How to store a HashSet in the index?

2007-12-10 Thread Rishabh Joshi
Hi,

Can anyone help me on, as to how I can go about efficiently indexing
(actually, storing in the index) and retrieving, a HashSet object, which
contains multiple string arrays?
I just want to store the HashSet in the index, and not search on it. The
HashSet should be returned with the document when I perform a search on any
other fields.

Regards,
Rishabh


[jira] Commented: (SOLR-303) Distributed Search over HTTP

2007-12-10 Thread Sabyasachi Dalal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549970
 ] 

Sabyasachi Dalal commented on SOLR-303:
---

I fixed the issue with the patch and it works with version 594268. 
Now, i am trying to make it work with the latest trunk.  I am facing a problem. 
The  FedSearchComponent needs a handle to the handler in order to execute on 
the local shard. I am trying to figure out how to pass the handler during 
component initialization.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-409) Allow configurable class loader sharing between cores

2007-12-10 Thread Walter Ferrara (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550002
 ] 

Walter Ferrara commented on SOLR-409:
-

Thanks for committing this patch.
I noticed that when you have just one core (with no multicore.xml), logging 
says null, i.e.
INFO: [null] Registered new searcher 
This could be fixed in several ways:
* by giving a (meaningful) name to the core, when multicore is not used. (like 
schema.getName())
* by not adding the name of the core when logging is off (which maybe means 
reuse logStr function, and check if Multicore is enabled)
This is present also in stats.jsp where a null is printed before the uppercase 
bold CORE string.
IMHO, we should set the name of single core when multicore is not set - this 
may make thing easier; set it as the name of its schema could be a solution.


 Allow configurable class loader sharing between cores
 -

 Key: SOLR-409
 URL: https://issues.apache.org/jira/browse/SOLR-409
 Project: Solr
  Issue Type: Sub-task
Affects Versions: 1.3
Reporter: Henri Biestro
Priority: Minor
 Fix For: 1.3

 Attachments: solr-350_409.patch, solr-350_409.patch, 
 solr-350_409.patch, solr-350_409.patch, solr-350_409.patch, 
 solr-350_409.patch, solr-350_409_414.patch, solr-409.patch, solr-409.patch


 WHAT:
 This patch allows to configure in the multicore.xml the parent class loader 
 of all core class loaders used to dynamically create instances.
 WHY:
 Current behavior allocates one class loader per config  thus per core.
 However, there are cases where one would like different cores to share some 
 objects that are dynamically instantiated (ie, where the class name is used 
 to find the class through the class loader and instantiate). In the current 
 form; since each core possesses its own class loader, static members are 
 indeed different objects. For instance, there is no way of implementing a 
 singleton shared between 2 request handlers.
 Originally from 
 http://www.nabble.com/Post-SOLR215-SOLR350-singleton-issue-tf4776980.html
 HOW:
 The sharedLib attribute is extracted from the XML (multicore.xml) 
 configuration file and parsed in the MultiCore load method. The directory 
 path is used to create an URL class loader that will become the parent class 
 loader of all core class loaders; since class resolution if performed on a 
 parent-first basis, this allows sharing instances between different cores.
 STATUS:
 operational in conjunction with solr-350

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-281) Search Components (plugins)

2007-12-10 Thread Sabyasachi Dalal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550031
 ] 

Sabyasachi Dalal commented on SOLR-281:
---

I am updating the distributed search patch (SOLR-303) with this patch.
I added the dist search components as, 

   searchComponent name=gstat 
class=org.apache.solr.handler.federated.component.GlobalCollectionStatComponent
 /
   searchComponent name=mqp 
class=org.apache.solr.handler.federated.component.MainQPhaseComponent /
   searchComponent name=aqp 
class=org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent /

  requestHandler name=/search class=solr.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
/lst

arr name=last-components
  strgstat/str
  strmqp/str
  straqp/str
/arr

  /requestHandler

But it was not working.  On debugging i found that these added components were 
not getting registered.

I made the following change in SolrCore.loadSearchComponents,

// NamedListPluginLoaderSearchComponent loader = new 
NamedListPluginLoaderSearchComponent( xpath, searchComponents );
NamedListPluginLoaderSearchComponent loader = new 
NamedListPluginLoaderSearchComponent( xpath, components );

 Search Components (plugins)
 ---

 Key: SOLR-281
 URL: https://issues.apache.org/jira/browse/SOLR-281
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3

 Attachments: SOLR-281-ComponentInit.patch, 
 SOLR-281-ComponentInit.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 solr-281.patch, solr-281.patch, solr-281.patch


 A request handler with pluggable search components for things like:
   - standard
   - dismax
   - more-like-this
   - highlighting
   - field collapsing 
 For more discussion, see:
 http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: How to store a HashSet in the index?

2007-12-10 Thread Mike Klaas

On 10-Dec-07, at 12:09 AM, Rishabh Joshi wrote:


Can anyone help me on, as to how I can go about efficiently indexing
(actually, storing in the index) and retrieving, a HashSet object,  
which

contains multiple string arrays?
I just want to store the HashSet in the index, and not search on  
it. The
HashSet should be returned with the document when I perform a  
search on any

other fields.


I don't know what efficient means in your context, but why not  
serialize to bytes and base64 encode, then store as you would a text  
field in Solr?


-Mike


[jira] Updated: (SOLR-415) LoggingFilter for debug

2007-12-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-415:


Attachment: SOLR-415.patch

attached a revised patch as Hoss kindly suggested.

 LoggingFilter for debug
 ---

 Key: SOLR-415
 URL: https://issues.apache.org/jira/browse/SOLR-415
 Project: Solr
  Issue Type: Improvement
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: SOLR-415.patch, SOLR-415.patch, SOLR-415.patch, 
 SOLR-415.patch


 logging version of analysis.jsp

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



solrj for distributed search

2007-12-10 Thread Yonik Seeley
I've been hacking on SOLR-303 (distributed search), and I started to
write my own XML parsing code utilizing stax (streaming), when I
realized that the code had already been written (in SolrJ).

Should we use SolrJ for making and parsing the distributed search requests?
One downside is that SolrJ would need to be moved into the core, but I
was planning on migrating to HttpClient at some point anyway.

Thoughts?

-Yonik


Re: Confluence wiki vs MoinMoin

2007-12-10 Thread Chris Hostetter

: What is really missing is that we don't (at least I don't) have a clear sense
: where what type of docs should go.  Some in javadocs, some on the wiki, almost
: none on the forrest site.  Javadocs work great since they are attached to
: sources and get included in releases.  But solr's users are not all javadoc
: readers (nor should they be).  Solr docs really should be in a non java
: specific context.

Once upon a time the plan (or at least my plan) was that how/what/why 
documentation for provided plugins (dismax, fieldtypes, analysis 
factories, etc...) would live (close to the code) in class level javadocs 
-- our users may not be javadoc readers, but we could link straight to the 
good pits from user centric forrest based overview pages.  The 
wiki would be a way for users to write tips and tricks type docs.

But things didn't really work out that way ... as simple as forrest is to 
use to generate pages, it's not the most freindly tool to add and 
organicly grow a set of documentation ... plus we made the decision early 
on to start a lot of docs on the wiki to flesh them out and make them 
easier to tweak with the intention of eventually migrating them to 
official forrest docs  except that we didn't know then what we know 
now about the legal issues -- but even before we knew about the legal 
issues, no one ever really had the inclination to migrate any of the docs.

even if we switch to cwiki, I still think javadocs are the best way to go 
for official plugin docs because of hte code/doc proximity advantages 
... but if they aren't user friendly enough for typical users then maybe 
we could look into hacking together a custom doclet to just output the 
class level docs and not hte method details?

: Having read all the rules, this is my proposal:

+1 to the bulk of your proposal, but a few comments...

I would like to suggest a step #0: There doesn't seem to be a cwiki 
sandbox we can use to test stuff out, so after getting a solr cwiki 
created, let's do some experiments with the exporting and make sure we can 
viably export docs that:
   1) use all relative links (like forrest)
   2) don't contain user comments from non cla users
..so we can be confident the exports can be included as documentation with 
releases before we spend a lot of time building up the new docset.

: 2. We keep http://wiki.apache.org/solr/ as an unofficial sandbox and pre 1.3
: docs.  Anyone can edit it, but it is not official.

i'm assuming we might eventually want to migrate this to a seperate 
cwiki space just for our own sanity (single syntax, single look/feel, 
etc...) but i agree this doesn't need to happen any time soon.

: For now, i think we should stick with forrest for the website and tutorial.
: When the tutorial gets revisited, http://cwiki.apache.org/SOLRxSITE/ may be a

i think the current site (including the tutorial) would probably make the 
best initial docs to put into the new cwiki to test it out since we 
*know* the legal issues with them are okay and we know they should be 
included in all releases.  eliminating forrest from the equation 
early on would also help simplify the documentation dilution issues of 
having forrest docs, wiki docs, and cwiki docs all at once -- especially 
if in Solr 1.3 (or 1.4 ... whenever it happens) the release itself 
includes overview docs and a tutorial generated by forrest with other docs 
generated from a cwiki dump ... the odds of getting those to all hyperlink 
with eachother cleanly seems very low.



-Hoss



Re: solrj for distributed search

2007-12-10 Thread Otis Gospodnetic
I think (re)using solrj is a good idea.  As a client, I'd rather have one API 
to use for both distributed and non-distributed calls to Solr.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-dev@lucene.apache.org
Sent: Tuesday, December 11, 2007 12:13:43 AM
Subject: solrj for distributed search

I've been hacking on SOLR-303 (distributed search), and I started to
write my own XML parsing code utilizing stax (streaming), when I
realized that the code had already been written (in SolrJ).

Should we use SolrJ for making and parsing the distributed search
 requests?
One downside is that SolrJ would need to be moved into the core, but I
was planning on migrating to HttpClient at some point anyway.

Thoughts?

-Yonik





purpose of MultiCore default ?

2007-12-10 Thread Chris Hostetter


Forgive me if i'm off base with some stuff here ... i'm still trying to 
wrap my head arround some of the new multicore stuff.


Ryan's comments in SOLR-428 have made me realize that the default core 
means more then i thought.  I had missunderstood it to be a way of 
specifying what the legacy singlton core should be ... but based on on 
SOLR-428 I'm now getting the sense that the default core identifies what 
core to use if no core is specified in the URL, soif this is your 
multicore.xml...


  multicore adminPath=/admin/multicore persistent=true
core name=core0 instanceDir=core0 default=true/
core name=core1 instanceDir=core1 /
  /multicore

...then these two URLs are equivilent, correct?

   http://localhost:8983/solr/@core0/select?q=*:*
   http://localhost:8983/solr/select?q=*:*

If i may ask: what is the motivation for this?  isn't it fair to assume 
that if people want to use multiple cores they can include the core name 
in every URL?


The one use case i can think of is that based on the SETASDEFAULT option 
of the MultiCoreHandler i suspect people want to do stuff like this...



  1. start up server with a single core0 as default
  2. use default URLs all day long...
   GET http://localhost:8983/solr/select?q=bar
   POST http://localhost:8983/solr/update ...
   GET http://localhost:8983/solr/select?q=foo
  3. decide you want to change the schema or something,
 load a new core0
  4. rebuild your index using core0 urls...
   POST http://localhost:8983/solr/@core1/update ...
  5. once you're happy with core1, set it as the default,
 and unload core0...
   GET 
http://localhost:8983/solr/admin/multicore?action=SETASDEFAULTcore=core1
   GET http://localhost:8983/solr/admin/multicore?action=UNLOADcore=core0
  6. keep using core1 just like you use to use core0 with
 default urls...
   GET http://localhost:8983/solr/select?q=bar
   POST http://localhost:8983/solr/update ...
   GET http://localhost:8983/solr/select?q=foo

...this seems like a really cool use case of multicores, but it also seems 
like it is incompartible with the primary goal of multicores: having lots 
of different indexes; afterall: there's only one default, so you can only 
use this trick with one of your indexes.


It seems like if this is the only perk or having a default core, it 
would make more sense to require a core name in every url (when multicore 
support is turned on) and replace the SETASDEFAULT operation with a 
RENAME operation that changes the name of a core (unloading any previous 
core that was using that name) ... or maybe even support multiple names 
per core, with some ADDNAME, REMOVENAME, and MOVENAME options...


 1 /admin/multicore?action=ADDNAMEcoreDir=cores/dir0name=yak
 2 /@yak/select?q=*:*
 3 /admin/multicore?action=ADDNAMEcoreDir=cores/dir1name=foo
 4 /@foo/select?q=*:*
 5 /admin/multicore?action=ADDNAMEcoreDir=cores/dir1name=bar
 6 /@bar/select?q=*:*
 (#4 and #6 are now equivilent)
 7 /admin/multicore?action=REMOVENAMEcoreDir=cores/dir1name=foo
 (now #4 no longer works)
 8 /admin/multicore?action=MOVENAMEcoreDir=cores/dir0name=bar
 (now #2 and #6 are equivilent)

thoughts?

-Hoss



multicore and admin pages?

2007-12-10 Thread Chris Hostetter


I notice this in the MultiCore wiki...


To access the admin pages for each core visit:
http://localhost:8983/solr/admin/?core=core0
http://localhost:8983/solr/admin/?core=core1


...trying this out using the example multicore setup didn't seem to work 
(the admin screen said core0 even for the second URL) -- but in general 
i'm curious if there's a specific desire for the admin pages to work with 
URLs like this (the core name as a URL param) instead of the having the 
core in the path like for the rest of the URLs?


Sure the admin pages are (mostly) JSPs, but before the Dispatcher forwards 
the request/response up the chain, it could pull the core name out of the 
path and include it as a request attribute right?





-Hoss