Multi tokenizer

2008-12-10 Thread Antonio Zippo
Hi all,

I need to tokenize my field on whitespaces, html, punctuation, apostrophe

but if I use HTMLStripStandardTokenizerFactory it strips only html but no 
apostrophes

If I use PatternTokenizerFactory i don't know if i can create a pattern to 
tokenizer all of theese characters...(hmtl, apostrophes..)...
I can filter with a pattern theese chars [^0-9A-Za-z] but with filter if I use 
  as replacement it brokes my text

could you help me to solve this problem?

Bye



  

dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread sunnyfr

Hi,

I would like to get the difference between  q=text:+toto AND q=toto ?

/select?fl=*qt=dismaxq=text:+toto : 4 docs find.
lst name=params
str name=fl*/str
str name=qtext: toto/str
str name=qtdismax/str

/select?fl=*qt=dismaxq=toto : 5682 docs find.
lst name=params
str name=fl*/str
str name=qtoto/str
str name=qtdismax/str


My schema just stored text field, I don't get this big difference.
Thanks a lot for your time, 

-- 
View this message in context: 
http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html
Sent from the Solr - User mailing list archive at Nabble.com.



Value based boosting - Design Help

2008-12-10 Thread ayyanar

We have a requirement for a keyword search in one of our projects and we are
using Solr/Lucene for the same.   

We have the data, link_id, title, url and a collection of keywords
associated to a link_id. Right now we have indexed link_id, title, url and
keywords (multivalued field) in a single index. 
 

Also, in our requirement each keyword value has a weight associated to it
and this weight is calculated based on certain factors like (if the keyword
exist in title then it takes a specific weight etc…). This weight should
drive the relevancy on the search result. For example, when a user enters a
keyword called “Biology” and clicks search, we search the keywords field in
the index. That document that contains the searched keyword with higher
weight should come first.

 

Eg:

 

Document 1:

LinkID = 100

Title = Biology

Keywords = Biology, BioNews, Bio, Bio chemistry

 

Document 2:

LinkID = 102

Title = Nutrition

Keywords = Biology, Nutrition, Dietics 

 

In the above example document 1 should come first because we will associate
more weight to the keyword biology for link id 100 in document 1

 

We understand that this weight can be applied as a boost to a field. The
problem is that in Solr/Lucene we cannot associate a different boost to
different values of a same field. 

 

It would be vey helpful for us if you can provide your thoughts/inputs on
how to achieve this requirement in Lucene:

 

Do we have a way to associate a different boost to different values of a
same field? 
Can we maintain the list of keywords associated to each link_id in a
separate index, so that we can associate weight to each keyword value? If
so, how do we relate the main index and the keyword index? 
 


-- 
View this message in context: 
http://www.nabble.com/Value-based--boosting---Design-Help-tp20934304p20934304.html
Sent from the Solr - User mailing list archive at Nabble.com.



Setting Request Handler

2008-12-10 Thread Deshpande, Mukta
Hi,
 
I have a request handler in my solrconfig.xml : /spellCheckCompRH 
It utilizes the search component spellcheck.
 
When I specify following query in browser, I get correct spelling
suggestions from the file dictionary.
 
http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=rel
evancyspellcheck=truefl=title,scorespellcheck.dictionary=file
 
Now I write a java program to achieve the same result:
 
Code snippet

 .
 .
 
server = new CommonsHttpSolrServer(http://localhost:8080/solr;);
 .
 .
SolrQuery query = new SolrQuery();
query.setQuery(solr );
query.setFields(*,score);
query.set(qt, spellCheckCompRH);
query.set(spellcheck, true);
query.set(SpellingParams.SPELLCHECK_DICT, file);
query.set(SpellingParams.SPELLCHECK_Q , solt);
 .
 .
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
SpellCheckResponse srsp = rsp.getSpellCheckResponse();
 
I get documents for my query but I do not get any spelling suggestions.
I think that the request handler is not getting set for the query
correctly.
 
Can someone please help. 
 
Best Regards,
Mukta


Re: dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread Erik Hatcher
dismax doesn't support field selection in it's query syntax, only via  
the qf parameter.


add debugQuery=true to see how the queries are being parsed, that'll  
reveal what is going on.


Erik


On Dec 10, 2008, at 5:07 AM, sunnyfr wrote:



Hi,

I would like to get the difference between  q=text:+toto AND q=toto ?

/select?fl=*qt=dismaxq=text:+toto : 4 docs find.
lst name=params
str name=fl*/str
str name=qtext: toto/str
str name=qtdismax/str

/select?fl=*qt=dismaxq=toto : 5682 docs find.
lst name=params
str name=fl*/str
str name=qtoto/str
str name=qtdismax/str


My schema just stored text field, I don't get this big difference.
Thanks a lot for your time,

--
View this message in context: 
http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: full-import and empty ./core/data/index

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 4:23 PM, Marc Sturlese [EMAIL PROTECTED]wrote:


 Is there any way to start solar having the index folder empty without
 having
 and error? What I would like to do is start with the empty folder, do a
 full
 import (wich would create the index from 0) and from there keep updating it
 with delta-import.
 At the moment I must have something in the index folder at the begining.
 Otherwise I get an error.


You can delete the index folder (but keep the data folder) and Solr will
create it at the start. There should be no errors.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Value based boosting - Design Help

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 5:54 PM, ayyanar
[EMAIL PROTECTED]wrote:


 Also, in our requirement each keyword value has a weight associated to it
 and this weight is calculated based on certain factors like (if the keyword
 exist in title then it takes a specific weight etc…). This weight should
 drive the relevancy on the search result. For example, when a user enters a
 keyword called Biology and clicks search, we search the keywords field in
 the index. That document that contains the searched keyword with higher
 weight should come first.

 It would be vey helpful for us if you can provide your thoughts/inputs on
 how to achieve this requirement in Lucene:

 Do we have a way to associate a different boost to different values of a
 same field?


So you are searching only on the keywords field and not the title field? You
can search on both the title and the keywords field and provide different
boosts to the title field.

Why do you want to assign weights to keywords? If all keywords which are in
title are supposed to be more relevant than all keywords only in keywords
field then assigning a boost value to the title field is enough. Is there
any other use-case?



 Can we maintain the list of keywords associated to each link_id in a
 separate index, so that we can associate weight to each keyword value? If
 so, how do we relate the main index and the keyword index?


No joins like these are not possible in Lucene/Solr. Lucene has payloads
which can be used for boosting a particular term but that functionality is
not available in Solr. Look at BoostingTermQuery in Lucene on how to use it.

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/payloads/BoostingTermQuery.html

-- 
Regards,
Shalin Shekhar Mangar.


Re: Problems with SOLR-236 (field collapsing)

2008-12-10 Thread Doug Steigerwald
The first output is from the query component.  You might just need to  
make the collapse component first and remove the query component  
completely.


We perform geographic searching with localsolr first (if we need to),  
and then try to collapse those results (if collapse=true).  If we  
don't have any results yet, that's the only time we use the standard  
query component.  I'm making sure we set the  
builder.setNeedDocSet=false and then I modified the query component to  
only execute when builder.isNeedDocSet=true.


In the field collapsing patch that I'm using, I've got code to remove  
a previous 'response' from the builder.rsp so we don't have duplicates.


Now, if I could get field collapsing to work properly with a docSet/ 
docList from localsolr and also have faceting work, I'd be golden.


Doug

On Dec 9, 2008, at 9:37 PM, Stephen Weiss wrote:


Hi Tracy,

Well, I managed to get it working (I think) but the weird thing is,  
in the XML output it gives both recordsets (the filtered and  
unfiltered - filtered second).  In the JSON (the one I actually use  
anyway, at least) I only get the filtered results (as expected).


In my core's solrconfig.xml, I added:

  searchComponent name=collapse  
class=org.apache.solr.handler.component.CollapseComponent /


(I'm not sure if it's supposed to go anywhere in particular but for  
me it's right before StandardRequestHandler)


and then within StandardRequestHandler:

 requestHandler name=standard class=solr.StandardRequestHandler
   !-- default values for query parameters --
lst name=defaults
  str name=echoParamsexplicit/str
  !--
  int name=rows10/int
  str name=fl*/str
  str name=version2.1/str
   --
/lst
arr name=components
   strquery/str
   strfacet/str
   strmlt/str
   strhighlight/str
   strdebug/str
   strcollapse/str
/arr
 /requestHandler


Which is basically all the default values plus collapse.  Not sure  
if this was needed for prior versions, I don't see it in any patch  
files (I just got a vague idea from looking at a comment from  
someone else who said it wasn't working for them).  It would kinda  
be nice if someone working on the code might throw us a bone and say  
explicitly what the right options to put in the config file are (if  
there are even supposed to be any - for all I know, this is just a  
bandaid over a larger problem).  I know it's not done yet though...  
just a pointer for this patch might be handy, it's really a useful  
feature if it works (I was kinda shocked this wasn't part of the  
standard distribution since it's something I had to do so often with  
mysql, kinda lucky I guess that it only came up now).


Another issue I'm having now is the faceting doesn't seem to change  
- even if I set the collapse.facet option to after...  I should  
really try before and see what happens.


Of course, I just realized the integrity of my collapse field is not  
so great so I have to go back and redo the data :-)


Best of luck.

--
Steve

On Dec 9, 2008, at 7:49 PM, Tracy Flynn (SOLR) wrote:


Steve,

I need this too. As my previous posting said, I adapted the 1.2  
field collapsing back at the beginning of the year, so I'm somewhat  
familiar.


I'll try and get a look this weekend. It's the earliest I''m likely  
to get spare cycles. I'll post any results.


Tracy

On Dec 9, 2008, at 4:18 PM, Stephen Weiss wrote:


Hi,

I'm trying to use field collapsing with our SOLR but I just can't  
seem to get it to do anything.


I've downloaded a dist copy of solr 1.3 and applied Ivan de  
Prado's patch - reading through the source code, the patch  
definitely was applied successfully (all the changes are in the  
right places, I've checked every single one).


I've run ant clean, ant compile, and ant dist to produce the war  
file in the dist/ folder, and then put the war file in place and  
restarted jetty.  According to the logs, jetty is definitely  
loading the right war file.  If I expand the war file and grep  
through the files, it would appear the collapsing code is there.


However, when I add any sort of collapse parameters (I've tried  
any combination of collapse=true collapse.field=link_id  
collapse.threshold=1 collapse.type=normal collapse.info.doc=true),  
the result set is no different from normal query, and there is no  
collapse data returned in the XML.


I'm not a java developer, this is my first time using ant period,  
and I'm just following basic directions I found on google.



Here is the output of the compilation process:



I really need this patch to work for a project...  Can someone  
please tell me what I'm missing to get this to work?  I can't  
really find any documentation beyond adding the collapse options  
to the query string, so it's hard to tell - is there an option in  
solrconfig.xml or in the core configuration that needs to be set?   
Am I going about this entirely the wrong way?


Thanks for any advice, I 

Re: How can i look for tom jerry

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 5:12 PM, sunnyfr [EMAIL PROTECTED] wrote:


 When I look for this expression it does stop the search at the , taking
 that for a parameter i guess.


You will need to URL encode the query parameter before you make the request.

URLEncoder.encode(tom  jerry, UTF-8);

If you are using SolrJ, it will automatically take care of this.

-- 
Regards,
Shalin Shekhar Mangar.


Re: full-import and empty ./core/data/index

2008-12-10 Thread Marc Sturlese

Thanks, it did work.


Shalin Shekhar Mangar wrote:
 
 On Wed, Dec 10, 2008 at 4:23 PM, Marc Sturlese
 [EMAIL PROTECTED]wrote:
 

 Is there any way to start solar having the index folder empty without
 having
 and error? What I would like to do is start with the empty folder, do a
 full
 import (wich would create the index from 0) and from there keep updating
 it
 with delta-import.
 At the moment I must have something in the index folder at the begining.
 Otherwise I get an error.

 
 You can delete the index folder (but keep the data folder) and Solr will
 create it at the start. There should be no errors.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/full-import-and-empty-.-core-data-index-tp20932981p20933620.html
Sent from the Solr - User mailing list archive at Nabble.com.



Can we extract contents from two Core folders

2008-12-10 Thread payalsharma

Hi All,

Issue: Need to fetch the data available in different core folders.
Scenario: 
We are storing the information on different core folders specific to website
ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any
region get store in specific core folder. for e.g. for india specific
information, CoreIndia folder is used.

Now the requirement is that, we have to access the information stored in
multiple cores that is CoreUSA and CoreUK folders simultaneously.
Is it possible to do so and if what is the mechanism.

Thanks in advance
Payal
-- 
View this message in context: 
http://www.nabble.com/Can-we-extract-contents-from-two-Core-folders-tp20933745p20933745.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can we extract contents from two Core folders

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 5:19 PM, payalsharma [EMAIL PROTECTED] wrote:


 We are storing the information on different core folders specific to
 website
 ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any
 region get store in specific core folder. for e.g. for india specific
 information, CoreIndia folder is used.

 Now the requirement is that, we have to access the information stored in
 multiple cores that is CoreUSA and CoreUK folders simultaneously.
 Is it possible to do so and if what is the mechanism.


Make two queries :-)

-- 
Regards,
Shalin Shekhar Mangar.


Re: dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread sunnyfr

Thanks Erik,
Have a good day,


Erik Hatcher wrote:
 
 dismax doesn't support field selection in it's query syntax, only via  
 the qf parameter.
 
 add debugQuery=true to see how the queries are being parsed, that'll  
 reveal what is going on.
 
   Erik
 
 
 On Dec 10, 2008, at 5:07 AM, sunnyfr wrote:
 

 Hi,

 I would like to get the difference between  q=text:+toto AND q=toto ?

 /select?fl=*qt=dismaxq=text:+toto : 4 docs find.
 lst name=params
 str name=fl*/str
 str name=qtext: toto/str
 str name=qtdismax/str

 /select?fl=*qt=dismaxq=toto : 5682 docs find.
 lst name=params
 str name=fl*/str
 str name=qtoto/str
 str name=qtdismax/str


 My schema just stored text field, I don't get this big difference.
 Thanks a lot for your time,

 -- 
 View this message in context:
 http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932585.html
Sent from the Solr - User mailing list archive at Nabble.com.



Error, when i update the rich text documents such as .doc, .ppt files.

2008-12-10 Thread RaghavPrabhu

Hi all,

 I want to index the rich text documents like .doc, .xls, .ppt files. I had
done the patch for updating the rich documents by followed the instructions
in this below url. http://wiki.apache.org/solr/UpdateRichDocuments

When i indexing the doc file, im getting this following error in the
browser.


HTTP ERROR: 500
lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:247)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.RichDocumentRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:319)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:340)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:235)
... 21 more
Caused by: java.lang.ClassNotFoundException: solr.RichDocumentRequestHandler
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375)
at
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257)
... 24 more

RequestURI=/solr/update/rich



Better solutions will be appreciate more..

Thanks a lot
Prabhu.K
-- 
View this message in context: 
http://www.nabble.com/Error%2C-when-i-update-the-rich-text-documents-such-as-.doc%2C-.ppt-files.-tp20934026p20934026.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can we extract contents from two Core folders

2008-12-10 Thread Mark Miller

payalsharma wrote:

Hi All,

Issue: Need to fetch the data available in different core folders.
Scenario: 
We are storing the information on different core folders specific to website

ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any
region get store in specific core folder. for e.g. for india specific
information, CoreIndia folder is used.

Now the requirement is that, we have to access the information stored in
multiple cores that is CoreUSA and CoreUK folders simultaneously.
Is it possible to do so and if what is the mechanism.

Thanks in advance
Payal
  

Try distributed search over the cores.


Re: Can we extract contents from two Core folders

2008-12-10 Thread payalsharma

Hi,

Will you please explain what exactly you mean by :
Distributed search over the cores.

Please provide some context around this.

Thanks


markrmiller wrote:
 
 payalsharma wrote:
 Hi All,

 Issue: Need to fetch the data available in different core folders.
 Scenario: 
 We are storing the information on different core folders specific to
 website
 ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to
 any
 region get store in specific core folder. for e.g. for india specific
 information, CoreIndia folder is used.

 Now the requirement is that, we have to access the information stored in
 multiple cores that is CoreUSA and CoreUK folders simultaneously.
 Is it possible to do so and if what is the mechanism.

 Thanks in advance
 Payal
   
 Try distributed search over the cores.
 
 

-- 
View this message in context: 
http://www.nabble.com/Can-we-extract-contents-from-two-Core-folders-tp20933745p20937150.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: snappuller issue with multicore

2008-12-10 Thread Bill Au
I notices that you are using the same rysncd port for both core.  Do you
have a scripts.conf for each core?

Bill

On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu [EMAIL PROTECTED]wrote:

 Hi,



  We are seeing a strange behavior with snappuller



 We have 2 cores Hotel  Location



 Here are the steps we perform



 1.  index hotel on master server
 2.  index location on master server
 3.  execute snapshooter for hotel core on master server
 4.  execute snapshooter for location core on master server
 5.  execute snappuller from slave machines (once for hotel core 
 once for location core)



 However, the hotel core snapshot is pulled into the location data dir.



 Here are the commands that we execute in our ruby scripts



 system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M
 masterServer  -D /solr/data/hotel )

 system(solr/multicore/location/bin/snappuller -P 18983 -M masterServer
 -S /solr/data -D /solr/data/location)



 Thanks,

 Raghu




RE: snappuller issue with multicore

2008-12-10 Thread Kashyap, Raghu
Bill,

   Yes I do have scripts.conf for each core. However, all the options
needed for snappuller is specified in the command line itself (-D -S
etc...)

-Raghu

-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 10, 2008 9:17 AM
To: solr-user@lucene.apache.org
Subject: Re: snappuller issue with multicore

I notices that you are using the same rysncd port for both core.  Do you
have a scripts.conf for each core?

Bill

On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu
[EMAIL PROTECTED]wrote:

 Hi,



  We are seeing a strange behavior with snappuller



 We have 2 cores Hotel  Location



 Here are the steps we perform



 1.  index hotel on master server
 2.  index location on master server
 3.  execute snapshooter for hotel core on master server
 4.  execute snapshooter for location core on master server
 5.  execute snappuller from slave machines (once for hotel core 
 once for location core)



 However, the hotel core snapshot is pulled into the location data dir.



 Here are the commands that we execute in our ruby scripts



 system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M
 masterServer  -D /solr/data/hotel )

 system(solr/multicore/location/bin/snappuller -P 18983 -M
masterServer
 -S /solr/data -D /solr/data/location)



 Thanks,

 Raghu




RE: Can we extract contents from two Core folders

2008-12-10 Thread Kashyap, Raghu
-Original Message-
From: payalsharma [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 10, 2008 9:11 AM
To: solr-user@lucene.apache.org
Subject: Re: Can we extract contents from two Core folders


Hi,

Will you please explain what exactly you mean by :
Distributed search over the cores.

Please provide some context around this.

Thanks


http://wiki.apache.org/solr/DistributedSearch 





Re: Look for three words, just two are weighted ?

2008-12-10 Thread Erik Hatcher


On Dec 10, 2008, at 9:58 AM, sunnyfr wrote:
Second question, if I want to weight status_official:true^2 should I  
do it

this way ??? for weighting the true one? thanks

/select?fl=*qt=dismaxq=+tom+jerry 
+ 
cartoontv 
qf=status_official^2.5+owner_login^10+title^3debugQuery=true


Use bq (boosting query) for boosting by status  
bq=status_official:true^2 and remove it from the qf parameter.  That  
should do the trick.


Erik



Re: Look for three words, just two are weighted ?

2008-12-10 Thread sunnyfr

Yes but when I check the debug, there is no weight about it ??? 
/select?fl=*qt=dismaxq=+tom+jerry+cartoontvbq=status_official:true^12qf=owner_login^10+title^3debugQuery=true

and its like if it doesnt weight as well my word cartoontv ?? ok maybe the
doc which contain this three word is not enough weighted ? 

str name=rawquerystring tom jerry cartoontv/str
str name=querystring tom jerry cartoontv/str
−
str name=parsedquery
+((DisjunctionMaxQuery((owner_login:tom^10.0 | title:tom^3.0)~0.01)
DisjunctionMaxQuery((owner_login:jerry^10.0 | title:jerri^3.0)~0.01)
DisjunctionMaxQuery((owner_login:cartoontv^10.0 |
title:cartoontv^3.0)~0.01))~2) DisjunctionMaxQuery((text:tom jerri
cartoontv~100^0.2)~0.01) status_official:true^12.0
/str
−
str name=parsedquery_toString
+(((owner_login:tom^10.0 | title:tom^3.0)~0.01 (owner_login:jerry^10.0 |
title:jerri^3.0)~0.01 (owner_login:cartoontv^10.0 |
title:cartoontv^3.0)~0.01)~2) (text:tom jerri cartoontv~100^0.2)~0.01
status_official:T^12.0
/str
−
lst name=explain
−
str name=559170

0.59949005 = (MATCH) sum of:
  0.59949005 = (MATCH) product of:
0.899235 = (MATCH) sum of:
  0.37824848 = (MATCH) max plus 0.01 times others of:
0.37824848 = (MATCH) weight(title:tom^3.0 in 918085), product of:
  0.077876315 = queryWeight(title:tom^3.0), product of:
3.0 = boost
7.771266 = idf(docFreq=8887, numDocs=7753783)
0.003340353 = queryNorm
  4.8570414 = (MATCH) fieldWeight(title:tom in 918085), product of:
1.0 = tf(termFreq(title:tom)=1)
7.771266 = idf(docFreq=8887, numDocs=7753783)
0.625 = fieldNorm(field=title, doc=918085)
  0.52098656 = (MATCH) max plus 0.01 times others of:
0.52098656 = (MATCH) weight(title:jerri^3.0 in 918085), product of:
  0.09139661 = queryWeight(title:jerri^3.0), product of:
3.0 = boost
9.120454 = idf(docFreq=2305, numDocs=7753783)
0.003340353 = queryNorm
  5.7002835 = (MATCH) fieldWeight(title:jerri in 918085), product
of:
1.0 = tf(termFreq(title:jerri)=1)
9.120454 = idf(docFreq=2305, numDocs=7753783)
0.625 = fieldNorm(field=title, doc=918085)
0.667 = coord(2/3)
/str
−



Erik Hatcher wrote:
 
 
 On Dec 10, 2008, at 9:58 AM, sunnyfr wrote:
 Second question, if I want to weight status_official:true^2 should I  
 do it
 this way ??? for weighting the true one? thanks

 /select?fl=*qt=dismaxq=+tom+jerry 
 + 
 cartoontv 
 qf=status_official^2.5+owner_login^10+title^3debugQuery=true
 
 Use bq (boosting query) for boosting by status  
 bq=status_official:true^2 and remove it from the qf parameter.  That  
 should do the trick.
 
   Erik
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Look-for-three-words%2C-just-two-are-weighted---tp20936945p20937676.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: snappuller issue with multicore

2008-12-10 Thread Doug Steigerwald
Try using the -d option with the snappuller so you can specify the  
path to the directory holding index data on local machine.


Doug

On Dec 10, 2008, at 10:20 AM, Kashyap, Raghu wrote:


Bill,

  Yes I do have scripts.conf for each core. However, all the options
needed for snappuller is specified in the command line itself (-D -S
etc...)

-Raghu

-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 10, 2008 9:17 AM
To: solr-user@lucene.apache.org
Subject: Re: snappuller issue with multicore

I notices that you are using the same rysncd port for both core.  Do  
you

have a scripts.conf for each core?

Bill

On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu
[EMAIL PROTECTED]wrote:


Hi,



We are seeing a strange behavior with snappuller



We have 2 cores Hotel  Location



Here are the steps we perform



1.  index hotel on master server
2.  index location on master server
3.  execute snapshooter for hotel core on master server
4.  execute snapshooter for location core on master server
5.  execute snappuller from slave machines (once for hotel core 
once for location core)



However, the hotel core snapshot is pulled into the location data  
dir.




Here are the commands that we execute in our ruby scripts



system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M
masterServer  -D /solr/data/hotel )

system(solr/multicore/location/bin/snappuller -P 18983 -M

masterServer

-S /solr/data -D /solr/data/location)



Thanks,

Raghu






Re: Setting Request Handler

2008-12-10 Thread Grant Ingersoll

Inline below...

Also, though, you should note that the /spellCheckCompRH that is  
packaged with the example is not necessarily the best way to actually  
use the SpellCheckComponent.   It is intended to be used as a  
component in whatever your MAIN Request Handler is, it merely shows  
the how of hooking it in.



On Dec 10, 2008, at 7:51 AM, Deshpande, Mukta wrote:


Hi,

I have a request handler in my solrconfig.xml : /spellCheckCompRH
It utilizes the search component spellcheck.

When I specify following query in browser, I get correct spelling
suggestions from the file dictionary.

http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=rel
evancyspellcheck=truefl=title,scorespellcheck.dictionary=file

Now I write a java program to achieve the same result:

Code snippet

.
.

server = new CommonsHttpSolrServer(http://localhost:8080/solr;);
.
.
SolrQuery query = new SolrQuery();
query.setQuery(solr );
query.setFields(*,score);
query.set(qt, spellCheckCompRH);


Is spellCheckCompRH a variable?  Does it equal /spellCheckCompRH?



query.set(spellcheck, true);
query.set(SpellingParams.SPELLCHECK_DICT, file);
query.set(SpellingParams.SPELLCHECK_Q , solt);
.
.
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
SpellCheckResponse srsp = rsp.getSpellCheckResponse();

I get documents for my query but I do not get any spelling  
suggestions.

I think that the request handler is not getting set for the query
correctly.

Can someone please help.

Best Regards,
Mukta


--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












RE: snappuller issue with multicore

2008-12-10 Thread Kashyap, Raghu
Ok I think the problem is what Bill mentioned earlier. The rsync port
was the same for both the cores and due to which it was copying the same
snapshot for both the cores

Thanks for all the help

-Raghu
-Original Message-
From: Kashyap, Raghu [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 10, 2008 10:27 AM
To: solr-user@lucene.apache.org
Subject: RE: snappuller issue with multicore

Doug,

  That doesn't help

-Raghu

-Original Message-
From: Doug Steigerwald [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 10, 2008 9:35 AM
To: solr-user@lucene.apache.org
Subject: Re: snappuller issue with multicore

Try using the -d option with the snappuller so you can specify the  
path to the directory holding index data on local machine.

Doug

On Dec 10, 2008, at 10:20 AM, Kashyap, Raghu wrote:

 Bill,

   Yes I do have scripts.conf for each core. However, all the options
 needed for snappuller is specified in the command line itself (-D -S
 etc...)

 -Raghu

 -Original Message-
 From: Bill Au [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 10, 2008 9:17 AM
 To: solr-user@lucene.apache.org
 Subject: Re: snappuller issue with multicore

 I notices that you are using the same rysncd port for both core.  Do  
 you
 have a scripts.conf for each core?

 Bill

 On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu
 [EMAIL PROTECTED]wrote:

 Hi,



 We are seeing a strange behavior with snappuller



 We have 2 cores Hotel  Location



 Here are the steps we perform



 1.  index hotel on master server
 2.  index location on master server
 3.  execute snapshooter for hotel core on master server
 4.  execute snapshooter for location core on master server
 5.  execute snappuller from slave machines (once for hotel core 
 once for location core)



 However, the hotel core snapshot is pulled into the location data  
 dir.



 Here are the commands that we execute in our ruby scripts



 system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M
 masterServer  -D /solr/data/hotel )

 system(solr/multicore/location/bin/snappuller -P 18983 -M
 masterServer
 -S /solr/data -D /solr/data/location)



 Thanks,

 Raghu





Re: Error, when i update the rich text documents such as .doc, .ppt files.

2008-12-10 Thread Otis Gospodnetic
Hi,

There is a ClassNotFound exception in there.  Make sure you rebuild the war, 
completely remove the old one, and properly deploy the new one.  Peek into the 
war and look for the class that the error below is missing to make sure the 
class is really there.  Get the latest code for 
http://wiki.apache.org/solr/ExtractingRequestHandler and try that.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: RaghavPrabhu [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 10, 2008 7:09:17 AM
 Subject: Error, when i update the rich text documents such as .doc, .ppt 
 files.
 
 
 Hi all,
 
 I want to index the rich text documents like .doc, .xls, .ppt files. I had
 done the patch for updating the rich documents by followed the instructions
 in this below url. http://wiki.apache.org/solr/UpdateRichDocuments
 
 When i indexing the doc file, im getting this following error in the
 browser.
 
 
 HTTP ERROR: 500
 lazy loading error
 
 org.apache.solr.common.SolrException: lazy loading error
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:247)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
 at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
 at
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 Caused by: org.apache.solr.common.SolrException: Error loading class
 'solr.RichDocumentRequestHandler'
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:319)
 at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:340)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:235)
 ... 21 more
 Caused by: java.lang.ClassNotFoundException: solr.RichDocumentRequestHandler
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375)
 at
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337)
 at java.lang.ClassLoader.loadClassInternal(Unknown Source)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Unknown Source)
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257)
 ... 24 more
 
 RequestURI=/solr/update/rich
 
 
 
 Better solutions will be appreciate more..
 
 Thanks a lot
 Prabhu.K
 -- 
 View this message in context: 
 http://www.nabble.com/Error%2C-when-i-update-the-rich-text-documents-such-as-.doc%2C-.ppt-files.-tp20934026p20934026.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Error, when i update the rich text documents such as .doc, .ppt files.

2008-12-10 Thread Grant Ingersoll

Hi Raghav,

Recently, integration with Tika was completed for SOLR-284 and it is  
now committed on the trunk (but does not use the old  
RichDocumentHandler approach).  See http://wiki.apache.org/solr/ExtractingRequestHandler 
 for how to use and configure.


Otherwise, it looks to me like the jar file for the RichDocHandler is  
not in your WAR or in the Solr Home lib directory.


HTH,
Grant

On Dec 10, 2008, at 7:09 AM, RaghavPrabhu wrote:



Hi all,

I want to index the rich text documents like .doc, .xls, .ppt files.  
I had
done the patch for updating the rich documents by followed the  
instructions

in this below url. http://wiki.apache.org/solr/UpdateRichDocuments

When i indexing the doc file, im getting this following error in the
browser.


HTTP ERROR: 500
lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:247)

at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)

at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)

at
org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)
	at  
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 
365)

at
org 
.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)
	at  
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 
181)
	at  
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 
712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 
405)

at
org 
.mortbay 
.jetty 
.handler 
.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)

at
org 
.mortbay 
.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at  
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 
139)

at org.mortbay.jetty.Server.handle(Server.java:285)
	at  
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)

at
org.mortbay.jetty.HttpConnection 
$RequestHandler.headerComplete(HttpConnection.java:821)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)

at
org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)

Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.RichDocumentRequestHandler'
at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:319)
	at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java: 
340)

at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:235)

... 21 more
Caused by: java.lang.ClassNotFoundException:  
solr.RichDocumentRequestHandler

at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at
org 
.mortbay 
.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375)

at
org 
.mortbay 
.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337)

at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257)

... 24 more

RequestURI=/solr/update/rich



Better solutions will be appreciate more..

Thanks a lot
Prabhu.K
--
View this message in context: 
http://www.nabble.com/Error%2C-when-i-update-the-rich-text-documents-such-as-.doc%2C-.ppt-files.-tp20934026p20934026.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Dates in Solr

2008-12-10 Thread Tricia Williams

Hi All,

  I'm curious about what people have done with dates.

We Require:

 1. multiple granularities to query and facet on: by year, by
year/month, by year/month/day
 2. sortability: sort/order by date
 3. time typically isn't important to us
 4. some of these items don't have a day or month associated with them
 5. possibly consider seasonal like publications with FALL as a date

This is the bulk of what I found documented in the mailing list and wiki:

  * http://www.nabble.com/dates---times-td10417533.html#a10421952
  * http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing
  * 
http://wiki.apache.org/solr/SimpleFacetParameters#head-068dc96b0dac1cfc7264fe85528d7df5bf391acd 



o 
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html


o 
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html


  * any queries on those fields (typically range queries) should use
either the Complete ISO 8601 Date syntax that field supports, or
the DateMath?
http://trac.library.utoronto.ca/projectKO/wiki/DateMath Syntax
to get relative dates

This is great and valuable.  I would like to be able to use the existing 
functionality but I'm not sure how I can use the DateField to specify a 
year without a time (what I guess would actually be a range of time) for 
a document.  Any ideas?


Tricia


Solr Newbie question

2008-12-10 Thread Rakesh Sinha
Hi -
  I am a new user of Solr tool  and came across the introductory
tutorial here - http://lucene.apache.org/solr/tutorial.html  .
I am planning to use Solr in one of my projects . I see that the
tutorial mentions about a REST api / interface to add documents and to
query the same.

I would like to create  the indices locally , where the web server (or
pool of servers ) will have access to the database directly , but use
the query REST api to query for the results.

 I am curious how this could be possible without taking the http rest
api submission to add to indices. (For the sake of simplicity - we can
assume it would be just one node to store the index but multiple
readers / query machines that could potentially connect to the solr
web service and retrieve the query results. Also the index might be
locally present in the same machine as that of the Solr host or at
least accessible through NFS etc. )

Thanks for helping out to some starting pointers regarding the same.


Re: Dates in Solr

2008-12-10 Thread Otis Gospodnetic
Tricia,

I think you might have missed the key nugget at the bottom of 
http://wiki.apache.org/jakarta-lucene/DateRangeQueries

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Tricia Williams [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 10, 2008 12:12:11 PM
 Subject: Dates in Solr
 
 Hi All,
 
   I'm curious about what people have done with dates.
 
 We Require:
 
 1. multiple granularities to query and facet on: by year, by
 year/month, by year/month/day
 2. sortability: sort/order by date
 3. time typically isn't important to us
 4. some of these items don't have a day or month associated with them
 5. possibly consider seasonal like publications with FALL as a date
 
 This is the bulk of what I found documented in the mailing list and wiki:
 
   * http://www.nabble.com/dates---times-td10417533.html#a10421952
   * http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing
   * 
 http://wiki.apache.org/solr/SimpleFacetParameters#head-068dc96b0dac1cfc7264fe85528d7df5bf391acd
  
 
 
 o 
 http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html
 
 o 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
 
   * any queries on those fields (typically range queries) should use
 either the Complete ISO 8601 Date syntax that field supports, or
 the DateMath?
 Syntax
 to get relative dates
 
 This is great and valuable.  I would like to be able to use the existing 
 functionality but I'm not sure how I can use the DateField to specify a year 
 without a time (what I guess would actually be a range of time) for a 
 document.  
 Any ideas?
 
 Tricia



Re: Limitations of Distributed Search ....

2008-12-10 Thread Otis Gospodnetic
Hi,

I have not worked with a 50 node Solr cluster, but I've worked with pure Lucene 
clusters of that size, very high query and data volumes.  I don't imagine a 
dist search involving 50 nodes will be a problem for Solr.  As for handling 
query slave failures, I'm sure you'll want to involve a LB that can detect 
those, and have multiple replicas of each query node behind it for fail-over.

As for the manageability, I think you'll find that management is really mostly 
on you - Solr doesn't provide tools for cluster / shard management.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: souravm [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Sunday, December 7, 2008 12:40:34 AM
 Subject: Limitations of Distributed Search 
 
 Hi,
 
 We are planning to use Solr for processing large volume of application log 
 files 
 (around ~ 10 Billions documents of size 5-6 TB).
 
 One of the approach we are considering for the same is to use Distributed 
 Search 
 extensively. 
 
 What we have in mind is distributing the log files in multiple boxes monthly 
 or 
 weekly basis - where at the weekly basis itself the volume can go to the 
 level 
 of 200 M of documents. And a search query can spread across all weeks (e.g. 
 number of a given txn for 1st 6 months of a year)
 
 However, what we are not sure how well the distributed search would scale 
 when 
 we may use around 50-60 boxes to distribute indexed documents on weekly 
 basis. 
 The specific questions I have in mind are -
 
 a) How would be the impact on the performance when a query spreads over 50 
 boxes
 b) Is there any hard limit on the number of slaves which can be contacted 
 from 
 the master server?
 c) How much load will this type of approach create on master server for 
 merging 
 data, keeping the track whether a slave is down or not
 d) Any other manageability issues with so many slaves
 
 If anyone of you have deployed Solr in such a environment it would be great 
 if 
 you can share your experience on the same.
 
 Thanks in advance.
 
 Regards,
 Sourav
 
 
 
  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
 for the use of the addressee(s). If you are not the intended recipient, 
 please 
 notify the sender by e-mail and delete the original message. Further, you are 
 not 
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person 
 and 
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken 
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage 
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your 
 own virus checks before opening the e-mail or attachment. Infosys reserves 
 the 
 right to monitor and review the content of all messages sent to or from this 
 e-mail 
 address. Messages sent to or from this e-mail address may be stored on the 
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***



Re: solr performance

2008-12-10 Thread Ryan McKinley

For a similar idea, check:
https://issues.apache.org/jira/browse/SOLR-906

This opens a single stream and writes all documents to that.  It could  
easily be extended to have multiple threads draining the same Queue



On Dec 9, 2008, at 4:02 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



I guess this is the best idea . Let us have a new BatchHttpSolrServer
which can help achieve this
--Noble

On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller [EMAIL PROTECTED]  
wrote:
Kick off some indexing more than once - eg, post a folder of docs,  
and while

thats working, post another.

I've been thinking about a multi threaded UpdateProcessor as well  
- that

could be interesting.


Not sure how that would work (unless you didn't want responses), but
I've thought about it from the SolrJ side - something you could
quickly add documents to and it would manage a number of threads  
under

the covers to maximize throughput.  Not sure what would be the best
for error handling though - perhaps just polling (allow user to ask
for failed or successful operations).

-Yonik





--
--Noble Paul




Re: Dates in Solr

2008-12-10 Thread Tricia Williams

Hi Otis,

   Absolutely, I missed that nugget.  I didn't think of using prefix 
filters/queries.  This works really well with how we had already stored 
dates in a MMDD string.  Thanks for pointing me in the right direction.


Tricia

Otis Gospodnetic wrote:

Tricia,

I think you might have missed the key nugget at the bottom of 
http://wiki.apache.org/jakarta-lucene/DateRangeQueries

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Tricia Williams [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, December 10, 2008 12:12:11 PM
Subject: Dates in Solr

Hi All,

  I'm curious about what people have done with dates.

We Require:

1. multiple granularities to query and facet on: by year, by
year/month, by year/month/day
2. sortability: sort/order by date
3. time typically isn't important to us
4. some of these items don't have a day or month associated with them
5. possibly consider seasonal like publications with FALL as a date

This is the bulk of what I found documented in the mailing list and wiki:

  * http://www.nabble.com/dates---times-td10417533.html#a10421952
  * http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing
  * 
http://wiki.apache.org/solr/SimpleFacetParameters#head-068dc96b0dac1cfc7264fe85528d7df5bf391acd 



o 
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html


o 
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html


  * any queries on those fields (typically range queries) should use
either the Complete ISO 8601 Date syntax that field supports, or
the DateMath?
Syntax
to get relative dates

This is great and valuable.  I would like to be able to use the existing 
functionality but I'm not sure how I can use the DateField to specify a year 
without a time (what I guess would actually be a range of time) for a document.  
Any ideas?


Tricia




  




Multi Core - Max Core Count Recommendation

2008-12-10 Thread Ryan Peterson
I'm trying to see if anyone has any recommendations on the maximum number of 
cores that should be used within Solr. Is there significant overhead to each 
core? Should it be 10 or less, or is 100 or 1,000 cores acceptable.

Thanks,

Ryan


Re: Multi Core - Max Core Count Recommendation

2008-12-10 Thread Ryan McKinley

it depends!

yes there is overhead to each core -- how much it matters will depend  
entirely on your setup and typical usage pattern.


sorry this is not a particularly useful answer.

I think the choice of how many cores will come down to your domain  
logic needs more then hardware.  If you are able to put things into a  
single index and get the performance you need, it will just be easier  
to deal with.


ryan



On Dec 10, 2008, at 3:35 PM, Ryan Peterson wrote:

I'm trying to see if anyone has any recommendations on the maximum  
number of cores that should be used within Solr. Is there  
significant overhead to each core? Should it be 10 or less, or is  
100 or 1,000 cores acceptable.


Thanks,

Ryan




Re: multiValued multiValued fields

2008-12-10 Thread Chris Hostetter

: I want to index a field with an array of arrays, is that possible in Solr?

Not out of the box ... you can implement custom FieldTypes that store any 
data you want in using a byte[] but you'd still need to do some tricks 
with your FieldType to get the ResponsWriter to write it out in a 
meaningful way.

the simplest appraoch would be to just encode the multiple values into a 
String in some way (comma seperated, or something)

-Hoss



Re: Multi Core - Max Core Count Recommendation

2008-12-10 Thread Ryan Peterson
We are considering a migration to SOLR from a home grown Lucene solution. 
Currently we have 27,000 seperate lucene indexes that are separated based on 
business logic. Collectively the indexes are about 1.5 Terrabytes in size. We 
have some very small indexes and some that are quite large (up to 15GB). My 
hesitation of grouping all this data across say 4 SOLR instances is that each 
individual idex will still be about 400GB in size. How bit is too big for a 
singel Lucene index? Each SOLR instance will be on a dual/dual core xeon box 
with 6 SAS 15k drives in Raid 5 config and 16GB of RAM.

If a 400GB instance is too much, I figured I could reduce the size of each 
individual index further by using multiple CORES, but again how many would 
depend on what size index is too big.

Any suggestions would be greatly appreciated, thank you for your time.

-Ryan





From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, December 10, 2008 1:26:07 PM
Subject: Re: Multi Core - Max Core Count Recommendation

it depends!

yes there is overhead to each core -- how much it matters will depend  
entirely on your setup and typical usage pattern.

sorry this is not a particularly useful answer.

I think the choice of how many cores will come down to your domain  
logic needs more then hardware.  If you are able to put things into a  
single index and get the performance you need, it will just be easier  
to deal with.

ryan



On Dec 10, 2008, at 3:35 PM, Ryan Peterson wrote:

 I'm trying to see if anyone has any recommendations on the maximum  
 number of cores that should be used within Solr. Is there  
 significant overhead to each core? Should it be 10 or less, or is  
 100 or 1,000 cores acceptable.

 Thanks,

 Ryan

RE: Dealing with field values as key/value pairs

2008-12-10 Thread Chris Hostetter

: This is really cool. U... How does it integrate with the Data Import
: Handler?

my DIH knowledge is extremely limited, but i'm guessing approach #1 is 
trivial (there is an easy way to concat DB values to build up solr field 
values right?); approach #2 would probably be possible using multiple root 
entities (assuming multiple root entites means what i think it means)

: I've taken two approaches in the past...
: 
: 1) encode the id and the label in the field value; facet on it; require
: clients to know how to decode.  This works really well for simple things
: where the the id=label mappings don't ever change, and are easy to encode
: (ie 01234:Chris Hostetter).  This is a horrible approach when id=label
: mappings do change with any frequency.
: 
: 2) have a seperate type of metadata document, one per thing that you are
: faceting on containing fields for id and the label (and probably a doc_type
: field so you can tell it apart from your main docs) then once you've done
: your main query and gotten the results back facetied on id, you can query
: for those ids to get the corrisponding labels.  this works realy well if the
: labels ever change (just reindex the corrisponding metadata document) and
: has the added bonus that you can store additional metadata in each of those
: docs, and in many use cases for presenting an initial browse interface,
: you can sometimes get away with a cheap search for all metadata docs (or all
: metadata docs meeting a certain
: criteria) instead of an expensive facet query across all of your main
: documents.



-Hoss



Sum of Fields and Record Count

2008-12-10 Thread John Martyniak

Hi,

I am a new solr user.

I have an application that I would like to show the results but one  
result may be the part of larger set of results.  So for example  
result #1 might also have 10 other results that are part of the same  
data set.


Hopefully this makes sense.

What I would like to find out is if there is a way within Solr to show  
the result that matched with the query, and then to also show that  
this result is part of a collection of 10 items.


I have thought about doing it using some sort of external process that  
runs, and with doing multiple queries, so get the list of items and  
then query against each item.  But those don't seem elegant.


So I would like to find out if there is a way to do it within Solr  
that is a little more elegant, and hopefully without having to write  
additional code.


Thank you in advance for the help.

-John




SolrConfig.xml Replication

2008-12-10 Thread Jeff Newburn
I am curious as to whether there is a solution to be able to replicate
solrconfig.xml with the 1.4 replication.  The obvious problem is that the
master would replicate the solrconfig turning all slaves into masters with
its config.  I have also tried on a whim to configure the master and slave
on the master so that the slave points to the same server but that seems to
break the replication completely.  Please let me know if anybody has any
ideas

-Jeff


Re: Sum of Fields and Record Count

2008-12-10 Thread Grant Ingersoll

Hi John,

What is your process for determining that #1 is part of the other  
result set?  My gut says this is a faceting problem, i.e. #1 has a  
field contain its category that is also shared by the 10 other  
results, and that all you need to do is facet on the category field.


The other thing that comes to mind is More Like This: 
http://wiki.apache.org/solr/MoreLikeThis

-Grant

On Dec 10, 2008, at 6:16 PM, John Martyniak wrote:


Hi,

I am a new solr user.

I have an application that I would like to show the results but one  
result may be the part of larger set of results.  So for example  
result #1 might also have 10 other results that are part of the same  
data set.


Hopefully this makes sense.

What I would like to find out is if there is a way within Solr to  
show the result that matched with the query, and then to also show  
that this result is part of a collection of 10 items.


I have thought about doing it using some sort of external process  
that runs, and with doing multiple queries, so get the list of items  
and then query against each item.  But those don't seem elegant.


So I would like to find out if there is a way to do it within Solr  
that is a little more elegant, and hopefully without having to write  
additional code.


Thank you in advance for the help.

-John




--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Re: Sum of Fields and Record Count

2008-12-10 Thread John Martyniak

Grant,

Basically I have created a text field that has the grouping value.   
All of the records would have the same value in this text field.  This  
is accomplished with some pre-processing. When I capture the data, but  
before it is submitted into the index.



-John

On Dec 10, 2008, at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED]  
wrote:



Hi John,

What is your process for determining that #1 is part of the other  
result set?  My gut says this is a faceting problem, i.e. #1 has a  
field contain its category that is also shared by the 10 other  
results, and that all you need to do is facet on the category field.


The other thing that comes to mind is More Like This: 
http://wiki.apache.org/solr/MoreLikeThis

-Grant

On Dec 10, 2008, at 6:16 PM, John Martyniak wrote:


Hi,

I am a new solr user.

I have an application that I would like to show the results but one  
result may be the part of larger set of results.  So for example  
result #1 might also have 10 other results that are part of the  
same data set.


Hopefully this makes sense.

What I would like to find out is if there is a way within Solr to  
show the result that matched with the query, and then to also show  
that this result is part of a collection of 10 items.


I have thought about doing it using some sort of external process  
that runs, and with doing multiple queries, so get the list of  
items and then query against each item.  But those don't seem  
elegant.


So I would like to find out if there is a way to do it within Solr  
that is a little more elegant, and hopefully without having to  
write additional code.


Thank you in advance for the help.

-John




--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Re: Sum of Fields and Record Count

2008-12-10 Thread John Martyniak

Grant,

For the more like this that would show the grouped results, once you  
have clicked on the item, so basically making another query,  would it  
show a count of the more like this results?


Something like cxxc and a collection 10 other items.

-John

On Dec 10, 2008, at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED]  
wrote:



Hi John,

What is your process for determining that #1 is part of the other  
result set? My gut says this is a faceting problem, i.e. #1 has a  
field contain its category that is also shared by the 10 other  
results, and that all you need to do is facet on the category field.


The other thing that comes to mind is More Like This: 
http://wiki.apache.org/solr/MoreLikeThis

-Grant

On Dec 10, 2008, at 6:16 PM, John Martyniak wrote:


Hi,

I am a new solr user.

I have an application that I would like to show the results but one  
result may be the part of larger set of results.  So for example  
result #1 might also have 10 other results that are part of the  
same data set.


Hopefully this makes sense.

What I would like to find out is if there is a way within Solr to  
show the result that matched with the query, and then to also show  
that this result is part of a collection of 10 items.


I have thought about doing it using some sort of external process  
that runs, and with doing multiple queries, so get the list of  
items and then query against each item.  But those don't seem  
elegant.


So I would like to find out if there is a way to do it within Solr  
that is a little more elegant, and hopefully without having to  
write additional code.


Thank you in advance for the help.

-John




--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Ordinal Field value and exact value for date.

2008-12-10 Thread amit rohatgi
Hi All,

I am trying to use ord() function query ord() on created_date.  I am
concrened with the warning of ord behaviour as it uses actual entry creation
in indices instead of created_date value.

Does all entries created initially with different created_date will have
same or nearly ordinal value? If yes then how is the age calculation for the
document works?
Does creation of   index should have knowledge of creation date of document
while adding into indices?

Thanks
amit


RE: Multi Core - Max Core Count Recommendation

2008-12-10 Thread Lance Norskog
1) Our limit is: is how big a file do we want to copy around? 
We switched to multiple indexes because of the logistics of
replicating/backing up giant Lucene index files.

2) Searching takes a little memory, sorting takes a lot of memory, and
faceting eats like a black hole.

There is an unwritten wiki page of practical experiences.

-Original Message-
From: Ryan Peterson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 10, 2008 2:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Multi Core - Max Core Count Recommendation

We are considering a migration to SOLR from a home grown Lucene solution.
Currently we have 27,000 seperate lucene indexes that are separated based on
business logic. Collectively the indexes are about 1.5 Terrabytes in size.
We have some very small indexes and some that are quite large (up to 15GB).
My hesitation of grouping all this data across say 4 SOLR instances is that
each individual idex will still be about 400GB in size. How bit is too big
for a singel Lucene index? Each SOLR instance will be on a dual/dual core
xeon box with 6 SAS 15k drives in Raid 5 config and 16GB of RAM.

If a 400GB instance is too much, I figured I could reduce the size of each
individual index further by using multiple CORES, but again how many would
depend on what size index is too big.

Any suggestions would be greatly appreciated, thank you for your time.

-Ryan





From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, December 10, 2008 1:26:07 PM
Subject: Re: Multi Core - Max Core Count Recommendation

it depends!

yes there is overhead to each core -- how much it matters will depend
entirely on your setup and typical usage pattern.

sorry this is not a particularly useful answer.

I think the choice of how many cores will come down to your domain logic
needs more then hardware.  If you are able to put things into a single index
and get the performance you need, it will just be easier to deal with.

ryan



On Dec 10, 2008, at 3:35 PM, Ryan Peterson wrote:

 I'm trying to see if anyone has any recommendations on the maximum 
 number of cores that should be used within Solr. Is there significant 
 overhead to each core? Should it be 10 or less, or is 100 or 1,000 
 cores acceptable.

 Thanks,

 Ryan



Re: Sum of Fields and Record Count

2008-12-10 Thread Otis Gospodnetic
Hi John,

This sounds a lot like field collapsing functionality that a few people are 
working on in SOLR-236:

https://issues.apache.org/jira/browse/SOLR-236

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: John Martyniak [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 10, 2008 6:16:21 PM
 Subject: Sum of Fields and Record Count
 
 Hi,
 
 I am a new solr user.
 
 I have an application that I would like to show the results but one result 
 may 
 be the part of larger set of results.  So for example result #1 might also 
 have 
 10 other results that are part of the same data set.
 
 Hopefully this makes sense.
 
 What I would like to find out is if there is a way within Solr to show the 
 result that matched with the query, and then to also show that this result is 
 part of a collection of 10 items.
 
 I have thought about doing it using some sort of external process that runs, 
 and 
 with doing multiple queries, so get the list of items and then query against 
 each item.  But those don't seem elegant.
 
 So I would like to find out if there is a way to do it within Solr that is a 
 little more elegant, and hopefully without having to write additional code.
 
 Thank you in advance for the help.
 
 -John



Re: SolrConfig.xml Replication

2008-12-10 Thread Otis Gospodnetic
Jeff,

Are you using Solr 1.3 replication scripts?  If so, I think it would be pretty 
simple to:

1) put all additional files to replicate to slaves to a specific location (or 
use a special naming scheme) on the master
2) write another script that uses scp or rsync to look for those additional 
files and copy them
3) run this new script whenever snappuller + snapinstaller run: snappuller  
snapinstaller  my-file-copying-script


It's not a part of Solr, but it's trivial to add.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Jeff Newburn [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wednesday, December 10, 2008 7:00:30 PM
 Subject: SolrConfig.xml Replication
 
 I am curious as to whether there is a solution to be able to replicate
 solrconfig.xml with the 1.4 replication.  The obvious problem is that the
 master would replicate the solrconfig turning all slaves into masters with
 its config.  I have also tried on a whim to configure the master and slave
 on the master so that the slave points to the same server but that seems to
 break the replication completely.  Please let me know if anybody has any
 ideas
 
 -Jeff



ExtractingRequestHandler and XmlUpdateHandler

2008-12-10 Thread Jacob Singh
Hey folks,

I'm looking at implementing ExtractingRequestHandler in the Apache_Solr_PHP
library, and I'm wondering what we can do about adding meta-data.

I saw the docs, which suggests you use different post headers to pass field
values along with ext.literal.  Is there anyway to use the XmlUpdateHandler
instead along with a document?  I'm not sure how this would work, perhaps it
would require 2 trips, perhaps the XML would be in the post content and
the file in something else?  The thing is we would need to refactor the
class pretty heavily in this case when indexing RichDocs and we were hoping
to avoid it.

Thanks,
Jacob
-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: [EMAIL PROTECTED]


Re: Sum of Fields and Record Count

2008-12-10 Thread John Martyniak

Otis,

Thanks for the information.  It looks like the field collapsing is  
similar to what I am looking.  But is that in the current release?  Is  
it stable?


Is there anyway to do it in Solr 1.3?

-John

On Dec 10, 2008, at 9:59 PM, Otis Gospodnetic wrote:


Hi John,

This sounds a lot like field collapsing functionality that a few  
people are working on in SOLR-236:


https://issues.apache.org/jira/browse/SOLR-236

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: John Martyniak [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, December 10, 2008 6:16:21 PM
Subject: Sum of Fields and Record Count

Hi,

I am a new solr user.

I have an application that I would like to show the results but one  
result may
be the part of larger set of results.  So for example result #1  
might also have

10 other results that are part of the same data set.

Hopefully this makes sense.

What I would like to find out is if there is a way within Solr to  
show the
result that matched with the query, and then to also show that this  
result is

part of a collection of 10 items.

I have thought about doing it using some sort of external process  
that runs, and
with doing multiple queries, so get the list of items and then  
query against

each item.  But those don't seem elegant.

So I would like to find out if there is a way to do it within Solr  
that is a
little more elegant, and hopefully without having to write  
additional code.


Thank you in advance for the help.

-John






RE: Setting Request Handler

2008-12-10 Thread Deshpande, Mukta
Hi Grant,

Thanks for the help.
So now I can have multiple components, configured as last-components
of standard request handler.

Best Regards,
Mukta

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 10, 2008 9:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Setting Request Handler

Inline below...

Also, though, you should note that the /spellCheckCompRH that is
packaged with the example is not necessarily the best way to actually  
use the SpellCheckComponent.   It is intended to be used as a  
component in whatever your MAIN Request Handler is, it merely shows the
how of hooking it in.


On Dec 10, 2008, at 7:51 AM, Deshpande, Mukta wrote:

 Hi,

 I have a request handler in my solrconfig.xml : /spellCheckCompRH It 
 utilizes the search component spellcheck.

 When I specify following query in browser, I get correct spelling 
 suggestions from the file dictionary.

 http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=r
 el evancyspellcheck=truefl=title,scorespellcheck.dictionary=file

 Now I write a java program to achieve the same result:

 Code snippet
 
 .
 .

 server = new CommonsHttpSolrServer(http://localhost:8080/solr;);
 .
 .
 SolrQuery query = new SolrQuery();
 query.setQuery(solr );
 query.setFields(*,score);
 query.set(qt, spellCheckCompRH);

Is spellCheckCompRH a variable?  Does it equal /spellCheckCompRH?


 query.set(spellcheck, true);
 query.set(SpellingParams.SPELLCHECK_DICT, file); 
 query.set(SpellingParams.SPELLCHECK_Q , solt); .
 .
 QueryResponse rsp = server.query( query ); SolrDocumentList docs = 
 rsp.getResults(); SpellCheckResponse srsp = 
 rsp.getSpellCheckResponse();

 I get documents for my query but I do not get any spelling 
 suggestions.
 I think that the request handler is not getting set for the query 
 correctly.

 Can someone please help.

 Best Regards,
 Mukta

--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Re: Dealing with field values as key/value pairs

2008-12-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Dec 11, 2008 at 4:41 AM, Chris Hostetter
[EMAIL PROTECTED] wrote:

 : This is really cool. U... How does it integrate with the Data Import
 : Handler?

 my DIH knowledge is extremely limited, but i'm guessing approach #1 is
 trivial (there is an easy way to concat DB values to build up solr field
 values right?);
yes TemplateTransformer can help you here

approach #2 would probably be possible using multiple root
 entities (assuming multiple root entites means what i think it means)

Yes ,multiple rooot entities can do the trick (with a separate doctype).



 : I've taken two approaches in the past...
 :
 : 1) encode the id and the label in the field value; facet on it; require
 : clients to know how to decode.  This works really well for simple things
 : where the the id=label mappings don't ever change, and are easy to encode
 : (ie 01234:Chris Hostetter).  This is a horrible approach when id=label
 : mappings do change with any frequency.
 :
 : 2) have a seperate type of metadata document, one per thing that you are
 : faceting on containing fields for id and the label (and probably a doc_type
 : field so you can tell it apart from your main docs) then once you've done
 : your main query and gotten the results back facetied on id, you can query
 : for those ids to get the corrisponding labels.  this works realy well if the
 : labels ever change (just reindex the corrisponding metadata document) and
 : has the added bonus that you can store additional metadata in each of those
 : docs, and in many use cases for presenting an initial browse interface,
 : you can sometimes get away with a cheap search for all metadata docs (or all
 : metadata docs meeting a certain
 : criteria) instead of an expensive facet query across all of your main
 : documents.



 -Hoss





-- 
--Noble Paul


Re: SolrConfig.xml Replication

2008-12-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
This is a known issue and I was planning to take it up soon.
https://issues.apache.org/jira/browse/SOLR-821


On Thu, Dec 11, 2008 at 5:30 AM, Jeff Newburn [EMAIL PROTECTED] wrote:
 I am curious as to whether there is a solution to be able to replicate
 solrconfig.xml with the 1.4 replication.  The obvious problem is that the
 master would replicate the solrconfig turning all slaves into masters with
 its config.  I have also tried on a whim to configure the master and slave
 on the master so that the slave points to the same server but that seems to
 break the replication completely.  Please let me know if anybody has any
 ideas

 -Jeff




-- 
--Noble Paul


Re: Solr Newbie question

2008-12-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Dec 10, 2008 at 11:00 PM, Rakesh Sinha [EMAIL PROTECTED] wrote:
 Hi -
  I am a new user of Solr tool  and came across the introductory
 tutorial here - http://lucene.apache.org/solr/tutorial.html  .
 I am planning to use Solr in one of my projects . I see that the
 tutorial mentions about a REST api / interface to add documents and to
 query the same.

 I would like to create  the indices locally , where the web server (or
 pool of servers ) will have access to the database directly , but use
 the query REST api to query for the results.

If your data resides in DB consider using DIH.
http://wiki.apache.org/solr/DataImportHandler


  I am curious how this could be possible without taking the http rest
 api submission to add to indices. (For the sake of simplicity - we can
 assume it would be just one node to store the index but multiple
 readers / query machines that could potentially connect to the solr
 web service and retrieve the query results. Also the index might be
 locally present in the same machine as that of the Solr host or at
 least accessible through NFS etc. )
I guess you are thinking of using a master/slave setup.
see this http://wiki.apache.org/solr/CollectionDistribution
or http://wiki.apache.org/solr/SolrReplication



 Thanks for helping out to some starting pointers regarding the same.




-- 
--Noble Paul


jboss and solr

2008-12-10 Thread Neha Bhardwaj
I am trying to configure jboss wih solr

 

As stated in wiki docs I copied the  solr.war  but there is no web-apps
folder currently present in jboss.

So should I create web-apps manually and paste the war file there.

 

I tried configuring solr with tomcat as well. I paste the war file in
tomcat's web-apps folder. Now when I set system property solr.solr.home

It raises an class not found exception.

 

Can any one help me with that.  


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: Sum of Fields and Record Count

2008-12-10 Thread Otis Gospodnetic
Hi John,

It's not in the current release, but the chances are it will make it into 1.4.  
You can try one of the recent patches and apply it to your Solr 1.3 sources.  
Check list archives for more discussion, this field collapsing was just 
discussed again today/yesterday.  markmail.org is a good one.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: John Martyniak [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 10, 2008 10:51:57 PM
 Subject: Re: Sum of Fields and Record Count
 
 Otis,
 
 Thanks for the information.  It looks like the field collapsing is similar to 
 what I am looking.  But is that in the current release?  Is it stable?
 
 Is there anyway to do it in Solr 1.3?
 
 -John
 
 On Dec 10, 2008, at 9:59 PM, Otis Gospodnetic wrote:
 
  Hi John,
  
  This sounds a lot like field collapsing functionality that a few people are 
 working on in SOLR-236:
  
  https://issues.apache.org/jira/browse/SOLR-236
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: John Martyniak 
  To: solr-user@lucene.apache.org
  Sent: Wednesday, December 10, 2008 6:16:21 PM
  Subject: Sum of Fields and Record Count
  
  Hi,
  
  I am a new solr user.
  
  I have an application that I would like to show the results but one result 
 may
  be the part of larger set of results.  So for example result #1 might also 
 have
  10 other results that are part of the same data set.
  
  Hopefully this makes sense.
  
  What I would like to find out is if there is a way within Solr to show the
  result that matched with the query, and then to also show that this result 
  is
  part of a collection of 10 items.
  
  I have thought about doing it using some sort of external process that 
  runs, 
 and
  with doing multiple queries, so get the list of items and then query 
  against
  each item.  But those don't seem elegant.
  
  So I would like to find out if there is a way to do it within Solr that is 
  a
  little more elegant, and hopefully without having to write additional code.
  
  Thank you in advance for the help.
  
  -John
  



Re: jboss and solr

2008-12-10 Thread Akshay
On Thu, Dec 11, 2008 at 11:21 AM, Neha Bhardwaj 
[EMAIL PROTECTED] wrote:

 I am trying to configure jboss wih solr



 As stated in wiki docs I copied the  solr.war  but there is no web-apps
 folder currently present in jboss.

 So should I create web-apps manually and paste the war file there.

For JBoss, war files are deployed to this location:
$JBOSS_HOME/server/default/deploy
Please look up resources on the net for more information on running
applications in JBoss.





 I tried configuring solr with tomcat as well. I paste the war file in
 tomcat's web-apps folder. Now when I set system property solr.solr.home

 It raises an class not found exception.

Probably something is missing in the environment settings.
One way to get solr running in Tomcat is to start the Tomcat server from the
directory where solr home is present. E.g. solr home is at location
/home/users/test-solr/solr then start tomcat server from
/home/users/test-solr directory. This assumes that you have $TOMCAT_HOME/bin
in your PATH env variable.





 Can any one help me with that.


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.




-- 
Regards,
Akshay Ukey.


minimum match issue with dismax

2008-12-10 Thread vinay kumar kaku

Hi,
  do any one know how to make sure minimum match in dismax is working? i change 
the values and try doing solrCtl restart indexname but i don't see it taking 
into effect. any body have an idea on this?


thank you
vinay

_
You live life online. So we put Windows on the web. 
http://clk.atdmt.com/MRT/go/127032869/direct/01/

Newbie Question boosting

2008-12-10 Thread ayyanar

I read many articles on boosting still iam not so clear on boosting. Can
anyone explain the following questions with examples?

1) Can you given an example for field level boosting and document level
boosting and the difference between two?

2) If we set the boost at field level (index time), should the query
contains the that particular field?
For example, if we set the boost for title field, should we create the
termquery for title field?

Also, based on your experience, can you explain why you need the boosting.

Thanks,
Ayyanar. A
-- 
View this message in context: 
http://www.nabble.com/Newbie-Question-boosting-tp20950268p20950268.html
Sent from the Solr - User mailing list archive at Nabble.com.



Nwebie Question on boosting

2008-12-10 Thread ayyanar

I read many articles on boosting still iam not so clear on boosting. Can
anyone explain the following questions with examples? 

1) Can you given an example for field level boosting and document level
boosting and the difference between two? 

2) If we set the boost at field level (index time), should the query
contains the that particular field? 
For example, if we set the boost for title field, should we create the
termquery for title field? 

Also, based on your experience, can you explain why you need the boosting. 

Thanks, 
Ayyanar. A
-- 
View this message in context: 
http://www.nabble.com/Nwebie-Question-on-boosting-tp20950286p20950286.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Nwebie Question on boosting

2008-12-10 Thread Robert Young
On Thu, Dec 11, 2008 at 6:49 AM, ayyanar
[EMAIL PROTECTED]wrote:

 1) Can you given an example for field level boosting and document level
 boosting and the difference between two?

Field level boosting is used when one field is considered more or less
important than another. For example, you may want the title field of a
document to be considered more important so that if a term appears in the
title this considered more important than if it appears in the body. On the
other hand, document level boosting is about when a document is more or less
important than another. For example, an FAQ is often considered a very
important page and as such, may be required to appear higher in results than
it otherwise would have.



 2) If we set the boost at field level (index time), should the query
 contains the that particular field?
 For example, if we set the boost for title field, should we create the
 termquery for title field?

 Yes, if you want that it to make any difference.


Rob