Re: Performance optimization of Proximity/Wildcard searches
Only couple of thousand documents are added daily so the old OS cache should still be useful since old documents remain same, right? Also can you please comment on my other thread related to Term Vectors? Thanks! On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic wrote: > Yes, OS cache mostly remains (obviously index files that are no longer > around > are going to remain the OS cache for a while, but will be useless and > gradually > replaced by new index files). > How long warmup takes is not relevant here, but what queries you use to > warm up > the index and how much you auto-warm the caches. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Salman Akram > > To: solr-user@lucene.apache.org > > Sent: Sat, February 5, 2011 4:06:54 AM > > Subject: Re: Performance optimization of Proximity/Wildcard searches > > > > Correct me if I am wrong. > > > > Commit in index flushes SOLR cache but of course OS cache would still be > > useful? If a an index is updated every hour then a warm up that takes > less > > than 5 mins should be more than enough, right? > > > > On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic < > otis_gospodne...@yahoo.com > > > wrote: > > > > > Salman, > > > > > > Warming up may be useful if your caches are getting decent hit ratios. > > > Plus, you > > > are warming up the OS cache when you warm up. > > > > > > Otis > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > > > > > - Original Message > > > > From: Salman Akram > > > > To: solr-user@lucene.apache.org > > > > Sent: Fri, February 4, 2011 3:33:41 PM > > > > Subject: Re: Performance optimization of Proximity/Wildcard searches > > > > > > > > I know so we are not really using it for regular warm-ups (in any > case > > > index > > > > is updated on hourly basis). Just tried few times to compare > results. > > > The > > > > issue is I am not even sure if warming up is useful for such > regular > > > > updates. > > > > > > > > > > > > > > > > On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic < > > > otis_gospodne...@yahoo.com > > > > > wrote: > > > > > > > > > Salman, > > > > > > > > > > I only skimmed your email, but wanted to say that this part > sounds a > > > little > > > > > suspicious: > > > > > > > > > > > Our warm up script currently executes all distinct queries in > our > > > logs > > > > > > having count > 5. It was run yesterday (with all the indexing > > > update > > > > > every > > > > > > > > > > It sounds like this will make warmup take a long time, > assuming > > > you > > > > > have > > > > > more than a handful distinct queries in your logs. > > > > > > > > > > Otis > > > > > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > > > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > > > > > > > > > > > > > - Original Message > > > > > > From: Salman Akram > > > > > > To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk > > > > > > Sent: Tue, January 25, 2011 6:32:48 AM > > > > > > Subject: Re: Performance optimization of Proximity/Wildcard > searches > > > > > > > > > > > > By warmed index you only mean warming the SOLR cache or OS > cache? As > > > I > > > > > said > > > > > > our index is updated every hour so I am not sure how much SOLR > cache > > > > > would > > > > > > be helpful but OS cache should still be helpful, right? > > > > > > > > > > > > I haven't compared the results with a proper script but from > manual > > > > > testing > > > > > > here are some of the observations. > > > > > > > > > > > > 'Recent' queries which are in cache of course return > immediately > > > (only > > > > > if > > > > > > they are exactly same - even if they took 3-4 mins first > time). I > > > will > > > > > need > > > > > > to test how many recent queries stay in cache but still this > would > > > work > > > > > only > > > > > > for very common queries. User can run different queries and I > want > > > at > > > > > least > > > > > > them to be at 'acceptable' level (5-10 secs) even if not very > fast. > > > > > > > > > > > > Our warm up script currently executes all distinct queries in > our > > > logs > > > > > > having count > 5. It was run yesterday (with all the indexing > > > update > > > > > every > > > > > > hour after that) and today when I executed some of the same > > > queries > > > > > again > > > > > > their time seemed a little less (around 15-20%), I am not > sure if > > > this > > > > > means > > > > > > anything. However, still their time is not acceptable. > > > > > > > > > > > > What do you think is the best way to compare results? First > run all > > > the > > > > > warm > > > > > > up queries and then
Re: AND operator and dismax request handler
Hi Bagesh, I think Hossman and Erick have given you the path that can you choos and found the desired result. Try mm value set to 0 to dismax work for your operators "AND" OR and NOT. Thanx: Grijesh Lucid Imagination Inc. On Sat, Feb 5, 2011 at 8:17 PM, Bagesh Sharma [via Lucene] wrote: > Hi friends, Please suggest me that how can i set query operator to AND for > dismax request handler case. > > My problem is that i am searching a string "water treatment plant" using > dismax request handler . The query formed is of such type > > http://localhost:8884/solr/select/?q=water+treatment+plant&q.alt=*:*&start=0&rows=5&sort=score%20desc&qt=dismax&omitHeader=true > > My handling for dismax request handler in solrConfig.xml is - > > default="true"> > > true > explicit > 0.2 > > > TDR_SUBIND_SUBTDR_SHORT^3 > TDR_SUBIND_SUBTDR_DETAILS^2 > TDR_SUBIND_COMP_NAME^1.5 > TDR_SUBIND_LOC_STATE^3 > TDR_SUBIND_PROD_NAMES^2.5 > TDR_SUBIND_LOC_CITY^3 > TDR_SUBIND_LOC_ZIP^2.5 > TDR_SUBIND_NAME^1.5 > TDR_SUBIND_TENDER_NO^1 > > > > TDR_SUBIND_SUBTDR_SHORT^15 > TDR_SUBIND_SUBTDR_DETAILS^10 > TDR_SUBIND_COMP_NAME^20 > > > 1 > 0 > 20% > > > > > In the final parsed query it is like > > +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 | > TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water | > TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 | > TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 | > TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 | > TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 | > TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 | > TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0 > | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2 > (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 | > TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant | > TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 | > TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 | > TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:"water treatment > plant"^10.0 | TDR_SUBIND_COMP_NAME:"water treatment plant"^20.0 | > TDR_SUBIND_SUBTDR_SHORT:"water treatment plant"^15.0)~0.2 > > > > Now it gives me results if any of the word is found from text "water > treatment plant". I think here OR operator is working which finally combines > the results. > > Now i want only those results for which only complete text should be > matching "water treatment plant". > > 1. I do not want to make any change in solrConfig.xml dismax handler. If > possible then suggest any other handler to deal with it. > > 2. Does there is really or operator is working in query. basically when i > query like this > > q=%2Bwater%2Btreatment%2Bplant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score > desc,TDR_SUBIND_SUBTDR_OPEN_DATE > asc&omitHeader=true&debugQuery=true&qt=dismax > > OR > > q=water+AND+treatment+AND+plant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score > desc,TDR_SUBIND_SUBTDR_OPEN_DATE > asc&omitHeader=true&debugQuery=true&qt=dismax > > > Then it is giving different results. Can you suggest what is the difference > between above two queries. > > Please suggest me for full text search "water treatment plant". > > Thanks for your response. > > > > This email was sent by Bagesh Sharma (via Nabble) > Your replies will appear at > http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html > To receive all replies by email, subscribe to this discussion > > - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2441363.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HTTP ERROR 400 undefined field: *
Yup, here it is, warning about needing to reindex: http://twitter.com/#!/lucene/status/28694113180192768 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Erick Erickson > To: solr-user@lucene.apache.org > Sent: Sun, February 6, 2011 9:43:00 AM > Subject: Re: HTTP ERROR 400 undefined field: * > > I *think* that there was a post a while ago saying that if you were > using trunk 3_x one of the recent changes required re-indexing, but don't > quote me on that. > Have you tried that? > > Best > Erick > > On Fri, Feb 4, 2011 at 2:04 PM, Jed Glazner >wrote: > > > Sorry for the lack of details. > > > > It's all clear in my head.. :) > > > > We checked out the head revision from the 3.x branch a few weeks ago ( > > https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We > > picked up r1058326. > > > > We upgraded from a previous checkout (r960098). I am using our customized > > schema.xml and the solrconfig.xml from the old revision with the new > > checkout. > > > > After upgrading I just copied the data folders from each core into the new > > checkout (hoping I wouldn't have to re-index the content, as this takes > > days). Everything seems to work fine, except that now I can't get the score > > to return. > > > > The stack trace is attached. I also saw this warning in the logs not sure > > exactly what it's talking about: > > > > Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion > > WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24 > > emulation. You should at some point declare and reindex to at least 3.0, > > because 2.4 emulation is deprecated and will be removed in 4.0. This > > parameter will be mandatory in 4.0. > > > > Here is my request handler, the actual fields here are different than what > > is in mine, but I'm a little uncomfortable publishing how our companies > > search service works to the world: > > > > > > > > explicit > > edismax > > true > > > > field_a^2 field_b^2 field_c^4 > > > > > > field_d^10 > > > > > > > > > > 0.1 > > > > > > tvComponent > > > > > > > > Anyway Hopefully this is enough info, let me know if you need more. > > > > Jed. > > > > > > > > > > > > > > On 02/03/2011 10:29 PM, Chris Hostetter wrote: > > > >> : I was working on an checkout of the 3.x branch from about 6 months ago. > >> : Everything was working pretty well, but we decided that we should update > >> and > >> : get what was at the head. However after upgrading, I am now getting > >> this > >> > >> FWIW: please be specific. "head" of what? the 3x branch? or trunk? what > >> revision in svn does that corrispond to? (the "svnversion" command will > >> tell you) > >> > >> : HTTP ERROR 400 undefined field: * > >> : > >> : If I clear the fl parameter (default is set to *, score) then it works > >> fine > >> : with one big problem, no score data. If I try and set fl=score I get > >> the same > >> : error except it says undefined field: score?! > >> : > >> : This works great in the older version, what changed? I've googled for > >> about > >> : an hour now and I can't seem to find anything. > >> > >> i can't reproduce this using either trunk (r1067044) or 3x (r1067045) > >> > >> all of these queries work just fine... > >> > >>http://localhost:8983/solr/select/?q=* > >> http://localhost:8983/solr/select/?q=solr&fl=*,score > >> http://localhost:8983/solr/select/?q=solr&fl=score > >> http://localhost:8983/solr/select/?q=solr > >> > >> ...you'll have to proivde us with a *lot* more details to help understand > >> why you might be getting an error (like: what your configs look like, what > >> the request looks like, what the full stack trace of your error is in the > >> logs, etc...) > >> > >> > >> > >> > >> -Hoss > >> > > > > >
Re: Optimize seaches; business is progressing with my Solr site
Hmmm, my default distance for geospatial was excluding the results, I believe. I have to check to see if I was actually looking at the desired return result for 'ballroom' alone. Mabye I wasn't. But I saw a lot to learn when I applied the techniques you gave me. Thank you :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. From: Erick Erickson To: solr-user@lucene.apache.org Sent: Sun, February 6, 2011 8:21:15 AM Subject: Re: Optimize seaches; business is progressing with my Solr site What does &debugQuery=on give you? Second, what optimizatons are you doing? What shows up in they analysis page? does your admin page show the terms in your copyfield you expect? Best Erick On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon wrote: > Thanks to LOTS of information from you guys, my site is up and working. > It's > only an API now, I need to work on my OWN front end, LOL! > > I have my second customer. My general purpose repository API is very useful > I'm > finding. I will soon be in the business of optimizing the search engine > part. > > > For example. I have a copy field that has the words, 'boogie woogie > ballroom' on > lots of records in the copy field. I cannot find those records using > 'boogie/boogi/boog', or the woogie versions of those, but I can with > ballroom. > For my VERY first lesson in optimization of search, what might be causing > that, > and where are the places to read on the Solr site on this? > > All the best on a Sunday, guys and gals. > > Dennis Gearon > > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. >
Re: Separating Index Reader and Writer
Hi Peter, I must jump in this discussion: From a logical point of view what you are saying makes only sense if both instances do not run on the same machine or at least not on the same drive. When both run on the same machine and the same drive, the overall used memory should be equal plus I do not understand why this setup should affect cache warming etc., since the process of rewarming should be the same. Well, my knowledge about the internals is not very deep. But from just a logical point of view - to me - the same is happening as if I would do it in a single solr-instance. So what is the difference, what do I overlook? Another thing: While W is committing and writing to the index, is there any inconsistency in R or isn't there any, because W is writing a new Segment and so for R there isn't anything different until the commit finished? Are there problems during optimizing an index? How do you inform R about the finished commit? Thank you for your explanation, it's a really interesting topic! Regards, Em Peter Sturge-2 wrote: > > Hi, > > We use this scenario in production where we have one write-only Solr > instance and 1 read-only, pointing to the same data. > We do this so we can optimize caching/etc. for each instance for > write/read. The main performance gain is in cache warming and > associated parameters. > For your Index W, it's worth turning off cache warming altogether, so > commits aren't slowed down by warming. > > Peter > > > On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia > wrote: >> Hi all, >> I have setup two indexes one for reading(R) and other for >> writing(W).Index R >> refers to the same data dir of W (defined in solrconfig via ). >> To make sure the R index sees the indexed documents of W , i am firing an >> empty commit on R. >> With this , I am getting performance improvement as compared to using the >> same index for reading and writing . >> Can anyone help me in knowing why this performance improvement is taking >> place even though both the indexeses are pointing to the same data >> directory. >> >> -- >> Thanks & Regards, >> Isan Fulia. >> > > -- View this message in context: http://lucene.472066.n3.nabble.com/Separating-Index-Reader-and-Writer-tp2437666p2438730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Separating Index Reader and Writer
Hi peter , Can you elaborate a little on how performance gain is in cache warming.I am getting a good improvement on search time. On 6 February 2011 23:29, Peter Sturge wrote: > Hi, > > We use this scenario in production where we have one write-only Solr > instance and 1 read-only, pointing to the same data. > We do this so we can optimize caching/etc. for each instance for > write/read. The main performance gain is in cache warming and > associated parameters. > For your Index W, it's worth turning off cache warming altogether, so > commits aren't slowed down by warming. > > Peter > > > On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia > wrote: > > Hi all, > > I have setup two indexes one for reading(R) and other for > writing(W).Index R > > refers to the same data dir of W (defined in solrconfig via ). > > To make sure the R index sees the indexed documents of W , i am firing an > > empty commit on R. > > With this , I am getting performance improvement as compared to using the > > same index for reading and writing . > > Can anyone help me in knowing why this performance improvement is taking > > place even though both the indexeses are pointing to the same data > > directory. > > > > -- > > Thanks & Regards, > > Isan Fulia. > > > -- Thanks & Regards, Isan Fulia.
Re: Separating Index Reader and Writer
Hi, We use this scenario in production where we have one write-only Solr instance and 1 read-only, pointing to the same data. We do this so we can optimize caching/etc. for each instance for write/read. The main performance gain is in cache warming and associated parameters. For your Index W, it's worth turning off cache warming altogether, so commits aren't slowed down by warming. Peter On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia wrote: > Hi all, > I have setup two indexes one for reading(R) and other for writing(W).Index R > refers to the same data dir of W (defined in solrconfig via ). > To make sure the R index sees the indexed documents of W , i am firing an > empty commit on R. > With this , I am getting performance improvement as compared to using the > same index for reading and writing . > Can anyone help me in knowing why this performance improvement is taking > place even though both the indexeses are pointing to the same data > directory. > > -- > Thanks & Regards, > Isan Fulia. >
Re: Optimize seaches; business is progressing with my Solr site
What does &debugQuery=on give you? Second, what optimizatons are you doing? What shows up in they analysis page? does your admin page show the terms in your copyfield you expect? Best Erick On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon wrote: > Thanks to LOTS of information from you guys, my site is up and working. > It's > only an API now, I need to work on my OWN front end, LOL! > > I have my second customer. My general purpose repository API is very useful > I'm > finding. I will soon be in the business of optimizing the search engine > part. > > > For example. I have a copy field that has the words, 'boogie woogie > ballroom' on > lots of records in the copy field. I cannot find those records using > 'boogie/boogi/boog', or the woogie versions of those, but I can with > ballroom. > For my VERY first lesson in optimization of search, what might be causing > that, > and where are the places to read on the Solr site on this? > > All the best on a Sunday, guys and gals. > > Dennis Gearon > > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. >
Separating Index Reader and Writer
Hi all, I have setup two indexes one for reading(R) and other for writing(W).Index R refers to the same data dir of W (defined in solrconfig via ). To make sure the R index sees the indexed documents of W , i am firing an empty commit on R. With this , I am getting performance improvement as compared to using the same index for reading and writing . Can anyone help me in knowing why this performance improvement is taking place even though both the indexeses are pointing to the same data directory. -- Thanks & Regards, Isan Fulia.
Re: AND operator and dismax request handler
Try attaching &debugQuery=on to your queries. The results will show you exactly what the query is after it gets parsed and the difference should stand out. About dismax. Try looking at the "minimum should match" parameter, that might do what you're looking for. Or, think about edismax if you're on trunk or 3_x... Best Erick On Sat, Feb 5, 2011 at 9:47 AM, Bagesh Sharma wrote: > > Hi friends, Please suggest me that how can i set query operator to AND for > dismax request handler case. > > My problem is that i am searching a string "water treatment plant" using > dismax request handler . The query formed is of such type > > > http://localhost:8884/solr/select/?q=water+treatment+plant&q.alt=*:*&start=0&rows=5&sort=score%20desc&qt=dismax&omitHeader=true > > My handling for dismax request handler in solrConfig.xml is - > > default="true"> > >true >explicit >0.2 > > >TDR_SUBIND_SUBTDR_SHORT^3 >TDR_SUBIND_SUBTDR_DETAILS^2 >TDR_SUBIND_COMP_NAME^1.5 >TDR_SUBIND_LOC_STATE^3 >TDR_SUBIND_PROD_NAMES^2.5 >TDR_SUBIND_LOC_CITY^3 >TDR_SUBIND_LOC_ZIP^2.5 >TDR_SUBIND_NAME^1.5 >TDR_SUBIND_TENDER_NO^1 > > > >TDR_SUBIND_SUBTDR_SHORT^15 >TDR_SUBIND_SUBTDR_DETAILS^10 >TDR_SUBIND_COMP_NAME^20 > > >1 >0 >20% > > > > > In the final parsed query it is like > > +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 | > TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water | > TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 | > TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 | > TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 | > TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 | > TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 | > TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0 > | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2 > (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 | > TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant | > TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 | > TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 | > TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:"water treatment > plant"^10.0 | TDR_SUBIND_COMP_NAME:"water treatment plant"^20.0 | > TDR_SUBIND_SUBTDR_SHORT:"water treatment plant"^15.0)~0.2 > > > > Now it gives me results if any of the word is found from text "water > treatment plant". I think here OR operator is working which finally > combines > the results. > > Now i want only those results for which only complete text should be > matching "water treatment plant". > > 1. I do not want to make any change in solrConfig.xml dismax handler. If > possible then suggest any other handler to deal with it. > > 2. Does there is really or operator is working in query. basically when i > query like this > > q=%2Bwater%2Btreatment%2Bplant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score > desc,TDR_SUBIND_SUBTDR_OPEN_DATE > asc&omitHeader=true&debugQuery=true&qt=dismax > > OR > > > q=water+AND+treatment+AND+plant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score > desc,TDR_SUBIND_SUBTDR_OPEN_DATE > asc&omitHeader=true&debugQuery=true&qt=dismax > > > Then it is giving different results. Can you suggest what is the difference > between above two queries. > > Please suggest me for full text search "water treatment plant". > > Thanks for your response. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: HTTP ERROR 400 undefined field: *
I *think* that there was a post a while ago saying that if you were using trunk 3_x one of the recent changes required re-indexing, but don't quote me on that. Have you tried that? Best Erick On Fri, Feb 4, 2011 at 2:04 PM, Jed Glazner wrote: > Sorry for the lack of details. > > It's all clear in my head.. :) > > We checked out the head revision from the 3.x branch a few weeks ago ( > https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We > picked up r1058326. > > We upgraded from a previous checkout (r960098). I am using our customized > schema.xml and the solrconfig.xml from the old revision with the new > checkout. > > After upgrading I just copied the data folders from each core into the new > checkout (hoping I wouldn't have to re-index the content, as this takes > days). Everything seems to work fine, except that now I can't get the score > to return. > > The stack trace is attached. I also saw this warning in the logs not sure > exactly what it's talking about: > > Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion > WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24 > emulation. You should at some point declare and reindex to at least 3.0, > because 2.4 emulation is deprecated and will be removed in 4.0. This > parameter will be mandatory in 4.0. > > Here is my request handler, the actual fields here are different than what > is in mine, but I'm a little uncomfortable publishing how our companies > search service works to the world: > > > > explicit > edismax > true > > field_a^2 field_b^2 field_c^4 > > > field_d^10 > > > > > 0.1 > > > tvComponent > > > > Anyway Hopefully this is enough info, let me know if you need more. > > Jed. > > > > > > > On 02/03/2011 10:29 PM, Chris Hostetter wrote: > >> : I was working on an checkout of the 3.x branch from about 6 months ago. >> : Everything was working pretty well, but we decided that we should update >> and >> : get what was at the head. However after upgrading, I am now getting >> this >> >> FWIW: please be specific. "head" of what? the 3x branch? or trunk? what >> revision in svn does that corrispond to? (the "svnversion" command will >> tell you) >> >> : HTTP ERROR 400 undefined field: * >> : >> : If I clear the fl parameter (default is set to *, score) then it works >> fine >> : with one big problem, no score data. If I try and set fl=score I get >> the same >> : error except it says undefined field: score?! >> : >> : This works great in the older version, what changed? I've googled for >> about >> : an hour now and I can't seem to find anything. >> >> i can't reproduce this using either trunk (r1067044) or 3x (r1067045) >> >> all of these queries work just fine... >> >>http://localhost:8983/solr/select/?q=* >>http://localhost:8983/solr/select/?q=solr&fl=*,score >>http://localhost:8983/solr/select/?q=solr&fl=score >>http://localhost:8983/solr/select/?q=solr >> >> ...you'll have to proivde us with a *lot* more details to help understand >> why you might be getting an error (like: what your configs look like, what >> the request looks like, what the full stack trace of your error is in the >> logs, etc...) >> >> >> >> >> -Hoss >> > >
Re: keepword file with phrases
Hi Chris, Yes you've identified the problem :-) I've tried using keyword tokeniser but that seems to merge all comma seperated lists of synonyms in one. the pattern tokeniser would seem to be a candidate but can you pass the pattern attribute to the tokeniser attribute in the synontm filter ? example synonym line which is problematic termA1,termA2,termA3, phrase termA, termA4 => normalisedTermA termB1,termB2,termB3 => normalisedTermB when the synonym filter uses the keyword tokeniser only "phrase term A" ends up being matched as a synonym :-) lee On 6 February 2011 12:58, lee carroll wrote: > Hi Bill, > > quoting in the synonyms file did not produce the correct expansion :-( > > Looking at Chris's comments now > > cheers > > lee > > > On 5 February 2011 23:38, Bill Bell wrote: > >> OK that makes sense. >> >> If you double quote the synonyms file will that help for white space? >> >> Bill >> >> >> On 2/5/11 4:37 PM, "Chris Hostetter" wrote: >> >> > >> >: You need to switch the order. Do synonyms and expansion first, then >> >: shingles.. >> > >> >except then he would be building shingles out of all the permutations of >> >"words" in his symonyms -- including the multi-word synonyms. i don't >> >*think* that's what he wants based on his example (but i may be wrong) >> > >> >: Have you tried using analysis.jsp ? >> > >> >he already mentioned he has, in his original mail, and that's how he can >> >tell it's not working. >> > >> >lee: based on your followup post about seeing problems in the synonyms >> >output, i suspect the problem you are having is with how the >> >synonymfilter >> >"parses" the synonyms file -- by default it assumes it should split on >> >certain characters to creates multi-word synonyms -- but in your case the >> >tokens you are feeding synonym filter (the output of your shingle filter) >> >really do have whitespace in them >> > >> >there is a "tokenizerFactory" option that Koji added a hwile back to the >> >SYnonymFilterFactory that lets you specify the classname of a >> >TokenizerFactory to use when parsing the synonym rule -- that may be what >> >you need to get your synonyms with spaces in them (so they work properly >> >with your shingles) >> > >> >(assuming of course that i really understand your problem) >> > >> > >> >-Hoss >> >> >> >
Re: keepword file with phrases
Hi Bill, quoting in the synonyms file did not produce the correct expansion :-( Looking at Chris's comments now cheers lee On 5 February 2011 23:38, Bill Bell wrote: > OK that makes sense. > > If you double quote the synonyms file will that help for white space? > > Bill > > > On 2/5/11 4:37 PM, "Chris Hostetter" wrote: > > > > >: You need to switch the order. Do synonyms and expansion first, then > >: shingles.. > > > >except then he would be building shingles out of all the permutations of > >"words" in his symonyms -- including the multi-word synonyms. i don't > >*think* that's what he wants based on his example (but i may be wrong) > > > >: Have you tried using analysis.jsp ? > > > >he already mentioned he has, in his original mail, and that's how he can > >tell it's not working. > > > >lee: based on your followup post about seeing problems in the synonyms > >output, i suspect the problem you are having is with how the > >synonymfilter > >"parses" the synonyms file -- by default it assumes it should split on > >certain characters to creates multi-word synonyms -- but in your case the > >tokens you are feeding synonym filter (the output of your shingle filter) > >really do have whitespace in them > > > >there is a "tokenizerFactory" option that Koji added a hwile back to the > >SYnonymFilterFactory that lets you specify the classname of a > >TokenizerFactory to use when parsing the synonym rule -- that may be what > >you need to get your synonyms with spaces in them (so they work properly > >with your shingles) > > > >(assuming of course that i really understand your problem) > > > > > >-Hoss > > >
Re: UIMA Error
Hi How to apply the AlchemyAPIAnnotator? will this helps me with the *NamedEntityExtractionAnnotator?* *thanx a lot Tommaso for you time*