Re: Read past EOF error due to broken connection
First commit and then try again to search. You can also use lucene's CheckIndex tool to check fix your index (it may remove some corrupt segments in your index) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Read-past-EOF-error-due-to-broken-connection-tp3091247p3094334.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem in accessing a variable's changed value outside of if block in javascript code
*$(#submit).click(function(){ var query=getquerystring() ; //get the query string entered by user // get the JSON response from solr server var newquery=query; $.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+json.wrf=?;, function(result){ //$.each(result.response.docs, function(result){ if(result.response.numFound==0) { $.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+spellcheck=truejson.wrf=?;, function(result){ $.each(result.spellcheck.suggestions, function(i,item){ newquery=item.suggestion; }); }); }* favorite $(#submit).click(function(){ var query=getquerystring() ; //get the query string entered by user // get the JSON response from solr server var newquery=query; $.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+json.wrf=?;, function(result){ //$.each(result.response.docs, function(result){ if(result.response.numFound==0) { $.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+spellcheck=truejson.wrf=?;, function(result){ $.each(result.spellcheck.suggestions, function(i,item){ newquery=item.suggestion; }); }); } In the above javascript code a variable newquery initialy having value of query. but when the if condition is true its value have changed. but my problem is i am not getting its changed value outside of if block while i want this changed value. how can i do this. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-accessing-a-variable-s-changed-value-outside-of-if-block-in-javascript-code-tp3094342p3094342.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Clustering For Multiple Pages
Thanks Alot . I was thinking i am not doing in correct way . - Regards Nilay Tiwari -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Clustering-For-Multiple-Pages-tp3085507p3094379.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Clustering For Multiple Pages
Can you please tell me how can i apply filter in cluster data in Solr ? Currently i storing docid and topic name in Map and get the ids by topic from Map and then pass into solr separating by OR condition Is there any other way to do this - Regards Nilay Tiwari -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Clustering-For-Multiple-Pages-tp3085507p3094390.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Parse solr json object
try this mail list http://docs.jquery.com/Discussion or this doc http://api.jquery.com/jQuery.each/ On 21 June 2011 07:32, Romi romijain3...@gmail.com wrote: Hi, for enabling highlighting i want to parse json object. for readilibility i included xml format of that json object. please tell me how should i parse this object using $.each(, function(i,item){ so that i could get highlighted result. lst name=highlighting - lst name=12250 - arr name=description - str These emelegant/em and fluid earrings have six round prong-set and twenty-six faceted briolette /str /arr /lst - lst name=12254 - arr name=description - str These emelegant/em and fluid earrings have six round prong-set and twenty-six faceted briolette /str /arr /lst - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Parse-solr-json-object-tp3089470p3089470.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Clustering For Multiple Pages
I don't quite follow, I must admit. Maybe it's faceting you're after? http://wiki.apache.org/solr/SolrFacetingOverview Staszek On Wed, Jun 22, 2011 at 08:40, nilay@gmail.com nilay@gmail.comwrote: Can you please tell me how can i apply filter in cluster data in Solr ? Currently i storing docid and topic name in Map and get the ids by topic from Map and then pass into solr separating by OR condition Is there any other way to do this - Regards Nilay Tiwari -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Clustering-For-Multiple-Pages-tp3085507p3094390.html Sent from the Solr - User mailing list archive at Nabble.com.
strange utf-8 problem
I use solr 4 trunk to index some sites with nutch 1-2-rc4. When i try to index 300k documents with solr4 i get error. But when i use solr 1.4.1 version there is no problem like that. I install solr4 to tomcat5,6 jetty7,8 there is no change. I use apache-solr-core-1.4.0.jar apache-solr-solrj-1.4.0.jar for solr 1.4.1 becouse of javabin errors. here is problematic chars. Sao Tom���nd Princip���STP SEVERE: java.lang.RuntimeException: [was class java.io.CharConversionException] Invalid UTF-8 character 0x at char #681112, byte #700315) at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:266) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:126) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1323) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:476) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:937) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:871) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:247) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110) at org.eclipse.jetty.server.Server.handle(Server.java:346) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:589) at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:1065) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:915) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:411) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:535) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.CharConversionException: Invalid UTF-8 character 0x at char #681112, byte #700315) at com.ctc.wstx.io.UTF8Reader.reportInvalid(UTF8Reader.java:335) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:249) at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) ... 32 more -- View this message in context: http://lucene.472066.n3.nabble.com/strange-utf-8-problem-tp3094473p3094473.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MultiValued facet behavior question
Hi Bill, yes, you absolutely do make sense. I posted the exact same question to this mailing list (subject: faceting on multivalued fields), but got no response out of it. A friend of mine is now helping out. I hope someone on the list can give us some advice. I'll post our findings to this topic. Regards, Dennis On Wed, Jun 22, 2011 at 5:37 AM, Bill Bell billnb...@gmail.com wrote: Doing it with q=specialities:Cardiologist or q=CardiologistdefType=dismaxqf=specialties does not matter, the issue is how I see facets. I want the facets to only show the one match, and not all the multiValued fields in specialties that match... Example, Name|specialties Bell|Cardiologist Smith|Cardiologist,Family Doctor Adams,Cardiologist,Family Doctor,Internist When I facet.field=specialties I get: Cardiologist: 3 Internist: 1 Family Doctor: 1 I only want it to return: Cardiologist: 3 Because this matches exactly... Facet on the field that matches and only return the number for that. It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 If I searched on q=internistdefType=dismaxqf=specialties, I want the result to be: Internist: 1 Does this all make sense? On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote: So are you saying that for all results for cardiologist, you don't want facets not matching Cardiologist to be returned as facets? what happens when you make q=specialities:Cardiologist? instead of just q=Cardiologist? Seems that if you make the query on the field, then all your results will necessarily qualify and you can discard any additional facets you don't want (e.g. that don't match the initial query term). Maybe you can write what you see now, with what you want to help clarify. On 06/21/2011 09:47 PM, Bill Bell wrote: I have a field: specialties that is multiValued. It indicates the doctor's specialties: cardiologist, internist, etc. When someone does a search: Cardiologist, I use q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci alt ies What I want to come out in the facet is the Cardiologist (since it matches exactly) and the number that matches: 700. I don't want to see the other values that are not Cardiologist. Now I see: Cardiologist: 700 Internist: 45 Family Doctor: 20 This means that several Cardiologist's are also internists and family doctors. When it matches exactly, I don't want to see Internists, Family Doctors. How do I send a query to Solr with a condition. Facet.query=specialties:Cardiologistfacet.field=specialties Then if the query returns something use it, otherwise use the field one? Other ideas?
Re: MultiValued facet behavior question
Can your front end app normalize the q parameter. Either with a drop down or a type a head derived from the values in the specialties field. that way q will match value(s) in your facet results. I'm not sure what you are trying to achieve though so maybe i'm off the mark. On 22 June 2011 04:37, Bill Bell billnb...@gmail.com wrote: Doing it with q=specialities:Cardiologist or q=CardiologistdefType=dismaxqf=specialties does not matter, the issue is how I see facets. I want the facets to only show the one match, and not all the multiValued fields in specialties that match... Example, Name|specialties Bell|Cardiologist Smith|Cardiologist,Family Doctor Adams,Cardiologist,Family Doctor,Internist When I facet.field=specialties I get: Cardiologist: 3 Internist: 1 Family Doctor: 1 I only want it to return: Cardiologist: 3 Because this matches exactly... Facet on the field that matches and only return the number for that. It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 If I searched on q=internistdefType=dismaxqf=specialties, I want the result to be: Internist: 1 Does this all make sense? On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote: So are you saying that for all results for cardiologist, you don't want facets not matching Cardiologist to be returned as facets? what happens when you make q=specialities:Cardiologist? instead of just q=Cardiologist? Seems that if you make the query on the field, then all your results will necessarily qualify and you can discard any additional facets you don't want (e.g. that don't match the initial query term). Maybe you can write what you see now, with what you want to help clarify. On 06/21/2011 09:47 PM, Bill Bell wrote: I have a field: specialties that is multiValued. It indicates the doctor's specialties: cardiologist, internist, etc. When someone does a search: Cardiologist, I use q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci alt ies What I want to come out in the facet is the Cardiologist (since it matches exactly) and the number that matches: 700. I don't want to see the other values that are not Cardiologist. Now I see: Cardiologist: 700 Internist: 45 Family Doctor: 20 This means that several Cardiologist's are also internists and family doctors. When it matches exactly, I don't want to see Internists, Family Doctors. How do I send a query to Solr with a condition. Facet.query=specialties:Cardiologistfacet.field=specialties Then if the query returns something use it, otherwise use the field one? Other ideas?
Re: MultiValued facet behavior question
Oh sorry forgot to also type: Often facet fields are not stemmed or heavily analysed. The facet values are from the index. On 22 June 2011 08:21, lee carroll lee.a.carr...@googlemail.com wrote: Can your front end app normalize the q parameter. Either with a drop down or a type a head derived from the values in the specialties field. that way q will match value(s) in your facet results. I'm not sure what you are trying to achieve though so maybe i'm off the mark. On 22 June 2011 04:37, Bill Bell billnb...@gmail.com wrote: Doing it with q=specialities:Cardiologist or q=CardiologistdefType=dismaxqf=specialties does not matter, the issue is how I see facets. I want the facets to only show the one match, and not all the multiValued fields in specialties that match... Example, Name|specialties Bell|Cardiologist Smith|Cardiologist,Family Doctor Adams,Cardiologist,Family Doctor,Internist When I facet.field=specialties I get: Cardiologist: 3 Internist: 1 Family Doctor: 1 I only want it to return: Cardiologist: 3 Because this matches exactly... Facet on the field that matches and only return the number for that. It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 If I searched on q=internistdefType=dismaxqf=specialties, I want the result to be: Internist: 1 Does this all make sense? On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote: So are you saying that for all results for cardiologist, you don't want facets not matching Cardiologist to be returned as facets? what happens when you make q=specialities:Cardiologist? instead of just q=Cardiologist? Seems that if you make the query on the field, then all your results will necessarily qualify and you can discard any additional facets you don't want (e.g. that don't match the initial query term). Maybe you can write what you see now, with what you want to help clarify. On 06/21/2011 09:47 PM, Bill Bell wrote: I have a field: specialties that is multiValued. It indicates the doctor's specialties: cardiologist, internist, etc. When someone does a search: Cardiologist, I use q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci alt ies What I want to come out in the facet is the Cardiologist (since it matches exactly) and the number that matches: 700. I don't want to see the other values that are not Cardiologist. Now I see: Cardiologist: 700 Internist: 45 Family Doctor: 20 This means that several Cardiologist's are also internists and family doctors. When it matches exactly, I don't want to see Internists, Family Doctors. How do I send a query to Solr with a condition. Facet.query=specialties:Cardiologistfacet.field=specialties Then if the query returns something use it, otherwise use the field one? Other ideas?
Re: MultiValued facet behavior question
Am 22.06.2011 05:37, schrieb Bill Bell: It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 You would never get Cardiologist: 3 as the facet result, because if Cardiologist would be in your index, it's impossible to find it when searching for cardiology (except when you manage to write some strange tokenizer that translates cardiology to Cardiologist on query time, including the upper case letter). Facets are always taken from the index, so they usually match exactly or never when querying for it. -Kuli
Re: Problem with field collapsing of patched Solr 1.4
Hi, Iam using solr collapse, it is working perfectly with default sorting (score), when we give the more than fileld in sort with pagination, it through incorrect result. Could any one help to solve this?. Thanks in advance... Regards Thalaiselvam N -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-field-collapsing-of-patched-Solr-1-4-tp2678850p3094555.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MultiValued facet behavior question
Here is an example using exampledocs and trunk 4.0: http://localhost:8983/solr/select/?q=cat:%22hard%20drive%22version=2.2sta rt=0rows=10indent=onfacet=truefacet.field=catfacet.query={!lucene}cat: %22hard%20drive%22facet.mincount=1 Results: result name=response numFound=2 start=0 Etc lst name=facet_queries int name={!lucene}cat:hard drive2/int /lst lst name=facet_fields lst name=cat int name=electronics2/int int name=hard drive2/int /lst/lst Notice that the facet_queries count 2 is the same as the the numFound=2. But I have no way to use facet.field to count the matches. The algorithm - Loop through multiValued field and match on hard drive. Ignore other values in there when setting the facet list On 6/22/11 1:19 AM, Dennis de Boer datdeb...@gmail.com wrote: Hi Bill, yes, you absolutely do make sense. I posted the exact same question to this mailing list (subject: faceting on multivalued fields), but got no response out of it. A friend of mine is now helping out. I hope someone on the list can give us some advice. I'll post our findings to this topic. Regards, Dennis On Wed, Jun 22, 2011 at 5:37 AM, Bill Bell billnb...@gmail.com wrote: Doing it with q=specialities:Cardiologist or q=CardiologistdefType=dismaxqf=specialties does not matter, the issue is how I see facets. I want the facets to only show the one match, and not all the multiValued fields in specialties that match... Example, Name|specialties Bell|Cardiologist Smith|Cardiologist,Family Doctor Adams,Cardiologist,Family Doctor,Internist When I facet.field=specialties I get: Cardiologist: 3 Internist: 1 Family Doctor: 1 I only want it to return: Cardiologist: 3 Because this matches exactly... Facet on the field that matches and only return the number for that. It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 If I searched on q=internistdefType=dismaxqf=specialties, I want the result to be: Internist: 1 Does this all make sense? On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote: So are you saying that for all results for cardiologist, you don't want facets not matching Cardiologist to be returned as facets? what happens when you make q=specialities:Cardiologist? instead of just q=Cardiologist? Seems that if you make the query on the field, then all your results will necessarily qualify and you can discard any additional facets you don't want (e.g. that don't match the initial query term). Maybe you can write what you see now, with what you want to help clarify. On 06/21/2011 09:47 PM, Bill Bell wrote: I have a field: specialties that is multiValued. It indicates the doctor's specialties: cardiologist, internist, etc. When someone does a search: Cardiologist, I use q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=spe ci alt ies What I want to come out in the facet is the Cardiologist (since it matches exactly) and the number that matches: 700. I don't want to see the other values that are not Cardiologist. Now I see: Cardiologist: 700 Internist: 45 Family Doctor: 20 This means that several Cardiologist's are also internists and family doctors. When it matches exactly, I don't want to see Internists, Family Doctors. How do I send a query to Solr with a condition. Facet.query=specialties:Cardiologistfacet.field=specialties Then if the query returns something use it, otherwise use the field one? Other ideas?
Re: MultiValued facet behavior question
You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist But that is not the issue. The issue is around multiValue fields and facets. You would expect a user Who is searching on the multiValued field to match on some values in there. For example, they type Cardiologist and it matches on the value Cardiologist. So it matches in the multiValue field. So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 05:37, schrieb Bill Bell: It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 You would never get Cardiologist: 3 as the facet result, because if Cardiologist would be in your index, it's impossible to find it when searching for cardiology (except when you manage to write some strange tokenizer that translates cardiology to Cardiologist on query time, including the upper case letter). Facets are always taken from the index, so they usually match exactly or never when querying for it. -Kuli
Re: MultiValued facet behavior question
Hi Bill, can you explain a little bit more around why you need this. Knowing the motivation might suggest a different solution not just involving faceting. On 22 June 2011 08:49, Bill Bell billnb...@gmail.com wrote: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist But that is not the issue. The issue is around multiValue fields and facets. You would expect a user Who is searching on the multiValued field to match on some values in there. For example, they type Cardiologist and it matches on the value Cardiologist. So it matches in the multiValue field. So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 05:37, schrieb Bill Bell: It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 You would never get Cardiologist: 3 as the facet result, because if Cardiologist would be in your index, it's impossible to find it when searching for cardiology (except when you manage to write some strange tokenizer that translates cardiology to Cardiologist on query time, including the upper case letter). Facets are always taken from the index, so they usually match exactly or never when querying for it. -Kuli
Re: MultiValued facet behavior question
Hi Bill, as far as I understood now, with the help of my friend, you can't. Multivalued fields don't work that way. You can however always filter the facet results manually in the JSP. You knwo what the user chose as a facet. The issue I ran into is when you have additional facet fields. For example when you also have country as a facetfield. Now when you search for Cardiologist, it also returns Internist and family doctor as you described. What Sorl now also returns for the country list are the countries for Cardiologist, but also for Internist and family doctor. This is not what you want. I don't think what we want her is supported out of the box by solr. Regards, Dennis On Wed, Jun 22, 2011 at 9:49 AM, Bill Bell billnb...@gmail.com wrote: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist But that is not the issue. The issue is around multiValue fields and facets. You would expect a user Who is searching on the multiValued field to match on some values in there. For example, they type Cardiologist and it matches on the value Cardiologist. So it matches in the multiValue field. So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 05:37, schrieb Bill Bell: It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 You would never get Cardiologist: 3 as the facet result, because if Cardiologist would be in your index, it's impossible to find it when searching for cardiology (except when you manage to write some strange tokenizer that translates cardiology to Cardiologist on query time, including the upper case letter). Facets are always taken from the index, so they usually match exactly or never when querying for it. -Kuli
Re: MultiValued facet behavior question
Am 22.06.2011 09:49, schrieb Bill Bell: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't get Cardiologist: 3 as the result but cardiolog: 3 or something like that instead. Normally, you use declare facet field explicitly for facetting, and not for searching, exactly because stemming and tokenizing on facet fields don't make sense. And the short answer is: No, that's not possible. -Kuli
Understanding query explain information
Hi guys, I am getting some doubts about how to correctly understand the debugQuery output. I have a field named itemName in my index. This is a text field, just that. When I quqery a simple ?q=itemName:iPad , I end up with the following query result. Simply trying to understand why these strings generated such scores, and as far as I can understand, the only difference between them is the field norms, as all the other results maintain themselves. Now, how do I get these field norm values? Field Norm is the result of this formula right? *1/square root of (terms)*,* where terms is the number of terms in my field after it is indexed* Well, if this is true, the field norm for my first document should be 0.5 (1/sqrt(4)) as Livro - IPAD - O Guia do Profissional ends up with the terms livro|ipad|guia|profissional as tokens. What I am forgetting to take into account? ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=debugQueryon/str str name=start0/str str name=rows10/str arr name=indent stron/str stron/str /arr str name=flitemName,score/str str name=version2.2/str str name=qitemName:ipad/str /lst /lst result name=response numFound=161 start=0 maxScore=3.6808658 doc float name=score3.6808658/float str name=itemNameLivro - IPAD - O Guia do Profissional/str /doc doc float name=score3.1550279/float str name=itemNameLeitor de Cartão para Ipad - Mobimax/str /doc doc float name=score3.1550279/float str name=itemNameSleeve para iPad/str /doc doc float name=score3.1550279/float str name=itemNameSleeve de Neoprene para iPad/str /doc doc float name=score3.1550279/float str name=itemNameCarregador de parede para iPad/str /doc doc float name=score2.6291897/float str name=itemNameCase Envelope para iPad - Black - Built NY/str /doc doc float name=score2.6291897/float str name=itemNameCase Protetora p/ IPad de Silicone Duo - Browm - Iskin/str /doc doc float name=score2.6291897/float str name=itemNameCase Protetora p/ IPad de Silicone Duo - Clear - Iskin/str /doc doc float name=score2.6291897/float str name=itemNameCase p/ iPad Sleeve - Black - Built NY/str /doc doc float name=score2.6291897/float str name=itemNameBolsa de Proteção p/ iPad Preta - Geonav/str /doc /result lst name=debug str name=rawquerystringitemName:ipad/str str name=querystringitemName:ipad/str str name=parsedqueryitemName:ipad/str str name=parsedquery_toStringitemName:ipad/str lst name=explain str name=7369507 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.4375 = fieldNorm(field=itemName, doc=102507) /str str name=739 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226401) /str str name=7356941 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226409) /str str name=7356931 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226447) /str str name=7360321 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226583) /str str name=7428354 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223178) /str str name=7366074 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223196) /str str name=7366068 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223831) /str str name=7428358 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223856) /str str name=7422680 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223908), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223908) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time3.0/double lst name=prepare double name=time1.0/double
Re: char sets accepted via xml
Hi, I also have this issue with Solr 3.2.0. It is probably this: https://issues.apache.org/jira/browse/SOLR-2381 Tom On 06/15/2011 02:09 PM, Mark Cunningham wrote: Hi, If you submit information to solr using xml, does the server assume you're using unicode encoded in utf8? And does it accept the whole range of possible characters in unicode? (For example, characters that require multiple bytes when encoded in utf-8). I'm getting quite a few Invalid UTF-8 middle byte 0x20 (at char #408, byte #-1) errors (with different bytes/characters) that seem to be coming from characters such as the trademark symbol or registered or some characters that look like normal characters (such as a dash). It comes out as UTF-8 code units (E2 80 93) using this very handy website http://rishida.net/tools/conversion/ I tried inserting?xml version=1.0 encoding=utf-8? at the start of the xml however this didn't seem to make much difference. Anyone else have these issues or know what they might be coming from? Mark -- Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C Tom Gross email.@toms-projekte.de skype.tom_gross web.http://toms-projekte.de blog...http://blog.toms-projekte.de
Conflict in wildcard query and spellchecker in solr search
Using solr search when i search for rin* it run wildcard query and i get the result for ring but when i search for Rin* it run spellchecker and then gives the result for ring. why so ?? please explain - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellchecker-in-solr-search-tp3095198p3095198.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Conflict in wildcard query and spellchecker in solr search
Wildcard queries are not analyzed. Lowercase your query beforehand. On Wednesday 22 June 2011 14:08:48 Romi wrote: Using solr search when i search for rin* it run wildcard query and i get the result for ring but when i search for Rin* it run spellchecker and then gives the result for ring. why so ?? please explain - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellche cker-in-solr-search-tp3095198p3095198.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: upgrading to Tika 0.9 on Solr 1.4.1
Hi Chris ,Andreas I have upgraded to solr 3.2 ... everything seems fine now. I will have to integrate this to my application and observe if any further issues...again thanks for your patience and time... --Surendra
Re: MultiValued facet behavior question
Hi Bill, So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? Why do you need this ? If your use case is faceted navigation then not showing all the facet terms which match your query would be mis-leading to your users. The fact is your data indicates Ben the cardiologist is also a GP etc. Is it not valid for your users to be able to further filter on cardiologists who are also specialists in x other disciplines ? If the specialisms are mutually exclusive then your data will reflect this. The fact is x number of cardiologists match and x number of GP's match etc I may be missing the point here as you have not said why you need to do this ? cheers lee c On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 09:49, schrieb Bill Bell: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't get Cardiologist: 3 as the result but cardiolog: 3 or something like that instead. Normally, you use declare facet field explicitly for facetting, and not for searching, exactly because stemming and tokenizing on facet fields don't make sense. And the short answer is: No, that's not possible. -Kuli
Re: Conflict in wildcard query and spellchecker in solr search
* fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=6 maxPosQuestion=2 maxFractionAsterisk=0.33/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType* I am using this fieldtype and applied these filters. for wildcard searches do i need to include some more filters or what other configurations are needed - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellchecker-in-solr-search-tp3095198p3095290.html Sent from the Solr - User mailing list archive at Nabble.com.
Search is taking long-long time.
I am running two solrShards. I have indexed 100 million docs in each shard ( each are 50 GB and only 'id' is stored). My search have became very slow. Its taking around 2-3 seconds. below is my query : http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq= QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time desc QUERY and FilterQuery is below : QUERY = Online Shopping AND ( Amex OR Am ex OR American express OR americanexpress ) FilterQuery = time:[1308659371 TO 1308745771] AND category:news AND lang:English How to boost the query perfomance. default search filed is title( text). -- Thanks and Regards Mohammad Shariq
Re: Conflict in wildcard query and spellchecker in solr search
No, wildcard queries are not analyzed. They are not _passed_ through your analyzers. If you lowercase at index-time, you must lowercase it outside of Solr before sending a query. On Wednesday 22 June 2011 14:35:12 Romi wrote: * fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=6 maxPosQuestion=2 maxFractionAsterisk=0.33/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType* I am using this fieldtype and applied these filters. for wildcard searches do i need to include some more filters or what other configurations are needed - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellche cker-in-solr-search-tp3095198p3095290.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: MultiValued facet behavior question
Hi Lee, since I have the same problem, I might as well try to answer this question. You want this behaviour to make things clear for your users. If they select cardiologists, does it make sense to also show family doctors as a facetvalue to the user. The same thing goed for the facets that are related to family doctors. They are returned as well, thus making it even moren unclear for the end-user. On Wed, Jun 22, 2011 at 2:27 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Bill, So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? Why do you need this ? If your use case is faceted navigation then not showing all the facet terms which match your query would be mis-leading to your users. The fact is your data indicates Ben the cardiologist is also a GP etc. Is it not valid for your users to be able to further filter on cardiologists who are also specialists in x other disciplines ? If the specialisms are mutually exclusive then your data will reflect this. The fact is x number of cardiologists match and x number of GP's match etc I may be missing the point here as you have not said why you need to do this ? cheers lee c On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 09:49, schrieb Bill Bell: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't get Cardiologist: 3 as the result but cardiolog: 3 or something like that instead. Normally, you use declare facet field explicitly for facetting, and not for searching, exactly because stemming and tokenizing on facet fields don't make sense. And the short answer is: No, that's not possible. -Kuli
Re: Conflict in wildcard query and spellchecker in solr search
how can I lowercase query outside of Solr before sending a query? - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellchecker-in-solr-search-tp3095198p3095345.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Understanding query explain information
Hi are you using synonyms ? On 22 June 2011 10:30, Alexander Ramos Jardim alexander.ramos.jar...@gmail.com wrote: Hi guys, I am getting some doubts about how to correctly understand the debugQuery output. I have a field named itemName in my index. This is a text field, just that. When I quqery a simple ?q=itemName:iPad , I end up with the following query result. Simply trying to understand why these strings generated such scores, and as far as I can understand, the only difference between them is the field norms, as all the other results maintain themselves. Now, how do I get these field norm values? Field Norm is the result of this formula right? *1/square root of (terms)*,* where terms is the number of terms in my field after it is indexed* Well, if this is true, the field norm for my first document should be 0.5 (1/sqrt(4)) as Livro - IPAD - O Guia do Profissional ends up with the terms livro|ipad|guia|profissional as tokens. What I am forgetting to take into account? ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=debugQueryon/str str name=start0/str str name=rows10/str arr name=indent stron/str stron/str /arr str name=flitemName,score/str str name=version2.2/str str name=qitemName:ipad/str /lst /lst result name=response numFound=161 start=0 maxScore=3.6808658 doc float name=score3.6808658/float str name=itemNameLivro - IPAD - O Guia do Profissional/str /doc doc float name=score3.1550279/float str name=itemNameLeitor de Cartão para Ipad - Mobimax/str /doc doc float name=score3.1550279/float str name=itemNameSleeve para iPad/str /doc doc float name=score3.1550279/float str name=itemNameSleeve de Neoprene para iPad/str /doc doc float name=score3.1550279/float str name=itemNameCarregador de parede para iPad/str /doc doc float name=score2.6291897/float str name=itemNameCase Envelope para iPad - Black - Built NY/str /doc doc float name=score2.6291897/float str name=itemNameCase Protetora p/ IPad de Silicone Duo - Browm - Iskin/str /doc doc float name=score2.6291897/float str name=itemNameCase Protetora p/ IPad de Silicone Duo - Clear - Iskin/str /doc doc float name=score2.6291897/float str name=itemNameCase p/ iPad Sleeve - Black - Built NY/str /doc doc float name=score2.6291897/float str name=itemNameBolsa de Proteção p/ iPad Preta - Geonav/str /doc /result lst name=debug str name=rawquerystringitemName:ipad/str str name=querystringitemName:ipad/str str name=parsedqueryitemName:ipad/str str name=parsedquery_toStringitemName:ipad/str lst name=explain str name=7369507 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.4375 = fieldNorm(field=itemName, doc=102507) /str str name=739 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226401) /str str name=7356941 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226409) /str str name=7356931 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226447) /str str name=7360321 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.375 = fieldNorm(field=itemName, doc=226583) /str str name=7428354 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223178) /str str name=7366074 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223196) /str str name=7366068 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223831) /str str name=7428358 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 = fieldNorm(field=itemName, doc=223856) /str str name=7422680 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223908), product of: 1.0 = tf(termFreq(itemName:ipad)=1) 8.413407 = idf(docFreq=165, maxDocs=275239) 0.3125 =
Re: MultiValued facet behavior question
On 06/22/2011 04:01 AM, Dennis de Boer wrote: Hi Bill, as far as I understood now, with the help of my friend, you can't. Multivalued fields don't work that way. You can however always filter the facet results manually in the JSP. You knwo what the user chose as a facet. Yes - that is the most sensible suggestion: if you want to display the facets the user chose, and only those, regardless of what was found in the index, then I think you know what to do! The issue I ran into is when you have additional facet fields. For example when you also have country as a facetfield. Now when you search for Cardiologist, it also returns Internist and family doctor as you described. What Sorl now also returns for the country list are the countries for Cardiologist, but also for Internist and family doctor. This is not what you want. I don't think this is accurate. Your query matches some set of documents - the facet values shown will only be those that occur in that set. If some internist's countries are shown when the user selects Cardiologist, that is because those internists are aldo cardiologists, right? -Mike
Tika Jax-RS and DIH
Mattmann, Chris A (388J chris.a.mattmann at jpl.nasa.gov writes: Hi Jo, You may consider checking out Tika trunk, where we recently have a Tika JAX-RS web service [1] committed as part of the tika-server module. You could probably wire DIH into it and accomplish the same thing. Cheers, Chris [1] https://issues.apache.org/jira/browse/TIKA-593 Chris - could you elaborate on using Tika Jax-RS and DIH? How production ready is it? Could you summarize the steps necessary to get it to work? Any examples yet? I'd be happy to work with you to get something out to the group. Thanks - Tod
Re: Search is taking long-long time.
I am running two solrShards. I have indexed 100 million docs in each shard ( each are 50 GB and only 'id' is stored). My search have became very slow. Its taking around 2-3 seconds. below is my query : http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq= QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time desc QUERY and FilterQuery is below : QUERY = Online Shopping AND ( Amex OR Am ex OR American express OR americanexpress ) FilterQuery = time:[1308659371 TO 1308745771] AND category:news AND lang:English How to boost the query perfomance. default search filed is title( text). If fieldType of time is not trie-based, you can change it to tdate, tint etc. For range queries. If you don't update your index frequently, you can use separate filter queries (fq) for your clauses. To benefit from caching. fq=category:newsfq=lang:English http://wiki.apache.org/solr/SolrPerformanceFactors http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
Re: MultiValued facet behavior question
Hi Dennis, I think maybe I just disagree. Your not showing facet counts for cardiologists and Family Doctors independently. The Family Doctor count will be all Family Doctors who are also Cardiologists. This allows users to further filter Cardiologists who are also family Doctors. (this could be of use to them ??) If your front end app implements the filtering as a list of fq=xxx then that would make for consistent results ? I don't see how not showing that some cardiologists are also Family Doctors is a better user experience... But again you might have a very specific use case? On 22 June 2011 13:44, Dennis de Boer datdeb...@gmail.com wrote: Hi Lee, since I have the same problem, I might as well try to answer this question. You want this behaviour to make things clear for your users. If they select cardiologists, does it make sense to also show family doctors as a facetvalue to the user. The same thing goed for the facets that are related to family doctors. They are returned as well, thus making it even moren unclear for the end-user. On Wed, Jun 22, 2011 at 2:27 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Bill, So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? Why do you need this ? If your use case is faceted navigation then not showing all the facet terms which match your query would be mis-leading to your users. The fact is your data indicates Ben the cardiologist is also a GP etc. Is it not valid for your users to be able to further filter on cardiologists who are also specialists in x other disciplines ? If the specialisms are mutually exclusive then your data will reflect this. The fact is x number of cardiologists match and x number of GP's match etc I may be missing the point here as you have not said why you need to do this ? cheers lee c On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 09:49, schrieb Bill Bell: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't get Cardiologist: 3 as the result but cardiolog: 3 or something like that instead. Normally, you use declare facet field explicitly for facetting, and not for searching, exactly because stemming and tokenizing on facet fields don't make sense. And the short answer is: No, that's not possible. -Kuli
Re: MultiValued facet behavior question
Well, the use case is rather simple. It is not a use case but more auser experience. If I have a list of values I can facet on, for example : A B C D E And I click on B, does it make sense for the user to display B C E after the selection ? Just because items in B are C and E items as well? As A user I chose B because I'm interested in B items. I do not care if they are also C and E items. Technically this is correct, but functional wise, the user doesn't care because it is not what they searched for. In this case they were searching for a Cardiologists. Do I care that a cardiologist is also a family doctor? No. So I also do not want to see this as a facet value presented to me in frontend logic. In the item details you can show that the cardiologist is also a family doctor. That is fine, but not as an availbale facet option, if you just chose an speciality you want to filter on. Does it make sense? On Wed, Jun 22, 2011 at 3:31 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Dennis, I think maybe I just disagree. Your not showing facet counts for cardiologists and Family Doctors independently. The Family Doctor count will be all Family Doctors who are also Cardiologists. This allows users to further filter Cardiologists who are also family Doctors. (this could be of use to them ??) If your front end app implements the filtering as a list of fq=xxx then that would make for consistent results ? I don't see how not showing that some cardiologists are also Family Doctors is a better user experience... But again you might have a very specific use case? On 22 June 2011 13:44, Dennis de Boer datdeb...@gmail.com wrote: Hi Lee, since I have the same problem, I might as well try to answer this question. You want this behaviour to make things clear for your users. If they select cardiologists, does it make sense to also show family doctors as a facetvalue to the user. The same thing goed for the facets that are related to family doctors. They are returned as well, thus making it even moren unclear for the end-user. On Wed, Jun 22, 2011 at 2:27 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Bill, So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? Why do you need this ? If your use case is faceted navigation then not showing all the facet terms which match your query would be mis-leading to your users. The fact is your data indicates Ben the cardiologist is also a GP etc. Is it not valid for your users to be able to further filter on cardiologists who are also specialists in x other disciplines ? If the specialisms are mutually exclusive then your data will reflect this. The fact is x number of cardiologists match and x number of GP's match etc I may be missing the point here as you have not said why you need to do this ? cheers lee c On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 09:49, schrieb Bill Bell: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't get Cardiologist: 3 as the result but cardiolog: 3 or something like that instead. Normally, you use declare facet field explicitly for facetting, and not for searching, exactly because stemming and tokenizing on facet fields don't make sense. And the short answer is: No, that's not possible. -Kuli
Re: Search is taking long-long time.
this is how my 'time' field looks in schema : field name=time type=tint indexed=true stored=false/ and also, I am doing frequent Update to Solr (every 5 minuts). On 22 June 2011 18:41, Ahmet Arslan iori...@yahoo.com wrote: I am running two solrShards. I have indexed 100 million docs in each shard ( each are 50 GB and only 'id' is stored). My search have became very slow. Its taking around 2-3 seconds. below is my query : http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq= QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time desc QUERY and FilterQuery is below : QUERY = Online Shopping AND ( Amex OR Am ex OR American express OR americanexpress ) FilterQuery = time:[1308659371 TO 1308745771] AND category:news AND lang:English How to boost the query perfomance. default search filed is title( text). If fieldType of time is not trie-based, you can change it to tdate, tint etc. For range queries. If you don't update your index frequently, you can use separate filter queries (fq) for your clauses. To benefit from caching. fq=category:newsfq=lang:English http://wiki.apache.org/solr/SolrPerformanceFactors http://wiki.apache.org/lucene-java/ImproveSearchingSpeed -- Thanks and Regards Mohammad Shariq
RE: MultiValued facet behavior question
Hi, Bill (and others). I post this for what it's worth - it's a very specialized resolution we wrote to a similar requirement that may help with your (and similar) requirements. Caveats abound [1] We're running 3.1. We wanted to be able to return facets which matched on our actual search, rather than all facets from the entire result set. For example, if a user searches for author 'Twain', we present to them a list of facets which match 'Twain', and exclude facets where 'Twain' is not found. (Now - we don't tell our users that these are 'facet' values - we just present an alpha-sorted list of author names with a count of associated documents) So, we search our Author search field to identify matching documents, get all the facets (i.e. normal Solr processing to this point), and then filter that facet set to include only those that match our original search. We added our own extra facet parameter (facet.sirsidynix.filter.facets) to instruct Solr when to do this special facet filtering. We modified SimpleFacets method getTermCounts right before the final return counts; like this: // Custom SirsiDynix code. if (params.getBool(FacetParams.FACET_SIRSIDYNIX_FILTER_FACETS, false)) { counts = filterCounts(field, counts); } return counts; and added method 'filterCounts()' which is this class, basically wrapping things up to run the search against each facet value, setting up MemoryIndex instances based on our schema, inserting the facet value, and running our original query against the MemoryIndex. Anything that matches has a score 0, and those are the only ones we keep: /** * Custom SirsiDynix code: * Filters counts down to only those entries that match the original * query. Does this by using lucene's MemoryIndex - a very fast, in-memory, * single document index that can have queries run against it. * For each string value in count, we create a MemoryIndex and run the * original query against it. Anything with a score 0 means a 'hit', so * the value matches the original query, and we'll retain it. Score 0 means * no hit (i.e. was a facet value that was associated with a document that matched * the query, but the facet value itself didn't match the query). * @param field name of the field that the facet values came from. * @param counts Lucene's list of facet values. * @return filtered set, only those matching the original query. */ private NamedList filterCounts(String field, NamedList counts) { if (!field.endsWith(_facet)) { return counts; } // Trim off _facet String fieldBase = field.substring(0,field.length() - 6); // Builds fields to search against. // Note that original came from (e.g.) AUTHOR_facet. // And, original search would have been for INITIAL_AUTHOR_SRCH_boost as well as // SUBSEQUENT_AUTHOR_SRCH_boost (and fuzzy's). However, we're only searching // one string at a time, so we'll shove it into the single-values INITIAL_xxx // fields. That will be good enough for the Query to be able to correctly // evaluate against the document. String fieldBoost = INITIAL_ + fieldBase + _SRCH_boost; String fieldFuzzy = INITIAL_ + fieldBase + _SRCH_fuzzy; NamedList newCounts = new NamedList(); IndexSchema schema = searcher.getSchema(); SchemaField schemaField = schema.getField(fieldBoost); FieldType fieldType = schemaField.getType(); Analyzer fieldAnalyzer = fieldType.getAnalyzer(); SchemaField schemaFuzzyField = schema.getField(fieldFuzzy); FieldType fuzzyFieldType = schemaFuzzyField.getType(); Analyzer fuzzyFieldAnalyzer = fuzzyFieldType.getAnalyzer(); for (int i = 0; i counts.size(); i++) { String testValue = counts.getName(i); MemoryIndex index = new MemoryIndex(); index.addField(fieldBoost, testValue, fieldAnalyzer); index.addField(fieldFuzzy, testValue, fuzzyFieldAnalyzer); float score = index.search(rb.getQuery()); if (score 0.0f) { newCounts.add(testValue, counts.getVal(i)); } } return newCounts; } A bit of explanation on our schema will be in order here. 1) We've suffixed all our facet fields with _facet - hence that first if statement. 2) We have matching 'searchable' and 'facet' fields, names basically differ only in the suffix. So, we strip off '_facet' and append '_boost' and '_fuzzy' (our two field types for searching against (and possibly applying boosts), and doing fuzzy matching against). (You'll see it's not exactly that - but you can hopefully modify your version to match your schema) Basically the idea is that we can derive the field name(s) against which the original search was issued from the
Re: upgrading to Tika 0.9 on Solr 1.4.1
Glad it worked out! Cheers, Chris On Jun 22, 2011, at 5:14 AM, Surendra wrote: Hi Chris ,Andreas I have upgraded to solr 3.2 ... everything seems fine now. I will have to integrate this to my application and observe if any further issues...again thanks for your patience and time... --Surendra ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Weird optimize performance degradation
Thanks for your answers Erick Mohammad! I'll get back to the list if I have more specific info about this issue, so far the index is performing normally again. Best, Santiago On Mon, Jun 20, 2011 at 9:29 AM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, that is odd, anyone else want to chime in here? But optimizing isn't going to help with the strange commit times, it'll only make it worse. It's not doing you much if any good, so I'd think about not optimizing About the commit times in general. Depending upon when the merge happens, lots of work can go on under the covers. Here's a detailed look at merging... http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ But the short form is that, depending upon the number of segments and the merge policy, you may periodically hit a commit that copies, perhaps, #all# the current segments into a single segment, which will create a large pause. But it's always possible that something's wonky with documents that have a very large number of fields. There's some interesting work being done on trunk to flatten out this curve, but that's not going to do you much good in the 3.x code line... Best Erick On Sun, Jun 19, 2011 at 10:32 AM, Santiago Bazerque sbazer...@gmail.com wrote: Hello Erick, thanks for your answer! Yes, our over-optimization is mainly due to paranoia over these strange commit times. The long optimize time persisted in all the subsequent commits, and this is consistent with what we are seeing in other production indexes that have the same problem. Once the anomaly shows up, it never commits quickly again. I combed through the last 50k documents that were added before the first slow commit. I found one with a larger than usual number of fields (didn't write down the number, but it was a few thousands). I deleted it, and the following optimize was normal again (110 seconds). So I'm pretty sure a document with lots of fields is the cause of the slowdown. If that would be useful, I can do some further testing to confirm this hypothesis and send the document to the list. Thanks again for your answer. Best, Santiago On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson erickerick...@gmail.comwrote: First, there's absolutely no reason to optimize this often, if at all. Older versions of Lucene would search faster on an optimized index, but this is no longer necessary. Optimize will reclaim data from deleted documents, but is generally recommended to be performed fairly rarely, often at off-peak hours. Note that optimize will re-write your entire index into a single new segment, so following your pattern it'll take longer and longer each time. But the speed change happening at 500,000 documents is suspiciously close to the default mergeFactor of 10 X 50,000. Do subsequent optimizes (i.e. on the 750,000th document) still take that long? But this doesn't make sense because if you're optimizing instead of committing, each optimize should reduce your index to 1 segment and you'll never hit a merge. So I'm a little confused. If you're really optimizing every 50K docs, what I'd expect to see is successively longer times, and at the end of each optimize I'd expect there to be only one segment in your index. Are you sure you're not just seeing successively longer times on each optimize and just noticing it after 10? Best Erick On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque sbazer...@gmail.com wrote: Hello! Here is a puzzling experiment: I build an index of about 1.2MM documents using SOLR 3.1. The index has a large number of dynamic fields (about 15.000). Each document has about 100 fields. I add the documents in batches of 20, and every 50.000 documents I optimize the index. The first 10 optimizes (up to exactly 500k documents) take less than a minute and a half. But the 11th and all subsequent commits take north of 10 minutes. The commit logs look identical (in the INFOSTREAM.txt file), but what used to be Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene Merge Thread #0]: merge: total 50 docs Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge Thread #0]: merge store matchedCount=2 vs 2 now eats a lot of time: Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene Merge Thread #0]: merge: total 55 docs Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge Thread #0]: merge store matchedCount=2 vs 2 What could be happening between those two lines that takes 10 minutes at full CPU? (and with 50k docs less used to take so much less?). Thanks in advance, Santiago
SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x
Hi, Today's checkout (Solr Specification Version: 3.4.0.2011.06.22.16.10.08) produces the exception below on start up. The same exception with very similar strack trace comes when committing and add. Example schema and docs will reproduce the error. Jun 22, 2011 4:11:57 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoSuchFieldError: core at org.apache.lucene.index.SegmentTermDocs.init(SegmentTermDocs.java:48) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:491) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1005) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:484) at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:321) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:101) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:524) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
size of synonyms.txt
While trying some synonyms.txt files I noticed a huge increase of heap usage. synonyms_1.txt -- 6645 lines (2826104 bytes in size) results in 66364 entries in SynonymMap with 730MB heap usage. Startup time about 2 minutes. synonyms_2.txt -- 6645 lines (5384884 bytes in size) results in 115168 entries in SynonymMap with 3.3GB heap usage. Startup time about 4 minutes. What is your size of synonyms.txt? Any limitations (e.g. file size, number of synonyms, ...)? How to deal with _really_ large numbers of synonyms? To the experts: Why not using synonyms from a file, just because memory is faster? Regards, Bernd
Re: rename a core to same name of existing core
Koji, the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is: *quote* If a core with the same name exists, while the new created core is initalizing, the old one will continue to accept requests. Once it has finished, all new request will go to the new core, and the old core will be unloaded. */quote* I guess, same handling for other actions, like rename. Regards Stefan 2011/6/21 Koji Sekiguchi k...@r.email.ne.jp: I accidentally rename a core to the same name of existing core, e.g. using example-DIH: http://localhost:8983/solr/admin/cores?action=RENAMEcore=dbother=tika I expected solr throws an exception, but it worked, and the existing core (tika) is gone. Does it a known bug (but I couldn't find open issue in jira) or intended behavior? koji -- http://www.rondhuit.com/en/
Re: commit time and lock
Dear all, Kindly help me.. thanks On Tuesday 21 June 2011 11:46 AM, Jonty Rhods wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
Re: Read past EOF error due to broken connection
Hi Pravesh, Thanks for your reply. I tried both the approaches- Commit fails with this exception- Exception in thread main org.apache.solr.common.SolrException: Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml - java.lang.RuntimeException: java.io.IOException: read past EOF at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1091) at org.apache.solr.core.SolrCore.init(SolrCore.java:585) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml - java.lang.RuntimeException: java.io.IOException: read past EOF at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1091) at org.apache.solr.core.SolrCore.init(SolrCore.java:585) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org And Checkindex fails with this exception- Opening index @ ./index/ ERROR: could not read any segments file in directory java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:207) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:40) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:71) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:268) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:358) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:753) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:592) at
Re: MultiValued facet behavior question
We always remove the facet filter when faceting: in other words, for a good user experience, you generally want to show facets based on the query excluding any restriction based on the facets. So in your example (facet B selected), we would continue to show *all* facets. Only if you performed a search using some other filter (proximity, gender, etc), would we restrict the facet list. -Mike On 06/22/2011 09:42 AM, Dennis de Boer wrote: Well, the use case is rather simple. It is not a use case but more auser experience. If I have a list of values I can facet on, for example : A B C D E And I click on B, does it make sense for the user to display B C E after the selection ? Just because items in B are C and E items as well? As A user I chose B because I'm interested in B items. I do not care if they are also C and E items. Technically this is correct, but functional wise, the user doesn't care because it is not what they searched for. In this case they were searching for a Cardiologists. Do I care that a cardiologist is also a family doctor? No. So I also do not want to see this as a facet value presented to me in frontend logic. In the item details you can show that the cardiologist is also a family doctor. That is fine, but not as an availbale facet option, if you just chose an speciality you want to filter on. Does it make sense? On Wed, Jun 22, 2011 at 3:31 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Dennis, I think maybe I just disagree. Your not showing facet counts for cardiologists and Family Doctors independently. The Family Doctor count will be all Family Doctors who are also Cardiologists. This allows users to further filter Cardiologists who are also family Doctors. (this could be of use to them ??) If your front end app implements the filtering as a list of fq=xxx then that would make for consistent results ? I don't see how not showing that some cardiologists are also Family Doctors is a better user experience... But again you might have a very specific use case? On 22 June 2011 13:44, Dennis de Boerdatdeb...@gmail.com wrote: Hi Lee, since I have the same problem, I might as well try to answer this question. You want this behaviour to make things clear for your users. If they select cardiologists, does it make sense to also show family doctors as a facetvalue to the user. The same thing goed for the facets that are related to family doctors. They are returned as well, thus making it even moren unclear for the end-user. On Wed, Jun 22, 2011 at 2:27 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Bill, So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? Why do you need this ? If your use case is faceted navigation then not showing all the facet terms which match your query would be mis-leading to your users. The fact is your data indicates Ben the cardiologist is also a GP etc. Is it not valid for your users to be able to further filter on cardiologists who are also specialists in x other disciplines ? If the specialisms are mutually exclusive then your data will reflect this. The fact is x number of cardiologists match and x number of GP's match etc I may be missing the point here as you have not said why you need to do this ? cheers lee c On 22 June 2011 09:34, Michael Kuhlmanns...@kuli.org wrote: Am 22.06.2011 09:49, schrieb Bill Bell: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't get Cardiologist: 3 as the result but cardiolog: 3 or something like that instead. Normally, you use declare facet field explicitly for facetting, and not for searching, exactly because stemming and tokenizing on facet fields don't make sense. And the short answer is: No, that's not possible. -Kuli
Re: rename a core to same name of existing core
Stefan, I guess, same handling for other actions, like rename. I agree. Thank you for the pointer! koji (11/06/22 23:16), Stefan Matheis wrote: Koji, the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is: *quote* If a core with the same name exists, while the new created core is initalizing, the old one will continue to accept requests. Once it has finished, all new request will go to the new core, and the old core will be unloaded. */quote* I guess, same handling for other actions, like rename. Regards Stefan 2011/6/21 Koji Sekiguchik...@r.email.ne.jp: I accidentally rename a core to the same name of existing core, e.g. using example-DIH: http://localhost:8983/solr/admin/cores?action=RENAMEcore=dbother=tika I expected solr throws an exception, but it worked, and the existing core (tika) is gone. Does it a known bug (but I couldn't find open issue in jira) or intended behavior? koji -- http://www.rondhuit.com/en/ -- http://www.rondhuit.com/en/
response time for pdf indexing
Hi ! We are using Zend Search based on Lucene. Our indexing pdf consultations take longer than 2 seconds. We want to change to solr to try to solve this problem. i. Can anyone tell me the response time for querys on pdf documents on solr? ii. Can anyone tell me some strategies to reduce this response time? Note: the pdf is not indexed in a simple way. The pdf is converted to text previously and then, indexed with some additional information needed. Thank you. --- Rode González _ No se encontraron virus en este mensaje. Comprobado por AVG - www.avg.com Versión: 10.0.1382 / Base de datos de virus: 1513/3719 - Fecha de publicación: 06/22/11
Re: Exception using Analyze from the Solr Admin app
any help on this would be really appreciated. i just setup a totally brand new setup of solr still got this exception .. I can see that this would be something to do with classpath, but not able to figure out exactly what is causing this issue. -- karthik On Mon, Jun 13, 2011 at 4:23 PM, karthik kmoha...@gmail.com wrote: Hi Everyone, I am new to the Solr world and just started playing around with it. I had everything up running and suddenly the Analyze functionality started throwing an exception when i tried using it. It was working a few days ago suddenly it stopped working started throwing this exception. This is a Solr 3.1 setup running on tomcat-7.0.11. The exception trace is: - Jun 13, 2011 4:04:19 PM org.apache.solr.common.SolrException log SEVERE: org.apache.jasper.JasperException: javax.servlet.ServletException: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:534) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:442) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:684) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:471) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:402) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:498) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:394) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: javax.servlet.ServletException: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:911) at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:840) at org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:725) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:419) ... 26 more Caused by: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jsp.admin.analysis_jsp.getTokens(analysis_jsp.java:118) at org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:696) ... 29 more - I verified that the webapps/app/WEB-INF/lib has all the latest 3.1.0 JAR files in them. Any pointers to fix this issue would be great. Thanks, Karthik
[Announce] Solr 3.2 with RankingAlgorithm
Hi! I would like to announce the availability of Solr 3.2 with RankingAlgorithm. Please download and give the new version a try. This version of RankingAlgorithm exposes a lucene compatible api so almost all of the Solr features should work as it is. Note: NRT support will be available by next week. Sincerely, - Nagendra Nagarajayya http://solr-ra.tgels.com http://rankingalgorithm.tgels.com
Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer
Yeah, I see your points. It's complicated. I'm not sure either. But the thing is: in order to use a feature like that you'd have to really think hard about the query analysis of your fields, and which ones will produce which tokens in which situations You need to think really hard about the (index and query) analysis of your fields and which ones will produce which tokens _now_, if you are using multiple fields in a 'qf' with differing analysis, and using a percent mm. (Or similarly an mm that varies depending on how many terms). That's what I've come to realize, that's the status quo. If your qf fields don't all have identical analysis, right _now_ you need to think really hard about the analysis and how it's going to possibly effect 'mm', including for edge case queries. If you don't, you likely have edge case queries (at least) which aren't behaving how you expected (whether you notice or have it brought to your attention by users or not). Or you can just make sure all fields in your qf have identical analysis, and then you don't have to worry about it. But that's not always practical, a lot of the power of dismax qf ends up being combining fields with different analysis. So I was trying to think of a way to make this less so, but still be able to take advantage of dismax, but I think you're right that maybe there isn't any, or at least nothing we've come up with yet. Maybe what I really need is a query parser that does not do disjunction maximum at all, but somehow still combines different 'qf' type fields with different boosts on each field. I personally don't _neccesarily_ need the actual disjunction max calculation, but I do need combining of mutiple fields with different boosts. Of course, I'm not sure exactly how it would combine multiple fields if not disjunction maximum, but perhaps one is conceivable that wouldn't be subject to this particular gotcha with differing analysis. I also remain kind of confused about how the existing dismax figures out how many terms for the 'mm' type calculations. If someone wanted to explain that, I would find it enlightening and helpful for understanding what's going on. Jonathan On 6/21/2011 10:20 PM, Chris Hostetter wrote: : not other) setups/intentions. It's counter-intuitive to me that adding : a field to the 'qf' set results in _fewer_ hits than the same 'qf' set agreed .. but that's where looking the debug info comes in to understand the reason for that behavior is that your old qf treated part of your input as garbage and that new field respects it and uses it in the calculation. mind you: the fewer hits behavior only happens when using a percentage value in mm ... if you had mm=2 you'd get more results, but you've asked for 66% (or whatever) and with that new qf there is a differnet number of clauses produced by query parsing. : I wonder if it would be a good idea to have a parameter to (e)dismax : that told it which of these two behaviors to use? The one where the : 'term count' is based on the maximum number of terms from any field in : the 'qf', and one where it's based on the minimum number of terms : produced from any field in the qf? I am still not sure how feasible even in your use case, i don't think you are fully considering what that would produce. imagine that an mmType=min param existed and gave you what you're asking for. Now imagine that you have two fields, one named simple that strips all punctuation and one named complex that doesn't, and you have a query like this... q=Foo Bar qf=simple complex mm=100% mmType=min * Foo produces tokens for all qf * only produces tokens for some qf (complex) * Bar products tokens for all qf your mmType would say there are only 2 tokens that we can query across all fields, so our computed minShouldMatch should be 100% of 2 == 2 sounds good so far right? the problem is you still have query clause coming from that character ... you have 3 real clauses, one of which is that term query for complex: which means that with your (computed) minShouldMatch of 2 you would see matches for any doc that happened to have indexed the symbol in the complex field and also matched *either* of Foo or Bar (in either field) So while a lot of your results would match both Foo and Bar, you'd get still get a bunch of weird results. : Or maybe a feature where you tell dismax, the number of tokens produced : by field X, THAT's the one you should use for your 'term count' for mm, Hmmm maybe. i'd have to see a patch in action and play with it, to really think it through ... hmmm ... honestly i really can't imagine how that would be helpful in general... in order to use a feature like that you'd have to really think hard about the query analysis of your fields, and which ones will produce which tokens in which situations in order to make sure you pick the *right* value for that param -- but once you've done that hard
Re: Exception using Analyze from the Solr Admin app
Karthik, could you attach/pastebin your schema and also the text you're trying to analyze? Regards Stefan On Wed, Jun 22, 2011 at 5:29 PM, karthik kmoha...@gmail.com wrote: any help on this would be really appreciated. i just setup a totally brand new setup of solr still got this exception .. I can see that this would be something to do with classpath, but not able to figure out exactly what is causing this issue. -- karthik On Mon, Jun 13, 2011 at 4:23 PM, karthik kmoha...@gmail.com wrote: Hi Everyone, I am new to the Solr world and just started playing around with it. I had everything up running and suddenly the Analyze functionality started throwing an exception when i tried using it. It was working a few days ago suddenly it stopped working started throwing this exception. This is a Solr 3.1 setup running on tomcat-7.0.11. The exception trace is: - Jun 13, 2011 4:04:19 PM org.apache.solr.common.SolrException log SEVERE: org.apache.jasper.JasperException: javax.servlet.ServletException: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:534) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:442) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:684) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:471) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:402) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:498) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:394) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: javax.servlet.ServletException: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:911) at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:840) at org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:725) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:419) ... 26 more Caused by: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jsp.admin.analysis_jsp.getTokens(analysis_jsp.java:118) at org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:696) ... 29 more - I verified that the webapps/app/WEB-INF/lib has all the latest 3.1.0 JAR files in them. Any pointers to fix this issue would be great. Thanks, Karthik
Re: MultiValued facet behavior question
How is that different from doing a field search and just counting the results? If you only want the facet of the searched term (input), then why not just combine that with the result count and use that? Facets are more useful when you _don't_ know the distribution of values across a result set because they weren't included in the search criteria. Maybe this needs a new name or handler than facet. What am I missing? On 06/22/2011 03:44 AM, Bill Bell wrote: Here is an example using exampledocs and trunk 4.0: http://localhost:8983/solr/select/?q=cat:%22hard%20drive%22version=2.2sta rt=0rows=10indent=onfacet=truefacet.field=catfacet.query={!lucene}cat: %22hard%20drive%22facet.mincount=1 Results: result name=response numFound=2 start=0 Etc lst name=facet_queries int name={!lucene}cat:hard drive2/int /lst lst name=facet_fields lst name=cat int name=electronics2/int int name=hard drive2/int /lst/lst Notice that the facet_queries count 2 is the same as the the numFound=2. But I have no way to use facet.field to count the matches. The algorithm - Loop through multiValued field and match on hard drive. Ignore other values in there when setting the facet list On 6/22/11 1:19 AM, Dennis de Boerdatdeb...@gmail.com wrote: Hi Bill, yes, you absolutely do make sense. I posted the exact same question to this mailing list (subject: faceting on multivalued fields), but got no response out of it. A friend of mine is now helping out. I hope someone on the list can give us some advice. I'll post our findings to this topic. Regards, Dennis On Wed, Jun 22, 2011 at 5:37 AM, Bill Bellbillnb...@gmail.com wrote: Doing it with q=specialities:Cardiologist or q=CardiologistdefType=dismaxqf=specialties does not matter, the issue is how I see facets. I want the facets to only show the one match, and not all the multiValued fields in specialties that match... Example, Name|specialties Bell|Cardiologist Smith|Cardiologist,Family Doctor Adams,Cardiologist,Family Doctor,Internist When I facet.field=specialties I get: Cardiologist: 3 Internist: 1 Family Doctor: 1 I only want it to return: Cardiologist: 3 Because this matches exactly... Facet on the field that matches and only return the number for that. It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 If I searched on q=internistdefType=dismaxqf=specialties, I want the result to be: Internist: 1 Does this all make sense? On 6/21/11 8:23 PM, Darren Govonidar...@ontrenet.com wrote: So are you saying that for all results for cardiologist, you don't want facets not matching Cardiologist to be returned as facets? what happens when you make q=specialities:Cardiologist? instead of just q=Cardiologist? Seems that if you make the query on the field, then all your results will necessarily qualify and you can discard any additional facets you don't want (e.g. that don't match the initial query term). Maybe you can write what you see now, with what you want to help clarify. On 06/21/2011 09:47 PM, Bill Bell wrote: I have a field: specialties that is multiValued. It indicates the doctor's specialties: cardiologist, internist, etc. When someone does a search: Cardiologist, I use q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=spe ci alt ies What I want to come out in the facet is the Cardiologist (since it matches exactly) and the number that matches: 700. I don't want to see the other values that are not Cardiologist. Now I see: Cardiologist: 700 Internist: 45 Family Doctor: 20 This means that several Cardiologist's are also internists and family doctors. When it matches exactly, I don't want to see Internists, Family Doctors. How do I send a query to Solr with a condition. Facet.query=specialties:Cardiologistfacet.field=specialties Then if the query returns something use it, otherwise use the field one? Other ideas?
Re: size of synonyms.txt
On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: While trying some synonyms.txt files I noticed a huge increase of heap usage. synonyms_1.txt -- 6645 lines (2826104 bytes in size) results in 66364 entries in SynonymMap with 730MB heap usage. Startup time about 2 minutes. synonyms_2.txt -- 6645 lines (5384884 bytes in size) results in 115168 entries in SynonymMap with 3.3GB heap usage. Startup time about 4 minutes. What is your size of synonyms.txt? Any limitations (e.g. file size, number of synonyms, ...)? How to deal with _really_ large numbers of synonyms? To the experts: Why not using synonyms from a file, just because memory is faster? Hi, I think we should look at implementing synonyms with an FST, to reduce the ram usage. I also think this would make it easier for us to minimize the number of captureState/restoreState that it does, because it would just be a more natural way to handle all the multi-word cases... this could actually speed up the analysis time for this filter.
Re: size of synonyms.txt
I once tried to load wordnet synsets as a synonym file and it was prohibitively slow and unusable. fyi. On 06/22/2011 12:23 PM, Robert Muir wrote: On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: While trying some synonyms.txt files I noticed a huge increase of heap usage. synonyms_1.txt -- 6645 lines (2826104 bytes in size) results in 66364 entries in SynonymMap with 730MB heap usage. Startup time about 2 minutes. synonyms_2.txt -- 6645 lines (5384884 bytes in size) results in 115168 entries in SynonymMap with 3.3GB heap usage. Startup time about 4 minutes. What is your size of synonyms.txt? Any limitations (e.g. file size, number of synonyms, ...)? How to deal with _really_ large numbers of synonyms? To the experts: Why not using synonyms from a file, just because memory is faster? Hi, I think we should look at implementing synonyms with an FST, to reduce the ram usage. I also think this would make it easier for us to minimize the number of captureState/restoreState that it does, because it would just be a more natural way to handle all the multi-word cases... this could actually speed up the analysis time for this filter.
Re: MultiValued facet behavior question
Okay, so since you put cardiologist in the 'q', you only want facet values that have 'cardiologist' (or 'Cardiologist') to show in up the facet list. In general, there's no good way to do that. But. If you want to do some client-side processing before you submit the query to Solr, and on the client side you can figure out exactly what you want: then you could try to play around with facet.filter or facet.query, to see if you can make it do what you want. It may or may not work out, depending on exactly your use pattern, which you still haven't articulated very well, but you can mess around with it and see what you can do. Ie, if you KNOW (that is, your own app code knows, when creating the Solr request) that you only want the facet value for Cardiologist (including exact case), you can try facet.query=specialty:Cardiologist Your app code would have to pull out the results special too, they won't be in the Solr response in same way ordinary facet.field is. It also requires your query value to match _exactly_ (case, punctuation, etc) the value in the index. Not cardiologist and Cardiologist. I think Solr 3.1 has some regex based facet.filter abilities that might be useful, and help you get around the 'exact match' issues, but watch out for performance. On 6/21/2011 11:37 PM, Bill Bell wrote: Doing it with q=specialities:Cardiologist or q=CardiologistdefType=dismaxqf=specialties does not matter, the issue is how I see facets. I want the facets to only show the one match, and not all the multiValued fields in specialties that match... Example, Name|specialties Bell|Cardiologist Smith|Cardiologist,Family Doctor Adams,Cardiologist,Family Doctor,Internist When I facet.field=specialties I get: Cardiologist: 3 Internist: 1 Family Doctor: 1 I only want it to return: Cardiologist: 3 Because this matches exactly... Facet on the field that matches and only return the number for that. It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 If I searched on q=internistdefType=dismaxqf=specialties, I want the result to be: Internist: 1 Does this all make sense? On 6/21/11 8:23 PM, Darren Govonidar...@ontrenet.com wrote: So are you saying that for all results for cardiologist, you don't want facets not matching Cardiologist to be returned as facets? what happens when you make q=specialities:Cardiologist? instead of just q=Cardiologist? Seems that if you make the query on the field, then all your results will necessarily qualify and you can discard any additional facets you don't want (e.g. that don't match the initial query term). Maybe you can write what you see now, with what you want to help clarify. On 06/21/2011 09:47 PM, Bill Bell wrote: I have a field: specialties that is multiValued. It indicates the doctor's specialties: cardiologist, internist, etc. When someone does a search: Cardiologist, I use q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci alt ies What I want to come out in the facet is the Cardiologist (since it matches exactly) and the number that matches: 700. I don't want to see the other values that are not Cardiologist. Now I see: Cardiologist: 700 Internist: 45 Family Doctor: 20 This means that several Cardiologist's are also internists and family doctors. When it matches exactly, I don't want to see Internists, Family Doctors. How do I send a query to Solr with a condition. Facet.query=specialties:Cardiologistfacet.field=specialties Then if the query returns something use it, otherwise use the field one? Other ideas?
Re: size of synonyms.txt
On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: While trying some synonyms.txt files I noticed a huge increase of heap usage. synonyms_1.txt -- 6645 lines (2826104 bytes in size) results in 66364 entries in SynonymMap with 730MB heap usage. Startup time about 2 minutes. synonyms_2.txt -- 6645 lines (5384884 bytes in size) results in 115168 entries in SynonymMap with 3.3GB heap usage. Startup time about 4 minutes. What is your size of synonyms.txt? Any limitations (e.g. file size, number of synonyms, ...)? How to deal with _really_ large numbers of synonyms? To the experts: Why not using synonyms from a file, just because memory is faster? Hi, I think we should look at implementing synonyms with an FST, to reduce the ram usage. I also think this would make it easier for us to minimize the number of captureState/restoreState that it does, because it would just be a more natural way to handle all the multi-word cases... this could actually speed up the analysis time for this filter. Wow you can read between the lines ;-) Exactly what I have on my mind.
RE: response time for pdf indexing
Hi Rode, Have you seen http://wiki.apache.org/solr/SolrPerformanceFactors ? Steve -Original Message- From: Rode González (libnova) [mailto:r...@libnova.es] Sent: Wednesday, June 22, 2011 11:30 AM To: solr-user@lucene.apache.org Cc: dan...@silvereme.com; Gonzalo Iglesias; Leo; Marcos; Mario Crespo (Silvereme); 'Rode' Subject: response time for pdf indexing Hi ! We are using Zend Search based on Lucene. Our indexing pdf consultations take longer than 2 seconds. We want to change to solr to try to solve this problem. i. Can anyone tell me the response time for querys on pdf documents on solr? ii. Can anyone tell me some strategies to reduce this response time? Note: the pdf is not indexed in a simple way. The pdf is converted to text previously and then, indexed with some additional information needed. Thank you. --- Rode González _ No se encontraron virus en este mensaje. Comprobado por AVG - www.avg.com Versión: 10.0.1382 / Base de datos de virus: 1513/3719 - Fecha de publicación: 06/22/11
Re: Exception using Analyze from the Solr Admin app
Thanks for offering to help Stefan. I just resolved the issue. It was some crazy thing within Tomcat (I still need to find out what it was). I just backed up my old tomcat installation just created a new instance of tomcat deployed my solr installation in there everything started working fine as it was before. I will compare the 2 tomcat folders to see what was different and respond back with my findings. -- karthik On Wed, Jun 22, 2011 at 11:48 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Karthik, could you attach/pastebin your schema and also the text you're trying to analyze? Regards Stefan On Wed, Jun 22, 2011 at 5:29 PM, karthik kmoha...@gmail.com wrote: any help on this would be really appreciated. i just setup a totally brand new setup of solr still got this exception .. I can see that this would be something to do with classpath, but not able to figure out exactly what is causing this issue. -- karthik On Mon, Jun 13, 2011 at 4:23 PM, karthik kmoha...@gmail.com wrote: Hi Everyone, I am new to the Solr world and just started playing around with it. I had everything up running and suddenly the Analyze functionality started throwing an exception when i tried using it. It was working a few days ago suddenly it stopped working started throwing this exception. This is a Solr 3.1 setup running on tomcat-7.0.11. The exception trace is: - Jun 13, 2011 4:04:19 PM org.apache.solr.common.SolrException log SEVERE: org.apache.jasper.JasperException: javax.servlet.ServletException: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:534) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:442) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:684) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:471) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:402) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:498) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:394) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: javax.servlet.ServletException: java.lang.NoSuchMethodError: org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token; at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:911) at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:840) at org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:725) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at
Re: MultiValued facet behavior question
Yeah, I agree with that last statement. It seems to me that the use case where it _might_ matter is where you have a query for MORE than one. q=cardiologist OR family and in that case, it MIGHT be useful to separate the facets in a XOR sense where you don't get cross-pollution. But the original poster didn't indicate this scenario originally. Maybe for that, Solr's grouping mechanism will help? Although I have not used it myself. On 06/22/2011 09:31 AM, lee carroll wrote: Hi Dennis, I think maybe I just disagree. Your not showing facet counts for cardiologists and Family Doctors independently. The Family Doctor count will be all Family Doctors who are also Cardiologists. This allows users to further filter Cardiologists who are also family Doctors. (this could be of use to them ??) If your front end app implements the filtering as a list of fq=xxx then that would make for consistent results ? I don't see how not showing that some cardiologists are also Family Doctors is a better user experience... But again you might have a very specific use case? On 22 June 2011 13:44, Dennis de Boerdatdeb...@gmail.com wrote: Hi Lee, since I have the same problem, I might as well try to answer this question. You want this behaviour to make things clear for your users. If they select cardiologists, does it make sense to also show family doctors as a facetvalue to the user. The same thing goed for the facets that are related to family doctors. They are returned as well, thus making it even moren unclear for the end-user. On Wed, Jun 22, 2011 at 2:27 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Bill, So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? Why do you need this ? If your use case is faceted navigation then not showing all the facet terms which match your query would be mis-leading to your users. The fact is your data indicates Ben the cardiologist is also a GP etc. Is it not valid for your users to be able to further filter on cardiologists who are also specialists in x other disciplines ? If the specialisms are mutually exclusive then your data will reflect this. The fact is x number of cardiologists match and x number of GP's match etc I may be missing the point here as you have not said why you need to do this ? cheers lee c On 22 June 2011 09:34, Michael Kuhlmanns...@kuli.org wrote: Am 22.06.2011 09:49, schrieb Bill Bell: You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't get Cardiologist: 3 as the result but cardiolog: 3 or something like that instead. Normally, you use declare facet field explicitly for facetting, and not for searching, exactly because stemming and tokenizing on facet fields don't make sense. And the short answer is: No, that's not possible. -Kuli
Re: MultiValued facet behavior question
An interesting live scenario for this matter: http://www.bondfaro.com.br/ (brazilian site) The query ipad returns results spread across many categories (links on the left, teasers in the center). The Tablet category (facet) is one of them. The query tablet does exactly the same as clicking Tablet in the search for ipad. Note the breadcrumb (InícioInformáticaTablettablet) In this case of a broad term, that exactly matches a product facet, it totally makes sense for the user. In general, it tends to make more sense as the search bias from full-text to structured metadata. So, is it possible to turn q=cardiologist into q=specialities:Cardiologist by boosting an exact match on a facet label?
Re: SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x
I just tried branch_3x and couldn't reproduce this. Looks like maybe there is something wrong with your build, or some old class files left over somewhere being picked up. -Yonik http://www.lucidimagination.com On Wed, Jun 22, 2011 at 10:15 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Today's checkout (Solr Specification Version: 3.4.0.2011.06.22.16.10.08) produces the exception below on start up. The same exception with very similar strack trace comes when committing and add. Example schema and docs will reproduce the error. Jun 22, 2011 4:11:57 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoSuchFieldError: core at org.apache.lucene.index.SegmentTermDocs.init(SegmentTermDocs.java:48) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:491) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1005) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:484) at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:321) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:101) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:524) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
sorting by termfreq on trunk doesn't work?
I am trying to use sorting by the termfreq function using the trunk code since termfreq was added in the 4.0 code base. I run this query: http://127.0.0.1:8983/solr/select/?q=librariansort=termfreq(all_lists_text,librarian)%20desc but I get: HTTP ERROR 500 Problem accessing /solr/select/. Reason: null java.lang.NullPointerException at org.apache.solr.search.function.TermFreqValueSource$1.reset(TermFreqValueSource.java:53) at org.apache.solr.search.function.TermFreqValueSource$1.init(TermFreqValueSource.java:49) at org.apache.solr.search.function.TermFreqValueSource.getValues(TermFreqValueSource.java:44) at org.apache.solr.search.function.ValueSource$ValueSourceComparator.setNextReader(ValueSource.java:188) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:313) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1190) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1078) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:346) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:400) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Is termfreq stable and how can I run this query? -- - sent from my mobile 6176064373
Re: sorting by termfreq on trunk doesn't work?
Thanks for the problem report. It turns out we didn't check for a null pointer when there were no terms in a field for a segment. I've just committed a fix to trunk. -Yonik http://www.lucidimagination.com On Wed, Jun 22, 2011 at 10:28 PM, Jason Toy jason...@gmail.com wrote: I am trying to use sorting by the termfreq function using the trunk code since termfreq was added in the 4.0 code base. I run this query: http://127.0.0.1:8983/solr/select/?q=librariansort=termfreq(all_lists_text,librarian)%20desc but I get: HTTP ERROR 500 Problem accessing /solr/select/. Reason: null java.lang.NullPointerException at org.apache.solr.search.function.TermFreqValueSource$1.reset(TermFreqValueSource.java:53) at org.apache.solr.search.function.TermFreqValueSource$1.init(TermFreqValueSource.java:49) at org.apache.solr.search.function.TermFreqValueSource.getValues(TermFreqValueSource.java:44) at org.apache.solr.search.function.ValueSource$ValueSourceComparator.setNextReader(ValueSource.java:188) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:313) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1190) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1078) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:346) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:400) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Is termfreq stable and how can I run this query? -- - sent from my mobile 6176064373
Re: Search is taking long-long time.
Was your searches always slow, OR, since you did some changes at index/config/schema level? Is it due to 5-mins index updation? Are you warming ur searches? Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Search-is-taking-long-long-time-tp3095306p3098552.html Sent from the Solr - User mailing list archive at Nabble.com.
Query time noun, verb boosting
Hi, At the query time, I want to make the lucene query such that it should boost only the noun from the query or some concept existing in the index. Are there any possibilities or any possible ideas that can be worked around? Regards, Pooja
Re: Query time noun, verb boosting
What would you mean by 'noun or some concept'. Would be better if you could give a rather concrete example. About detecting parts of speech, you could use a lot of libraries but I didn't get about boosting terms from the Index. -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Jun 23, 2011 at 11:02 AM, Pooja Verlani pooja.verl...@gmail.comwrote: Hi, At the query time, I want to make the lucene query such that it should boost only the noun from the query or some concept existing in the index. Are there any possibilities or any possible ideas that can be worked around? Regards, Pooja
Re: Query time noun, verb boosting
Hi, Say for example, a query like mammohan singh dancing, I am preferring to make a compulsory condition on nouns to be searched but any verb isnt important for me, I am preferring to extract results for manmohan singh and not for dancing. If I can extract noun verb or can get to know that in my index I have a concept of manmohan singh or an identity if not concept, I would like to define rules for doing a strict(compulsory) match of noun(concept) and loose match(non-compulsory boosting) for the verb. Basically, I want to avoid getting zero results for a compulsory match of the 3 tokens(in this case manmohan singh dancing) of the query and instead I want to do a compulsory match on manmohan singh since that exists in my index and dancing shouldn't be a compulsory match for non-zero number of results. Hope this explains. Any suggestions? Regards, Pooja On Thu, Jun 23, 2011 at 11:07 AM, Anshum ansh...@gmail.com wrote: What would you mean by 'noun or some concept'. Would be better if you could give a rather concrete example. About detecting parts of speech, you could use a lot of libraries but I didn't get about boosting terms from the Index. -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Jun 23, 2011 at 11:02 AM, Pooja Verlani pooja.verl...@gmail.com wrote: Hi, At the query time, I want to make the lucene query such that it should boost only the noun from the query or some concept existing in the index. Are there any possibilities or any possible ideas that can be worked around? Regards, Pooja