Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues
Thanks Eric. Here's the part which I'm not able to understand. I've for e.g. Source A, B, C and D in index. Each source contains n number of documents. Now, out of these, a bunch of documents in A and B are tagged with MediaType. I took the following steps: 1. Delete all documents tagged with MediaType for A and B. Documents from C and D are not touched. 2. Re-Index documents which were tagged with MediaType 3. Run Optimization Still, I keep seeing this exception. Does this mean, content from C and D are impacted even though they are not tagged with MediaType ? I'll follow your recommendation of creating a new collection, do a full index and delete original collection. -- View this message in context: http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219127.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues
I didn't use the REST API, instead updated the schema manually. Can you be specific on removing the data directory content ? I certainly don't want to wipe out the index. I've four Solr instances, 2 shards with a replica each. Are you suggesting clearing the index and re-indexing from scratch ? -- View this message in context: http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html Sent from the Solr - User mailing list archive at Nabble.com.
Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues
) ... 37 more Here's the current field definition : field name=MediaType type=string indexed=true stored=true multiValued=true required=false omitNorms=true / I've re-indexed the documents, ran optimization on all four instances, still I'm seeing the same error. I'm bit puzzled to figure out the root cause. Do I need to delete the documents tagged with MediaType and re-index ? I'm getting results back if I don't use result grouping. Any pointers will be appreciated. - Thanks, Shamik
Combining two MLT queries
Just wondering if it's possible to combine to separate MLT queries (based on filtering condition) into a single one. I'm trying to combine the results of this two query: http://localhost:8983/solr/collection1/mlt?q=title:ABCfq=Source:(Test1 OR Test3 OR Test4) http://localhost:8983/solr/collection1/mlt?q=title:ABCfq=Source:(Test2) The Source field filter differs in both cases. What I'm looking is to combine the top 4 from query 1 and top 4 from query 2. I was exploring the option to combine into a single query instead of two. Is it possible ? Any pointer will be appreciated. -Thanks, Shamik
Re: Issue with German search
Anyone ? -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue with German search
Thanks Doug. I'm using eDismax Here's my Solr query : http://localhost:8983/solr/testhandlerdeu?debugQuery=trueq=title_deu:Software%20und%20Downloads Here's my request handler. requestHandler name=/testhandlerdeu class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=wtvelocity/str str name=v.templatebrowse/str str name=v.contentTypetext/html;charset=UTF-8/str str name=v.layoutlayout/str str name=v.channeltesthandler/str str name=titleTest Request Handler German/str str name=defTypeedismax/str str name=q.opAND/str str name=q.alt*:*/str str name=rows15/str str name=fl*,score/str str name=qfname_deu^1.2 title_deu^10.0 description_deu^5.0 /str str name=dftext_deu/str str name=faceton/str str name=facet.mincount1/str str name=facet.limit-1/str str name=facet.sortindex/str str name=facet.methodenum/str str name=facet.fieldcat/str str name=facet.fieldmanu_exact/str str name=facet.fieldcontent_type/str str name=facet.fieldauthor/str str name=hltrue/str str name=hl.tag.pre/str str name=hl.tag.post/str str name=hl.flname subject description_deu name_deu title_deu/str str name=hl.encoderhtml/str str name=f.subject.hl.fragsize20/str str name=f.description_fra.hl.fragsize20/str str name=f.name_fra.hl.fragsize20/str str name=hl.usePhraseHighlighterfalse/str str name=hl.useFastVectorHighlightertrue/str str name=hl.boundaryScannerbreakIterator/str str name=hl.bs.typeSENTENCE/str str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206341.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue with German search
Thanks a ton Doug, I should have figured this out, pretty stupid of me. Appreciate your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206357.html Sent from the Solr - User mailing list archive at Nabble.com.
Issue with German search
Hi, I'm having an issue with searching a term in german. Here's the keyword(s) I'm trying to search -- Software und Downloads I've a document indexed in German with the same title -- Software und Downloads I'm expecting that the search on Software und Downloads will return this document, unfortunately it's not happening. Here's my sample test scenario from my local machine. In schema, I've defined these three fields. field name=title_deu type=adsktext_deu indexed=true stored=true multiValued=true / field name=name_deu type=adsktext_deu indexed=true stored=true termVectors=true termPositions=true termOffsets=true/ field name=description_deu type=adsktext_deu indexed=true stored=true termVectors=true termPositions=true termOffsets=true/ Field Type definition : !-- German language specific definitions -- fieldType name=adsktext_deu class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=lang/dictionary_de.txt / filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German2/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=lang/dictionary_de.txt / filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German2/ /analyzer /fieldType When I ran a sample analysis of Software und Downloads, the term is indexed as softwar softoft download ad During query, it's getting searched as softwar download Not sure, why it's not returning the document. Here's the sample data indexed through solr.xml under example docs. doc field name=id12234!SOLR11092212/field field name=name_deuTest Name/field field name=title_deuSoftware und Downloads/field field name=description_deudiv#actcontain { width: 100%; min-width: 220px; display: block; float: left; padding: 0 8px 0 0; } div#actcopy { width: 48%; min-width: 230px; min-height: 120px; float: left; display: inline-block; padding: 0 28px 0 0; margin: 10px 0 0 0; overflow: hidden; } Häufige ThemenDownload-Verfahren im Autodesk-KontoDownload-Verfahren für Education Community (Schüler, Studenten und Lehrkräfte)Suchen von Service Packs, Hotfixes und SprachpaketenSoftware-Lizenzen im Autodesk-Store kaufenSuchen kostenloser Testversion-Downloads Download-VerfahrenHerunterladen von Software aus verschiedenen Speicherorten, abhängig von Ihrem Konto oder dem Subscription-TypenNutzung am Heimarbeitsplatz für AbonnentenDesktop Subscription können lizenzierte Software zur Verwendung auf ihrem Computer zu Hause erhaltenProdukterweiterungen für AbonnentenExklusiver Zugriff auf die neueste Software für einige Autodesk-ProdukteBestellen einer Software-DVDSo bestellen Sie eine DVD oder einen USB-Stick für Ihre SoftwareAktuelle Versionen für AbonnentenSubscription-Kunden haben Zugriff auf Produkt-Updates, die während der Vertragslaufzeit verfügbar sind.VorgängerversionenErfahren Sie, wie Sie eine Vorgängerversion Ihrer Autodesk-Software erhaltenSprachoptionenHerunterladen der lizenzierten Software in einer anderen Sprache oder Erhalten eines Sprachpakets. /field field name=authorBob/field /doc Any pointers will be appreciated. -Thanks, Shamik
Re: Grouping Performance Optimation
You should look at CollapsinQParserPlugin. It's much faster compared to a Grouping query. https://wiki.apache.org/solr/CollapsingQParserPlugin It has a limitation though, check the following JIRA if it might affect your use-case. https://issues.apache.org/jira/browse/SOLR-6143 -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-Performance-Optimation-tp4201886p4202032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to update config file using zkcli or RELOAD
Ok, I figured the steps in case someone needs a reference. It required both zkcli and RELOAD to update the changes. 1. Use zkcli to load the changes. I ran it from the node which used the bootstrapping. sh zkcli.sh -cmd upconfig -zkhost zoohost1:2181 -confname myconf -solrhome /mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/ 2. Use the same node to run the RELOAD http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOADcore=collection1 -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-update-config-file-using-zkcli-or-RELOAD-tp4197376p4197393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to update config file using zkcli or RELOAD
Thanks Shawn for the pointer, really appreciate it. -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-update-config-file-using-zkcli-or-RELOAD-tp4197376p4197494.html Sent from the Solr - User mailing list archive at Nabble.com.
Unable to update config file using zkcli or RELOAD
Hi, I'm facing a weird issue. I've a solr cloud cluster with 2 shards having a replica each. I started the cluster using -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf. After the cluster is up and running, I added a new request handler (newhandler) and wanted to push it without restarting the server. First, I tried the RELOAD option. I ran http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOADcore=collection1 The command was successful, but when I logged in to the admin screen, the solrconfig didn't show the request handler. Next I tried the zkcli script on shard 1. sh zkcli.sh -cmd upconfig -zkhost zoohost1:2181 -confname myconf -solrhome /mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/ The script ran successfully and I could see the updated solrconfig file in Solr admin. But then, when I tried http://54.151.xx.xxx:8983/solr/collection1/newhandler I got a 404. Not sure what I'm doing wrong. Do I need to run the zkcli script on each node? I'm using Solr 5.0. Regards, Shamik
Uneven index distribution using composite router
Hi, I'm using a three level composite router in a solr cloud environment, primarily for multi-tenant and field collapsing. The format is as follows. *language!topic!url*. An example would be : ENU!12345!www.testurl.com/enu/doc1 GER!12345!www.testurl.com/ger/doc2 CHS!67890!www.testurl.com/chs/doc3 The Solr Cloud cluster contains 2 shard, each having 3 replicas. After indexing around 10 million documents, I'm observing that the index size in shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is getting indexed in shard 1. Since 60% of the document is english, I expect the index size to be higher on one shard, but the difference seem little too high. The idea is to make sure that all ENU!12345 documents are routed to one shard so that distributed field collapsing works. Is there something I can do differently here to make a better distribution ? Any pointers will be appreciated. Regards, Shamik
Re: Uneven index distribution using composite router
Thanks for your reply Eric. In my case, I've 14 languages, out of which 50% of the documents belong to English. German and CHS will probably constitute another 25%. I'm not using copyfield, rather, each language has it's dedicated field such as title_enu, text_enu, title_ger,text_ger, etc. Since I know the language prior to index time, this works for, me. I've added one more sample key in the example. ENU!12345!www.testurl.com/enu/doc1 ENU!12345!www.testurl.com/enu/doc10 GER!12345!www.testurl.com/ger/doc2 CHS!67890!www.testurl.com/chs/doc3 As you can see, there are 2 documents in english having same topic id (12345). I added topicid as part of the key to make sure that they are residing in the same shard in order to make field collapsing work on topic id. I can perhaps remove the composite key and only have language and url, something like, ENU!www.testurl.com/enu/doc1 But that'll probably not solve the distribution issue. You mentioned when you take over routing, making sure the distribution is even is now your responsibility. I'm wondering, what's the best practice to make it happen ? I can get away from composite router and manually assign a bunch of language to a dedicated shard, both during index and query time. But I'm not sure keeping a map is an efficient way of dealing with it. -- View this message in context: http://lucene.472066.n3.nabble.com/Uneven-index-distribution-using-composite-router-tp4195569p4195591.html Sent from the Solr - User mailing list archive at Nabble.com.
Uneven data distribution with composite router
Hi, I'm using a three level composite router in a solr cloud environment, primarily for multi-tenant and field collapsing. The format is as follows. *language!topic!url*. An example would be : ENU!12345!www.testurl.com/enu/doc1 GER!12345!www.testurl.com/ger/doc2 CHS!67890!www.testurl.com/chs/doc3 The Solr Cloud cluster contains 2 shard, each having 3 replicas. After indexing around 10 million documents, I'm observing that the index size in shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is getting indexed in shard 1. Since 60% of the document is english, I expect the index size to be higher on one shard, but the difference seem little too high. The idea is to make sure that all ENU!12345 documents are routed to one shard so that distributed field collapsing works. Is there something I can do differently here to make a better distribution ? Any pointers will be appreciated. Regards, Shamik
Problem with Terms Query Parser
Hi, I'm trying to use Terms Query Parser for one of my use cases where I use an implicit filter on bunch of sources. When I'm trying to run the following query, fq={!terms f=Source}help,documentation,sfdc I'm getting the following error. lst name=errorstr name=msgUnknown query parser 'terms'/strint name=code400/int/lst What am I missing here ? I'm using Solr 5.0 version. Any pointers will be appreciated. Regards, Shamik
Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping
Hi, I've a field which is being used for result grouping. Here's the field definition. field name=ADDedup type=string indexed=true stored=true multiValued=false required=false omitNorms=true docValues=true/ This started once I did a rolling update from 4.7 to 5.0. I started getting the error on any group by query -- SolrDispatchFilter null:java.lang. IllegalStateException: unexpected docvalues type NONE for field 'ADSKDedup' (expected=SORTED). Use UninvertingReader or dex with docvalues. Does this mean that I need to re-index documents to get over this error ? Regards, Shamik
Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping
Wow, optimize worked like a charm. This really addressed the docvalues issue. A follow-up question, is it recommended to run optimize in a Production Solr index ? Also, in a Sorl cloud mode, do we need to run optimize on each instance / each shard / any instance ? Appreciate your help Alex. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping
Well, I think I've narrowed down the issue. The error is happening when I'm trying to do a rolling update from Solr 4.7 (which is our current version) to 5.0 . I'm able to re-produce this couple of times. If I do a fresh index on a 5.0, it works. Not sure if there's any other way to mitigate it. I'll appreciate if someone can share their experience on the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping
Looks like it's happening for any field which is using docvalues. java.lang.IllegalStateException: unexpected docvalues type NONE for field 'title_sort' (expected=SORTED). Use UninvertingReader or index with docvalues. Any idea ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-0-IllegalStateException-unexpected-docvalues-type-NONE-on-result-grouping-tp4192477p4192529.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping
Thanks for your reply. Initially, I was under the impression that the issue is related to grouping as group queries were failing. Later, when I looked further, I found that it's happening for any field for which the docvalue has turned on. The second example I took was from another field. Here's a full stack trace for another field using docvalues. Field definition : field name=DocumentType type=string indexed=true stored=true multiValued=false required=false omitNorms=true docValues=true / 3/11/2015, 2:14:30 PM ERROR SolrDispatchFilter null:java.lang.IllegalStateException: unexpected docvalues type NONE for field 'DocumentType' (expected=SORTED). Use UninvertingReader or index with docvalues. null:java.lang.IllegalStateException: unexpected docvalues type NONE for field 'DocumentType' (expected=SORTED). Use UninvertingReader or index with docvalues. at org.apache.lucene.index.DocValues.checkField(DocValues.java:208) at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264) at org.apache.lucene.search.FieldComparator$TermOrdValComparator.getSortedDocValues(FieldComparator.java:757) at org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:762) at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:183) at org.apache.lucene.search.TopFieldCollector$NonScoringCollector.getLeafCollector(TopFieldCollector.java:141) at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:99) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:583) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:284) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:231) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1766) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1502) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:586) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:511) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:227) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
Re: How to start solr in solr cloud mode using external zookeeper ?
The other way you can do that is to specify the startup parameters in solr.in.sh. Example : SOLR_MODE=solrcloud ZK_HOST=zoohost1:2181,zoohost2:2181,zoohost3:2181 SOLR_PORT=4567 You can simply start solr by running ./solr start -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does DocValues improve Grouping performance ?
Joel, To give you some context, we are running queries against 6 million documents in a Solr cloud environment. The grouping is done to de-duplicate content based on an unique field. Unfortunately, due to some requirement constraint, the only way for us to run the de-duplication is during query time. The group numbers are pretty high in our case. Average distinct group is around 1000. The total number of distinct group for the field is around 10k. Phrase queries are especially worse,averaging a response time of 10-12 secs. Having said that, CollapsingQParserPlugin makes a huge difference in performance, only caveat being the lack of support for group.facets equivalent. I had this discussion earlier with you where you had confirmed it http://lucene.472066.n3.nabble.com/RE-SOLR-6143-Bad-facet-counts-from-CollapsingQParserPlugin-td4140455.html#a4146645 Are there any plans to address this ? Not sure if it's a big change at your end, but if something we can contribute to add it, I'm more than happy to help. I know there are a bunch of people who are looking forward to this. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-DocValues-improve-Grouping-performance-tp4179926p4184295.html Sent from the Solr - User mailing list archive at Nabble.com.
Include stopwords in phrase search
Hi, I'm having an issue running phrase quires with stopwords. Looks like Solr is ignoring the stopword during search. Here's my search term. cannot open device When I'm executing title:cannot open device , it's bringing back titles with Find Open Devices. Here's my field definition for title : field name=title type=adsktext indexed=true stored=true multiValued=true/ fieldType name=adsktext class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType Sample text : doc field name=id111!SOLR1000/field field name=nameSolr, the Enterprise Search Server/field field name=titleFind Open Devices/field /doc doc field name=id333!SOLR1002/field field name=nameElasticSearch Server/field field name=titleCannot open device/field /doc I've cannot as part of my stopword list. Weird part is, when I analyze the phrase in Solr admin, it's getting indexed as the following three tokens : cannot open devic I'm in Solr 4.7, so not sure if enablePositionIncrements=true is making any difference. Any feedback will be appreciated. Thanks, Shamik
Re: Include stopwords in phrase search
Well, I somehow made it work by using CommonGramsFilterFactory. filter class=solr.CommonGramsFilterFactory words=stopwords.txt ignoreCase=true/ Just wondering if it's the right approach ? -- View this message in context: http://lucene.472066.n3.nabble.com/Include-stopwords-in-phrase-search-tp4184067p4184068.html Sent from the Solr - User mailing list archive at Nabble.com.
Issue with Solr multiple sort
Hi, I'm facing a problem with multiple field sort in Solr. I'm using the following fields in sort : PublishDate asc,DocumentType asc The sort is only happening on PublishDate, DocumentType seemsto completely ignored. Here's my field type definition. field name=PublishDate type=tdate indexed=true stored=true default=NOW/ field name=DocumentType type=string indexed=true stored=true multiValued=false required=false omitNorms=true/ Here's the sample query: http://localhost:8983/solr/select?sort=PublishDate+desc%2CDocumentType+descq=cat:searchfl=PublishDate,DocumentTypedebugQuery=true Here's the output : result name=response numFound=8 start=0 doc date name=PublishDate2015-01-17T00:00:00Z/date str name=DocumentTypeHotfixes/str /doc doc date name=PublishDate2014-11-17T00:00:00Z/date str name=DocumentTypeHotfixes/str /doc doc date name=PublishDate2013-01-17T00:00:00Z/date str name=DocumentTypeTutorials/str /doc doc date name=PublishDate2012-10-17T00:00:00Z/date str name=DocumentTypeService Packs/str /doc doc date name=PublishDate2012-01-17T00:00:00Z/date str name=DocumentTypeTutorials/str /doc doc date name=PublishDate2011-01-17T00:00:00Z/date str name=DocumentTypeTutorials /str /doc doc date name=PublishDate2006-01-17T00:00:00Z/date str name=DocumentTypeObject Enablers/str /doc doc date name=PublishDate2006-01-17T00:00:00Z/date str name=DocumentTypeHotfixes/str /doc /result As you can see, the sorting happened only on PublishDate. I'm using Solr 4.7. Not sure what I'm missing here, any pointers will be appreciated. Thanks, Shamik
Re: Issue with Solr multiple sort
Thanks Hoss for clearing up my doubt. I was confused with the ordering. So I guess, the first field is always the primary sort field followed by secondary. Thanks again. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-Solr-multiple-sort-tp4181056p4181062.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Conditions in function query
This one worked. if(termfreq(Source,'A'),sum(Likes,3),if(termfreq(Source,'B'),sum(Likes,3),0)) -- View this message in context: http://lucene.472066.n3.nabble.com/Conditions-in-Boost-function-query-tp4179687p4179906.html Sent from the Solr - User mailing list archive at Nabble.com.
Does DocValues improve Grouping performance ?
Hi, Does use of DocValues provide any performance improvement for Grouping ? I' looked into the blog which mentions improving Grouping performance through DocValues. https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/ Right now, Group by queries (which I can't sadly avoid) has become a huge bottleneck. It has an overhead of 60-70% compared to the same query san group by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it doesn't have a support similar to group.facet feature. My understanding on DocValues is that it's intended for faceting and sorting. Just wondering if anyone have tried DocValues for Grouping and saw any improvements ? -Thanks, Shamik
Conditions in function query
Hi, Just wanted to know if it's possible to provide conditions with a function query. Right now,I'm using the following functions to boost on Likes data. bf=recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0 sum(Likes,2) What I would like to do is to apply the boost on Likes based on source. For e.g. if Source=A or B or C, then sum(Likes,4) if Source=D then sum(Likes,3) if Source=E the sum(Likes,2). Is it possible to do this using a function ? Any pointers will be appreciated. Regards, Shamik
Re: Conditions in function query
Thanks Eric, I did take a look at the if condition earlier, but not sure how that can be used for multiple conditions. It works for a single condition : if(termfreq(Source2,'A'),sum(Likes,3),0) But for multiple, I'm struggling to find the right syntax. I tried using OR in conjunction but hasn't worked out so far. -- View this message in context: http://lucene.472066.n3.nabble.com/Conditions-in-Boost-function-query-tp4179687p4179696.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
Anyone ? -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174069.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
Ted, Here's the query I'm using and the debug info. It's still returning all 5 results back as if it's simply looking for either of the term with q.op set as OR (default). http://localhost:8983/solr/autophrase?q=text:seat+cushionswt=xmldebugQuery=true Debug lst name=debug str name=rawquerystringtext:seat cushions/str str name=querystringtext:seat cushions/str str name=parsedquerytext:seat text:cushion/str str name=parsedquery_toStringtext:seat text:cushion/str lst name=explain str name=2 0.430151 = (MATCH) sum of: 0.11124363 = (MATCH) weight(text:seat in 1) [DefaultSimilarity], result of: 0.11124363 = score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 0.5085423 = queryWeight, product of: 1.0 = idf(docFreq=5, maxDocs=6) 0.5085423 = queryNorm 0.21875 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=5, maxDocs=6) 0.21875 = fieldNorm(doc=1) 0.31890735 = (MATCH) weight(text:cushion in 1) [DefaultSimilarity], result of: 0.31890735 = score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 0.86103696 = queryWeight, product of: 1.6931472 = idf(docFreq=2, maxDocs=6) 0.5085423 = queryNorm 0.37037593 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.6931472 = idf(docFreq=2, maxDocs=6) 0.21875 = fieldNorm(doc=1) /str str name=6 0.430151 = (MATCH) sum of: 0.11124363 = (MATCH) weight(text:seat in 5) [DefaultSimilarity], result of: 0.11124363 = score(doc=5,freq=1.0 = termFreq=1.0 ), product of: 0.5085423 = queryWeight, product of: 1.0 = idf(docFreq=5, maxDocs=6) 0.5085423 = queryNorm 0.21875 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=5, maxDocs=6) 0.21875 = fieldNorm(doc=5) 0.31890735 = (MATCH) weight(text:cushion in 5) [DefaultSimilarity], result of: 0.31890735 = score(doc=5,freq=1.0 = termFreq=1.0 ), product of: 0.86103696 = queryWeight, product of: 1.6931472 = idf(docFreq=2, maxDocs=6) 0.5085423 = queryNorm 0.37037593 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.6931472 = idf(docFreq=2, maxDocs=6) 0.21875 = fieldNorm(doc=5) /str str name=1 0.06356779 = (MATCH) product of: 0.12713557 = (MATCH) sum of: 0.12713557 = (MATCH) weight(text:seat in 0) [DefaultSimilarity], result of: 0.12713557 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 0.5085423 = queryWeight, product of: 1.0 = idf(docFreq=5, maxDocs=6) 0.5085423 = queryNorm 0.25 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=5, maxDocs=6) 0.25 = fieldNorm(doc=0) 0.5 = coord(1/2) /str str name=3 0.06356779 = (MATCH) product of: 0.12713557 = (MATCH) sum of: 0.12713557 = (MATCH) weight(text:seat in 2) [DefaultSimilarity], result of: 0.12713557 = score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.5085423 = queryWeight, product of: 1.0 = idf(docFreq=5, maxDocs=6) 0.5085423 = queryNorm 0.25 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=5, maxDocs=6) 0.25 = fieldNorm(doc=2) 0.5 = coord(1/2) /str str name=5 0.055621814 = (MATCH) product of: 0.11124363 = (MATCH) sum of: 0.11124363 = (MATCH) weight(text:seat in 4) [DefaultSimilarity], result of: 0.11124363 = score(doc=4,freq=1.0 = termFreq=1.0 ), product of: 0.5085423 = queryWeight, product of: 1.0 = idf(docFreq=5, maxDocs=6) 0.5085423 = queryNorm 0.21875 = fieldWeight in 4, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=5, maxDocs=6) 0.21875 = fieldNorm(doc=4) 0.5 = coord(1/2) /str /lst str name=QParserLuceneQParser/str Sample data add doc field name=id1/field field name=nameDoc 1/field field name=textThis has a rear window defroster and really cool bucket seats./field /doc doc field name=id2/field field name=nameDoc 2/field field name=textThis one has rear seat cushions and air conditioning – what a ride!/field /doc doc field name=id3/field field name=nameDoc 3/field field name=textThis one has gold seat belts front and rear./field /doc doc field name=id4/field field name=nameDoc 4/field field name=textThis one has front and side air bags and a heated seat.The fan belt never breaks./field /doc doc field name=id5/field field name=nameDoc 5/field field name=textThis one has big rear wheels and a seat cushion.It doesn't have a timing belt./field /doc
Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
Jim, Thanks for your response. I've tried including AutoPhrasingTokenFilterFactory as part of the query analyzer, but didn't make any difference. fieldType name=text_autophrase class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory phrases=autophrases.txt includeTokens=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory phrases=autophrases.txt includeTokens=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I'll try out your version and post my observation. Just curios, what version of Solr are you using? -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174096.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
Ted, Thanks a lot, I had gone through your blogs but the white space issue slipped out of my mind. replaceWhitespaceWith addressed the issue. I think it's a great filter to have, surely takes care of an important use case. Appreciate your help. -Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174113.html Sent from the Solr - User mailing list archive at Nabble.com.
Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
Hi, I'm trying to use AutoPhrasingTokenFilterFactory which seems to be a great solution to our phrase query issues. But doesn't seem to work as mentioned in the blog : https://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/ The tokenizer is working as expected during query time, where it's preserving the phrases as a single token based on the text file. Here's my field definition : fieldType name=text_autophrase class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory phrases=autophrases.txt includeTokens=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.KStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.KStemFilterFactory / /analyzer /fieldType On analyzing, I can see the phrase seat cushions (defined in autophrases.txt) is being indexed as seat, seat cushions and cushion. The problem is during the query time. As per the blog, the request handler needs to use a custom query parser to achieve the result. Here's my entry in solrconfig. requestHandler name=/autophrase class=solr.SearchHandler lst name=defaults str name=wtvelocity/str str name=v.templatebrowse/str str name=v.layoutlayout/str str name=titleSolritas/str str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=defTypeautophrasingParser/str /lst /requestHandler queryParser name=autophrasingParser class=com.lucidworks.analysis.AutoPhrasingQParserPlugin str name=phrasesautophrases.txt/str /queryParser But if I query seat cushions using this request handler, it's seemed to be treating the query as two separate terms and returning all results matching seat and cushion. Not sure what I'm missing here. I'm using Solr 4.10. The other question I had is whether com.lucidworks.analysis.AutoPhrasingQParserPlugin supports the edismax features which is my default parser. I'll appreciate if anyone provide their feedback. -Thanks Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808.html Sent from the Solr - User mailing list archive at Nabble.com.
Has anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
Hi, I'm trying to use AutoPhrasingTokenFilterFactory which seems to be a great solution to our phrase query issues. But doesn't seem to work as mentioned in the blog : https://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/ The tokenizer is working as expected during query time, where it's preserving the phrases as a single token based on the text file. Here's my field definition : fieldType name=text_autophrase class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory phrases=autophrases.txt includeTokens=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.KStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.KStemFilterFactory / /analyzer /fieldType On analyzing, I can see the phrase seat cushions (defined in autophrases.txt) is being indexed as seat, seat cushions and cushion. The problem is during the query time. As per the blog, the request handler needs to use a custom query parser to achieve the result. Here's my entry in solrconfig. requestHandler name=/autophrase class=solr.SearchHandler lst name=defaults !-- VelocityResponseWriter settings -- str name=wtvelocity/str str name=v.templatebrowse/str str name=v.layoutlayout/str str name=titleSolritas/str str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=defTypeautophrasingParser/str /lst /requestHandler queryParser name=autophrasingParser class=com.lucidworks.analysis.AutoPhrasingQParserPlugin str name=phrasesautophrases.txt/str /queryParser But if I query seat cushions using this request handler, it's seemed to be treating the query as two separate terms and returning all results matching seat and cushion. Not sure what I'm missing here. I'm using Solr 4.10. The other question I had is whether com.lucidworks.analysis.AutoPhrasingQParserPlugin supports the edismax features which is my default parser. I'll appreciate if anyone provide their feedback. -Thanks Shamik
Highlighting simple.pre and simple.post values getting ignored
Hi, I'm facing a weird issue where the specified hl.simple.pre and hl.simple.post values for highlighting is getting ignored. In my test handler, I've the following entry: !-- Highlighting defaults -- str name=hltrue/str str name=hl.simple.pre![CDATA[span class=vivbold qt0]]/str str name=hl.simple.post![CDATA[/span]]/str str name=hl.flname subject/str str name=hl.encoderhtml/str str name=f.subject.hl.fragsize200/str str name=hl.usePhraseHighlighterfalse/str str name=hl.useFastVectorHighlightertrue/str str name=hl.boundaryScannerbreakIterator/str searchComponent class=solr.HighlightComponent name=highlight highlighting fragmenter name=gap default=true class=solr.highlight.GapFragmenter lst name=defaults int name=hl.fragsize100/int /lst /fragmenter fragmenter name=regex class=solr.highlight.RegexFragmenter lst name=defaults int name=hl.fragsize70/int float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str /lst /fragmenter formatter name=html default=true class=solr.highlight.HtmlFormatter lst name=defaults str name=hl.simple.pre![CDATA[span class=vivbold qt0]]/str str name=hl.simple.post![CDATA[/span]]/str /lst /formatter encoder name=html class=solr.highlight.HtmlEncoder / fragListBuilder name=simple class=solr.highlight.SimpleFragListBuilder/ fragListBuilder name=single class=solr.highlight.SingleFragListBuilder/ fragListBuilder name=weighted default=true class=solr.highlight.WeightedFragListBuilder/ !-- default tag FragmentsBuilder -- fragmentsBuilder name=default default=true class=solr.highlight.ScoreOrderFragmentsBuilder /fragmentsBuilder !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder boundaryScanner name=default default=false class=solr.highlight.SimpleBoundaryScanner lst name=defaults str name=hl.bs.maxScan10/str str name=hl.bs.chars.,!? #9;#10;#13;/str /lst /boundaryScanner boundaryScanner name=breakIterator class=solr.highlight.BreakIteratorBoundaryScanner lst name=defaults !-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -- str name=hl.bs.typeSENTENCE/str !-- language and country are used when constructing Locale object. -- !-- And the Locale object will be used when getting instance of BreakIterator -- str name=hl.bs.languageen/str str name=hl.bs.countryUS/str /lst /boundaryScanner /highlighting /searchComponent As you can see, I've specified the simple.pre and simple.post values in the request handler as well as under standard formatter. But, search result is always wrapping the term with em/em, not sure where is this value coming from. There's no reference of it in solrconfig file. Looks like it's ignoring the value from solrconfig and defaulting it to em. Can someone provide any pointer ? I'm using Solr 4.7. Thanks, Shamik
Re: Highlighting simple.pre and simple.post values getting ignored
Looks like this has to do with the selection of fast vector and breakIterator as boundary scanner. I'm using them to make sure that the highlighted snippet starts from the beginning of a sentence and not from the middle. str name=hl.usePhraseHighlighterfalse/str str name=hl.useFastVectorHighlightertrue/str str name=hl.boundaryScannerbreakIterator/str Now, if I don't use them, I'm getting the right pre and post tags. str name=hlon/str str name=hl.fltitle name/str str name=hl.encoderhtml/str str name=hl.simple.pre/str str name=hl.simple.post/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldmanu/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldname/str str name=f.content.hl.snippets3/str str name=f.content.hl.fragsize200/str Do i need any separate setting or breakIterator to support custom pre and post tags? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-simple-pre-and-simple-post-values-getting-ignored-tp4168657p4168662.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting simple.pre and simple.post values getting ignored
Found the issue, to use FastVectorHighlighter, the pre and post tag syntax are different str name=hl.tag.pre/str str name=hl.tag.post/str This worked out as expected. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-simple-pre-and-simple-post-values-getting-ignored-tp4168657p4168663.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost Query (bq) syntax/usage
Thanks a lot Jack, it makes total sense. I check the config and default q.op was set to OR, which was influencing the query. -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-Query-bq-syntax-usage-tp4161989p4162169.html Sent from the Solr - User mailing list archive at Nabble.com.
Boost Query (bq) syntax/usage
Hi, I'm little confused with the right syntax of defining boost queries. If I use them in the following way: http://localhost:8983/solr/testhandler?q=Application+Managerbq=(Source2:sfdc^6 Source2:downloads^5 Source2:topics^3)debugQuery=true it gets translated to -- arr name=parsed_boost_queries str +Source2:sfdc^6.0 +Source2:downloads^5.0 +Source2:topics^3.0 /str /arr Now, if I use the following query: http://localhost:8983/solr/testhandler?q=Application+Managerbq=Source2:sfdc^6bq=Source2:downloads^5bq=Source2:topics^3debugQuery=true gets translated as -- arr name=parsed_boost_queries strSource2:sfdc^6.0/str strSource2:downloads^5.0/str strSource2:topics^3.0/str /arr Both queries generate different result in terms of relevancy. Just wondering what is the right way of using bq ? -Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-Query-bq-syntax-usage-tp4161988.html Sent from the Solr - User mailing list archive at Nabble.com.
Boost Query (bq) syntax/usage
Hi, I'm little confused with the right syntax of defining boost queries. If I use them in the following way: http://localhost:8983/solr/testhandler?q=Application+Managerbq=(Source2:sfdc^6 Source2:downloads^5 Source2:topics^3)debugQuery=true it gets translated to -- arr name=parsed_boost_queries str +Source2:sfdc^6.0 +Source2:downloads^5.0 +Source2:topics^3.0 /str /arr Now, if I use the following query: http://localhost:8983/solr/testhandler?q=Application+Managerbq=Source2:sfdc ^6bq=Source2:downloads^5bq=Source2:topics^3debugQuery=true gets translated as -- arr name=parsed_boost_queries strSource2:sfdc^6.0/str strSource2:downloads^5.0/str strSource2:topics^3.0/str /arr Both queries generate different result in terms of relevancy. Just wondering what is the right way of using bq ? -Thanks
Re: Boost Query (bq) syntax/usage
Thanks a lot Jack, makes sense. Just curios, if we used the following bq entry in solrconfig xml str name=bqSource2:sfdc^6 Source2:downloads^5 Source2:topics^3/str will it always be treated as an AND query ? Some of local results suggests otherwise. -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-Query-bq-syntax-usage-tp4161989p4161994.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr query field (qf) conditional boost
Hi, I'm trying to check if it's possible to include a conditional boosting in Solr qf field. For e.g. I've the following entry in qf parameter. str name=qftext^0.5 title^10.0 ProductLine^5/str What I'm looking is to add the productline boosting only for a given Author field, something in the lines boost ProductLine^5 if Author:Tom. I've been using a similar filtering in appends section, but not sure how to do it in qf or whether it's possible. lst name=appends str name=fqAuthor:(Tom +Solution:yes) /str /lst Any pointers will be appreciated. Thanks, Shamik
RE: Solr query field (qf) conditional boost
Thanks Markus. Well, I tried using a conditional if-else function, but it doesn't seem to work for boosting field. What I'm trying to do is boost ProductLine field by 5, if the result documents contain Author = 'Tom'. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-field-qf-conditional-boost-tp4161783p4161797.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr query field (qf) conditional boost
Thanks Markus, let me play around with the functions and see if I can achieve the results. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-field-qf-conditional-boost-tp4161783p4161803.html Sent from the Solr - User mailing list archive at Nabble.com.
How to query certain fields filtered by a condition
Hi, Just wanted to understand if it's possible to limit a searchable field only to specific documents during query time. Following are my searchable fields. str name=qftext^0.5 title^10.0 country^1.0/str What I want is to make country a searchable field only for documents which contain author:Robert. For remaining documents, country should not be considered as a searchable field, only text and title will come into play. So If I search for usa, it should bring result from documents where author=Robert (by matching country field), but not for remaining authors even if they've a country field with value usa. I don't how it can be done during query time or if it's possible at all through some function queries. The other option is to add the country value as part of title or text for documents containing Author:Robert during index time. But I would like to know if its possible during query time. Appreciate your feedback. -Thanks, Shamik
Re: How to query certain fields filtered by a condition
Thanks Jack for your reply ... I'm sorry but I'm not too clear on the solution you proposed. Can you please provide a sample on what you suggested ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-query-certain-fields-filtered-by-a-condition-tp4161815p4161827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Czech stemmer
Lucas, Thanks for the information. I took the dictionary and used hunspell stemmer. It worked for the use-case I had mentioned, i.e. posunout and posunulo. But, it had an impact on other search terms. For e.g. a search term ukončit or ukončí is not returning any result, though they work with CzechStemFilterFactory. I know there'll be trade-offs with various stemmers, but not sure which one fits the bill. Being an alien to Czech language doesn't help the cause either. Thanks, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/Czech-stemmer-tp4157675p4158301.html Sent from the Solr - User mailing list archive at Nabble.com.
Czech stemmer
Hi, I'm facing stemming issues with the Czech language search. Solr/Lucene currently provides CzechStemFilterFactory as the sole option. Snowball Porter doesn't seem to be available for Czech. Here's the issue. I'm trying to search for posunout (means move in English) which returns result, but fails if I use ''posunulo (means moved in English). I used the following text as field for search. Pomocí multifunkčních uzlů je možné odkazy mnoha způsoby upravovat. Můžete přidat a odstranit odkazy, přidat a odstranit vrcholy, prodloužit nebo přesunout prodloužení čáry nebo přesunout text odkazu. Přístup k požadované možnosti získáte po přesunutí ukazatele myši na uzel. Z uzlu prodloužení čáry můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout prodloužení odkazové čáry. Délka prodloužení čáry: Umožňuje prodloužit prodloužení čáry. Přidat odkaz: Umožňuje přidat jednu nebo více odkazových čar. Z uzlu koncového bodu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout koncový bod odkazové čáry. Přidat vrchol: Umožňuje přidat vrchol k odkazové čáře. Odstranit odkaz: Umožňuje odstranit vybranou odkazovou čáru. Z uzlu vrcholu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout vrchol. Přidat vrchol: Umožňuje přidat vrchol na odkazovou čáru. Odstranit vrchol: Umožňuje odstranit vrchol. Just wondering if there's a different stemmer available or a way to address this. Schema : fieldType name=text_csy class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_cz.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms_csy.txt ignoreCase=true expand=true/ filter class=solr.CzechStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_cz.txt / filter class=solr.CzechStemFilterFactory/ /analyzer /fieldType Any pointers will be appreciated. - Thanks, Shamik
Re: solr query gives different numFound upon refreshing
I've noticed similar behavior with our Solr cloud cluster for a while, it's random though. We've 2 shards with 3 replicas each. At times, I've observed that the same query on refresh will fetch different results (numFound) as well as the content. The only way to mitigate is to refresh the index with the documents till the nodes are in sync. I always use SolrJ which talks to Solr through zookeeper, even with that it seemed to be unavoidable at times. We are committing every 10 mins. I'm pretty much sure there's a minor glitch which creates a sync issue at times. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR-6143 Bad facet counts from CollapsingQParserPlugin
Are there any plans to release this feature anytime soon ? I think this is pretty important as a lot of search use case are dependent on the facet count being returned by the search result. This issue renders renders the CollapsingQParserPlugin pretty much unusable. I'm now reverting back to the old group query (painfully slow) since I can't use the facet count anymore. -- View this message in context: http://lucene.472066.n3.nabble.com/RE-SOLR-6143-Bad-facet-counts-from-CollapsingQParserPlugin-tp4140455p4146645.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does solrj support partial update for solr cloud?
Yes it does and pretty straight forward. Refer to following url : http://heliosearch.org/solr/atomic-updates/ http://www.mumuio.com/solrj-4-0-0-alpha-atomic-updates/ -- View this message in context: http://lucene.472066.n3.nabble.com/Does-solrj-support-partial-update-for-solr-cloud-tp4146654p4146660.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get related facets using Solr query ?
Thanks for the pointer Eric. You are right, I forgot to include IJK under AB. Also, facet field names are different. Unfortunately, I'm using Solrcloud and facet pivot doesn't seem to work in a distributed mode. I'll get back some result if I use distrib=false, but then it's not the right data. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-related-facets-using-Solr-query-tp4145580p4145684.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MLT weird behaviour in Solrcloud
Anyone ? -- View this message in context: http://lucene.472066.n3.nabble.com/MLT-weird-behaviour-in-Solrcloud-tp4145066p4145502.html Sent from the Solr - User mailing list archive at Nabble.com.
How to get related facets using Solr query ?
Hi, I've trying construct a facet query to organize related facets in the response. Let me illustrate a sample. Let's say I've the following documents indexed in Solr. 1. Doc A -- Facet:AB Facet:MNO 2. Doc B -- Facet:CD Facet:XYZ 3. Doc C -- Facet:AB,CD Facet:IJK, XYZ Now, I want the result organized as : AB MNO,XYZ CD IJK,XYZ Is there a way to do this ? Thanks, Shamik
MLT weird behaviour in Solrcloud
Hi, I'm trying to use mlt request handler in a Solrcloud cluster. Apparently, its showing some weird behavior. I'm getting response randomly, it's able to return results randomly for the same query. I'm using Solrj client which in turn communicates the cluster using zookeeper ensemble. Here's my mlt request handler. !-- mlt request handler -- requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=omitHeadertrue/str str name=echoParamsexplicit/str str name=wtvelocity/str str name=v.templatebrowse/str str name=v.contentTypetext/html;charset=UTF-8/str str name=v.layoutlayout/str str name=v.channelmlt/str str name=titleProject Sunshine - Mlt/str str name=mlt.fltitle,text,language,caaskey/str int name=mlt.mintf2/int int name=mlt.mindf1/int int name=mlt.minwl3/int int name=mlt.maxwl1000/int int name=mlt.maxqt50/int int name=mlt.maxntp5000/int str name=rows4/str bool name=mlt.boosttrue/bool str name=mlt.qftitle,textlanguage,caaskey/str !--str name=mlt.interestingTermsdetails/str-- !-- Shard Tolerant -- str name=shards.toleranttrue/str lst name=appends str name=fqSource2:(TestSource OR help/str /lst str name=shards.qt/mlt/str /lst /requestHandler Here's a sample query : http://stage-int***.com/solr/mlt?fq=language:englishfq={!collapse field=dedup}q=caaskey:caas/documentation/files/GUID-EDC69C3shards.qt=/mltshard.keys=enu/8!wt=xml I've tried removing collapsing and composite key from the query, but it didn't make any difference. I've 2 shards with a replica each. Weird part is, same shard/replica which returns result for a given request, behaves differently next time, i.e. doesn't return data at all. If I use any other request handler, I'm getting response back for the given query. So, something is not right with the mlt request handler. Is this a known issue with solrcloud ? Any pointer will be appreciated. Thanks, Shamik
Re: MLT weird behaviour in Solrcloud
Sorry, that's a typo when I copied the mlt definition from my solrconfig, but there's comma in my test environment. It's not the issue. -- View this message in context: http://lucene.472066.n3.nabble.com/MLT-weird-behaviour-in-Solrcloud-tp4145066p4145145.html Sent from the Solr - User mailing list archive at Nabble.com.
Can we do conditional boosting using edismax ?
Hi, I'm using edismax parser to perform a runtime boosting. Here's my sample request handler entry. str name=qftext^2 title^3/str str name=bqSource:Blog^3 Source2:Videos^2/str str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str As you can see, I'm adding weights to text and title, as well as, boosting on source. What I'm trying to see is if there's a way to change the the weights based on Source.E.g. for source Blog, I would like to have the following boost text^3 title^2 while for source Videos , I prefer text^2 title^3. Any pointers will be appreciated. Thanks, Shamik
Re: Can we do conditional boosting using edismax ?
Thanks Ahmet, I'll give it a shot. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-we-do-conditional-boosting-using-edismax-tp4141131p4141268.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with French stopword filter
centre du cercle dont l'arc fait partie. Point de départ Spécifiez le point de départ de l'arc. Extrémité Trace un arc dans le sens trigonométrique depuis le point de départ (2) jusqu'au point situé sur une demi-droite imaginaire tracée du centre (1) jusqu'au point final (3). Angle Dessine un arc dans le sens trigonométrique à partir du point de départ (2), en utilisant un centre (1), avec un angle décrit spécifié. Si l'ange est négatif, un arc est tracé dans le sens horaire. Longueur de corde Trace un grand ou un petit arc en respectant la distance en ligne droite entre le point de départ et le point d'arrivée. Si la longueur de corde est positive, le petit arc est tracé dans le sens trigonométrique à partir du point de départ. Si la longueur de corde est négative, le grand arc est tracé dans le sens trigonométrique. Tangente à la dernière ligne, à l'arc ou à la polyligne Dessine un arc tangent à la dernière ligne, à la polyligne ou à l'arc dessiné lorsque vous appuyez sur ENTREE à la première invite. Extrémité de l'arc Spécifiez un point (1). /field /doc Query = http://localhost:8983/solr/browse?q=arc de cercle When I ran through the query term in admin analysis, the stopword filter seemed to be working , but not when the actual search is happening. Any pointers will be appreciated. Thanks, Shamik
Re: Problem with French stopword filter
Turned out to be a weird exception. Apparently, the comments in the stopwords_fr.txt disrupts the stop filter factory. After I stripped off the comments, it worked as expected. Referred to this thread : http://mail-archives.apache.org/mod_mbox/lucene-dev/201309.mbox/%3CJIRA.12668581.1379112889603.133757.1379118831671@arcas%3E -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-French-stopword-filter-tp4138545p4138550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with French stopword filter
I found the issue. It had to do with edismax qf entry in request handler. I had the following entry : str name=qfname_fra^1.2 title_fra^10.0 description_fra^5.0 author^1/str Except for author, all other fields are of type adsktext_fra, while author was of the type text_general, which uses english stopfilter. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-French-stopword-filter-tp4138545p4138561.html Sent from the Solr - User mailing list archive at Nabble.com.
Question on 3-level composite-id routing
Hi, Need some clarification on multilevel composite-id routing in SolrCloud .I'm currently composite id routing using the following pattern *topic!url* . This is aimed at run-time de-duplication based on topic field. As I'm adding support for language search, I felt the need to include language parameter for better multi-tenancy. Here's the new key structure I'm thinking of -- *language!topic!url*. An example would be : english!12345!www.testurl.com Now, during query time, I'll always have language parameter at my disposal. I was thinking,of leveraging the shard.key parameter to specify *shard.keys=language! *, which will route the request to the right shard and bring back english content. Is this a valid assumption ? Also, as per my understanding, the three fields will default to 8,8 and 16 bits of the routing hash. What'll be a valid scenario for providing a custom allocation of bits for these fields. I was referring to the following article http://searchhub.org/2014/01/06/10590/ , but was not entirely sure on this section. *At query time:* *To query all records for myapp: shard.keys=myapp/8!* *Note the explicit mention of 8 bits in case of querying by component 1 only i.e. app level. This is required because the usage of the router as 2 or 3 level isn’t implicit. Specifying ’8′ bits for the component highlights the use of ’3′ level router*. Any feedback will be much appreciated. Thanks, Shamik
Re: Question on 3-level composite-id routing
Awesome, thanks a lot Anshum, makes total sense now. Appreciate your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-3-level-composite-id-routing-tp4137044p4137071.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What are the best practices on Multiple Language support in Solr Cloud ?
Thanks Nicole. Leveraging dynamic field definitions is a great idea. Probably work for me as I've a bunch of fields which are indexed as String. Just curious about the sharding, are you using Solr Cloud. I thought of taking the dedicated shard / core route , but then, as using a composite key (for dedup), managing dedicated core can cause issues at times. As far as single field representation, thanks for validating my concern. Probably its best to use when you've to address a multi-lingual search. -- View this message in context: http://lucene.472066.n3.nabble.com/What-are-the-best-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html Sent from the Solr - User mailing list archive at Nabble.com.
What are the best practices on Multiple Language support in Solr Cloud ?
Hi, I'm trying to implement multiple language support in Solr Cloud (4.7). Although we've different languages in index, we were only supporting english in terms of index and query. To provide some context, our current index size is 35 GB with close to 15 million documents. We've two shards with two replicas per shard. I'm using composite id to support de-duplication, which puts the documents having the same field (dedup) value to a specific shard. Language is known prior to for every document being indexed. That saves the need for runtime language detection. Similarly, during query, the language will be known as well. To extend it, there's no need for multi-lingual support. Based on my understanding so far, there are three approaches which are widely adopted. Multi-field indexing, Multi-Core indexing and Multiple language in one field (based from Solr in Action). First option seems easy to implement. But then, I've around 40 fields which are getting indexed currently, though a majority of them are type=string and not being analyzed. I'm planning to support around 10 languages, which translates to 400 field definitions in the same schema. And this is poised to grow with addition of languages and fields. My apprehension is whether this approach becomes a maintenance nightmare ? Does it affect overall scalability ? Does is affect any existing features like Suggester, Spellcheck, etc. ? I was thinking of including language as part of the id key. It'll look like Language!Dedup_id!url so that documents are spread across the two shards. Second option of a dedicated core sounds easy in terms of maintaining config files. Also,routing requests will be fairly easy as the language will be always known up-front,both during indexing and query time. But, as I looked into the documents, 60% of our total index will be in English, while rest 40% will constitute remaining 10-14 languages. Some language content are in few thousands which perhaps doesn't merit a dedicate core. On top of that, this approach has the potential of getting into a complex infrastructure, which might be hard to maintain. I read about the use of multiple language in a single field in Trey Grainger's book. It looks like a great approach but not sure if it is meant to address my scenario. My first impression is that it's more geared towards supporting multi-lingual, but I maybe completely wrong. Also, this is not supported by Solr / Lucene out of the box. I know there's a lot of people in this group who have excelled as far as supporting multiple language in Solr is concerned. I'm trying to gather their inputs / experience on the best practice to help me decide the right approach. Any pointer on this will be highly appreciated. Thanks, Shamik
Solr 4.7 not showing parsedQuery / parsedquery_toString information
,f.ADSKDocumentType.facet.mincount=1,f.ADSKAudience.facet.limit=-1,isShard=true,f.ADSKProductLine.facet.limit=-1}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={ADSKProductLine={},ADSKContentGroup={},ADSKReleaseYear={},ADSKHelpTopic={},ADSKDocumentType={},ADSKAudience={}},facet_dates={},facet_ranges={}},debug={}}/str /lst /lst /lst lst name=explain / /lst Here's the sample query : http://localhost:8983/solr/adskhelpportal?q=How%20can%20I%20obtain%20local%20offline%20Helpwt=xmldebugQuery=truerows=1 I'm using SolrCloud with 2 shards and a replica each. I'm getting parsedQuery / parsedQueryString information if I use the earlier version. Do I change something in the configuration ? Any pointers will be appreciated. Thanks, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-7-not-showing-parsedQuery-parsedquery-toString-information-tp4132964.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.7 not showing parsedQuery / parsedquery_toString information
Help,f.ADSKDocumentType.facet.mincount=1,f.ADSKAudience.facet.limit=-1,isShard=true,f.ADSKProductLine.facet.limit=-1}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={ADSKProductLine={},ADSKContentGroup={},ADSKReleaseYear={},ADSKHelpTopic={},ADSKDocumentType={},ADSKAudience={}},facet_dates={},facet_ranges={}},debug={}}/str /lst /lst /lst lst name=explain / /lst Here's the sample query : http://localhost:8983/solr/adskhelpportal?q=How%20can%20I%20obtain%20local%20offline%20Helpwt=xmldebugQuery=truerows=1 I'm using SolrCloud with 2 shards and a replica each. I'm getting parsedQuery / parsedQueryString information if I use the earlier version. Do I change something in the configuration ? Any pointers will be appreciated. Thanks, Shamik
Re: CollapsingQParserPlugin returning different result set
Joel, I had a discussion with you earlier related ngroup inconsistent number when you suggested to use the composite id to make sure that identical (ADSKDedup) fields are available in the same shard. Here's the thread -- http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-td4111331.html After making that change, the number of results returned matched with the numfound parameter. I'm using the same setup after I upgraded to Solr 4.7 and started using CollapsingQParserPlugin API. I take a quick look at some of the ids, the composite ids look to be correct. One thing I've noticed is that the difference in relevance and number seems to be directly proportional to the number of documents in the result. I'll try to create a small set of documents in a local Solr cloud and see if I can replicate the problem. In that way, it'll be probably easy for you to look. Regards, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/CollapsingQParserPlugin-returning-different-result-set-tp4123716p4125290.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CollapsingQParserPlugin returning different result set
Hi Joel, Thanks for taking a look into this. Here's the information you had requested.*ADSKDedup:*I've attached separate files for debug information for each query.Let me know if you need any information.Regards,Shamik CollapsingQParserPlugin_Query_Debug.txt http://lucene.472066.n3.nabble.com/file/n4124968/CollapsingQParserPlugin_Query_Debug.txt Group_Query_Debug.txt http://lucene.472066.n3.nabble.com/file/n4124968/Group_Query_Debug.txt -- View this message in context: http://lucene.472066.n3.nabble.com/CollapsingQParserPlugin-returning-different-result-set-tp4123716p4124968.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud - inconsistent result for the same query
Hi, I'm using SolrCloud 4.4 version with 2 shards having 2 replica each. Lately, I'm observing issues where an obsolete document will suddenly show up in search result. I'm crawling a bunch of source system on a daily basis and updating the Solr index. Now, when I'm searching for a specific content based on the url , it suddenly returns an old content which was updated by the last crawl. This behavior is in-consistent, seems like it randomly picks the old and new content. Here's the field signature in question. field name=ADSKCaasContent type=string indexed=false stored=true multiValued=true required=false/ The field is not indexed and being used for storing the data. I'm using a composite key which distributes documents among shards based on a specific field. I can't think of any possible reason except for Solr cache. Based on the Solr logs, it looks like, one of the shard/replicas are holding on to the old value for some reason. I'm using a haproxy to perform a round-robin request to any of the 6 servers (2 shard, 4 replicas). Ideally, a full crawl should have updated the cache with the new set of data. I even re-started the instance, but the problem seems to persist. I'll appreciate if someone can provide their feedback. Regards, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-inconsistent-result-for-the-same-query-tp4125005.html Sent from the Solr - User mailing list archive at Nabble.com.
CollapsingQParserPlugin returning different result set
Hi, I recently upgraded to 4.7, with the aim of replacing group queries with CollapsingQParserPlugin. As I'm comparing results between the two APIs, CollapsingQParserPlugin seems to be way off, in terms of relevancy and result count. Here's an example : *Group query* http://test-dev.mydomain.com/solr/adskhelpportal?fq=language:(english)wt=xmlrows=40start=0fq=(ContentGroup-local:Learn Explore OR ContentGroup-local:Getting Started OR ContentGroup-local:Troubleshooting)fq=Product:PRDq=linesort=score descgroup=truegroup.field=ADSKDedupgroup.ngroups=truefl=title,ADSKDedup,scoredebugQuery=true /Top 4 results/ lst name=grouped lst name=ADSKDedup int name=matches14593/int int name=ngroups*13648*/int arr name=groups lst str name=groupValuefbfef4647e68c2300eba99028f2598a9/str result name=doclist numFound=1 start=0 doc str name=ADSKDedupfbfef4647e68c2300eba99028f2598a9/str arr name=title strLINE/str /arr float name=score8.517085/float /doc /result /lst lst str name=groupValueGUID-E8C1190C-A26C-484C-ADDD-DDF81666F69F/str result name=doclist numFound=3 start=0 doc arr name=title strLINE (Command)/str /arr str name=ADSKDedupGUID-E8C1190C-A26C-484C-ADDD-DDF81666F69F/str /doc /result /lst lst str name=groupValueGUID-695722CD-A131-48DB-9AB8-162F0832FE04/str result name=doclist numFound=4 start=0 doc str name=ADSKDedupGUID-695722CD-A131-48DB-9AB8-162F0832FE04/str arr name=title strAbout Controlling Extension Lines/str /arr float name=score5.1433907/float /doc /result /lst lst str name=groupValueGUID-9084DAC2-D5B7-4727-A443-205007A79440/str result name=doclist numFound=4 start=0 doc arr name=title strAbout Controlling Dimension Lines/str /arr str name=ADSKDedupGUID-9084DAC2-D5B7-4727-A443-205007A79440/str float name=score5.1361656/float /doc /result /lst *CollapsingQParserPlugin query* http://test-dev.mydomain.com/solr/adskhelpportal?fq=language:(english)wt=xmlrows=15start=0fq=(ContentGroup-local:Learn Explore OR ContentGroup-local:Getting Started OR ContentGroup-local:Troubleshooting)fq=ProductLine:PRDq=linesort=score descfq={!collapse field=ADSKDedup}fl=title,ADSKDedup,scoredebugQuery=true /Top 4 results/ result name=response numFound=27142 start=0 maxScore=8.517085 doc str name=ADSKDedupfbfef4647e68c2300eba99028f2598a9/str arr name=title strLINE/str /arr float name=score8.517085/float /doc doc str name=ADSKDedupGUID-57CDDB6C-B12B-46CE-B9C5-22EFC17258FF/str arr name=title strTo Draw Lines/str /arr float name=score6.276938/float /doc doc arr name=title strDraw Lines/str /arr str name=ADSKDedup98b4a0e39400f0a216ff51a89922ce82/str float name=score6.224089/float /doc doc str name=ADSKDedup4e51abdc0e8d30e77069505d93c1d4d4/str arr name=title strLines Tab/str /arr float name=score6.210026/float /doc As you can see, the results are completely off, except for the first one. Moreover, the number of results returned are different as well. Group query has 13648 results which CollapsingQParserPlugin returns 27142, almost twice the size. I'm little baffled why the two APIs are returning different results for the same query. Are they fundamentally different ? Any pointers will be appreciated. -Thanks, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/CollapsingQParserPlugin-returning-different-result-set-tp4123716.html Sent from the Solr - User mailing list archive at Nabble.com.
Weird behavior of stopwords in search query
Hi, I'm observing a weird behavior while using stopwords as part of the search query. I'm able to replicate it in standalone Solr instance well. The issue pops up when I'm trying to use other and and stopword together in a query string. The query doesn't return any result. But it works with any other combination. For e.g. 1. query yields no result -- http://localhost:8983/solr/collection1/browse?q=AWS+other+and+SearchdebugQuery=truewt=xml Debug Query : str name=rawquerystringAWS other and Search/str str name=querystringAWS other and Search/strstr name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5/no_coord/str str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5) +(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5))/str 2. query yields result -- http://localhost:8983/solr/collection1/browse?q=AWS+other+an+SearchdebugQuery=truewt=xml Debug Query - str name=rawquerystringAWS other an Search/str str name=querystringAWS other an Search/strstr name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 | cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5/no_coord/str str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5) (id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5))/str Both other and and are part of the stopwords list. I ran an analysis on text_general field, both stopwords were shows as ignored during indexing and query time, but not happening during actual search. Not sure what I'm missing here, any pointers will be appreciated. - Thanks, Shamik
Re: Weird behavior of stopwords in search query
Jack, thanks for the pointer. I should have checked this closely. I'm using edismax and here's my qf entry : str name=qf id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 /str As you can see, I was boosting id and cat which are of type string and of course doesn't go through the stopwords filter. Removing them returned one result which is based on AND operator. The part what I'm not clear is how and is being treated even through its a stopword and the default operator is OR. Shouldn't this be ignored ? -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fault Tolerant Technique of Solr Cloud
As Shawn had pointed, if you are using CloudSolrServer client, then you are immune to the scenario where a shard and its replica(s) go down. The communication should be ideally with the zookeepers and not the solr servers directly, One thing you need to make sure is to add the shard.tolerant parameter so that the query returns result from the shard which is alive, though it'll fetch a partial resultset. -- View this message in context: http://lucene.472066.n3.nabble.com/Fault-Tolerant-Technique-of-Solr-Cloud-tp4118003p4118196.html Sent from the Solr - User mailing list archive at Nabble.com.
Weird issue with q.op=AND
Hi, I'm facing a weird problem while using q.op=AND condition. Looks like it gets into some conflict if I use multiple appends condition in conjunction. It works as long as I've one filtering condition in appends. lst name=appends str name=fqSource:TestHelp/str /lst Now, the moment I add an additional parameter, search stops returning any result. lst name=appends str name=fqSource:TestHelp | Source:TestHelp2/str /lst If I remove q.op=AND from request handler, I get results back. Data is present for both the Source I'm using, so it's not a filtering issue. Even a blank query fails to return data. Here's my request handler. requestHandler name=/testhandler class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=wtvelocity/str str name=v.templatebrowse/str str name=v.contentTypetext/html;charset=UTF-8/str str name=v.layoutlayout/str str name=v.channeltesthandler/str str name=defTypeedismax/str str name=q.opAND/str str name=q.alt*:*/str str name=rows15/str str name=flid,url,Source2,text/str str name=qftext^1.5 title^2/str str name=bqSource:TestHelp^3 Source:TestHelp2^0.85/str str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str str name=dftext/str !-- facets -- str name=faceton/str str name=facet.mincount1/str str name=facet.limit100/str str name=facet.fieldlanguage/str str name=facet.fieldSource/str !-- Highlighting defaults -- str name=hltrue/str str name=hl.fltext title/str str name=f.text.hl.fragsize250/str str name=f.text.hl.alternateFieldShortDesc/str !-- Spell check settings -- str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str !-- Shard Tolerant -- str name=shards.toleranttrue/str /lst lst name=appends str name=fqSource:TestHelp | Source2:TestHelp2/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Not sure what's going wrong. I'm using a SolrCloud environment with 2 shards having a replica each. Any pointers will be appreciated. Thanks, Shamik
Re: Weird issue with q.op=AND
Thanks a lot Shawn. Changing the appends filtering based on your suggestion worked. The part which confused me bigtime is the syntax I've been using so far without an issue (barring the q.op part). lst name=appends str name=fqSource:TestHelp | Source:downloads | -AccessMode:internal | -workflowparentid:[* TO *]/str /lst This has been working as expected and applies the filter correctly. Just curious, if its an invalid syntax, how's Solr handling this ? -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117022.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Weird issue with q.op=AND
Thanks, I'll take a look at the debug data. -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117047.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing question on individual field update
Eric, Thanks for your reply. I should have given a better context. I'm currently running an incremental crawl daily on this particular source and indexing the documents. Incremental crawl looks for any change since last crawl date based on the document publish date. But, there's no way for me to know if a document has been deleted. To ensure that, I ran a full crawl on a weekend, which basically re-index the entire content. After the full index is over, I call a purge script, which deletes any content which is more than 24 hour old, based on the indextimestamp field. The issue with atomic update is that it doesn't alter the indextimstamp field. So even if I run a full crawl with atomic updates, the timestamp will stick to its old value. Unfortunately, I can't rely on another date field coming from the source as they are not consistent. That translates to the fact that I can't remove stale content. Let me know if I'm missing something here. - Thanks, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116757.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing question on individual field update
Ok, I was wrong here. I can always set the indextimestamp field with current time (NOW) for every atomic update. On a similar note, is there any performance constraint with updates compared to add ? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116772.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing question on individual field update
Thanks Eric and Shawn, appreciate your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116831.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing question on individual field update
Hi, I'm currently indexing a bunch of fields for a given document. For e.g. let's assume there's a field called rating. The rating field is not part of the original document during index, so the value is blank. The field gets updated by an external service when the document is rated by users. The service makes a partial Solr update and sets the appropriate rating value. But, when I re-index the same document, the rating fields get over-written and reset to blank. I understand that an indexing in Solr is delete and add, but is there a way to put a conditional indexing at the field level, which will keep the value if its already present in the index for a given id ? Any pointers will be appreciated. Thanks, Shamik
Re: SolrCloud Result Grouping vs CollapsingQParserPlugin
Thanks Joel, really appreciate your help. I'll keep an eye on the 4.6.1 release. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111486.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Result Grouping vs CollapsingQParserPlugin
Hi, I'm planning to upgrade to Solr 4.6 to move from using Result Grouping to CollapsingQParserPlugin. I'm currently using SolrCloud, couple of issues with Result Grouping are : 1. Slow performance 2. Incorrect result count from ngroup My understanding is that CollapsingQParserPlugin is aimed at addressing the performance issue with Result Grouping. Based on the available documentation, I'm not sure if CollapsingQParserPlugin addresses the result count when the collapse field is spread across shards. The Result Grouping ngroup currently works if the groups are not distributed and confined to a dedicated shard. Just wondering if this applies to CollapsingQParserPlugin as well ? Will result name=response *numFound=6* start=0 be incorrect if the collapsed field is distributed ? I'll really appreciate if someone can provide pointers on this. Thanks, Shamik
Re: SolrCloud Result Grouping vs CollapsingQParserPlugin
Joel, Thanks for the pointer. I went through your blog on Document routing, very informative. I do need some clarifications on the implementation. I'll try to run it based on my use case. I'm indexing documents from multiple source system out of which a bunch consist of duplicate content. I'm trying to remove them by applying result grouping / CollapsingQParserPlugin. For e.g. lets say I've source ABC, MNO and XYZ. Now, ABC and MNO source contains the duplicate documents, which is identified by a field say adskdedup. I've couple of shards, the id being the url of the documents. Now, to make field collapsing work, I need to update the id field to include adskdedup!url . Documents having identical adskdedup values should route to a dedicated shard , e.g. shard1. The ones which are not identical will be routed to either Shard1 or Shard2. After the indexing is done, shard1 should have all documents on which grouping needs to be applied upon. During query time, depending on the query, results can be returned from both shards. For e.g. a query q=solrgroup=truegroup.field=adskdedupgroup.ngroups=true would ideally return data from both shards and apply the grouping on shard1 based on adskdedup field. This will also ensure that group.ngroups=true will return the right count. The other clarification I wanted was based on this statement : When a tenant is too large to fit on a single shard it can be spread across multiple shards be specifying the number of bits to use from the shard key. If we split shards, will Result Grouping / CollapsingQParserPlugin and number of results still work ? Last but not the least, when are you planning to release 4.6.1 ? Again, appreciate your help on this. - Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questionon CollapsingQParserPlugin
Thanks Joel, I found the issue. It had to do with the schema definition for adskdedup field. I had defined it as a text_general which was analyzing it based on -. After I changed it to type string, it worked as expected. Thanks for looking into this. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Questionon-CollapsingQParserPlugin-tp4111357p4111376.html Sent from the Solr - User mailing list archive at Nabble.com.
Questionon CollapsingQParserPlugin
Hi, I'm looking for some clarification on CollapsingQParserPlugin feature. Here's what I tried. I downloaded 4.6, updated solr.xml under exampledocs folder and added the following entry. I've added a new field adskdedup on which I'm planning to test field collapsing. As you can see, out of four documents, three have similar adskdedup values while the last one is different. doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field field name=adskdedupABCD-XYZ/field /doc doc field name=idSOLR1001/field field name=nameSolr, the Enterprise Search Server/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field field name=adskdedupABCD-XYZ/field /doc doc field name=idSOLR1002/field field name=nameSolr, the Enterprise Search Server/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field field name=adskdedupABCD-XYZ/field /doc doc field name=idSOLR1003/field field name=nameSolr, the Enterprise Search Server/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field field name=adskdedupMNOP-QRS/field /doc Here's my query : http://localhost:8983/solr/collection1/select?q=solrwt=xmlfq={!collapse%20field=adskdedup} Based on my understanding of using group by, I was expecting couple of results from the query. One with id=SOLR1000 and the second with id=SOLR1003. Instead, its returning only 1 result based on the field collapsing, i.e. id=SOLR1000. Am I missing something here ? Any pointer will be appreciated. -Thanks
Re: Solr grouping performance porblem
Thanks for the update Shawn, will look forward to the release. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-performance-porblem-tp4098565p4101314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr grouping performance porblem
Thanks Joel, appreciate your help. Is Solr 4.6 due this year ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-performance-porblem-tp4098565p4100358.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr grouping performance porblem
Hi, I've recently upgraded to SolrCloud (4.4) from Master-Slave mode. One of the changes I did the in queries is to add group functionality to remove duplicate results. The grouping is done on a specific field. But the change seemed to have a huge effect on the query performance. The group option decreased the performance by 10 times. For e.g. this query takes 1 sec to execute. The number of results is around 105387. http://localhost:8083/solr/browse?fq=language:(english)wt=xmlrows=10start=0fq=(ContentGroup-local:Learn Explore OR ADSKContentGroup-local:Getting Started)q=linesort=score descgroup=truegroup.field=dedupgroup.ngroups=true If I exclude group option, it comes down to 190ms http://localhost:8083/solr/browse?fq=language:(english)wt=xmlrows=10start=0fq=(ContentGroup-local:Learn Explore OR ADSKContentGroup-local:Getting Started)q=line I'm running this query against a 8 million doc index . I've 2 shard with 1 replica each, running on a m1x.large EC2 instance, each having 8gb allocat ed memory. Is this a known issue or am I missing something which is making this query expensive. I bumped into this JIRA -- https://issues.apache.org/jira/browse/SOLR-5027 which talks about CollapsingQParserPlugin as an alternate to grouping, but that seemed to be available in 4.6. Just wondering if it can be an alternate in my case and whether if its possible to apply as a patch in 4.4 version. Any pointer will be appreciated. - Thanks, Shamik
Re: Grouping performance problem
Bumping up this thread as I'm facing similar issue . Any solution ? -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-problem-tp3995245p4098566.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shards.tolerant throwing null pointer exception when spellcheck is on
Thanks for the information. I think its good to have this issue fixed, specially for cases where the spellcheck feature is on. I'll check out at the source code and take a look, even a quick suppressing of the null pointer exception might make a difference. -- View this message in context: http://lucene.472066.n3.nabble.com/shards-tolerant-throwing-null-pointer-exception-when-spellcheck-is-on-tp4097133p4097234.html Sent from the Solr - User mailing list archive at Nabble.com.
shards.tolerant throwing null pointer exception when spellcheck is on
Hi, .96 I'm trying to simulate a fault tolerance test where a shard and its replica(s) goes. down, leaving other shard(s) running. To test it, I added str name=shards.toleranttrue/str in my request handler under defaults section. This is to make sure that the condition is added to each query running against this request handler. In my test environment, I have to 2 shards with a replica each. I brought down Shard 1 and Replica 1, then fired a query using SolrJ CloudSolrServer, which internally talks to the zookeeper ensemble. In my request handler, the spellcheck option is turned on. Due to this, the servers are throwing null pointer exception. Here's the stack trace. 2013-10-22 20:24:43,875] INFO482886[qtp1783079124-15] - org.apache.solr.core.SolrCore.execute(SolrCore.java:1909) - [collection1] webapp=/solr path=/testhtmlhelp params={spellcheck=onq=xrefwt=xmlfq=TestProductLine:ADTfq=TestProductRelease:ADT+2014fq=language:english} hits=157 status=500 QTime=70 [2013-10-22 20:24:43,876]ERROR482887[qtp1783079124-15] - org.apache.solr.common.SolrException.log(SolrException.java:119) - null:java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.finishStage(SpellCheckComponent.java:323) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:619) Here's the query detail from the server log, as you can see the spellcheck is on. [collection1] webapp=/solr path=/testhtmlhelp params={facet=onf.TestCategory.facet.limit=160tie=0.01shards.qt=/testhtmlhelpfl=id,scorefacet.field=Source2fq=TestProductLine:ADTfq=TestProductRelease:ADT+2014fq=language:englishrows=150defType=edismaxstart=0spellcheck=onshards.tolerant=trueshard.url=localhost:8984/solr/collection1/|localhost:8983/solr/collection1/q=xrefisShard=true} hits=157 status=0 QTime=15 Now, if I comment out the spellcheck option in request handler, the query works as expected, even if the other shard and its replica is down. Is this a known bug in Solr 4.4 ? What'll be the recommended work-around to address this issue ? Any pointers will be appreciated. Thanks, Shamik
Re: SolrCloud Performance Issue
Thanks Primoz, I was suspecting that too. But then, its hard to imagine that query cache is only contributing to the big performance hit. The setting applies to the old configuration, and it works pretty well even with the query cache low hit rate. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096123.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Performance Issue
I tried commenting out NOW in bq, but didn't make any difference in the performance. I do see minor entry in the queryfiltercache rate which is a meager 0.02. I'm really struggling to figure out the bottleneck, any known pain points I should be checking ? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096277.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Performance Issue
=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler One thing I've noticed is that the queryresultcache hit rate is really low, not sure our queries are always that unique. I'm using edismax and there's a str name=bfrecip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0/str , can this contribute ? Sorry about the long post, but I'm struggling to nail down the issue here, especially when queries are running fine in a master-slave environment with similar hardware and network. Any pointers will be highly appreciated. Regards, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095940.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Performance Issue
str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler One thing I've noticed is that the queryresultcache hit rate is really low, not sure our queries are always that unique. I'm using edismax and there's a str name=bfrecip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0/str , can this contribute ? Sorry about the long post, but I'm struggling to nail down the issue here, especially when queries are running fine in a master-slave environment with similar hardware and network. Any pointers will be highly appreciated. Regards, Shamik
RE: How to achieve distributed spelling check in SolrCloud ?
James, Thanks for your reply. The shards.qt did the trick. I read the documentation earlier but was not clear on the implementation, now it totally makes sense. Appreciate your help. Regards, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/RE-How-to-achieve-distributed-spelling-check-in-SolrCloud-tp4094113p4094137.html Sent from the Solr - User mailing list archive at Nabble.com.