Re: faceting performance on fields with high-cardinality
Hi Tag, I dont' see any query(q) given for execution in the firstSearcher and newSearcher event listener. Can you add a query term: <>> Check your logs and it will log that firstSeacher event executed and prints an message with investerdIndex and number of facet items loaded. Thanks Shyamsunder On Friday, June 13, 2014 8:02 PM, "Tang, Rebecca" wrote: Hi Toke, Thank you for the reply! Both single-value-with-semi-colon-tokenizer and multi-value-untokenized have static warming queries in place. In fact, that was the first thing I did to improve performance. Below is my warming queries in solrconfig.xml. au_facet per_facet org_facet dt brd industry,source_facet availability,availability_status search true 5 5 5 5 5 au_facet per_facet org_facet dt brd industry,source_facet availability,availability_status search true 5 5 5 5 5 As for cardinality, for example, the per_facet field (person facet) has 4,627,056 unique terms for 14,000,000 documents. Maybe my warming queries are not correct? I just don't get why multi-valued-untokenized field yielded such a performance improvement. I guess it doesn't make sense to you either :) I will definitely give the docValues a try to see if it further improves the performance. Rebecca Tang Applications Developer, UCSF CKM Legacy Tobacco Document Library E: rebecca.t...@ucsf.edu On 6/13/14 1:24 PM, "Toke Eskildsen" wrote: >Tang, Rebecca [rebecca.t...@ucsf.edu] wrote: >> I have an solr index with 14+ million records. We facet on quite a few >>fields with very >> high-cardinality such as author, person, organization, brand and >>document type. Some >> of the records contain thousands of persons and organizations. So the >>person and >> organization fields can be very large. > >How many unique values per field in the full index are we talking? Just >approximately. > >> After this change, the performance improved drastically. But I can't >>understand why >> building these fields as multi-valued field vs. single-valued field >>with semicolon >> tokenizer can have such a dramatic performance difference. > >It should not. I suspect something else is happening. 10 minutes does not >sound unrealistic if it is your first query after and index update. Maybe >your measurement for tokenized was unwarmed and your measurement for >un-tokenized warmed? Could you give an example of a full query? > >Anyway, you should definitely be using DocValues for such high >cardinality facet-fields. > >Depending on your usage pattern and where the bottleneck is, >https://issues.apache.org/jira/browse/SOLR-5894 might also help. > >- Toke Eskildsen
Surround query with Boolean queries
Hi, I have two fields in the index with company and year. Following surround query finds computer and applications within and 5 words of each is working fine with surround query parser. {!surround maxBasicQueries=10}company:5N(comput*, appli*) Now If I have add another boolean query +year:[2005 TO *], then it throws query parser exception. {!surround maxBasicQueries=10}company:5N(comput*, appli*) +year:[2005 TO *] * msg: "org.apache.solr.search.SyntaxError: org.apache.lucene.queryparser.surround.parser.ParseException: Encountered " "year "" at line 1, column 30. Was expecting one of: ... ... ... ... ... "^" ... ", * Couldn't figure out the syntax from SurroundQParserPlugin code. How to combine other term and/or boolean queries with surround queries. Also looking for syntax to add more than one surround query on different fields. Thanks Shyamsunder
facet counting using SimpleFacets
We have a large index where each document has stored multi-valued string field called products. Also we have lot of customization of search requests. Each request goes through a pre-defined custom search handler and docId are stored for facet calculation. Following method is called to get facets for products field where docSet is the document ids gathers in the searcher chain. SimpleFacets f = new SimpleFacets(rb.req, docSet, msparams, rb); I found this entry in the logs: Dec 1, 2013 10:42:51 AM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field {field=products,memSize=158633430,tindexSize=1582000,time=414797,phase1=414140,nTerms=4660858,bigTerms=0,termInstances=32608058,uses=0} Subsequent calls for the same request are rendered fast. Is there any way to improve the facet counting using other methods? Thanks
Re: ConcurrentModificationException from XMLResponseWriter
Shawn, We have custom search handlers that uses in built components - result and facet to generate the results. I see that our facet generation is using the LinkedHashMap. I will revisit my code. Thanks for the advise!!! We are migrating to Solr4 soon :) Thanks On Monday, November 25, 2013 11:28 AM, Shawn Heisey wrote: On 11/25/2013 8:43 AM, Shyamsunder R Mutcha wrote: > > > Following exception is found in solr logs. We are using Solr 3.2. As the > stack trace is not referring to any application classes, I couldn't figure > out the piece of code that throws this exception. Is there any way to debug > this issue? > > Is it related to the issue ConcurrentModificationException from > BinaryResponseWriter > > Nov 25, 2013 7:10:56 AM org.apache.solr.common.SolrException log > SEVERE: java.util.ConcurrentModificationException > at >java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373) > at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392) > at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391) > at org.apache.solr.response.XMLWriter.writeMap(XMLWriter.java:644) The exception is coming from LinkedHashMap, a built-in Java object type. http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html The code that made the call that's failing is line 644 of this source code file: solr/core/src/java/org/apache/solr/response/XMLWriter.java I looked at the 3.2 source code. What's going on here is fairly normal - it's interating through a Map and outputting the data contained there to the writer. The actual problem is occurring elsewhere, it's only showing up in XMLWriter due to the way LinkedHashMap objects work. Another thread has modified the Map while the iterator is being used. This is something you're not allowed to do with this object type, so it throws the exception. I can't find any existing Solr bugs, so the question is: Are you using any custom code with Solr? Perhaps something you downloaded or purchased, or something you wrote in-house? If so, then that code has some bugs. If this *is* a bug in Solr 3.x, it is highly unlikely that it will get fixed, at least in a 3.x version. If it still exists in version 4.x (which is unlikely), then it will get fixed there. Version 3.2 is two years old, and the entire 3.x branch is in maintenance mode, meaning that only EXTREMELY severe bugs will be fixed. Thanks, Shawn
ConcurrentModificationException from XMLResponseWriter
Following exception is found in solr logs. We are using Solr 3.2. As the stack trace is not referring to any application classes, I couldn't figure out the piece of code that throws this exception. Is there any way to debug this issue? Is it related to the issue ConcurrentModificationException from BinaryResponseWriter Nov 25, 2013 7:10:56 AM org.apache.solr.common.SolrException log SEVERE: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391) at org.apache.solr.response.XMLWriter.writeMap(XMLWriter.java:644) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:591) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:662) Thanks