Re: faceting performance on fields with high-cardinality

2014-06-19 Thread Shyamsunder R Mutcha
Hi Tag,

I dont' see any query(q) given for execution in the firstSearcher and 
newSearcher event listener. Can you add a query term:
<>>

Check your logs and it will log that firstSeacher event executed and prints an 
message with investerdIndex and number of facet items loaded.


Thanks
Shyamsunder 



On Friday, June 13, 2014 8:02 PM, "Tang, Rebecca"  wrote:
 


Hi Toke,

Thank you for the reply!

Both single-value-with-semi-colon-tokenizer and multi-value-untokenized
have static warming queries in place.  In fact, that was the first thing I
did to improve performance.

Below is my warming queries in solrconfig.xml.


            
                 
                    au_facet
                    per_facet
                    org_facet
dt
                    brd
                    industry,source_facet
                    availability,availability_status
                    search
                    true
                    5
5
                    5
                    5
                    5
                 
            
        
        
            
                 
                    au_facet
                    per_facet
                    org_facet
dt
                    brd
                    industry,source_facet
                    availability,availability_status
                    search
                    true
                    5
                    5
                    5
                    5
                    5
                 
            
        


As for cardinality, for example, the per_facet field (person facet) has
4,627,056 unique terms for 14,000,000 documents.

Maybe my warming queries are not correct?  I just don't get why
multi-valued-untokenized field yielded such a performance improvement. I
guess it doesn't make sense to you either :)

I will definitely give the docValues a try to see if it further improves
the performance.


Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library 
E: rebecca.t...@ucsf.edu





On 6/13/14 1:24 PM, "Toke Eskildsen"  wrote:

>Tang, Rebecca [rebecca.t...@ucsf.edu] wrote:
>> I have an solr index with 14+ million records.  We facet on quite a few
>>fields with very
>> high-cardinality such as author, person, organization, brand and
>>document type.  Some
>> of the records contain thousands of persons and organizations.  So the
>>person and
>> organization fields can be very large.
>
>How many unique values per field in the full index are we talking? Just
>approximately.
>
>> After this change, the performance improved drastically. But I can't
>>understand why
>> building these fields as multi-valued field vs. single-valued field
>>with semicolon
>> tokenizer can have such a dramatic performance difference.
>
>It should not. I suspect something else is happening. 10 minutes does not
>sound unrealistic if it is your first query after and index update. Maybe
>your measurement for tokenized was unwarmed and your measurement for
>un-tokenized warmed? Could you give an example of a full query?
>
>Anyway, you should definitely be using DocValues for such high
>cardinality facet-fields.
>
>Depending on your usage pattern and where the bottleneck is,
>https://issues.apache.org/jira/browse/SOLR-5894 might also help.
>
>- Toke Eskildsen

Surround query with Boolean queries

2014-06-19 Thread Shyamsunder R Mutcha


Hi,

I have two fields in the index with company and year. Following surround query 
finds computer and applications within and 5 words of each is working fine with 
surround query parser.
{!surround maxBasicQueries=10}company:5N(comput*, appli*)

Now If I have add another boolean query +year:[2005 TO *], then it throws query 
parser exception.
{!surround maxBasicQueries=10}company:5N(comput*, appli*) +year:[2005 TO *]

* msg: "org.apache.solr.search.SyntaxError: 
org.apache.lucene.queryparser.surround.parser.ParseException: Encountered " 
 "year "" at line 1, column 30. Was expecting one of:   ... 
 ...  ...  ...  ... "^" ... ",
* 

Couldn't figure out the syntax from SurroundQParserPlugin code. 
How to combine other term and/or boolean queries with surround queries. Also 
looking for syntax to add more than one surround query on different fields.

Thanks
Shyamsunder

facet counting using SimpleFacets

2013-12-01 Thread Shyamsunder R Mutcha
We have a large index where each document has stored multi-valued string field 
called products.  Also we have lot of customization of search requests. Each 
request goes through a pre-defined custom search handler and docId are stored 
for facet calculation. 

Following method is called to get facets for products field where docSet is the 
document ids gathers in the searcher chain. 
SimpleFacets f = new SimpleFacets(rb.req, docSet, msparams, rb);


I found this entry in the logs:
Dec 1, 2013 10:42:51 AM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field 
{field=products,memSize=158633430,tindexSize=1582000,time=414797,phase1=414140,nTerms=4660858,bigTerms=0,termInstances=32608058,uses=0}

Subsequent calls for the same request are rendered fast. 
Is there any way to improve the facet counting using other methods?

Thanks

Re: ConcurrentModificationException from XMLResponseWriter

2013-11-25 Thread Shyamsunder R Mutcha
Shawn,

We have custom search handlers that uses in built components - result and facet 
to generate the results. I see that our facet generation is using the 
LinkedHashMap. I will revisit my code. Thanks for the advise!!!

We are migrating to Solr4 soon :)

Thanks



On Monday, November 25, 2013 11:28 AM, Shawn Heisey  wrote:
 
On 11/25/2013 8:43 AM, Shyamsunder R Mutcha wrote:
> 
> 
> Following exception is found in solr logs. We are using Solr 3.2. As the 
> stack trace is not referring to any application classes, I couldn't figure 
> out the piece of code that throws this exception. Is there any way to debug 
> this issue?
> 
> Is it related to the issue ConcurrentModificationException from 
> BinaryResponseWriter 
> 
> Nov 25, 2013 7:10:56 AM org.apache.solr.common.SolrException log
> SEVERE: java.util.ConcurrentModificationException
>         at 
>java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373)
>         at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392)
>         at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391)
>         at org.apache.solr.response.XMLWriter.writeMap(XMLWriter.java:644)

The exception is coming from LinkedHashMap, a built-in Java object type.

http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html

The code that made the call that's failing is line 644 of this source
code file:

solr/core/src/java/org/apache/solr/response/XMLWriter.java

I looked at the 3.2 source code.  What's going on here is fairly normal
- it's interating through a Map and outputting the data contained there
to the writer.

The actual problem is occurring elsewhere, it's only showing up in
XMLWriter due to the way LinkedHashMap objects work.  Another thread has
modified the Map while the iterator is being used. This is something
you're not allowed to do with this object type, so it throws the exception.

I can't find any existing Solr bugs, so the question is: Are you using
any custom code with Solr?  Perhaps something you downloaded or
purchased, or something you wrote in-house?  If so, then that code has
some bugs.

If this *is* a bug in Solr 3.x, it is highly unlikely that it will get
fixed, at least in a 3.x version.  If it still exists in version 4.x
(which is unlikely), then it will get fixed there.  Version 3.2 is two
years old, and the entire 3.x branch is in maintenance mode, meaning
that only EXTREMELY severe bugs will be fixed.


Thanks,
Shawn

ConcurrentModificationException from XMLResponseWriter

2013-11-25 Thread Shyamsunder R Mutcha


Following exception is found in solr logs. We are using Solr 3.2. As the stack 
trace is not referring to any application classes, I couldn't figure out the 
piece of code that throws this exception. Is there any way to debug this issue?

Is it related to the issue ConcurrentModificationException from 
BinaryResponseWriter 

Nov 25, 2013 7:10:56 AM org.apache.solr.common.SolrException log
SEVERE: java.util.ConcurrentModificationException
        at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373)
        at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392)
        at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391)
        at org.apache.solr.response.XMLWriter.writeMap(XMLWriter.java:644)
        at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:591)
        at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
        at 
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
        at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
        at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
        at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
        at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
        at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
        at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
        at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
        at java.lang.Thread.run(Thread.java:662)

Thanks