Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field
We are harvesting and indexing bibliographic data, thus having many distinct author names in our index. While testing Solr 4 I believe I had pushed a single core to 100 million records (91GB of data) and everything was working fine and fast. After adding a little more to the index, then following started to happen: 17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore – Approaching too many values for UnInvertedField faceting on field 'author_exact' : bucket size=16726546 17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – UnInverted multi-valued field {field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0} 18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field author_exact at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:181) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664) I can see that we reached a limit of bucket size. Is there a way to adjust this? The index also seem to explode in size (217GB). Thinking that I had reached a limit for what a single core could handle in terms of facet, I deleted records in the index, but even now at 1/3 (32 million) it will still fails with above error. I have optimised with expungeDeleted=true. The index is somewhat larger (76GB) than I would have expected. While we can still use the index and get facets back using enum method on that field, I would still like a way to fix the index if possible. Any suggestions? cheers, :-Dennis
Re: Slow first searcher with facet on bibliographic data in Master - Slave
I do have a firstSearcher, but currently coldSearcher is set to true. But doesn't this just mean that that any searches will block while the first searcher is running? This is how the comment describes first searcher. It would almost give the same effect; that some searches take a long time. What I am looking for is after receiving replicated data, do first searcher and then switch to new index. I will try with coldSearcher false, but I actually think I have already tried this. cheers, :-Dennis On Mar 29, 2012, at 13:57 , fbrisbart wrote: If you add your query to the firstSearcher and/or newSearcher event listeners in the slave 'solrconfig.xml' ( http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners ), each new search instance will wait before accepting queries. Example to load the FieldCache for 'your_facet_field' field : ... listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facettrue/strstr name=facet.fieldyour_facet_field/str/lst /arr /listener ... Franck Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit : Hi I am running indexing and facetted searching on bibliographic data, which is known not to perform to well due to the high facet count. Actually it's just the firstSearch that is horrible slow, 200+ seconds . After that, I am getting okay times (1 second) (at least in a few users scenario we have now). The current index is 54 millions record with approx. 10 millions unique authors. The facets (… _exact) is using the string type. I had hoped that a master (indexing) and slave (searching) would have solved the issue, but I am still seeing the issue on the slave, so I guess I must have misunderstood (or perhaps misconfigured) something I had thought that the slave would not switch to the new index until the auto warming was completed. Is such behavior possible? I guess a alternative solution could be to have multiple slaves and taking a slave off-line when doing replication, but if it is possible to do simpler (and using 1/3 less space) that would be great. Then again we might need multiple slaves with more requests. Attached is the configuration files. Let me know if there is missing information. cheers, :-Dennis Schafroth
Re: Slow first searcher with facet on bibliographic data in Master - Slave
On Mar 29, 2012, at 14:49 , fbrisbart wrote: Arf, I didn't see your attached tgz. In your slave solrconfig.xml, only the 'firstSearcher' contains the query. Add it also in the 'newSearcher', so that the new search instances will wait also after a new index is replicated. Did that now, but I believe my case is mostly a first searcher issue. Anyway it didn't seem to change anything. The first request is long because the default faceting method uses the FieldCache for your facet fields. Jup, i know. You may also choose to use the facet.method=enum The performance is globally worse You say. This means that every search with facets is now 20 seconds instead of 2. Then I prefer the field cache with one bad first search. than the 'fc' method, but you will avoid the very slow first request. Btw, it's far better to use the default 'enum' facet method. Thanks for the input so far. Hope this helps, Franck Le jeudi 29 mars 2012 à 13:57 +0200, fbrisbart a écrit : If you add your query to the firstSearcher and/or newSearcher event listeners in the slave 'solrconfig.xml' ( http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners ), each new search instance will wait before accepting queries. Example to load the FieldCache for 'your_facet_field' field : ... listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facettrue/strstr name=facet.fieldyour_facet_field/str/lst /arr /listener ... Franck Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit : Hi I am running indexing and facetted searching on bibliographic data, which is known not to perform to well due to the high facet count. Actually it's just the firstSearch that is horrible slow, 200+ seconds . After that, I am getting okay times (1 second) (at least in a few users scenario we have now). The current index is 54 millions record with approx. 10 millions unique authors. The facets (… _exact) is using the string type. I had hoped that a master (indexing) and slave (searching) would have solved the issue, but I am still seeing the issue on the slave, so I guess I must have misunderstood (or perhaps misconfigured) something I had thought that the slave would not switch to the new index until the auto warming was completed. Is such behavior possible? I guess a alternative solution could be to have multiple slaves and taking a slave off-line when doing replication, but if it is possible to do simpler (and using 1/3 less space) that would be great. Then again we might need multiple slaves with more requests. Attached is the configuration files. Let me know if there is missing information. cheers, :-Dennis Schafroth
Re: Slow first searcher with facet on bibliographic data in Master - Slave
I was wrong! It does seem to work! Thanks a bunch! cheers, :-Dennis On Mar 29, 2012, at 15:52 , fbrisbart wrote: I had the same issue months ago. 'newSearcher' fixed the problem for me. I also remember that I had to upgrade solr (3.1) because it didn't work with release 1.4 But, I suppose you already have a solr 3.x or more. So I'm afraid I can't help you more :o( Franck Le jeudi 29 mars 2012 à 15:41 +0200, Dennis Schafroth a écrit : On Mar 29, 2012, at 14:49 , fbrisbart wrote: Arf, I didn't see your attached tgz. In your slave solrconfig.xml, only the 'firstSearcher' contains the query. Add it also in the 'newSearcher', so that the new search instances will wait also after a new index is replicated. Did that now, but I believe my case is mostly a first searcher issue. Anyway it didn't seem to change anything. The first request is long because the default faceting method uses the FieldCache for your facet fields. Jup, i know. You may also choose to use the facet.method=enum The performance is globally worse You say. This means that every search with facets is now 20 seconds instead of 2. Then I prefer the field cache with one bad first search. than the 'fc' method, but you will avoid the very slow first request. Btw, it's far better to use the default 'enum' facet method. I meant the default 'fc' method of course :o) Thanks for the input so far. Hope this helps, Franck Le jeudi 29 mars 2012 à 13:57 +0200, fbrisbart a écrit : If you add your query to the firstSearcher and/or newSearcher event listeners in the slave 'solrconfig.xml' ( http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners ), each new search instance will wait before accepting queries. Example to load the FieldCache for 'your_facet_field' field : ... listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facettrue/strstr name=facet.fieldyour_facet_field/str/lst /arr /listener ... Franck Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit : Hi I am running indexing and facetted searching on bibliographic data, which is known not to perform to well due to the high facet count. Actually it's just the firstSearch that is horrible slow, 200+ seconds . After that, I am getting okay times (1 second) (at least in a few users scenario we have now). The current index is 54 millions record with approx. 10 millions unique authors. The facets (… _exact) is using the string type. I had hoped that a master (indexing) and slave (searching) would have solved the issue, but I am still seeing the issue on the slave, so I guess I must have misunderstood (or perhaps misconfigured) something I had thought that the slave would not switch to the new index until the auto warming was completed. Is such behavior possible? I guess a alternative solution could be to have multiple slaves and taking a slave off-line when doing replication, but if it is possible to do simpler (and using 1/3 less space) that would be great. Then again we might need multiple slaves with more requests. Attached is the configuration files. Let me know if there is missing information. cheers, :-Dennis Schafroth
Re: Solr memory consumption
I ran out of memory on some big indexes when using solr 1.4. Found out that increasing termInfosIndexDivisor in solrconfig.xml could help a lot. It may slow down your searching your index. cheers, :-Dennis On 02/06/2011, at 01.16, Alexey Serba wrote: Hey Denis, * How big is your index in terms of number of documents and index size? * Is it production system where you have many search requests? * Is there any pattern for OOM errors? I.e. right after you start your Solr app, after some search activity or specific Solr queries, etc? * What are 1) cache settings 2) facets and sort-by fields 3) commit frequency and warmup queries? etc Generally you might want to connect to your jvm using jconsole tool and monitor your heap usage (and other JVM/Solr numbers) * http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html * http://wiki.apache.org/solr/SolrJmx#Remote_Connection_to_Solr_JMX HTH, Alexey 2011/6/1 Denis Kuzmenok forward...@ukr.net: There were no parameters at all, and java hitted out of memory almost every day, then i tried to add parameters but nothing changed. Xms/Xmx - did not solve the problem too. Now i try the MaxPermSize, because it's the last thing i didn't try yet :( Wednesday, June 1, 2011, 9:00:56 PM, you wrote: Could be related to your crazy high MaxPermSize like Marcus said. I'm no JVM tuning expert either. Few people are, it's confusing. So if you don't understand it either, why are you trying to throw in very non-standard parameters you don't understand? Just start with whatever the Solr example jetty has, and only change things if you have a reason to (that you understand). On 6/1/2011 1:19 PM, Denis Kuzmenok wrote: Overall memory on server is 24G, and 24G of swap, mostly all the time swap is free and is not used at all, that's why no free swap sound strange to me..
Re: solrj issue: SocketTimeout: read timed out, but commit succed on server.
It also happens on add records. Putting a proxy in between client and server, revealed that the server writes zero bytes back on the update, so what the client says is correct. So guess I have to dig into the server code. Limiting to fewer updates before commit does seem to make the change of success higher. Any input will greatly appreciated. cheers, :-Dennis On 17/05/2011, at 14.43, Dennis Schafroth wrote: Hi I can see others is having same issue but haven't seen any fixes or work around. I am adding and delete records mixed. I do bulks up till 1000 records. On the commit I see the following in the client: 2011-05-17 13:42:41 ERROR - harvester [main/com.indexdata.masterkey.localindices.harvest.storage.SolrRecordStorage] - Commit failed when adding 39900 and deleting 11666. org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75) at com.indexdata.masterkey.localindices.harvest.storage.SolrRecordStorage.commit(SolrRecordStorage.java:47) at com.indexdata.masterkey.localindices.harvest.storage.BulkSolrRecordStorage.commit(BulkSolrRecordStorage.java:101) at com.indexdata.masterkey.localindices.harvest.job.OAIRecordHarvestJob.run(OAIRecordHarvestJob.java:146) at com.indexdata.masterkey.localindices.harvest.job.TestOAIRecordHarvestJob.TestCleanFullBulkHarvestJob(TestOAIRecordHarvestJob.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:120) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) ... 24 more But the server seems pretty happy anyway: 17-05-2011 13:42:40 org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) 17-05-2011 13:42
Import Handler for tokenizing facet string into multi-valued solr.StrField..
Hi, Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. Water -- Irrigation ; Water -- Sewage should be tokenized into Water Irrigation Sewage in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR. It works as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields. (25 million documents, three multi-valued facets) I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above. Can anyone point point me in the right direction.. cheers, :-Dennis
Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..
Thanks for the hints! Sorry about stealing the thread query range in multivalued date field Mistakenly responded to it. cheers, :-Dennis On 27/01/2011, at 16.48, Erik Hatcher wrote: Beyond what Erick said, I'll add that it is often better to do this from the outside and send in multiple actual end-user displayable facet values. When you send in a field like Water -- Irrigation ; Water -- Sewage, that is what will get stored (if you have it set to stored), but what you might rather want is each individual value stored, which can only be done by the indexer sending in multiple values, not through just tokenization. Erik On Jan 27, 2011, at 09:09 , Dennis Schafroth wrote: Hi, Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. Water -- Irrigation ; Water -- Sewage should be tokenized into Water Irrigation Sewage in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR. It works as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields. (25 million documents, three multi-valued facets) I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above. Can anyone point point me in the right direction.. cheers, :-Dennis
Garbled facets even in a zero hit search
Hi,Running on aDebian 5.0.564bit box. Usingsolr-1.4.1 with Javaversion "1.6.0_20"I am seeing weird facets results along with the "right" looking ones. Garbled data, stuff that looks like a buffer overflow / index off by ...And I even get them when I do a zero hit search. I wouldn't expect any facets:?xml version="1.0" encoding="UTF-8"?response lst name="responseHeader" int name="status"0/int int name="QTime"56/int lst name="params" str name="facet"true/str str name="shards"satay:8985/solr/str str name="start"0/str str name="q"title:xzyzx/str str name="f.date.facet.limit"10/str str name="f.subject_exact.facet.limit"10/str arr name="facet.field"strauthor_exact/strstrdate/strstrsubject_exact/str /arr str name="f.author_exact.facet.limit"10/str str name="rows"20/str /lst /lst result name="response" numFound="0" start="0"/ lst name="facet_counts" lst name="facet_queries"/ lst name="facet_fields" lst name="author_exact"int name=" "0/intint name=" !;;!"0/intint name=" (Domingo, Juan); Imprenta Tormentaria (Córdoba)"0/intint name=" (Supervisor)"0/intint name=" *"0/intint name=" * "0/intint name=" * (μτφρ.)"0/intint name=" * * * "0/intint name=" * * * (μτφρ.)"0/intint name=" * * * *"0/int /lst lst name="date"int name=""0/intint name="0001"0/intint name="0002"0/intint name="0003"0/intint name="0004"0/intint name="0005"0/intint name="0006"0/intint name="0007"0/intint name="0008"0/intint name="0009"0/int /lst lst name="subject_exact"int name=" "0/intint name=" ! ! R P R"0/intint name=" !!rrqqyyhqhqwwllrqrqdd!!vvddvv"0/intint name=" !quot;quot;$%quot;( )*+,($quot;("0/intint name=" !()+, -./01 23456"0/intint name=" !-decidable and decidable deductive procedures for a restricted FTL with Unless"0/intint name=" !lt;f87.03..."0/int int name=" quot;)338-8570"0/intint name=" quot;-Optimization Schemes and L-Bit Precision: Alternative Perspectives in Combinatorial Optimization"0/intint name=" quot;A picture is worth 1K wordsquot;"0/int /lst /lst lst name="facet_dates"/ /lst/response response_formated.xml Description: XML document I tried to look for a bug report, but haven't been able to find one that matches. I will try to setup a debug session to get closer, but would love to get feedback if this is a know issue.cheers,:-Dennis Schafroth