Re: Dead node, but clusterstate.json says active, won't sync on restart
If you removed the tlog and index and restart it should resync, or something is really crazy. It doesn't, or at least if it tries, it's somehow failing. I'd be ok with the sync failing for some reason if the node wasn't also serving queries. -Greg On Tue, Jan 28, 2014 at 11:10 AM, Mark Miller markrmil...@gmail.com wrote: Sounds like a bug. 4.6.1 is out any minute - you might try that. There was a replication bug that may be involved. If you removed the tlog and index and restart it should resync, or something is really crazy. The clusterstate.json is a red herring. You have to merge the live nodes info with the state to know the real state. - Mark http://www.about.me/markrmiller On Jan 28, 2014, at 12:31 PM, Greg Preston gpres...@marinsoftware.com wrote: ** Using solrcloud 4.4.0 ** I had to kill a running solrcloud node. There is still a replica for that shard, so everything is functional. We've done some indexing while the node was killed. I'd like to bring back up the downed node and have it resync from the other replica. But when I restart the downed node, it joins back up as active immediately, and doesn't resync. I even wiped the data directory on the downed node, hoping that would force it to sync on restart, but it doesn't. I'm assuming this is related to the state still being listed as active in clusterstate.json for the downed node? Since it comes back as active, it's serving queries and giving old results. How can I force this node to do a recovery on restart? Thanks. -Greg
Re: Dead node, but clusterstate.json says active, won't sync on restart
I've attached the log of the downed node (truffle-solr-4). This is the relevant log entry from the node it should replicate from (truffle-solr-5): [29 Jan 2014 19:31:29] [qtp1614415528-74] ERROR (org.apache.solr.common.SolrException) - org.apache.solr.common.SolrException: I was asked to wait on state recovering for truffle-solr-4:8983_solr but I still do not see the requested state. I see state: active live:true at org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:966) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:191) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) You can see that 4 is serving queries. It appears that 4 tries to recover from 5, but 5 is confused about the state of 4? 4 had an empty index and tlog when it was started. We will eventually upgrade to 4.6.x or 4.7.x, but we've got a pretty extensive regression testing cycle, so there is some delay in upgrading versions. -Greg On Wed, Jan 29, 2014 at 9:08 AM, Mark Miller markrmil...@gmail.com wrote: What's in the logs of the node that won't recover on restart after clearing the index and tlog - Mark On Jan 29, 2014, at 11:41 AM, Greg Preston gpres...@marinsoftware.com wrote: If you removed the tlog and index and restart it should resync, or something is really crazy. It doesn't, or at least if it tries, it's somehow failing. I'd be ok with the sync failing for some reason if the node wasn't also serving queries. -Greg On Tue, Jan 28, 2014 at 11:10 AM, Mark Miller markrmil...@gmail.com wrote: Sounds like a bug. 4.6.1 is out any minute - you might try that. There was a replication bug that may be involved. If you removed the tlog and index and restart it should resync, or something is really crazy. The clusterstate.json is a red herring. You have to merge the live nodes info with the state to know the real state. - Mark http://www.about.me/markrmiller On Jan 28, 2014, at 12:31 PM, Greg Preston gpres...@marinsoftware.com wrote: ** Using solrcloud 4.4.0 ** I had to kill a running solrcloud node. There is still a replica for that shard, so everything is functional. We've done some indexing while the node was killed. I'd like to bring back up the downed node and have it resync from the other replica. But when I restart the downed node, it joins back up as active immediately, and doesn't resync. I even wiped the data directory on the downed node, hoping that would force it to sync on restart, but it doesn't. I'm assuming this is related to the state still being listed as active in clusterstate.json for the downed node? Since it comes back as active, it's serving queries and giving old results. How can I force this node to do a recovery on restart? Thanks. -Greg [29 Jan 2014 19:28:57] [main] INFO (org.eclipse.jetty.server.Server) - jetty-8.1.10.v20130312 [29 Jan 2014 19:28:57] [main] INFO (org.eclipse.jetty.deploy.providers.ScanningAppProvider) - Deployment monitor /home/solr/solr/solr-4.4.0/example/contexts at interval 0 [29 Jan 2014 19:28:57] [main] INFO (org.eclipse.jetty.deploy.DeploymentManager) - Deployable added: /home/solr/solr/solr-4.4.0/example/contexts/solr-jetty-context.xml [29 Jan 2014 19:28:58] [main] INFO
Dead node, but clusterstate.json says active, won't sync on restart
** Using solrcloud 4.4.0 ** I had to kill a running solrcloud node. There is still a replica for that shard, so everything is functional. We've done some indexing while the node was killed. I'd like to bring back up the downed node and have it resync from the other replica. But when I restart the downed node, it joins back up as active immediately, and doesn't resync. I even wiped the data directory on the downed node, hoping that would force it to sync on restart, but it doesn't. I'm assuming this is related to the state still being listed as active in clusterstate.json for the downed node? Since it comes back as active, it's serving queries and giving old results. How can I force this node to do a recovery on restart? Thanks. -Greg
Re: Dead node, but clusterstate.json says active, won't sync on restart
Thanks for the idea. I tried it, and the state for the bad node, even after an orderly shutdown, is still active in clusterstate.json. I see this in the logs on restart: [28 Jan 2014 18:25:29] [RecoveryThread] ERROR (org.apache.solr.common.SolrException) - Error while trying to recover. core=marin:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was asked to wait on state recovering for truffle-solr-4:8983_solr but I still do not see the requested state. I see state: active live:true at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) -Greg On Tue, Jan 28, 2014 at 9:53 AM, Shawn Heisey s...@elyograg.org wrote: On 1/28/2014 10:31 AM, Greg Preston wrote: ** Using solrcloud 4.4.0 ** I had to kill a running solrcloud node. There is still a replica for that shard, so everything is functional. We've done some indexing while the node was killed. I'd like to bring back up the downed node and have it resync from the other replica. But when I restart the downed node, it joins back up as active immediately, and doesn't resync. I even wiped the data directory on the downed node, hoping that would force it to sync on restart, but it doesn't. I'm assuming this is related to the state still being listed as active in clusterstate.json for the downed node? Since it comes back as active, it's serving queries and giving old results. How can I force this node to do a recovery on restart? This might be completely wrong, but hopefully it will help you: Perhaps a graceful stop of that node will result in the proper clusterstate so it will work the next time it's started? That may already be what you've done, so this may not help at all ... but you did say kill which might mean that it wasn't a clean shutdown of Solr. Thanks, Shawn
Re: Possible memory leak after segment merge? (related to DocValues?)
That was it. Setting omitNorms=true on all fields fixed my problem. I left it indexing all weekend, and heap usage still looks great. I'm still not clear why bouncing the solr instance freed up memory, unless the in-memory structure for this norms data is lazily loaded somehow. Anyway, thank you very much for the suggestion. -Greg On Fri, Dec 27, 2013 at 4:25 AM, Michael McCandless luc...@mikemccandless.com wrote: Likely this is for field norms, which use doc values under the hood. Mike McCandless http://blog.mikemccandless.com On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston gpres...@marinsoftware.com wrote: Does anybody with knowledge of solr internals know why I'm seeing instances of Lucene42DocValuesProducer when I don't have any fields that are using DocValues? Or am I misunderstanding what this class is for? -Greg On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston gpres...@marinsoftware.com wrote: Hello, I'm loading up our solr cloud with data (from a solrj client) and running into a weird memory issue. I can reliably reproduce the problem. - Using Solr Cloud 4.4.0 (also replicated with 4.6.0) - 24 solr nodes (one shard each), spread across 3 physical hosts, each host has 256G of memory - index and tlogs on ssd - Xmx=7G, G1GC - Java 1.7.0_25 - schema and solrconfig.xml attached I'm using composite routing to route documents with the same clientId to the same shard. After several hours of indexing, I occasionally see an IndexWriter go OOM. I think that's a symptom. When that happens, indexing continues, and that node's tlog starts to grow. When I notice this, I stop indexing, and bounce the problem node. That's where it gets interesting. Upon bouncing, the tlog replays, and then segments merge. Once the merging is complete, the heap is fairly full, and forced full GC only helps a little. But if I then bounce the node again, the heap usage goes way down, and stays low until the next segment merge. I believe segment merges are also what causes the original OOM. More details: Index on disk for this node is ~13G, tlog is ~2.5G. See attached mem1.png. This is a jconsole view of the heap during the following: (Solr cloud node started at the left edge of this graph) A) One CPU core pegged at 100%. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.fst.Builder.add(Builder.java:397) at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000) at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) B) One CPU core pegged at 100%. Manually triggered GC. Lots of memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) C) One CPU core pegged at 100%. Manually triggered GC. No memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42
Re: Possible memory leak after segment merge? (related to DocValues?)
Interesting. I'm not using score at all (all searches have an explicit sort defined). I'll try setting omit norms on all my fields and see if I can reproduce. Thanks. -Greg On Fri, Dec 27, 2013 at 4:25 AM, Michael McCandless luc...@mikemccandless.com wrote: Likely this is for field norms, which use doc values under the hood. Mike McCandless http://blog.mikemccandless.com On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston gpres...@marinsoftware.com wrote: Does anybody with knowledge of solr internals know why I'm seeing instances of Lucene42DocValuesProducer when I don't have any fields that are using DocValues? Or am I misunderstanding what this class is for? -Greg On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston gpres...@marinsoftware.com wrote: Hello, I'm loading up our solr cloud with data (from a solrj client) and running into a weird memory issue. I can reliably reproduce the problem. - Using Solr Cloud 4.4.0 (also replicated with 4.6.0) - 24 solr nodes (one shard each), spread across 3 physical hosts, each host has 256G of memory - index and tlogs on ssd - Xmx=7G, G1GC - Java 1.7.0_25 - schema and solrconfig.xml attached I'm using composite routing to route documents with the same clientId to the same shard. After several hours of indexing, I occasionally see an IndexWriter go OOM. I think that's a symptom. When that happens, indexing continues, and that node's tlog starts to grow. When I notice this, I stop indexing, and bounce the problem node. That's where it gets interesting. Upon bouncing, the tlog replays, and then segments merge. Once the merging is complete, the heap is fairly full, and forced full GC only helps a little. But if I then bounce the node again, the heap usage goes way down, and stays low until the next segment merge. I believe segment merges are also what causes the original OOM. More details: Index on disk for this node is ~13G, tlog is ~2.5G. See attached mem1.png. This is a jconsole view of the heap during the following: (Solr cloud node started at the left edge of this graph) A) One CPU core pegged at 100%. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.fst.Builder.add(Builder.java:397) at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000) at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) B) One CPU core pegged at 100%. Manually triggered GC. Lots of memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) C) One CPU core pegged at 100%. Manually triggered GC. No memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField
Re: Possible memory leak after segment merge? (related to DocValues?)
Does anybody with knowledge of solr internals know why I'm seeing instances of Lucene42DocValuesProducer when I don't have any fields that are using DocValues? Or am I misunderstanding what this class is for? -Greg On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston gpres...@marinsoftware.com wrote: Hello, I'm loading up our solr cloud with data (from a solrj client) and running into a weird memory issue. I can reliably reproduce the problem. - Using Solr Cloud 4.4.0 (also replicated with 4.6.0) - 24 solr nodes (one shard each), spread across 3 physical hosts, each host has 256G of memory - index and tlogs on ssd - Xmx=7G, G1GC - Java 1.7.0_25 - schema and solrconfig.xml attached I'm using composite routing to route documents with the same clientId to the same shard. After several hours of indexing, I occasionally see an IndexWriter go OOM. I think that's a symptom. When that happens, indexing continues, and that node's tlog starts to grow. When I notice this, I stop indexing, and bounce the problem node. That's where it gets interesting. Upon bouncing, the tlog replays, and then segments merge. Once the merging is complete, the heap is fairly full, and forced full GC only helps a little. But if I then bounce the node again, the heap usage goes way down, and stays low until the next segment merge. I believe segment merges are also what causes the original OOM. More details: Index on disk for this node is ~13G, tlog is ~2.5G. See attached mem1.png. This is a jconsole view of the heap during the following: (Solr cloud node started at the left edge of this graph) A) One CPU core pegged at 100%. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.fst.Builder.add(Builder.java:397) at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000) at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) B) One CPU core pegged at 100%. Manually triggered GC. Lots of memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) C) One CPU core pegged at 100%. Manually triggered GC. No memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376
Possible memory leak after segment merge? (related to DocValues?)
Hello, I'm loading up our solr cloud with data (from a solrj client) and running into a weird memory issue. I can reliably reproduce the problem. - Using Solr Cloud 4.4.0 (also replicated with 4.6.0) - 24 solr nodes (one shard each), spread across 3 physical hosts, each host has 256G of memory - index and tlogs on ssd - Xmx=7G, G1GC - Java 1.7.0_25 - schema and solrconfig.xml attached I'm using composite routing to route documents with the same clientId to the same shard. After several hours of indexing, I occasionally see an IndexWriter go OOM. I think that's a symptom. When that happens, indexing continues, and that node's tlog starts to grow. When I notice this, I stop indexing, and bounce the problem node. That's where it gets interesting. Upon bouncing, the tlog replays, and then segments merge. Once the merging is complete, the heap is fairly full, and forced full GC only helps a little. But if I then bounce the node again, the heap usage goes way down, and stays low until the next segment merge. I believe segment merges are also what causes the original OOM. More details: Index on disk for this node is ~13G, tlog is ~2.5G. See attached mem1.png. This is a jconsole view of the heap during the following: (Solr cloud node started at the left edge of this graph) A) One CPU core pegged at 100%. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.fst.Builder.add(Builder.java:397) at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000) at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) B) One CPU core pegged at 100%. Manually triggered GC. Lots of memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) C) One CPU core pegged at 100%. Manually triggered GC. No memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) D) One CPU core pegged at 100%. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE
Re: Possible memory leak after segment merge? (related to DocValues?)
Hi Joel, Thanks for the suggestion. I could see how decreasing autoCommit time would reduce tlog size, and how that could possibly be related to the original OOM error. I'm not seeing how that would make any difference once a tlog exists, though? I have a saved off copy of my data dir that has the 13G index and 2.5G tlog. So I can reproduce the replay - merge - memory usage issue very quickly. Changing the autoCommit to possibly avoid the initial OOM will take a good bit longer to try to reproduce. I may try that later in the week. -Greg On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein joels...@gmail.com wrote: Hi Greg, I have a suspicion that the problem might be related or exacerbated be overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave openSearcher = false. I would remove the maxDoc as well. If you try rerunning under those commit setting it's possible the OOM errors will stop occurring. Joel Joel Bernstein Search Engineer at Heliosearch On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston gpres...@marinsoftware.comwrote: Hello, I'm loading up our solr cloud with data (from a solrj client) and running into a weird memory issue. I can reliably reproduce the problem. - Using Solr Cloud 4.4.0 (also replicated with 4.6.0) - 24 solr nodes (one shard each), spread across 3 physical hosts, each host has 256G of memory - index and tlogs on ssd - Xmx=7G, G1GC - Java 1.7.0_25 - schema and solrconfig.xml attached I'm using composite routing to route documents with the same clientId to the same shard. After several hours of indexing, I occasionally see an IndexWriter go OOM. I think that's a symptom. When that happens, indexing continues, and that node's tlog starts to grow. When I notice this, I stop indexing, and bounce the problem node. That's where it gets interesting. Upon bouncing, the tlog replays, and then segments merge. Once the merging is complete, the heap is fairly full, and forced full GC only helps a little. But if I then bounce the node again, the heap usage goes way down, and stays low until the next segment merge. I believe segment merges are also what causes the original OOM. More details: Index on disk for this node is ~13G, tlog is ~2.5G. See attached mem1.png. This is a jconsole view of the heap during the following: (Solr cloud node started at the left edge of this graph) A) One CPU core pegged at 100%. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.fst.Builder.add(Builder.java:397) at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000) at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) B) One CPU core pegged at 100%. Manually triggered GC. Lots of memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) C) One CPU core pegged at 100%. Manually triggered GC. No memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE
Re: adding a node to SolrCloud
Yes, I'm well aware of the performance implications, many of which are mitigated by 2TB of SSD and 512GB RAM I've got a very similar setup in production. 2TB SSD, 256G RAM (128G heaps), and 1 - 1.5 TB of index per node. We're in the process of splitting that to multiple JVMs per host. GC pauses were causing ZK timeouts (you should up that in solr.xml). And resync's after the timeouts took long enough that a large tlog built up (we have near continuous indexing), and we couldn't replay the tlog fast enough to catch up to current. If you're going to have a mostly static index, then it may be less of an issue. -Greg On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro david.santama...@gmail.com wrote: On 12/22/2013 09:48 PM, Shawn Heisey wrote: On 12/22/2013 2:10 PM, David Santamauro wrote: My goal is to have a redundant copy of all 8 currently running, but non-redundant shards. This setup (8 nodes with no replicas) was a test and it has proven quite functional from a performance perspective. Loading, though, takes almost 3 weeks so I'm really not in a position to redesign the distribution, though I can add nodes. I have acquired another resource, a very large machine that I'd like to use to hold the replicas of the currently deployed 8-nodes. I realize I can run 8 jetty/tomcats and accomplish my goal but that is a maintenance headache and is really a last resort. I really would just like to be able to deploy this big machine with 'numShards=8'. Is that possible or do I really need to have 8 other nodes running? You don't want to run more than one container or Solr instance per machine. Things can get very confused, and it's too much overhead. With existing collections, you can simply run the CoreAdmin CREATE action on the new node with more resources. So you'd do something like this, once for each of the 8 existing parts: http://newnode:port/solr/admin/cores?action=CREATEname=collname_shard1_replica2collection=collnameshard=shard1 It will automatically replicate the shard from its current leader. Fantastic! Clearly my understanding of collection, vs core vs shard was lacking but now I see the relationship better. One thing to be aware of: With 1.4TB of index data, it might be impossible to keep enough of the index in RAM for good performance, unless the machine has a terabyte or more of RAM. Yes, I'm well aware of the performance implications, many of which are mitigated by 2TB of SSD and 512GB RAM. Thanks for the nudge in the right direction. The first node/shard1 is replicating right now. David
Re: Possible memory leak after segment merge? (related to DocValues?)
Interesting. In my original post, the memory growth (during restart) occurs after the tlog is done replaying, but during the merge. -Greg On Mon, Dec 23, 2013 at 2:06 PM, Joel Bernstein joels...@gmail.com wrote: Greg, There is a memory component to the tlog, which supports realtime gets. This memory component grows until there is a commit, so it will appear like a leak. I suspect that replaying a tlog that was big enough to possibly cause OOM is also problematic. One thing you might want to try is going to 15 second commits, and then kill the Solr instance between the commits. Then watch the memory as the replaying occurs with the smaller tlog. Joel Joel Bernstein Search Engineer at Heliosearch On Mon, Dec 23, 2013 at 4:17 PM, Greg Preston gpres...@marinsoftware.comwrote: Hi Joel, Thanks for the suggestion. I could see how decreasing autoCommit time would reduce tlog size, and how that could possibly be related to the original OOM error. I'm not seeing how that would make any difference once a tlog exists, though? I have a saved off copy of my data dir that has the 13G index and 2.5G tlog. So I can reproduce the replay - merge - memory usage issue very quickly. Changing the autoCommit to possibly avoid the initial OOM will take a good bit longer to try to reproduce. I may try that later in the week. -Greg On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein joels...@gmail.com wrote: Hi Greg, I have a suspicion that the problem might be related or exacerbated be overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave openSearcher = false. I would remove the maxDoc as well. If you try rerunning under those commit setting it's possible the OOM errors will stop occurring. Joel Joel Bernstein Search Engineer at Heliosearch On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston gpres...@marinsoftware.comwrote: Hello, I'm loading up our solr cloud with data (from a solrj client) and running into a weird memory issue. I can reliably reproduce the problem. - Using Solr Cloud 4.4.0 (also replicated with 4.6.0) - 24 solr nodes (one shard each), spread across 3 physical hosts, each host has 256G of memory - index and tlogs on ssd - Xmx=7G, G1GC - Java 1.7.0_25 - schema and solrconfig.xml attached I'm using composite routing to route documents with the same clientId to the same shard. After several hours of indexing, I occasionally see an IndexWriter go OOM. I think that's a symptom. When that happens, indexing continues, and that node's tlog starts to grow. When I notice this, I stop indexing, and bounce the problem node. That's where it gets interesting. Upon bouncing, the tlog replays, and then segments merge. Once the merging is complete, the heap is fairly full, and forced full GC only helps a little. But if I then bounce the node again, the heap usage goes way down, and stays low until the next segment merge. I believe segment merges are also what causes the original OOM. More details: Index on disk for this node is ~13G, tlog is ~2.5G. See attached mem1.png. This is a jconsole view of the heap during the following: (Solr cloud node started at the left edge of this graph) A) One CPU core pegged at 100%. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.fst.Builder.add(Builder.java:397) at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000) at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) B) One CPU core pegged at 100%. Manually triggered GC. Lots of memory freed. Thread dump shows: Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800 nid=0x7a74 runnable [0x7f5a41c5f000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField
Re: adding a node to SolrCloud
I believe you can just define multiple cores: core default=true instanceDir=shard1/ name=collectionName_shard1 shard=shard1/ core default=true instanceDir=shard2/ name=collectionName_shard2 shard=shard2/ ... (this is the old style solr.xml. I don't know how to do it in the newer style) Also, make sure you don't define a non-relative dataDir in solrconfig.xml, or you may run into issues with cores trying to use the same data dir. -Greg On Mon, Dec 23, 2013 at 2:16 PM, David Santamauro david.santama...@gmail.com wrote: On 12/23/2013 05:03 PM, Greg Preston wrote: Yes, I'm well aware of the performance implications, many of which are mitigated by 2TB of SSD and 512GB RAM I've got a very similar setup in production. 2TB SSD, 256G RAM (128G heaps), and 1 - 1.5 TB of index per node. We're in the process of splitting that to multiple JVMs per host. GC pauses were causing ZK timeouts (you should up that in solr.xml). And resync's after the timeouts took long enough that a large tlog built up (we have near continuous indexing), and we couldn't replay the tlog fast enough to catch up to current. GC pauses are a huge issue in our current production environment (monolithic index) and general performance was meager, hence the move to a distributed design. We will have 8 nodes with ~ 200GB per node, one shard each and performance for single and most multi-term queries has become sub-second and throughput has increased 10-fold. Larger boolean queries can still take 2-3s but we can live with that. At any rate, I still can't figure out what my solr.xml is supposed to look like on the node with all 8 redundant shards. David On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro david.santama...@gmail.com wrote: On 12/22/2013 09:48 PM, Shawn Heisey wrote: On 12/22/2013 2:10 PM, David Santamauro wrote: My goal is to have a redundant copy of all 8 currently running, but non-redundant shards. This setup (8 nodes with no replicas) was a test and it has proven quite functional from a performance perspective. Loading, though, takes almost 3 weeks so I'm really not in a position to redesign the distribution, though I can add nodes. I have acquired another resource, a very large machine that I'd like to use to hold the replicas of the currently deployed 8-nodes. I realize I can run 8 jetty/tomcats and accomplish my goal but that is a maintenance headache and is really a last resort. I really would just like to be able to deploy this big machine with 'numShards=8'. Is that possible or do I really need to have 8 other nodes running? You don't want to run more than one container or Solr instance per machine. Things can get very confused, and it's too much overhead. With existing collections, you can simply run the CoreAdmin CREATE action on the new node with more resources. So you'd do something like this, once for each of the 8 existing parts: http://newnode:port/solr/admin/cores?action=CREATEname=collname_shard1_replica2collection=collnameshard=shard1 It will automatically replicate the shard from its current leader. Fantastic! Clearly my understanding of collection, vs core vs shard was lacking but now I see the relationship better. One thing to be aware of: With 1.4TB of index data, it might be impossible to keep enough of the index in RAM for good performance, unless the machine has a terabyte or more of RAM. Yes, I'm well aware of the performance implications, many of which are mitigated by 2TB of SSD and 512GB RAM. Thanks for the nudge in the right direction. The first node/shard1 is replicating right now. David
How to always tokenize on underscore?
[Using SolrCloud 4.4.0] I have a text field where the data will sometimes be delimited by whitespace, and sometimes by underscore. For example, both of the following are possible input values: Group_EN_1000232142_blah_1000232142abc_foo Group EN 1000232142 blah 1000232142abc foo What I'd like to do is have underscores treated as spaces for tokenization purposes. I've tried using a PatternReplaceFilterFactory with: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= replace=all / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= replace=all / /analyzer /fieldType but that seems to do the pattern replacement on each token, rather than splitting tokens into multiple tokens based on the pattern. So with the input Group_EN_1000232142_blah_1000232142abc_foo I end up with a single token of group en 1000232142 blah 1000232142abc foo rather than what I want, which is 6 tokens: group, en, 1000232142, blah, 1000232142abc, foo. Is there a way to configure for the behavior I'm looking for, or would I need to write a customer tokenizer? Thanks! -Greg
Re: How to always tokenize on underscore?
This is exactly what I needed. Thank you! -Greg On Wed, Sep 25, 2013 at 2:48 PM, Jack Krupansky j...@basetechnology.com wrote: Use the char filter instead: http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.html -- Jack Krupansky -Original Message- From: Greg Preston Sent: Wednesday, September 25, 2013 5:43 PM To: solr-user@lucene.apache.org Subject: How to always tokenize on underscore? [Using SolrCloud 4.4.0] I have a text field where the data will sometimes be delimited by whitespace, and sometimes by underscore. For example, both of the following are possible input values: Group_EN_1000232142_blah_1000232142abc_foo Group EN 1000232142 blah 1000232142abc foo What I'd like to do is have underscores treated as spaces for tokenization purposes. I've tried using a PatternReplaceFilterFactory with: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= replace=all / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= replace=all / /analyzer /fieldType but that seems to do the pattern replacement on each token, rather than splitting tokens into multiple tokens based on the pattern. So with the input Group_EN_1000232142_blah_1000232142abc_foo I end up with a single token of group en 1000232142 blah 1000232142abc foo rather than what I want, which is 6 tokens: group, en, 1000232142, blah, 1000232142abc, foo. Is there a way to configure for the behavior I'm looking for, or would I need to write a customer tokenizer? Thanks! -Greg
Re: Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field
Our index is too large to uninvert on the fly, so we've been looking into using DocValues to keep a particular field uninverted at index time. See http://wiki.apache.org/solr/DocValues I don't know if this will solve your problem, but it might be worth trying it out. -Greg On Tue, Sep 3, 2013 at 7:04 AM, Dennis Schafroth den...@indexdata.com wrote: We are harvesting and indexing bibliographic data, thus having many distinct author names in our index. While testing Solr 4 I believe I had pushed a single core to 100 million records (91GB of data) and everything was working fine and fast. After adding a little more to the index, then following started to happen: 17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore – Approaching too many values for UnInvertedField faceting on field 'author_exact' : bucket size=16726546 17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – UnInverted multi-valued field {field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0} 18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field author_exact at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:181) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664) I can see that we reached a limit of bucket size. Is there a way to adjust this? The index also seem to explode in size (217GB). Thinking that I had reached a limit for what a single core could handle in terms of facet, I deleted records in the index, but even now at 1/3 (32 million) it will still fails with above error. I have optimised with expungeDeleted=true. The index is somewhat larger (76GB) than I would have expected. While we can still use the index and get facets back using enum method on that field, I would still like a way to fix the index if possible. Any suggestions? cheers, :-Dennis
Re: Question about SOLR-5017 - Allow sharding based on the value of a field
I don't know about SOLR-5017, but why don't you want to use parent_id as a shard key? So if you've got a doc with a key of abc123 and a parent_id of 456, just use a key of 456!abc123 and all docs with the same parent_id will go to the same shard. We're doing something similar and limiting queries to the single shard that hosts the relevant docs by setting shard.keys=456! on queries. -Greg On Wed, Aug 28, 2013 at 10:04 AM, adfel70 adfe...@gmail.com wrote: Hi I'm looking into allowing query joins in solr cloud. This has the limitation of having to index all the documents that are joineable together to the same shard. I'm wondering if SOLR-5017 https://issues.apache.org/jira/browse/SOLR-5017 would give me the ability to do so without implementing my own routing mechanism? If I add a field named parent_id and give that field the same value in all the documents that I want to join, it seems, theoretically, that it will be enough. Am I correct? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-SOLR-5017-Allow-sharding-based-on-the-value-of-a-field-tp4087050.html Sent from the Solr - User mailing list archive at Nabble.com.
Shard splitting error: cannot uncache file=_1.nvm
I haven't been able to successfully split a shard with Solr 4.4.0 If I have an empty index, or all documents would go to one side of the split, I hit SOLR-5144. But if I avoid that case, I consistently get this error: 290391 [qtp243983770-60] INFO org.apache.solr.update.processor.LogUpdateProcessor – [marin_shard1_1_replica1] webapp=/solr path=/update params={waitSearcher=trueopenSearcher=falsecommit=truewt=javabincommit_end_point=trueversion=2softCommit=false} {} 0 2 290392 [qtp243983770-60] ERROR org.apache.solr.core.SolrCore – java.io.IOException: cannot uncache file=_1.nvm: it was separately also created in the delegate directory at org.apache.lucene.store.NRTCachingDirectory.unCache(NRTCachingDirectory.java:297) at org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:216) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4109) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2809) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) I've seen LUCENE-4238, but that was closed as a test error. -Greg
Re: SOLR Prevent solr of modifying fields when update doc
But there is an API for sending a delta over the wire, and server side it does a read, overlay, delete, and insert. And only the fields you sent will be changed. *Might require your unchanged fields to all be stored, though. -Greg On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog goks...@gmail.com wrote: Solr does not by default generate unique IDs. It uses what you give as your unique field, usually called 'id'. What software do you use to index data from your RSS feeds? Maybe that is creating a new 'id' field? There is no partial update, Solr (Lucene) always rewrites the complete document. On 08/23/2013 09:03 AM, Greg Preston wrote: Perhaps an atomic update that only changes the fields you want to change? -Greg On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi thanks by the answer, but the uniqueId is generated by me. But when solr indexes and there is an update in a doc, it deletes the doc and creates a new one, so it generates a new UUID. It is not suitable for me, because i want that solr just updates some fields, because the UUID is the key that i use to map it to an user in my database. Right now i'm using information that comes from the source and never chages, as my uniqueId, like for example the guid, that exists in some rss feeds, or if it doesn't exists i use link. I think there is any simple solution for me, because for what i have read, when an update to a doc exists, SOLR deletes the old one and create a new one, right? On Aug 23, 2013, at 12:07 PM, Erick Erickson erickerick...@gmail.com wrote: Well, not much in the way of help because you can't do what you want AFAIK. I don't think UUID is suitable for your use-case. Why not use your uniqueId? Or generate something yourself... Best Erick On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, How can i prevent solr from update some fields when updating a doc? The problem is, i have an uuid with the field name uuid, but it is not an unique key. When a rss source updates a feed, solr will update the doc with the same link but it generates a new uuid. This is not the desired because this id is used by me to relate feeds with an user. Can someone help me? Many Thanks
Re: SOLR Prevent solr of modifying fields when update doc
Perhaps an atomic update that only changes the fields you want to change? -Greg On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi thanks by the answer, but the uniqueId is generated by me. But when solr indexes and there is an update in a doc, it deletes the doc and creates a new one, so it generates a new UUID. It is not suitable for me, because i want that solr just updates some fields, because the UUID is the key that i use to map it to an user in my database. Right now i'm using information that comes from the source and never chages, as my uniqueId, like for example the guid, that exists in some rss feeds, or if it doesn't exists i use link. I think there is any simple solution for me, because for what i have read, when an update to a doc exists, SOLR deletes the old one and create a new one, right? On Aug 23, 2013, at 12:07 PM, Erick Erickson erickerick...@gmail.com wrote: Well, not much in the way of help because you can't do what you want AFAIK. I don't think UUID is suitable for your use-case. Why not use your uniqueId? Or generate something yourself... Best Erick On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, How can i prevent solr from update some fields when updating a doc? The problem is, i have an uuid with the field name uuid, but it is not an unique key. When a rss source updates a feed, solr will update the doc with the same link but it generates a new uuid. This is not the desired because this id is used by me to relate feeds with an user. Can someone help me? Many Thanks
Autosuggest on very large index
Using 4.4.0 - I would like to be able to do an autosuggest query against one of the fields in our index and have the results be limited by an fq. I can get exactly the results I want with a facet query using a facet.prefix, but the first query takes ~5 minutes to run on our QA env (~240M docs). I'm afraid to attempt it on prod (~2B docs). Subsequent queries are sufficiently fast (~500ms). I'm assuming the first query is uninverting the field. Is there any way to mark that field so that an uninverted copy is maintained as updates come in? We plan to soft commit every 5 minutes, and we'd prefer to not be continuously uninverting this one field. Or is there a better way to do what I'm trying to do? I've looked at the spellcheck component a little bit, but it looks like I can't filter results by fq. The fq I'm using is based on which client is logged in, and we can't autosuggest terms from one client to another. Thanks. -Greg
Re: Autosuggest on very large index
The filter query would be on a different field (clientId) than the field we want to autosuggest on (title). Or are you proposing we index a compound field that would be clientId+titleTokens so we would then prefix the suggester with clientId+userInput ? Interesting idea. -Greg On Tue, Aug 20, 2013 at 11:21 AM, Markus Jelsma markus.jel...@openindex.io wrote: I am not entirely sure but the Suggester's FST uses prefixes so you may be able to prefix the value you otherwise use for the filter query when you build the suggester. -Original message- From:Greg Preston gpres...@marinsoftware.com Sent: Tuesday 20th August 2013 20:00 To: solr-user@lucene.apache.org Subject: Autosuggest on very large index Using 4.4.0 - I would like to be able to do an autosuggest query against one of the fields in our index and have the results be limited by an fq. I can get exactly the results I want with a facet query using a facet.prefix, but the first query takes ~5 minutes to run on our QA env (~240M docs). I'm afraid to attempt it on prod (~2B docs). Subsequent queries are sufficiently fast (~500ms). I'm assuming the first query is uninverting the field. Is there any way to mark that field so that an uninverted copy is maintained as updates come in? We plan to soft commit every 5 minutes, and we'd prefer to not be continuously uninverting this one field. Or is there a better way to do what I'm trying to do? I've looked at the spellcheck component a little bit, but it looks like I can't filter results by fq. The fq I'm using is based on which client is logged in, and we can't autosuggest terms from one client to another. Thanks. -Greg
Re: Autosuggest on very large index
DocValues looks interesting, a non-inverted field. I'll play with it a bit and see how it works. Thanks for the suggestion. I don't know how many total terms we've got, but each document is only 2-5 words/terms on average, and there is a TON of overlap between docs. -Greg On Tue, Aug 20, 2013 at 11:38 AM, Jack Krupansky j...@basetechnology.com wrote: Sounds like a problem for DocValues - assuming the number of unique values fits reasonably in memory to avoid I/O. How many unique values do you have or contemplate for two your billion documents? Two possibilities: 1. You need a lot more hardware. 2. You need to scale back your ambitions. -- Jack Krupansky -Original Message- From: Greg Preston Sent: Tuesday, August 20, 2013 2:00 PM To: solr-user@lucene.apache.org Subject: Autosuggest on very large index Using 4.4.0 - I would like to be able to do an autosuggest query against one of the fields in our index and have the results be limited by an fq. I can get exactly the results I want with a facet query using a facet.prefix, but the first query takes ~5 minutes to run on our QA env (~240M docs). I'm afraid to attempt it on prod (~2B docs). Subsequent queries are sufficiently fast (~500ms). I'm assuming the first query is uninverting the field. Is there any way to mark that field so that an uninverted copy is maintained as updates come in? We plan to soft commit every 5 minutes, and we'd prefer to not be continuously uninverting this one field. Or is there a better way to do what I'm trying to do? I've looked at the spellcheck component a little bit, but it looks like I can't filter results by fq. The fq I'm using is based on which client is logged in, and we can't autosuggest terms from one client to another. Thanks. -Greg
Re: Getting the shard a document lives on in resultset
I know I've done this in a search via the admin console, but I can't remember/find the exact syntax right now... -Greg On Tue, Aug 20, 2013 at 12:56 PM, AdamP adamph...@gmail.com wrote: Hi, We have several shards which we're querying across using distributed search. This initial search only returns basic information to the user. When a user requests more information about a document, we do a separate query using only the uniqueID for that document. The problem is, I don't know how to tell which shard a document lives on which means I have to do another distributed search instead of going directly to the shard with the data. Is there a way to get the shardID as part of the resultset? I've found this old ticket (https://issues.apache.org/jira/browse/SOLR-705), but it's not clear what parameters you need to pass in to get the shardID. From a quick glance at the code, I'm not sure these changes are present in the current versions of Solr. We're currently on 4.3.0. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-the-shard-a-document-lives-on-in-resultset-tp4085731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting the shard a document lives on in resultset
Found it. Add [shard] to your fl. -Greg On Tue, Aug 20, 2013 at 1:24 PM, Greg Preston gpres...@marinsoftware.com wrote: I know I've done this in a search via the admin console, but I can't remember/find the exact syntax right now... -Greg On Tue, Aug 20, 2013 at 12:56 PM, AdamP adamph...@gmail.com wrote: Hi, We have several shards which we're querying across using distributed search. This initial search only returns basic information to the user. When a user requests more information about a document, we do a separate query using only the uniqueID for that document. The problem is, I don't know how to tell which shard a document lives on which means I have to do another distributed search instead of going directly to the shard with the data. Is there a way to get the shardID as part of the resultset? I've found this old ticket (https://issues.apache.org/jira/browse/SOLR-705), but it's not clear what parameters you need to pass in to get the shardID. From a quick glance at the code, I'm not sure these changes are present in the current versions of Solr. We're currently on 4.3.0. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-the-shard-a-document-lives-on-in-resultset-tp4085731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Share splitting at 23 million documents - OOM
Have you tried it with a smaller number of documents? I haven't been able to successfully split a shard with 4.4.0 with even a handful of docs. -Greg On Fri, Aug 16, 2013 at 7:09 AM, Harald Kirsch harald.kir...@raytion.comwrote: Hi all. Using the example setup of solr-4.4.0, I was able to easily feed 23 million documents from ClueWeb09. The I tried to split the one shard into tqo. The size on disk is: % du -sh collection1 118Gcollection1 I started Solr with 8GB for the JVM: java -Xmx8000m -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/**collection1/conf -Dcollection.configName=myconf -jar start.jar Then I asked for the split http://localhost:8983/solr/**admin/collections?action=** SPLITSHARDcollection=**collection1shard=shard1http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=collection1shard=shard1 After a while I got the OOM in the logs: 841168 [qtp614872954-17] ERROR org.apache.solr.servlet.**SolrDispatchFilter – null:java.lang.**RuntimeException: java.lang.OutOfMemoryError: Java heap space My question: is it to be expected that the split needs huge amounts of RAM or is there a chance that some configuration or procedure change could get me past this? Regards, Harald. -- Harald Kirsch Raytion GmbH Kaiser-Friedrich-Ring 74 40547 Duesseldorf Fon +49-211-550266-0 Fax +49-211-550266-19 http://www.raytion.com
Re: Split Shard Error - maxValue must be non-negative
I'm running into the same issue using composite routing keys when all of the shard keys end up in one of the subshards. -Greg On Tue, Aug 13, 2013 at 9:34 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Scratch that. I obviously didn't pay attention to the stack trace. There is no workaround until 4.5 for this issue because we split the range by half and thus cannot guarantee that all segments will have numDocs 0. On Tue, Aug 13, 2013 at 9:25 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Aug 13, 2013 at 9:15 PM, Robert Muir rcm...@gmail.com wrote: On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The splitting code calls commit before it starts the splitting. It creates a LiveDocsReader using a bitset created by the split. This reader is merged to an index using addIndexes. Shouldn't the addIndexes code then ignore all such 0-document segments? Not in 4.4: https://issues.apache.org/jira/browse/LUCENE-5116 Sorry, I didn't notice that. So 4.4 users must call commit/optimize with expungeDeletes=true until 4.5 is released if they run into this problem. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
[4.4.0] Shard splitting failure (simplified case)
I've simplified things from my previous email, and I'm still seeing errors. Using solr 4.4.0 with two nodes, starting with a single shard. Collection is named marin, host names are dumbo and solrcloud1. I bring up an empty cloud and index 50 documents. I can query them and everything looks fine. This is clusterstate.json at that point: {marin:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{ dumbo:8983_solr_marin:{ state:active, core:marin, node_name:dumbo:8983_solr, base_url:http://dumbo:8983/solr;, leader:true}, solrcloud1:8983_solr_marin:{ state:active, core:marin, node_name:solrcloud1:8983_solr, base_url:http://solrcloud1:8983/solr, router:compositeId}} I attempt to split with http://dumbo:8983/solr/admin/collections?action=SPLITSHARDcollection=marinshard=shard1 After 127559ms, that call returns with org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I was asked to wait on state active for solrcloud1:8983_solr but I still do not see the requested state. I see state: recovering live:true clusterstate.json at this point: {marin:{ shards:{ shard1:{ range:8000-7fff, state:active, replicas:{ dumbo:8983_solr_marin:{ state:active, core:marin, node_name:dumbo:8983_solr, base_url:http://dumbo:8983/solr;, leader:true}, solrcloud1:8983_solr_marin:{ state:active, core:marin, node_name:solrcloud1:8983_solr, base_url:http://solrcloud1:8983/solr}}}, shard1_0:{ range:8000-, state:construction, replicas:{ dumbo:8983_solr_marin_shard1_0_replica1:{ state:active, core:marin_shard1_0_replica1, node_name:dumbo:8983_solr, base_url:http://dumbo:8983/solr;, leader:true}, solrcloud1:8983_solr_marin_shard1_0_replica2:{ state:active, core:marin_shard1_0_replica2, node_name:solrcloud1:8983_solr, base_url:http://solrcloud1:8983/solr}}}, shard1_1:{ range:0-7fff, state:construction, replicas:{ dumbo:8983_solr_marin_shard1_1_replica1:{ state:active, core:marin_shard1_1_replica1, node_name:dumbo:8983_solr, base_url:http://dumbo:8983/solr;, leader:true}, solrcloud1:8983_solr_marin_shard1_1_replica2:{ state:recovering, core:marin_shard1_1_replica2, node_name:solrcloud1:8983_solr, base_url:http://solrcloud1:8983/solr, router:compositeId}} In the logs on dumbo, I see several of these: 290391 [qtp243983770-60] INFO org.apache.solr.update.processor.LogUpdateProcessor – [marin_shard1_1_replica1] webapp=/solr path=/update params={waitSearcher=trueopenSearcher=falsecommit=truewt=javabincommit_end_point=trueversion=2softCommit=false} {} 0 2 290392 [qtp243983770-60] ERROR org.apache.solr.core.SolrCore – java.io.IOException: cannot uncache file=_1.nvm: it was separately also created in the delegate directory at org.apache.lucene.store.NRTCachingDirectory.unCache(NRTCachingDirectory.java:297) at org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:216) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4109) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2809) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) and then finally this: 406671 [qtp243983770-22] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: I was asked to wait on state active for solrcloud1:8983_solr but I still do not see the requested state. I see state: recovering live:true at org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:966) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:191) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209) at
Re: What gets written to the other shards?
Are you manually setting the shard on each document? If not, documents will be hashed across all the shards. -Greg On Mon, Aug 12, 2013 at 3:50 PM, Thierry Thelliez thierry.thelliez.t...@gmail.com wrote: Hello, I am trying to set a four shard system for the first time. I do not understand why all the shards data are growing at about the same rate when I push the documents to only one shard. The four shards represent four calendar years. And for now, on a development machine, these four shards run on four different ports. The first shard is started with Zookeeper. The log of the other shards is filed with something like: 7882051 [qtp1154079020-1245] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from= http://x.y.z.4:50121/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2 } {add=[14939-96467-304 (1443204912169091072), 14939-96467-308 (1443204912179576832), 14939-96467-310 (1443204912185868288), 14939-96467-311 (1443204912192159744), 14939-96467-313 (1443204912204742656), 14939-96467-314 (1443204912220471296), 14939-96467-318 (1443204912239345664), 14939-96467-319 (144320491225088), 14939-96467-322 (1443204912257171456), 14939-96467-324 (1443204912263462912)]} 0 282 What is getting written to the other shards? Is a separate index computed on all four shards? I thought that when pushing a document to one shard, only that shard would update its index. Thanks, Thierry
Re: Shard splitting failure, with and without composite hashing
Oops, I somehow forgot to mention that. The errors I'm seeing are with the release version of Solr 4.4.0. I mentioned 4.1.0 as that's what we currently have in prod, and we want to upgrade to 4.4.0 so we can do shard splitting. Towards that end, I'm testing shard splitting in 4.4.0 and seeing these errors. -Greg On Sun, Aug 11, 2013 at 7:51 AM, Erick Erickson erickerick...@gmail.comwrote: The very first thing I'd do is go to Solr 4.4. There have been a lot of improvements in this code in the intervening 3 versions. If the problem still occurs in 4.4, it'll get a lot more attention than 4.1 FWIW, Erick On Fri, Aug 9, 2013 at 7:32 PM, Greg Preston gpres...@marinsoftware.com wrote: Howdy, I'm trying to test shard splitting, and it's not working for me. I've got a 4 node cloud with a single collection and 2 shards. I've indexed 170k small documents, and I'm using the compositeId router, with an internal client id as the shard key, with 4 distinct values across the data set. For my testing, the values of the shard keys are 1 through 4. Before splitting, shard1 contains 100k docs (all of the docs for shard keys 1 and 4) and shard2 contains 70k docs (all of the docs for shard keys 2 and 3). In prod, we're going to have thousands of unique shard keys, but for now, I'm testing at a smaller scale. I attempt to split shard2 with http://host0:8983/solr/admin/collections?action=SPLITSHARDcollection=collshard=shard2 I understand the shard splitting is on hash range, not document count, and it shouldn't split up documents within a single shard key, so I'm ok with it if both shard keys end up in the same sub-shard. I see the following in the logs: 689524 [qtp259549756-119] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.RuntimeException: java.lang.IllegalArgumentException: maxValue must be non-negative (got: -1) at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:290) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run
Shard splitting failure, with and without composite hashing
Howdy, I'm trying to test shard splitting, and it's not working for me. I've got a 4 node cloud with a single collection and 2 shards. I've indexed 170k small documents, and I'm using the compositeId router, with an internal client id as the shard key, with 4 distinct values across the data set. For my testing, the values of the shard keys are 1 through 4. Before splitting, shard1 contains 100k docs (all of the docs for shard keys 1 and 4) and shard2 contains 70k docs (all of the docs for shard keys 2 and 3). In prod, we're going to have thousands of unique shard keys, but for now, I'm testing at a smaller scale. I attempt to split shard2 with http://host0:8983/solr/admin/collections?action=SPLITSHARDcollection=collshard=shard2 I understand the shard splitting is on hash range, not document count, and it shouldn't split up documents within a single shard key, so I'm ok with it if both shard keys end up in the same sub-shard. I see the following in the logs: 689524 [qtp259549756-119] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.RuntimeException: java.lang.IllegalArgumentException: maxValue must be non-negative (got: -1) at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:290) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.IllegalArgumentException: maxValue must be non-negative (got: -1) at org.apache.lucene.util.packed.PackedInts.bitsRequired(PackedInts.java:1184) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:140) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) at org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2488) at org.apache.solr.update.SolrIndexSplitter.split(SolrIndexSplitter.java:125) at org.apache.solr.update.DirectUpdateHandler2.split(DirectUpdateHandler2.java:766)