[jira] [Updated] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-4722: -- Attachment: PositionsSolrHighlighter.java Thanks a lot for the patch! I did some modification based on the original patch to meet our project special needs. This modification reveals each term's text as well as its position and offsets. We do not need solr to do the highlighting but just return the positions and offsets. So in schema.xml, our field does not stored and only have termVectors="true" termPositions="true" termOffsets="true". Just share it. > Highlighter which generates a list of query term position(s) for each item in > a list of documents, or returns null if highlighting is disabled. > --- > > Key: SOLR-4722 > URL: https://issues.apache.org/jira/browse/SOLR-4722 > Project: Solr > Issue Type: New Feature > Components: highlighter >Affects Versions: 4.3, 6.0 >Reporter: Tricia Jenkins >Priority: Minor > Attachments: PositionsSolrHighlighter.java, SOLR-4722.patch, > SOLR-4722.patch, solr-positionshighlighter.jar > > > As an alternative to returning snippets, this highlighter provides the (term) > position for query matches. One usecase for this is to reconcile the term > position from the Solr index with 'word' coordinates provided by an OCR > process. In this way we are able to 'highlight' an image, like a page from a > book or an article from a newspaper, in the locations that match the user's > query. > This is based on the FastVectorHighlighter and requires that termVectors, > termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9829) Solr cannot provide index service after a large GC pause but core state in ZK is still active
[ https://issues.apache.org/jira/browse/SOLR-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732316#comment-15732316 ] Forest Soup commented on SOLR-9829: --- Thanks All! I have a mail thread tracking on it. http://lucene.472066.n3.nabble.com/Solr-cannot-provide-index-service-after-a-large-GC-pause-but-core-state-in-ZK-is-still-active-td4308942.html Could you please help comments on the questions in it? Thanks! @Mark and Varun, are you sure this issue is dup of https://issues.apache.org/jira/browse/SOLR-7956 ? If yes, I'll try to backport it to 5.3.2. And also I see Daisy created a similar JIRA: https://issues.apache.org/jira/browse/SOLR-9830 . Although her root cause is the too many open file, but could you make sure it's also the dup of SOLR-7956? > Solr cannot provide index service after a large GC pause but core state in ZK > is still active > - > > Key: SOLR-9829 > URL: https://issues.apache.org/jira/browse/SOLR-9829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 5.3.2 > Environment: Redhat enterprise server 64bit >Reporter: Forest Soup > > When Solr meets a large GC pause like > https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it > cannot provide service and never come back until restart. > But in the ZooKeeper, the cores on that server still shows active and server > is also in live_nodes. > Some /update requests got http 500 due to "IndexWriter is closed". Some gots > http 400 due to "possible analysis error." whose root cause is still > "IndexWriter is closed", which we think it should return 500 > instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). > Our questions in this JIRA are: > 1, should solr mark cores as down in zk when it cannot provide index service? > 2, Is it possible solr re-open the IndexWriter to provide index service again? > solr log snippets: > 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 > r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore > org.apache.solr.common.SolrException: Exception writing document id > Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C > to the index; possible analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) > at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at >
[jira] [Commented] (SOLR-9828) Very long young generation stop the world GC pause
[ https://issues.apache.org/jira/browse/SOLR-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732274#comment-15732274 ] Forest Soup commented on SOLR-9828: --- The mail thread: http://lucene.472066.n3.nabble.com/Very-long-young-generation-stop-the-world-GC-pause-td4308911.html > Very long young generation stop the world GC pause > --- > > Key: SOLR-9828 > URL: https://issues.apache.org/jira/browse/SOLR-9828 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.2 > Environment: Linux Redhat 64bit >Reporter: Forest Soup > > We are using oracle jdk8u92 64bit. > The jvm memory related options: > -Xms32768m > -Xmx32768m > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/mnt/solrdata1/log > -XX:+UseG1GC > -XX:+PerfDisableSharedMem > -XX:+ParallelRefProcEnabled > -XX:G1HeapRegionSize=8m > -XX:MaxGCPauseMillis=100 > -XX:InitiatingHeapOccupancyPercent=35 > -XX:+AggressiveOpts > -XX:+AlwaysPreTouch > -XX:ConcGCThreads=16 > -XX:ParallelGCThreads=18 > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/mnt/solrdata1/log > -verbose:gc > -XX:+PrintHeapAtGC > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -Xloggc:/mnt/solrdata1/log/solr_gc.log > It usually works fine. But recently we met very long stop the world young > generation GC pause. Some snippets of the gc log are as below: > 2016-11-22T20:43:16.436+: 2942054.483: Total time for which application > threads were stopped: 0.0005510 seconds, Stopping threads took: 0.894 > seconds > 2016-11-22T20:43:16.463+: 2942054.509: Total time for which application > threads were stopped: 0.0029195 seconds, Stopping threads took: 0.804 > seconds > {Heap before GC invocations=2246 (full 0): > garbage-first heap total 26673152K, used 4683965K [0x7f0c1000, > 0x7f0c108065c0, 0x7f141000) > region size 8192K, 162 young (1327104K), 17 survivors (139264K) > Metaspace used 56487K, capacity 57092K, committed 58368K, reserved > 59392K > 2016-11-22T20:43:16.555+: 2942054.602: [GC pause (G1 Evacuation Pause) > (young) > Desired survivor size 88080384 bytes, new threshold 15 (max 15) > - age 1: 28176280 bytes, 28176280 total > - age 2:5632480 bytes, 33808760 total > - age 3:9719072 bytes, 43527832 total > - age 4:6219408 bytes, 49747240 total > - age 5:4465544 bytes, 54212784 total > - age 6:3417168 bytes, 57629952 total > - age 7:5343072 bytes, 62973024 total > - age 8:2784808 bytes, 65757832 total > - age 9:6538056 bytes, 72295888 total > - age 10:6368016 bytes, 78663904 total > - age 11: 695216 bytes, 79359120 total > , 97.2044320 secs] >[Parallel Time: 19.8 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max: > 2942054612.7, Diff: 10.6] > [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum: > 43.5] > [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0] > [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192] > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] > [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: > 0.0] > [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum: > 165.9] > [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1] > [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: > 0.6] > [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6, Sum: > 308.7] > [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max: > 2942054621.8, Diff: 0.0] >[Code Root Fixup: 0.1 ms] >[Code Root Purge: 0.0 ms] >[Clear CT: 0.2 ms] >[Other: 97184.3 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 8.5 ms] > [Ref Enq: 0.2 ms] > [Redirty Cards: 0.2 ms] > [Humongous Register: 0.1 ms] > [Humongous Reclaim: 0.1 ms] > [Free CSet: 0.4 ms] >[Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap: > 4574.2M(25.4G)->3450.8M(26.8G)] > Heap after GC invocations=2247 (full 0): > garbage-first heap total 28049408K, used 3533601K [0x7f0c1000, > 0x7f0c10806b00, 0x7f141000) > region size 8192K, 21 young (172032K), 21 survivors (172032K) > Metaspace used 56487K, capacity 57092K, committed 58368K, reserved > 59392K > } > [Times: user=0.00 sys=94.28, real=97.19 secs] > 2016-11-22T20:44:53.760+: 2942151.806: Total time for which application >
[jira] [Commented] (SOLR-9828) Very long young generation stop the world GC pause
[ https://issues.apache.org/jira/browse/SOLR-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732269#comment-15732269 ] Forest Soup commented on SOLR-9828: --- Thanks Shawn, I'll use this mail thread talking on it instead of this JIRA. Could you please help comment on the question in the mail thread? Thanks! 1, As you can see in the gc log, the long GC pause is not a full GC. It's a young generation GC instead. In our case, full gc is fast and young gc got some long stw pause. Do you have any comments on that, as we usually believe full gc may cause longer pause, but young generation should be ok? 2, Will these JVM options make it better? -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10 2016-11-22T20:43:16.463+: 2942054.509: Total time for which application threads were stopped: 0.0029195 seconds, Stopping threads took: 0.804 seconds {Heap before GC invocations=2246 (full 0): garbage-first heap total 26673152K, used 4683965K [0x7f0c1000, 0x7f0c108065c0, 0x7f141000) region size 8192K, 162 young (1327104K), 17 survivors (139264K) Metaspace used 56487K, capacity 57092K, committed 58368K, reserved 59392K 2016-11-22T20:43:16.555+: 2942054.602: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 88080384 bytes, new threshold 15 (max 15) - age 1: 28176280 bytes, 28176280 total - age 2:5632480 bytes, 33808760 total - age 3:9719072 bytes, 43527832 total - age 4:6219408 bytes, 49747240 total - age 5:4465544 bytes, 54212784 total - age 6:3417168 bytes, 57629952 total - age 7:5343072 bytes, 62973024 total - age 8:2784808 bytes, 65757832 total - age 9:6538056 bytes, 72295888 total - age 10:6368016 bytes, 78663904 total - age 11: 695216 bytes, 79359120 total , 97.2044320 secs] [Parallel Time: 19.8 ms, GC Workers: 18] [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max: 2942054612.7, Diff: 10.6] [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum: 43.5] [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0] [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum: 165.9] [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1] [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.6] [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6, Sum: 308.7] [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max: 2942054621.8, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.2 ms] [Other: 97184.3 ms] [Choose CSet: 0.0 ms] [Ref Proc: 8.5 ms] [Ref Enq: 0.2 ms] [Redirty Cards: 0.2 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 0.4 ms] [Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap: 4574.2M(25.4G)->3450.8M(26.8G)] Heap after GC invocations=2247 (full 0): garbage-first heap total 28049408K, used 3533601K [0x7f0c1000, 0x7f0c10806b00, 0x7f141000) region size 8192K, 21 young (172032K), 21 survivors (172032K) Metaspace used 56487K, capacity 57092K, committed 58368K, reserved 59392K } [Times: user=0.00 sys=94.28, real=97.19 secs] 2016-11-22T20:44:53.760+: 2942151.806: Total time for which application threads were stopped: 97.2053747 seconds, Stopping threads took: 0.0001373 seconds > Very long young generation stop the world GC pause > --- > > Key: SOLR-9828 > URL: https://issues.apache.org/jira/browse/SOLR-9828 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.2 > Environment: Linux Redhat 64bit >Reporter: Forest Soup > > We are using oracle jdk8u92 64bit. > The jvm memory related options: > -Xms32768m > -Xmx32768m > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/mnt/solrdata1/log > -XX:+UseG1GC > -XX:+PerfDisableSharedMem > -XX:+ParallelRefProcEnabled > -XX:G1HeapRegionSize=8m > -XX:MaxGCPauseMillis=100 > -XX:InitiatingHeapOccupancyPercent=35 > -XX:+AggressiveOpts > -XX:+AlwaysPreTouch > -XX:ConcGCThreads=16 > -XX:ParallelGCThreads=18 > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/mnt/solrdata1/log > -verbose:gc > -XX:+PrintHeapAtGC > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps >
[jira] [Updated] (SOLR-9829) Solr cannot provide index service after a large GC pause but core state in ZK is still active
[ https://issues.apache.org/jira/browse/SOLR-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9829: -- Description: When Solr meets a large GC pause like https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it cannot provide service and never come back until restart. But in the ZooKeeper, the cores on that server still shows active and server is also in live_nodes. Some /update requests got http 500 due to "IndexWriter is closed". Some gots http 400 due to "possible analysis error." whose root cause is still "IndexWriter is closed", which we think it should return 500 instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). Our questions in this JIRA are: 1, should solr mark cores as down in zk when it cannot provide index service? 2, Is it possible solr re-open the IndexWriter to provide index service again? solr log snippets: 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore org.apache.solr.common.SolrException: Exception writing document id Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at
[jira] [Commented] (SOLR-9829) Solr cannot provide index service after a large GC pause but core state in ZK is still active
[ https://issues.apache.org/jira/browse/SOLR-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731274#comment-15731274 ] Forest Soup commented on SOLR-9829: --- Hi Erick, I'm sure the solr node is still in the live_nodes list. The logs are from solr log. And the most root cause I can see here is the IndexWriter is closed. > Solr cannot provide index service after a large GC pause but core state in ZK > is still active > - > > Key: SOLR-9829 > URL: https://issues.apache.org/jira/browse/SOLR-9829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 5.3.2 > Environment: Redhat enterprise server 64bit >Reporter: Forest Soup > > When Solr meets a large GC pause like > https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it > cannot provide service and never come back until restart. > But in the ZooKeeper, the cores on that server still shows active. > Some /update requests got http 500 due to "IndexWriter is closed". Some gots > http 400 due to "possible analysis error." whose root cause is still > "IndexWriter is closed", which we think it should return 500 > instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). > Our questions in this JIRA are: > 1, should solr mark cores as down in zk when it cannot provide index service? > 2, Is it possible solr re-open the IndexWriter to provide index service again? > solr log snippets: > 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 > r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore > org.apache.solr.common.SolrException: Exception writing document id > Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C > to the index; possible analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) > at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at >
[jira] [Updated] (SOLR-9829) Solr cannot provide index service after a large GC pause but state in ZK is still active
[ https://issues.apache.org/jira/browse/SOLR-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9829: -- Summary: Solr cannot provide index service after a large GC pause but state in ZK is still active (was: Solr cannot provide index service after a large GC pause) > Solr cannot provide index service after a large GC pause but state in ZK is > still active > > > Key: SOLR-9829 > URL: https://issues.apache.org/jira/browse/SOLR-9829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 5.3.2 > Environment: Redhat enterprise server 64bit >Reporter: Forest Soup > > When Solr meets a large GC pause like > https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it > cannot provide service and never come back until restart. > But in the ZooKeeper, the cores on that server still shows active. > Some /update requests got http 500 due to "IndexWriter is closed". Some gots > http 400 due to "possible analysis error." whose root cause is still > "IndexWriter is closed", which we think it should return 500 > instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). > Our questions in this JIRA are: > 1, should solr mark it as down when it cannot provide index service? > 2, Is it possible solr re-open the IndexWriter to provide index service again? > solr log snippets: > 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 > r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore > org.apache.solr.common.SolrException: Exception writing document id > Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C > to the index; possible analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) > at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at >
[jira] [Updated] (SOLR-9829) Solr cannot provide index service after a large GC pause but core state in ZK is still active
[ https://issues.apache.org/jira/browse/SOLR-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9829: -- Description: When Solr meets a large GC pause like https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it cannot provide service and never come back until restart. But in the ZooKeeper, the cores on that server still shows active. Some /update requests got http 500 due to "IndexWriter is closed". Some gots http 400 due to "possible analysis error." whose root cause is still "IndexWriter is closed", which we think it should return 500 instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). Our questions in this JIRA are: 1, should solr mark cores as down when it cannot provide index service? 2, Is it possible solr re-open the IndexWriter to provide index service again? solr log snippets: 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore org.apache.solr.common.SolrException: Exception writing document id Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at
[jira] [Updated] (SOLR-9829) Solr cannot provide index service after a large GC pause but core state in ZK is still active
[ https://issues.apache.org/jira/browse/SOLR-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9829: -- Description: When Solr meets a large GC pause like https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it cannot provide service and never come back until restart. But in the ZooKeeper, the cores on that server still shows active. Some /update requests got http 500 due to "IndexWriter is closed". Some gots http 400 due to "possible analysis error." whose root cause is still "IndexWriter is closed", which we think it should return 500 instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). Our questions in this JIRA are: 1, should solr mark cores as down in zk when it cannot provide index service? 2, Is it possible solr re-open the IndexWriter to provide index service again? solr log snippets: 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore org.apache.solr.common.SolrException: Exception writing document id Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at
[jira] [Updated] (SOLR-9829) Solr cannot provide index service after a large GC pause but core state in ZK is still active
[ https://issues.apache.org/jira/browse/SOLR-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9829: -- Summary: Solr cannot provide index service after a large GC pause but core state in ZK is still active (was: Solr cannot provide index service after a large GC pause but state in ZK is still active) > Solr cannot provide index service after a large GC pause but core state in ZK > is still active > - > > Key: SOLR-9829 > URL: https://issues.apache.org/jira/browse/SOLR-9829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 5.3.2 > Environment: Redhat enterprise server 64bit >Reporter: Forest Soup > > When Solr meets a large GC pause like > https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it > cannot provide service and never come back until restart. > But in the ZooKeeper, the cores on that server still shows active. > Some /update requests got http 500 due to "IndexWriter is closed". Some gots > http 400 due to "possible analysis error." whose root cause is still > "IndexWriter is closed", which we think it should return 500 > instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). > Our questions in this JIRA are: > 1, should solr mark it as down when it cannot provide index service? > 2, Is it possible solr re-open the IndexWriter to provide index service again? > solr log snippets: > 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 > r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore > org.apache.solr.common.SolrException: Exception writing document id > Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C > to the index; possible analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) > at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at >
[jira] [Created] (SOLR-9829) Solr cannot provide index service after a large GC pause
Forest Soup created SOLR-9829: - Summary: Solr cannot provide index service after a large GC pause Key: SOLR-9829 URL: https://issues.apache.org/jira/browse/SOLR-9829 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: update Affects Versions: 5.3.2 Environment: Redhat enterprise server 64bit Reporter: Forest Soup When Solr meets a large GC pause like https://issues.apache.org/jira/browse/SOLR-9828 , the collections on it cannot provide service and never come back until restart. But in the ZooKeeper, the cores on that server still shows active. Some /update requests got http 500 due to "IndexWriter is closed". Some gots http 400 due to "possible analysis error." whose root cause is still "IndexWriter is closed", which we think it should return 500 instead(documented in https://issues.apache.org/jira/browse/SOLR-9825). Our questions in this JIRA are: 1, should solr mark it as down when it cannot provide index service? 2, Is it possible solr re-open the IndexWriter to provide index service again? solr log snippets: 2016-11-22 20:47:37.274 ERROR (qtp2011912080-76) [c:collection12 s:shard1 r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore org.apache.solr.common.SolrException: Exception writing document id Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20841350!270CE4F9C032EC26002580730061473C to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at
[jira] [Updated] (SOLR-9825) Solr should not return HTTP 400 for some cases
[ https://issues.apache.org/jira/browse/SOLR-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9825: -- Affects Version/s: (was: 5.3) 5.3.2 > Solr should not return HTTP 400 for some cases > -- > > Key: SOLR-9825 > URL: https://issues.apache.org/jira/browse/SOLR-9825 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.2 >Reporter: Forest Soup > > For some cases, when solr handling requests, it should not always return http > 400. We met several cases, here is the recent two: > Case 1: When adding a doc, if there is runtime error happens, even it's a > solr internal issue, it returns http 400 to confuse the client. Actually the > request is good, while IndexWriter is closed. > The exception stack is: > 2016-11-22 21:23:32.858 ERROR (qtp2011912080-83) [c:collection12 s:shard1 > r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore > org.apache.solr.common.SolrException: Exception writing document id > Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20824042!8918AB024CF638F685257DDC00074D78 > to the index; possible analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) > at > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) > at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:499) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at >
[jira] [Updated] (SOLR-9828) Very long young generation stop the world GC pause
[ https://issues.apache.org/jira/browse/SOLR-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9828: -- Description: We are using oracle jdk8u92 64bit. The jvm memory related options: -Xms32768m -Xmx32768m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/solrdata1/log -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=35 -XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:ConcGCThreads=16 -XX:ParallelGCThreads=18 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/solrdata1/log -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/mnt/solrdata1/log/solr_gc.log It usually works fine. But recently we met very long stop the world young generation GC pause. Some snippets of the gc log are as below: 2016-11-22T20:43:16.436+: 2942054.483: Total time for which application threads were stopped: 0.0005510 seconds, Stopping threads took: 0.894 seconds 2016-11-22T20:43:16.463+: 2942054.509: Total time for which application threads were stopped: 0.0029195 seconds, Stopping threads took: 0.804 seconds {Heap before GC invocations=2246 (full 0): garbage-first heap total 26673152K, used 4683965K [0x7f0c1000, 0x7f0c108065c0, 0x7f141000) region size 8192K, 162 young (1327104K), 17 survivors (139264K) Metaspace used 56487K, capacity 57092K, committed 58368K, reserved 59392K 2016-11-22T20:43:16.555+: 2942054.602: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 88080384 bytes, new threshold 15 (max 15) - age 1: 28176280 bytes, 28176280 total - age 2:5632480 bytes, 33808760 total - age 3:9719072 bytes, 43527832 total - age 4:6219408 bytes, 49747240 total - age 5:4465544 bytes, 54212784 total - age 6:3417168 bytes, 57629952 total - age 7:5343072 bytes, 62973024 total - age 8:2784808 bytes, 65757832 total - age 9:6538056 bytes, 72295888 total - age 10:6368016 bytes, 78663904 total - age 11: 695216 bytes, 79359120 total , 97.2044320 secs] [Parallel Time: 19.8 ms, GC Workers: 18] [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max: 2942054612.7, Diff: 10.6] [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum: 43.5] [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0] [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum: 165.9] [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1] [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.6] [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6, Sum: 308.7] [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max: 2942054621.8, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.2 ms] [Other: 97184.3 ms] [Choose CSet: 0.0 ms] [Ref Proc: 8.5 ms] [Ref Enq: 0.2 ms] [Redirty Cards: 0.2 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 0.4 ms] [Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap: 4574.2M(25.4G)->3450.8M(26.8G)] Heap after GC invocations=2247 (full 0): garbage-first heap total 28049408K, used 3533601K [0x7f0c1000, 0x7f0c10806b00, 0x7f141000) region size 8192K, 21 young (172032K), 21 survivors (172032K) Metaspace used 56487K, capacity 57092K, committed 58368K, reserved 59392K } [Times: user=0.00 sys=94.28, real=97.19 secs] 2016-11-22T20:44:53.760+: 2942151.806: Total time for which application threads were stopped: 97.2053747 seconds, Stopping threads took: 0.0001373 seconds 2016-11-22T20:44:53.762+: 2942151.809: Total time for which application threads were stopped: 0.0008138 seconds, Stopping threads took: 0.0001258 seconds And CPU reached near 100% during the GC. The load is normal at that time according to the stats of solr update/select/delete handler and jetty request log. was: We are using oracle jdk8u92 64bit. The jvm memory related options: -Xms32768m -Xmx32768m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/solrdata1/log -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=35 -XX:+AggressiveOpts -XX:+AlwaysPreTouch
[jira] [Created] (SOLR-9828) Very long young generation stop the world GC pause
Forest Soup created SOLR-9828: - Summary: Very long young generation stop the world GC pause Key: SOLR-9828 URL: https://issues.apache.org/jira/browse/SOLR-9828 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 5.3.2 Environment: Linux Redhat 64bit Reporter: Forest Soup We are using oracle jdk8u92 64bit. The jvm memory related options: -Xms32768m -Xmx32768m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/solrdata1/log -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=35 -XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:ConcGCThreads=16 -XX:ParallelGCThreads=18 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/solrdata1/log -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/mnt/solrdata1/log/solr_gc.log It usually works fine. But recently we met very long stop the world young generation GC pause. Some snippets of the gc log are as below: 2016-11-22T20:43:16.436+: 2942054.483: Total time for which application threads were stopped: 0.0005510 seconds, Stopping threads took: 0.894 seconds 2016-11-22T20:43:16.463+: 2942054.509: Total time for which application threads were stopped: 0.0029195 seconds, Stopping threads took: 0.804 seconds {Heap before GC invocations=2246 (full 0): garbage-first heap total 26673152K, used 4683965K [0x7f0c1000, 0x7f0c108065c0, 0x7f141000) region size 8192K, 162 young (1327104K), 17 survivors (139264K) Metaspace used 56487K, capacity 57092K, committed 58368K, reserved 59392K 2016-11-22T20:43:16.555+: 2942054.602: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 88080384 bytes, new threshold 15 (max 15) - age 1: 28176280 bytes, 28176280 total - age 2:5632480 bytes, 33808760 total - age 3:9719072 bytes, 43527832 total - age 4:6219408 bytes, 49747240 total - age 5:4465544 bytes, 54212784 total - age 6:3417168 bytes, 57629952 total - age 7:5343072 bytes, 62973024 total - age 8:2784808 bytes, 65757832 total - age 9:6538056 bytes, 72295888 total - age 10:6368016 bytes, 78663904 total - age 11: 695216 bytes, 79359120 total , 97.2044320 secs] [Parallel Time: 19.8 ms, GC Workers: 18] [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max: 2942054612.7, Diff: 10.6] [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum: 43.5] [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0] [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum: 165.9] [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1] [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.6] [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6, Sum: 308.7] [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max: 2942054621.8, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.2 ms] [Other: 97184.3 ms] [Choose CSet: 0.0 ms] [Ref Proc: 8.5 ms] [Ref Enq: 0.2 ms] [Redirty Cards: 0.2 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 0.4 ms] [Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap: 4574.2M(25.4G)->3450.8M(26.8G)] Heap after GC invocations=2247 (full 0): garbage-first heap total 28049408K, used 3533601K [0x7f0c1000, 0x7f0c10806b00, 0x7f141000) region size 8192K, 21 young (172032K), 21 survivors (172032K) Metaspace used 56487K, capacity 57092K, committed 58368K, reserved 59392K } [Times: user=0.00 sys=94.28, real=97.19 secs] 2016-11-22T20:44:53.760+: 2942151.806: Total time for which application threads were stopped: 97.2053747 seconds, Stopping threads took: 0.0001373 seconds 2016-11-22T20:44:53.762+: 2942151.809: Total time for which application threads were stopped: 0.0008138 seconds, Stopping threads took: 0.0001258 seconds And CPU reached near 100% during the GC. The load is not visibly high at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
[jira] [Created] (SOLR-9825) Solr should not return HTTP 400 for some cases
Forest Soup created SOLR-9825: - Summary: Solr should not return HTTP 400 for some cases Key: SOLR-9825 URL: https://issues.apache.org/jira/browse/SOLR-9825 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 5.3 Reporter: Forest Soup For some cases, when solr handling requests, it should not always return http 400. We met several cases, here is the recent two: Case 1: When adding a doc, if there is runtime error happens, even it's a solr internal issue, it returns http 400 to confuse the client. Actually the request is good, while IndexWriter is closed. The exception stack is: 2016-11-22 21:23:32.858 ERROR (qtp2011912080-83) [c:collection12 s:shard1 r:core_node1 x:collection12_shard1_replica1] o.a.s.c.SolrCore org.apache.solr.common.SolrException: Exception writing document id Q049dXMxYjMtbWFpbDg4L089bGxuX3VzMQ==20824042!8918AB024CF638F685257DDC00074D78 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.CloneFieldUpdateProcessorFactory$1.processAdd(CloneFieldUpdateProcessorFactory.java:231) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at
[jira] [Updated] (SOLR-9741) Solr has a CPU% spike when indexing a batch of data
[ https://issues.apache.org/jira/browse/SOLR-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9741: -- Description: When we doing a batch of index and search operations to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. We have 5 physical servers, 2 solr instances running on each server with different port(8983 and 8984), all 8983 are in a same solrcloud, all 8984 are in another solrcloud. You can see the chart in the attach file screenshot-1.png. The thread dump are in the attach file threads.zip. During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456) at org.apache.solr.search.Grouping.execute(Grouping.java:370) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) was: When we doing a batch of index and search operations to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. You can see the chart in the attach file screenshot-1.png. During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
[jira] [Updated] (SOLR-9741) Solr has a CPU% spike when indexing a batch of data
[ https://issues.apache.org/jira/browse/SOLR-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9741: -- Attachment: threads.zip > Solr has a CPU% spike when indexing a batch of data > --- > > Key: SOLR-9741 > URL: https://issues.apache.org/jira/browse/SOLR-9741 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.2 > Environment: Linux 64bit >Reporter: Forest Soup > Attachments: screenshot-1.png, threads.zip > > > When we doing a batch of index and search operations to SolrCloud v5.3.2, we > usually met a CPU% spike lasting about 10 min. > You can see the chart in the attach file screenshot-1.png. > During the spike, the thread dump shows most of the threads are with the call > stacks below: > "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 > runnable [0x7fb3ef1ef000] >java.lang.Thread.State: RUNNABLE > at > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) > at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) > at > java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) > at java.lang.ThreadLocal.get(ThreadLocal.java:163) > at > org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) > at > org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) > at > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) > at > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) > at > org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) > at org.apache.lucene.index.TermContext.build(TermContext.java:93) > at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) > at > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) > at > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) > at > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) > at > org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456) > at org.apache.solr.search.Grouping.execute(Grouping.java:370) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9741) Solr has a CPU% spike when indexing a batch of data
[ https://issues.apache.org/jira/browse/SOLR-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9741: -- Description: When we doing a batch of index and search operations to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. You can see the chart in the attach file screenshot-1.png. During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456) at org.apache.solr.search.Grouping.execute(Grouping.java:370) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) was: When we doing a batch of index and search operations to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. You can see the chart in the attach file During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at
[jira] [Updated] (SOLR-9741) Solr has a CPU% spike when indexing a batch of data
[ https://issues.apache.org/jira/browse/SOLR-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9741: -- Attachment: screenshot-1.png > Solr has a CPU% spike when indexing a batch of data > --- > > Key: SOLR-9741 > URL: https://issues.apache.org/jira/browse/SOLR-9741 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.2 > Environment: Linux 64bit >Reporter: Forest Soup > Attachments: screenshot-1.png > > > When we doing a batch of index and search operations to SolrCloud v5.3.2, we > usually met a CPU% spike lasting about 10 min. > You can see the chart in the attach file > During the spike, the thread dump shows most of the threads are with the call > stacks below: > "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 > runnable [0x7fb3ef1ef000] >java.lang.Thread.State: RUNNABLE > at > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) > at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) > at > java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) > at java.lang.ThreadLocal.get(ThreadLocal.java:163) > at > org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) > at > org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) > at > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) > at > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) > at > org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) > at org.apache.lucene.index.TermContext.build(TermContext.java:93) > at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) > at > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) > at > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) > at > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) > at > org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) > at > org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) > at > org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456) > at org.apache.solr.search.Grouping.execute(Grouping.java:370) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9741) Solr has a CPU% spike when indexing a batch of data
[ https://issues.apache.org/jira/browse/SOLR-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9741: -- Description: When we doing a batch of index and search operations to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. You can see the chart in the attach file During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456) at org.apache.solr.search.Grouping.execute(Grouping.java:370) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) was: When we doing index a batch of data to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. You can see the chart in the attach file. During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at
[jira] [Updated] (SOLR-9741) Solr has a CPU% spike when indexing a batch of data
[ https://issues.apache.org/jira/browse/SOLR-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-9741: -- Description: When we doing index a batch of data to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. You can see the chart in the attach file. During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456) at org.apache.solr.search.Grouping.execute(Grouping.java:370) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) was: When we doing index a batch of data to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at
[jira] [Created] (SOLR-9741) Solr has a CPU% spike when indexing a batch of data
Forest Soup created SOLR-9741: - Summary: Solr has a CPU% spike when indexing a batch of data Key: SOLR-9741 URL: https://issues.apache.org/jira/browse/SOLR-9741 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 5.3.2 Environment: Linux 64bit Reporter: Forest Soup When we doing index a batch of data to SolrCloud v5.3.2, we usually met a CPU% spike lasting about 10 min. During the spike, the thread dump shows most of the threads are with the call stacks below: "qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7 runnable [0x7fb3ef1ef000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49) at org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157) at org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141) at org.apache.lucene.index.TermContext.build(TermContext.java:93) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456) at org.apache.solr.search.Grouping.execute(Grouping.java:370) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5724) Two node, one shard solr instance intermittently going offline
[ https://issues.apache.org/jira/browse/SOLR-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434430#comment-15434430 ] Forest Soup commented on SOLR-5724: --- I found the similar issue in Solr v5.3.2 - We have a solrcloud with 3 solr nodes, 80 collections are created on them with replicateFactor=1, and shardNum=1 for each collection. After the collections creation, all cores are active, we start first batch of index with SolrJ client. But we found issues on all collections of one of the 3 solr nodes, and index failure due to HTTP 503: 2016-08-16 20:02:05.660 ERROR (qtp208437930-70) [c:collection4 s:shard1 r:core_node1 x:collection4_shard1_replica1] o.a.s.u.p.DistributedUpdateProcessor ClusterState says we are the leader, but locally we don't think so 2016-08-16 20:02:05.667 ERROR (qtp208437930-70) [c:collection4 s:shard1 r:core_node1 x:collection4_shard1_replica1] o.a.s.c.SolrCore org.apache.solr.common.SolrException: ClusterState says we are the leader (https://host1.domain1:8983/solr/collection4_shard1_replica1), but locally we don't think so. Request came from null at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:619) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:381) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:314) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:665) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:143) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:235) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:199) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) The collections on the other 2 solr nodes works fine and index succeeded. > Two node, one shard solr instance intermittently going offline > --- > > Key: SOLR-5724 > URL: https://issues.apache.org/jira/browse/SOLR-5724 > Project: Solr > Issue Type: Bug >Affects Versions: 4.6.1 > Environment: Ubuntu 12.04.3 LTS, 64 bit, java version "1.6.0_45" > Java(TM) SE Runtime Environment (build 1.6.0_45-b06) > Java HotSpot(TM) 64-Bit
[jira] [Commented] (SOLR-7021) Leader will not publish core as active without recovering first, but never recovers
[ https://issues.apache.org/jira/browse/SOLR-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316424#comment-15316424 ] Forest Soup commented on SOLR-7021: --- Is there any plan to fix this? We found same log in v5.3.2 solrcloud. > Leader will not publish core as active without recovering first, but never > recovers > --- > > Key: SOLR-7021 > URL: https://issues.apache.org/jira/browse/SOLR-7021 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.10 >Reporter: James Hardwick >Priority: Critical > Labels: recovery, solrcloud, zookeeper > > A little background: 1 core solr-cloud cluster across 3 nodes, each with its > own shard and each shard with a single replica hence each replica is itself a > leader. > For reasons we won't get into, we witnessed a shard go down in our cluster. > We restarted the cluster but our core/shards still did not come back up. > After inspecting the logs, we found this: > {code} > 015-01-21 15:51:56,494 [coreZkRegister-1-thread-2] INFO cloud.ZkController > - We are http://xxx.xxx.xxx.35:8081/solr/xyzcore/ and leader is > http://xxx.xxx.xxx.35:8081/solr/xyzcore/ > 2015-01-21 15:51:56,496 [coreZkRegister-1-thread-2] INFO cloud.ZkController > - No LogReplay needed for core=xyzcore baseURL=http://xxx.xxx.xxx.35:8081/solr > 2015-01-21 15:51:56,496 [coreZkRegister-1-thread-2] INFO cloud.ZkController > - I am the leader, no recovery necessary > 2015-01-21 15:51:56,496 [coreZkRegister-1-thread-2] INFO cloud.ZkController > - publishing core=xyzcore state=active collection=xyzcore > 2015-01-21 15:51:56,497 [coreZkRegister-1-thread-2] INFO cloud.ZkController > - numShards not found on descriptor - reading it from system property > 2015-01-21 15:51:56,498 [coreZkRegister-1-thread-2] INFO cloud.ZkController > - publishing core=xyzcore state=down collection=xyzcore > 2015-01-21 15:51:56,498 [coreZkRegister-1-thread-2] INFO cloud.ZkController > - numShards not found on descriptor - reading it from system property > 2015-01-21 15:51:56,501 [coreZkRegister-1-thread-2] ERROR core.ZkContainer - > :org.apache.solr.common.SolrException: Cannot publish state of core 'xyzcore' > as active without recovering first! > at org.apache.solr.cloud.ZkController.publish(ZkController.java:1075) > {code} > And at this point the necessary shards never recover correctly and hence our > core never returns to a functional state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9173) NullPointerException during recovery
Forest Soup created SOLR-9173: - Summary: NullPointerException during recovery Key: SOLR-9173 URL: https://issues.apache.org/jira/browse/SOLR-9173 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.3.2 Environment: Linux 64bit Reporter: Forest Soup We have a solrcloud. One server suffered crashes. After restart, during core recovering, there's one error: 2016-05-03 18:30:17.200 WARN (recoveryExecutor-80-thread-1-processing-n:lltcl5solr05.swg.usma.ibm.com:8983_solr x:collection3_shard1_replica1 s:shard1 c:collection3 r:core_node2) [c:collection3 s:shard1 r:core_node2 x:collection3_shard1_replica1] o.a.s.u.UpdateLog Starting log replay tlog{file=/mnt/solrdata1/solr/home/collection3_shard1_replica1/data/tlog/tlog.879 refcount=2} active=false starting pos=0 2016-05-03 18:30:18.377 ERROR (qtp562345204-56) [c:collection3 s:shard1 r:core_node2 x:collection3_shard1_replica1] o.a.s.c.SolrCore java.lang.NullPointerException at org.apache.solr.update.UpdateLog.lookup(UpdateLog.java:735) at org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:165) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) 2016-05-03 18:30:18.378 ERROR (qtp562345204-56) [c:collection3 s:shard1 r:core_node2 x:collection3_shard1_replica1] o.a.s.s.SolrDispatchFilter null:java.lang.NullPointerException at org.apache.solr.update.UpdateLog.lookup(UpdateLog.java:735) at org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:165) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:672) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at
[jira] [Updated] (SOLR-8756) Need 4 config "zkDigestUsername"/"zkDigestPassword"/ solr.xml
[ https://issues.apache.org/jira/browse/SOLR-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-8756: -- Summary: Need 4 config "zkDigestUsername"/"zkDigestPassword"/ solr.xml (was: Need config "zkDigestUsername" and "zkDigestPassword" in /solr.xml) > Need 4 config "zkDigestUsername"/"zkDigestPassword"/ solr.xml > -- > > Key: SOLR-8756 > URL: https://issues.apache.org/jira/browse/SOLR-8756 > Project: Solr > Issue Type: Bug > Components: security, SolrCloud >Affects Versions: 5.3.1 > Environment: Linux 64bit >Reporter: Forest Soup > Labels: security > > Need 4 config in /solr.xml instead of -D parameter in solr.in.sh. > like below: > > > zkusername > zkpassword > zkreadonlyusername > name="zkDigestReadonlyUsername">readonlypassword > ... > Otherwise, any user can use the linux "ps" command showing the full command > line including the plain text zookeeper username and password. If we use file > store them, we can control the access of the file not to leak the > username/password. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8756) Need 4 config "zkDigestUsername"/"zkDigestPassword"/"zkDigestReadonlyUsername"/"zkDigestReadonlyUsername" in solr.xml
[ https://issues.apache.org/jira/browse/SOLR-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-8756: -- Summary: Need 4 config "zkDigestUsername"/"zkDigestPassword"/"zkDigestReadonlyUsername"/"zkDigestReadonlyUsername" in solr.xml (was: Need 4 config "zkDigestUsername"/"zkDigestPassword"/"zkDigestReadonlyUsername"/ solr.xml) > Need 4 config > "zkDigestUsername"/"zkDigestPassword"/"zkDigestReadonlyUsername"/"zkDigestReadonlyUsername" > in solr.xml > - > > Key: SOLR-8756 > URL: https://issues.apache.org/jira/browse/SOLR-8756 > Project: Solr > Issue Type: Bug > Components: security, SolrCloud >Affects Versions: 5.3.1 > Environment: Linux 64bit >Reporter: Forest Soup > Labels: security > > Need 4 config in /solr.xml instead of -D parameter in solr.in.sh. > like below: > > > zkusername > zkpassword > zkreadonlyusername > name="zkDigestReadonlyUsername">readonlypassword > ... > Otherwise, any user can use the linux "ps" command showing the full command > line including the plain text zookeeper username and password. If we use file > store them, we can control the access of the file not to leak the > username/password. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8756) Need 4 config "zkDigestUsername"/"zkDigestPassword"/"zkDigestReadonlyUsername"/ solr.xml
[ https://issues.apache.org/jira/browse/SOLR-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-8756: -- Summary: Need 4 config "zkDigestUsername"/"zkDigestPassword"/"zkDigestReadonlyUsername"/ solr.xml (was: Need 4 config "zkDigestUsername"/"zkDigestPassword"/ solr.xml) > Need 4 config > "zkDigestUsername"/"zkDigestPassword"/"zkDigestReadonlyUsername"/ solr.xml > - > > Key: SOLR-8756 > URL: https://issues.apache.org/jira/browse/SOLR-8756 > Project: Solr > Issue Type: Bug > Components: security, SolrCloud >Affects Versions: 5.3.1 > Environment: Linux 64bit >Reporter: Forest Soup > Labels: security > > Need 4 config in /solr.xml instead of -D parameter in solr.in.sh. > like below: > > > zkusername > zkpassword > zkreadonlyusername > name="zkDigestReadonlyUsername">readonlypassword > ... > Otherwise, any user can use the linux "ps" command showing the full command > line including the plain text zookeeper username and password. If we use file > store them, we can control the access of the file not to leak the > username/password. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8756) Need config "zkDigestUsername" and "zkDigestPassword" in /solr.xml
Forest Soup created SOLR-8756: - Summary: Need config "zkDigestUsername" and "zkDigestPassword" in /solr.xml Key: SOLR-8756 URL: https://issues.apache.org/jira/browse/SOLR-8756 Project: Solr Issue Type: Bug Components: security, SolrCloud Affects Versions: 5.3.1 Environment: Linux 64bit Reporter: Forest Soup Need 2 config in /solr.xml instead of -D parameter in solr.in.sh. like below: zkusername zkpassword zkreadonlyusername readonlypassword ... Otherwise, any user can use the linux "ps" command showing the full command line including the plain text zookeeper username and password. If we use file store them, we can control the access of the file not to leak the username/password. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8756) Need config "zkDigestUsername" and "zkDigestPassword" in /solr.xml
[ https://issues.apache.org/jira/browse/SOLR-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-8756: -- Description: Need 4 config in /solr.xml instead of -D parameter in solr.in.sh. like below: zkusername zkpassword zkreadonlyusername readonlypassword ... Otherwise, any user can use the linux "ps" command showing the full command line including the plain text zookeeper username and password. If we use file store them, we can control the access of the file not to leak the username/password. was: Need 2 config in /solr.xml instead of -D parameter in solr.in.sh. like below: zkusername zkpassword zkreadonlyusername readonlypassword ... Otherwise, any user can use the linux "ps" command showing the full command line including the plain text zookeeper username and password. If we use file store them, we can control the access of the file not to leak the username/password. > Need config "zkDigestUsername" and "zkDigestPassword" in /solr.xml > > > Key: SOLR-8756 > URL: https://issues.apache.org/jira/browse/SOLR-8756 > Project: Solr > Issue Type: Bug > Components: security, SolrCloud >Affects Versions: 5.3.1 > Environment: Linux 64bit >Reporter: Forest Soup > Labels: security > > Need 4 config in /solr.xml instead of -D parameter in solr.in.sh. > like below: > > > zkusername > zkpassword > zkreadonlyusername > name="zkDigestReadonlyUsername">readonlypassword > ... > Otherwise, any user can use the linux "ps" command showing the full command > line including the plain text zookeeper username and password. If we use file > store them, we can control the access of the file not to leak the > username/password. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7982) SolrCloud: collection creation: There are duplicate coreNodeName in core.properties in a same collection.
[ https://issues.apache.org/jira/browse/SOLR-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7982: -- Description: We have a 3 Zookeeper 5 solr server Solrcloud. We created collection1 and collection2 with 80 shards respectively in the cloud, replicateFactor is 2. But after created, we found in a same collection, the coreNodeName has some duplicate in core.properties in the core folder. For example: [tanglin@solr64 home]$ ll collection1_shard13_replica2/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard13_replica2/core.properties [tanglin@solr64 home]$ ll collection1_shard66_replica1/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard66_replica1/core.properties [tanglin@solr64 home]$ cat collection1_shard66_replica1/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:54 UTC 2015 numShards=80 name=collection1_shard66_replica1 shard=shard66 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ cat collection1_shard13_replica2/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:53 UTC 2015 numShards=80 name=collection1_shard13_replica2 shard=shard13 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ The consequence of the issue is that the clusterstate.json in zookeeper is also with wrong core_node#, and updating state of a core sometimes changed the state of other core in other shard.. Snippet from clusterstate: shard13:{ range:a666-a998, state:active, replicas:{ core_node33:{ state:active, base_url:https://solr65.somesite.com:8443/solr;, core:collection1_shard13_replica1, node_name:solr65.somesite.com:8443_solr}, core_node19:{ state:active, base_url:https://solr64.somesite.com:8443/solr;, core:collection1_shard13_replica2, node_name:solr64.somesite.com:8443_solr, leader:true}}}, ... shard66:{ range:5000-5332, state:active, replicas:{ core_node105:{ state:active, base_url:https://solr63.somesite.com:8443/solr;, core:collection1_shard66_replica2, node_name:solr63.somesite.com:8443_solr, leader:true}, core_node19:{ state:active, base_url:https://solr64.somesite.com:8443/solr;, core:collection1_shard66_replica1, node_name:solr64.somesite.com:8443_solr}}}, was: We have a 3 Zookeeper 5 solr server Solrcloud. We created collection1 and collection2 with 80 shards respectively in the cloud, replicateFactor is 2. But after created, we found in a same collection, the coreNodeName has some duplicate in core.properties in the core folder. For example: [tanglin@solr64 home]$ ll collection1_shard13_replica2/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard13_replica2/core.properties [tanglin@solr64 home]$ ll collection1_shard66_replica1/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard66_replica1/core.properties [tanglin@solr64 home]$ cat collection1_shard66_replica1/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:54 UTC 2015 numShards=80 name=collection1_shard66_replica1 shard=shard66 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ cat collection1_shard13_replica2/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:53 UTC 2015 numShards=80 name=collection1_shard13_replica2 shard=shard13 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ The consequence of the issue is that the clusterstate.json in zookeeper is also with wrong core_node#, and updating state of a core sometimes changed the state of other core in other shard.. Snippet from clusterstate: shard13:{ range:a666-a998, state:active, replicas:{ core_node33:{ state:active, base_url:https://us1a3-solr65.a3.dal06.isc4sb.com:8443/solr;, core:collection1_shard13_replica1, node_name:us1a3-solr65.a3.dal06.isc4sb.com:8443_solr}, core_node19:{ state:active, base_url:https://us1a3-solr64.a3.dal06.isc4sb.com:8443/solr;, core:collection1_shard13_replica2, node_name:us1a3-solr64.a3.dal06.isc4sb.com:8443_solr, leader:true}}}, ... shard66:{ range:5000-5332, state:active, replicas:{ core_node105:{ state:active, base_url:https://us1a3-solr63.a3.dal06.isc4sb.com:8443/solr;, core:collection1_shard66_replica2, node_name:us1a3-solr63.a3.dal06.isc4sb.com:8443_solr, leader:true}, core_node19:{ state:active,
[jira] [Updated] (SOLR-7982) SolrCloud: collection creation: There are duplicate coreNodeName in core.properties in a same collection.
[ https://issues.apache.org/jira/browse/SOLR-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7982: -- Summary: SolrCloud: collection creation: There are duplicate coreNodeName in core.properties in a same collection. (was: SolrCloud: There are duplicate coreNodeName in core.properties in a same collection after the collection is created.) SolrCloud: collection creation: There are duplicate coreNodeName in core.properties in a same collection. - Key: SOLR-7982 URL: https://issues.apache.org/jira/browse/SOLR-7982 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: linux redhat enterprise server 5.9 64bit Reporter: Forest Soup We have a 3 Zookeeper 5 solr server Solrcloud. We created collection1 and collection2 with 80 shards respectively in the cloud, replicateFactor is 2. But after created, we found in a same collection, the coreNodeName has some duplicate in core.properties in the core folder. For example: [tanglin@solr64 home]$ ll collection1_shard13_replica2/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard13_replica2/core.properties [tanglin@solr64 home]$ ll collection1_shard66_replica1/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard66_replica1/core.properties [tanglin@solr64 home]$ cat collection1_shard66_replica1/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:54 UTC 2015 numShards=80 name=collection1_shard66_replica1 shard=shard66 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ cat collection1_shard13_replica2/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:53 UTC 2015 numShards=80 name=collection1_shard13_replica2 shard=shard13 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ The consequence of the issue is that the clusterstate.json in zookeeper is also with wrong core_node#, and updating state of a core sometimes changed the state of other core in other shard.. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7982) SolrCloud: There are duplicate coreNodeName in core.properties in a same collection after the collection is created.
Forest Soup created SOLR-7982: - Summary: SolrCloud: There are duplicate coreNodeName in core.properties in a same collection after the collection is created. Key: SOLR-7982 URL: https://issues.apache.org/jira/browse/SOLR-7982 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: linux redhat enterprise server 5.9 64bit Reporter: Forest Soup We have a 3 Zookeeper 5 solr server Solrcloud. We created collection1 and collection2 with 80 shards respectively in the cloud, replicateFactor is 2. But after created, we found in a same collection, the coreNodeName has some duplicate in core.properties in the core folder. For example: [tanglin@solr64 home]$ ll collection1_shard13_replica2/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard13_replica2/core.properties [tanglin@solr64 home]$ ll collection1_shard66_replica1/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard66_replica1/core.properties [tanglin@solr64 home]$ cat collection1_shard66_replica1/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:54 UTC 2015 numShards=80 name=collection1_shard66_replica1 shard=shard66 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ cat collection1_shard13_replica2/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:53 UTC 2015 numShards=80 name=collection1_shard13_replica2 shard=shard13 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ The consequence of the issue is that the clusterstate.json in zookeeper is also with wrong core_node#, and updating state of a core sometimes changed the state of other core in other shard.. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7982) SolrCloud: collection creation: There are duplicate coreNodeName in core.properties in a same collection.
[ https://issues.apache.org/jira/browse/SOLR-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7982: -- Description: We have a 3 Zookeeper 5 solr server Solrcloud. We created collection1 and collection2 with 80 shards respectively in the cloud, replicateFactor is 2. But after created, we found in a same collection, the coreNodeName has some duplicate in core.properties in the core folder. For example: [tanglin@solr64 home]$ ll collection1_shard13_replica2/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard13_replica2/core.properties [tanglin@solr64 home]$ ll collection1_shard66_replica1/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard66_replica1/core.properties [tanglin@solr64 home]$ cat collection1_shard66_replica1/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:54 UTC 2015 numShards=80 name=collection1_shard66_replica1 shard=shard66 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ cat collection1_shard13_replica2/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:53 UTC 2015 numShards=80 name=collection1_shard13_replica2 shard=shard13 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ The consequence of the issue is that the clusterstate.json in zookeeper is also with wrong core_node#, and updating state of a core sometimes changed the state of other core in other shard.. Snippet from clusterstate: shard13:{ range:a666-a998, state:active, replicas:{ core_node33:{ state:active, base_url:https://us1a3-solr65.a3.dal06.isc4sb.com:8443/solr;, core:collection1_shard13_replica1, node_name:us1a3-solr65.a3.dal06.isc4sb.com:8443_solr}, core_node19:{ state:active, base_url:https://us1a3-solr64.a3.dal06.isc4sb.com:8443/solr;, core:collection1_shard13_replica2, node_name:us1a3-solr64.a3.dal06.isc4sb.com:8443_solr, leader:true}}}, ... shard66:{ range:5000-5332, state:active, replicas:{ core_node105:{ state:active, base_url:https://us1a3-solr63.a3.dal06.isc4sb.com:8443/solr;, core:collection1_shard66_replica2, node_name:us1a3-solr63.a3.dal06.isc4sb.com:8443_solr, leader:true}, core_node19:{ state:active, base_url:https://us1a3-solr64.a3.dal06.isc4sb.com:8443/solr;, core:collection1_shard66_replica1, node_name:us1a3-solr64.a3.dal06.isc4sb.com:8443_solr}}}, was: We have a 3 Zookeeper 5 solr server Solrcloud. We created collection1 and collection2 with 80 shards respectively in the cloud, replicateFactor is 2. But after created, we found in a same collection, the coreNodeName has some duplicate in core.properties in the core folder. For example: [tanglin@solr64 home]$ ll collection1_shard13_replica2/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard13_replica2/core.properties [tanglin@solr64 home]$ ll collection1_shard66_replica1/core.properties -rw-r--r-- 1 solr solr 173 Jul 29 11:52 collection1_shard66_replica1/core.properties [tanglin@solr64 home]$ cat collection1_shard66_replica1/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:54 UTC 2015 numShards=80 name=collection1_shard66_replica1 shard=shard66 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ cat collection1_shard13_replica2/core.properties #Written by CorePropertiesLocator #Wed Jul 29 11:52:53 UTC 2015 numShards=80 name=collection1_shard13_replica2 shard=shard13 collection=collection1 coreNodeName=core_node19 [tanglin@solr64 home]$ The consequence of the issue is that the clusterstate.json in zookeeper is also with wrong core_node#, and updating state of a core sometimes changed the state of other core in other shard.. SolrCloud: collection creation: There are duplicate coreNodeName in core.properties in a same collection. - Key: SOLR-7982 URL: https://issues.apache.org/jira/browse/SOLR-7982 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: linux redhat enterprise server 5.9 64bit Reporter: Forest Soup We have a 3 Zookeeper 5 solr server Solrcloud. We created collection1 and collection2 with 80 shards respectively in the cloud, replicateFactor is 2. But after created, we found in a same collection, the coreNodeName has some duplicate in core.properties in the core folder. For example: [tanglin@solr64 home]$ ll collection1_shard13_replica2/core.properties -rw-r--r-- 1 solr solr 173 Jul 29
[jira] [Updated] (SOLR-7947) SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json.
[ https://issues.apache.org/jira/browse/SOLR-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7947: -- Description: A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. Logs are in attachmenent. ERROR - 2015-07-24 09:40:34.887; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1_shard1_replica1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) ... 8 more Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:267) at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:164) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1078) at java.nio.channels.FileChannel.tryLock(FileChannel.java:1165) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:217) at org.apache.lucene.store.NativeFSLock.isLocked(NativeFSLockFactory.java:319) at org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:4510) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:485) at org.apache.solr.core.SolrCore.init(SolrCore.java:761) ... 11 more was: A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. ERROR - 2015-07-24 09:40:34.887; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1_shard1_replica1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) ... 8 more Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:267) at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:164) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1078) at java.nio.channels.FileChannel.tryLock(FileChannel.java:1165) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:217) at org.apache.lucene.store.NativeFSLock.isLocked(NativeFSLockFactory.java:319) at org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:4510) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:485) at org.apache.solr.core.SolrCore.init(SolrCore.java:761) ... 11 more SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json. - Key: SOLR-7947 URL: https://issues.apache.org/jira/browse/SOLR-7947 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Redhat Linux Enterprise Server 5.9 64bit Reporter: Forest Soup A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. Logs are in attachmenent. ERROR - 2015-07-24 09:40:34.887;
[jira] [Updated] (SOLR-7947) SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json.
[ https://issues.apache.org/jira/browse/SOLR-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7947: -- Summary: SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json. (was: ZooKeeper /live_nodes shows the server is there, but all cores are down in /clusterstate.json.) SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json. - Key: SOLR-7947 URL: https://issues.apache.org/jira/browse/SOLR-7947 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Redhat Linux Enterprise Server 5.9 64bit Reporter: Forest Soup A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. ERROR - 2015-07-24 09:40:34.887; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1_shard1_replica1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) ... 8 more Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:267) at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:164) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1078) at java.nio.channels.FileChannel.tryLock(FileChannel.java:1165) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:217) at org.apache.lucene.store.NativeFSLock.isLocked(NativeFSLockFactory.java:319) at org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:4510) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:485) at org.apache.solr.core.SolrCore.init(SolrCore.java:761) ... 11 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7947) ZooKeeper /live_nodes shows the server is there, but all cores are down in /clusterstate.json.
Forest Soup created SOLR-7947: - Summary: ZooKeeper /live_nodes shows the server is there, but all cores are down in /clusterstate.json. Key: SOLR-7947 URL: https://issues.apache.org/jira/browse/SOLR-7947 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Redhat Linux Enterprise Server 5.9 64bit Reporter: Forest Soup A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. ERROR - 2015-07-24 09:40:34.887; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1_shard1_replica1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) ... 8 more Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:267) at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:164) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1078) at java.nio.channels.FileChannel.tryLock(FileChannel.java:1165) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:217) at org.apache.lucene.store.NativeFSLock.isLocked(NativeFSLockFactory.java:319) at org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:4510) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:485) at org.apache.solr.core.SolrCore.init(SolrCore.java:761) ... 11 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7947) SolrCloud: after a solr node restarted, all cores in the node are down in /clusterstate.json due to java.nio.channels.OverlappingFileLockException.
[ https://issues.apache.org/jira/browse/SOLR-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7947: -- Summary: SolrCloud: after a solr node restarted, all cores in the node are down in /clusterstate.json due to java.nio.channels.OverlappingFileLockException. (was: SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json.) SolrCloud: after a solr node restarted, all cores in the node are down in /clusterstate.json due to java.nio.channels.OverlappingFileLockException. --- Key: SOLR-7947 URL: https://issues.apache.org/jira/browse/SOLR-7947 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Redhat Linux Enterprise Server 5.9 64bit Reporter: Forest Soup Attachments: solr.zip A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. Logs are in attachmenent. ERROR - 2015-07-24 09:40:34.887; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1_shard1_replica1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) ... 8 more Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:267) at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:164) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1078) at java.nio.channels.FileChannel.tryLock(FileChannel.java:1165) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:217) at org.apache.lucene.store.NativeFSLock.isLocked(NativeFSLockFactory.java:319) at org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:4510) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:485) at org.apache.solr.core.SolrCore.init(SolrCore.java:761) ... 11 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7947) SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json.
[ https://issues.apache.org/jira/browse/SOLR-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7947: -- Attachment: solr.zip SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json. - Key: SOLR-7947 URL: https://issues.apache.org/jira/browse/SOLR-7947 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Redhat Linux Enterprise Server 5.9 64bit Reporter: Forest Soup Attachments: solr.zip A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. Logs are in attachmenent. ERROR - 2015-07-24 09:40:34.887; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1_shard1_replica1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) ... 8 more Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:267) at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:164) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1078) at java.nio.channels.FileChannel.tryLock(FileChannel.java:1165) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:217) at org.apache.lucene.store.NativeFSLock.isLocked(NativeFSLockFactory.java:319) at org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:4510) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:485) at org.apache.solr.core.SolrCore.init(SolrCore.java:761) ... 11 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5692) StackOverflowError during SolrCloud leader election process
[ https://issues.apache.org/jira/browse/SOLR-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541343#comment-14541343 ] Forest Soup commented on SOLR-5692: --- I met the same issue within Solr 4.7.0. Too many recursive calls with below lines: at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:399) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:259) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:289) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:399) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:259) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:289) StackOverflowError during SolrCloud leader election process --- Key: SOLR-5692 URL: https://issues.apache.org/jira/browse/SOLR-5692 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6.1 Reporter: Bojan Smid Labels: difficulty-hard, impact-medium Attachments: recovery-stackoverflow.txt I have SolrCloud cluster with 7 nodes, each with few 1000 cores. I got this StackOverflow few times when starting one of the nodes (just a piece of stack trace, the rest repeats, leader election process obviously got stuck in infinite repetition of steps): [2/4/14 3:42:43 PM] Bojan: 2014-02-04 15:18:01,947 [localhost-startStop-1-EventThread] ERROR org.apache.zookeeper.ClientCnxn- Error while calling watcher java.lang.StackOverflowError at java.security.AccessController.doPrivileged(Native Method) at java.io.PrintWriter.init(PrintWriter.java:116) at java.io.PrintWriter.init(PrintWriter.java:100) at org.apache.solr.common.SolrException.toStr(SolrException.java:138) at org.apache.solr.common.SolrException.log(SolrException.java:113) [2/4/14 3:42:58 PM] Bojan: at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:377) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:184) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:162) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:106) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:272) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:380) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:184) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:162) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:106) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:272) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:380) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:184) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:162) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:106) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:272) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:380) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:184) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:162) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:106) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:272) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:380) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:184) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:162) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:106) at
[jira] [Commented] (SOLR-6213) StackOverflowException in Solr cloud's leader election
[ https://issues.apache.org/jira/browse/SOLR-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541349#comment-14541349 ] Forest Soup commented on SOLR-6213: --- Can we set a max re-try number instead of keep always trying until stack over flow? StackOverflowException in Solr cloud's leader election -- Key: SOLR-6213 URL: https://issues.apache.org/jira/browse/SOLR-6213 Project: Solr Issue Type: Bug Affects Versions: 4.10, Trunk Reporter: Dawid Weiss Priority: Critical This is what's causing test hangs (at least on FreeBSD, LUCENE-5786), possibly on other machines too. The problem is stack overflow from looped calls in: {code} org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
[jira] [Updated] (SOLR-6213) StackOverflowException in Solr cloud's leader election
[ https://issues.apache.org/jira/browse/SOLR-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6213: -- Attachment: stackoverflow.txt The stackoverflow exception. StackOverflowException in Solr cloud's leader election -- Key: SOLR-6213 URL: https://issues.apache.org/jira/browse/SOLR-6213 Project: Solr Issue Type: Bug Affects Versions: 4.10, Trunk Reporter: Dawid Weiss Priority: Critical Attachments: stackoverflow.txt This is what's causing test hangs (at least on FreeBSD, LUCENE-5786), possibly on other machines too. The problem is stack overflow from looped calls in: {code} org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
[jira] [Comment Edited] (SOLR-6213) StackOverflowException in Solr cloud's leader election
[ https://issues.apache.org/jira/browse/SOLR-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541359#comment-14541359 ] Forest Soup edited comment on SOLR-6213 at 5/13/15 5:27 AM: The stackoverflow exception is in the attachment. was (Author: forest_soup): The stackoverflow exception. StackOverflowException in Solr cloud's leader election -- Key: SOLR-6213 URL: https://issues.apache.org/jira/browse/SOLR-6213 Project: Solr Issue Type: Bug Affects Versions: 4.10, Trunk Reporter: Dawid Weiss Priority: Critical Attachments: stackoverflow.txt This is what's causing test hangs (at least on FreeBSD, LUCENE-5786), possibly on other machines too. The problem is stack overflow from looped calls in: {code} org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212)
[jira] [Commented] (SOLR-6213) StackOverflowException in Solr cloud's leader election
[ https://issues.apache.org/jira/browse/SOLR-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541350#comment-14541350 ] Forest Soup commented on SOLR-6213: --- I met the same issue within Solr 4.7.0. Too many recursive calls with below lines: at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:399) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:259) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:289) at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:399) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:259) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:289) StackOverflowException in Solr cloud's leader election -- Key: SOLR-6213 URL: https://issues.apache.org/jira/browse/SOLR-6213 Project: Solr Issue Type: Bug Affects Versions: 4.10, Trunk Reporter: Dawid Weiss Priority: Critical This is what's causing test hangs (at least on FreeBSD, LUCENE-5786), possibly on other machines too. The problem is stack overflow from looped calls in: {code} org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221)
[jira] [Commented] (SOLR-6156) Exception while using group with timeAllowed on SolrCloud.
[ https://issues.apache.org/jira/browse/SOLR-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506695#comment-14506695 ] Forest Soup commented on SOLR-6156: --- We have the same issue, when we issue a request like this(I only paste the xml format header here with some replacement): lst name=responseHeader int name=status500/int int name=QTime11/int lst name=params str name=_route_Q049Y2RsMi1tYWlsMDcvTz1zY24y12345678!/str str name=facettrue/str str name=facet.mincount1/str str name=facet.limit13/str str name=facet.rangedate/str str name=facet.range.endNOW/DAY+1DAY/str str name=facet.range.gap+1DAY/str str name=wtxml/str str name=rows0/str str name=dfbody/str str name=start0/str str name=q ((owner:12345678) AND (servername:mail07)) AND (((funid:38D46BF5E8F08834852564B50129B2C)) (softdeletion:0)) /str str name=facet.range.startNOW/DAY-31DAY/str str name=q.opAND/str str name=timeAllowed6/str str name=group.fieldtua0/str str name=group.sortdate desc/str str name=grouptrue/str arr name=facet.field strstrinetfrom/str strfunid/str /arr /lst /lst If we remove the timeAllowed=6, there is no that issue. We have all cores active according to /clusterstates.json and /live_nodes in both Admin UI and in ZooKeeper. We have the response: { error: { msg: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://hij2-solr1.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica2, https://hij2-solr2.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica1];, trace: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://hij2-solr1.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica2, https://hij2-solr2.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica1]\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:308)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)\n\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)\n\tat java.lang.Thread.run(Thread.java:804)\nCaused by: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://hij2-solr1.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica2, https://hij2-solr2.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica1]\n\tat org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:387)\n\tat org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:205)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:161)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:273)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:273)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)\n\t... 1 more\nCaused by:
[jira] [Comment Edited] (SOLR-6156) Exception while using group with timeAllowed on SolrCloud.
[ https://issues.apache.org/jira/browse/SOLR-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506695#comment-14506695 ] Forest Soup edited comment on SOLR-6156 at 4/22/15 9:20 AM: We have the same issue on solr 4.7, when we issue a request like this(I only paste the xml format header here with some replacement): lst name=responseHeader int name=status500/int int name=QTime11/int lst name=params str name=_route_Q049Y2RsMi1tYWlsMDcvTz1zY24y12345678!/str str name=facettrue/str str name=facet.mincount1/str str name=facet.limit13/str str name=facet.rangedate/str str name=facet.range.endNOW/DAY+1DAY/str str name=facet.range.gap+1DAY/str str name=wtxml/str str name=rows0/str str name=dfbody/str str name=start0/str str name=q ((owner:12345678) AND (servername:mail07)) AND (((funid:38D46BF5E8F08834852564B50129B2C)) (softdeletion:0)) /str str name=facet.range.startNOW/DAY-31DAY/str str name=q.opAND/str str name=timeAllowed6/str str name=group.fieldtua0/str str name=group.sortdate desc/str str name=grouptrue/str arr name=facet.field strstrinetfrom/str strfunid/str /arr /lst /lst If we remove the timeAllowed=6, there is no that issue. We have all cores active according to /clusterstates.json and /live_nodes in both Admin UI and in ZooKeeper. We have the response: { error: { msg: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://hij2-solr1.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica2, https://hij2-solr2.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica1];, trace: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://hij2-solr1.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica2, https://hij2-solr2.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica1]\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:308)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)\n\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)\n\tat java.lang.Thread.run(Thread.java:804)\nCaused by: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://hij2-solr1.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica2, https://hij2-solr2.fen.def2.cn.abc.com:8443/solr/collection1_shard2_replica1]\n\tat org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:387)\n\tat org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:205)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:161)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:273)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:273)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)\n\tat
[jira] [Updated] (SOLR-7434) Adding coreName to each log entry
[ https://issues.apache.org/jira/browse/SOLR-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7434: -- Affects Version/s: 5.1 Adding coreName to each log entry - Key: SOLR-7434 URL: https://issues.apache.org/jira/browse/SOLR-7434 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Forest Soup Could you please add [core name] to each log entry? Thanks! For example, it's hard for us to know the exact core having a such issue and the sequence, if there are too many cores in a solr node. This line is a good example: 2015-04-16 13:12:07.244; org.apache.solr.core.SolrCore; [collection3_shard5_replica2] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 This is a bad example: WARN - 2015-04-16 13:12:11.136; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.287; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.465; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.586; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at
[jira] [Updated] (SOLR-7434) Adding coreName to each log entry
[ https://issues.apache.org/jira/browse/SOLR-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7434: -- Affects Version/s: (was: 5.1) 4.7 Adding coreName to each log entry - Key: SOLR-7434 URL: https://issues.apache.org/jira/browse/SOLR-7434 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Forest Soup Could you please add [core name] to each log entry? Thanks! For example, it's hard for us to know the exact core having a such issue and the sequence, if there are too many cores in a solr node. This line is a good example: 2015-04-16 13:12:07.244; org.apache.solr.core.SolrCore; [collection3_shard5_replica2] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 This is a bad example: WARN - 2015-04-16 13:12:11.136; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.287; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.465; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.586; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at
[jira] [Created] (SOLR-7434) Adding coreName to each log entry
Forest Soup created SOLR-7434: - Summary: Adding coreName to each log entry Key: SOLR-7434 URL: https://issues.apache.org/jira/browse/SOLR-7434 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Forest Soup Could you please add [core name] to each log entry? Thanks! For example, it's hard for us to know the exact core having a such issue and the sequence, if there are too many cores in a solr node. This line is a good example: 2015-04-16 13:12:07.244; org.apache.solr.core.SolrCore; [collection3_shard5_replica2] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 This is a bad example: WARN - 2015-04-16 13:12:11.136; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.287; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.465; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.586; org.apache.solr.handler.SnapPuller$DirectoryFileFetcher; Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) WARN - 2015-04-16 13:12:11.768;
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381798#comment-14381798 ] Forest Soup commented on SOLR-6359: --- We have a SolrCloud with 5 solr servers of Solr 4.7.0. There are one collection with 80 shards(2 replicas per shard) on those 5 servers. And we made a patch by merge the code of this fix to 4.7.0 stream. And after applied the patch to our servers with the config changing uploaded to ZooKeeper, we did a restart on one of the 5 solr server, we met some issues on that server. Below is the details - The solrconfig.xml we changed: updateLog str name=dir$ {solr.ulog.dir:} /str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After we restarted one solr server without other 4 servers are running, we met below exceptions in the restarted one: ERROR - 2015-03-16 20:48:48.214; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception writing document id Q049bGx0bWFpbDIxL089bGxwX3VzMQ==41703656!B68BF5EC5A4A650D85257E0A00724A3B to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:703) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:857) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:556) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:804) Caused by:
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381799#comment-14381799 ] Forest Soup commented on SOLR-6359: --- It looks like https://issues.apache.org/jira/browse/SOLR-4605, but I guess it's not the case... Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Ramkumar Aiyengar Priority: Minor Fix For: Trunk, 5.1 Attachments: SOLR-6359.patch Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381798#comment-14381798 ] Forest Soup edited comment on SOLR-6359 at 3/26/15 2:21 PM: We have a SolrCloud with 5 solr servers of Solr 4.7.0. There are one collection with 80 shards(2 replicas per shard) on those 5 servers. And we made a patch by merge the code of this fix to 4.7.0 stream. And after applied the patch to our servers with the config changing uploaded to ZooKeeper, we did a restart on one of the 5 solr server, we met some issues on that server. Below is the details - The solrconfig.xml we changed: updateLog str name=dir$ {solr.ulog.dir:} /str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After we restarted one solr server without other 4 servers are running, we met below exceptions in the restarted one: ERROR - 2015-03-16 20:48:48.214; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception writing document id Q049bGx0bWFpbDIxL089bGxwX3VzMQ==41703656!B68BF5EC5A4A650D85257E0A00724A3B to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:703) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:857) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:556) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:804)
[jira] [Issue Comment Deleted] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6359: -- Comment: was deleted (was: It looks like https://issues.apache.org/jira/browse/SOLR-4605, but I guess it's not the case...) Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Ramkumar Aiyengar Priority: Minor Fix For: Trunk, 5.1 Attachments: SOLR-6359.patch Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7292) OutOfMemory happened in Solr, but /clusterstates.json shows cores active
[ https://issues.apache.org/jira/browse/SOLR-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7292: -- Attachment: OOM.txt failure.txt OutOfMemory happened in Solr, but /clusterstates.json shows cores active -- Key: SOLR-7292 URL: https://issues.apache.org/jira/browse/SOLR-7292 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.7 Environment: Redhat Linux 6.3 64bit Reporter: Forest Soup Labels: performance Attachments: OOM.txt, failure.txt One of our 5 Solr server got OOM, but in /clusterstates.json in ZK, it is still active. The OOM Ex are the attached OOM.txt. But update and commit to the collection which has cores on that Solr server will got failure. The logs are in the failure.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7292) When there is OutOfMemory happened in Solr and Solr cannot do update, but the /clusterstates.json on ZooKeeper still shows it is active
Forest Soup created SOLR-7292: - Summary: When there is OutOfMemory happened in Solr and Solr cannot do update, but the /clusterstates.json on ZooKeeper still shows it is active Key: SOLR-7292 URL: https://issues.apache.org/jira/browse/SOLR-7292 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.7 Environment: Redhat Linux 6.3 64bit Reporter: Forest Soup One of our 5 Solr server got OOM, but in /clusterstates.json in ZK, it is still active. The OOM Ex are the attached OOM.txt. But update and commit to the collection which has cores on that Solr server will got failure. The logs are in the failure.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7292) OutOfMemory happened in Solr, but /clusterstates.json shows cores active
[ https://issues.apache.org/jira/browse/SOLR-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7292: -- Summary: OutOfMemory happened in Solr, but /clusterstates.json shows cores active (was: When there is OutOfMemory happened in Solr and Solr cannot do update, but the /clusterstates.json on ZooKeeper still shows it is active) OutOfMemory happened in Solr, but /clusterstates.json shows cores active -- Key: SOLR-7292 URL: https://issues.apache.org/jira/browse/SOLR-7292 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.7 Environment: Redhat Linux 6.3 64bit Reporter: Forest Soup Labels: performance One of our 5 Solr server got OOM, but in /clusterstates.json in ZK, it is still active. The OOM Ex are the attached OOM.txt. But update and commit to the collection which has cores on that Solr server will got failure. The logs are in the failure.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7292) OutOfMemory happened in Solr, but /clusterstates.json shows cores active
[ https://issues.apache.org/jira/browse/SOLR-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377084#comment-14377084 ] Forest Soup commented on SOLR-7292: --- Thank you all! Will consider your suggestion! OutOfMemory happened in Solr, but /clusterstates.json shows cores active -- Key: SOLR-7292 URL: https://issues.apache.org/jira/browse/SOLR-7292 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.7 Environment: Redhat Linux 6.3 64bit Reporter: Forest Soup Labels: performance Attachments: OOM.txt, failure.txt One of our 5 Solr server got OOM, but in /clusterstates.json in ZK, it is still active. The OOM Ex are the attached OOM.txt. But update and commit to the collection which has cores on that Solr server will got failure. The logs are in the failure.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7069) A down core(shard replica) on an active node cannot failover the query to its good peer
Forest Soup created SOLR-7069: - Summary: A down core(shard replica) on an active node cannot failover the query to its good peer Key: SOLR-7069 URL: https://issues.apache.org/jira/browse/SOLR-7069 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Reporter: Forest Soup -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7069) A down core(shard replica) on an active node cannot failover the query to its good peer
[ https://issues.apache.org/jira/browse/SOLR-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7069: -- Description: When querying a collection with a core in down state, if we send the request to the server containing the down core, while the server is active, it cannot failover to the good replica of same shard on another server. A down core(shard replica) on an active node cannot failover the query to its good peer --- Key: SOLR-7069 URL: https://issues.apache.org/jira/browse/SOLR-7069 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Reporter: Forest Soup When querying a collection with a core in down state, if we send the request to the server containing the down core, while the server is active, it cannot failover to the good replica of same shard on another server. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7069) A down core(shard replica) on an active node cannot failover the query to its good peer
[ https://issues.apache.org/jira/browse/SOLR-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7069: -- Description: When querying a collection with a core in down state, if we send the request to the server containing the down core, while the server is active, it cannot failover to the good replica of same shard on another server. The steps to make a core down on an active server is: 1, delete the content of the data folder of the core 2, restart the solr server the core locates. Then we can see the core is down while other cores on the same server is still active. See attached picture. When we issue a query to the collection, if we send the request to the server containing the down core, we receive below errors: HTTP Status 500 - {msg=SolrCore 'collection5_shard1_replica2' is not available due to init failure: Error opening new searcher,trace=org.apache.solr.common.SolrException: SolrCore 'collection5_shard1_replica2' is not available due to init failure: Error opening new searcher at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:309) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) ... 1 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1521) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1633) at org.apache.solr.core.SolrCore.init(SolrCore.java:827) ... 11 more Caused by: java.io.FileNotFoundException: /mnt/solrdata1/solr/home/collection5_shard1_replica2/data/index/_12x.si (No such file or directory) at java.io.RandomAccessFile.init(RandomAccessFile.java:252) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:233) at org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:404) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:694) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:400) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:741) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at
[jira] [Updated] (SOLR-7069) A down core(shard replica) on an active node cannot failover the query to its good peer
[ https://issues.apache.org/jira/browse/SOLR-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7069: -- Attachment: Untitled.png A down core on an active node. A down core(shard replica) on an active node cannot failover the query to its good peer --- Key: SOLR-7069 URL: https://issues.apache.org/jira/browse/SOLR-7069 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Reporter: Forest Soup Attachments: Untitled.png When querying a collection with a core in down state, if we send the request to the server containing the down core, while the server is active, it cannot failover to the good replica of same shard on another server. The steps to make a core down on an active server is: 1, delete the content of the data folder of the core 2, restart the solr server the core locates. Then we can see the core is down while other cores on the same server is still active. See attached picture. When we issue a query to the collection, if we send the request to the server containing the down core, we receive below errors: HTTP Status 500 - {msg=SolrCore 'collection5_shard1_replica2' is not available due to init failure: Error opening new searcher,trace=org.apache.solr.common.SolrException: SolrCore 'collection5_shard1_replica2' is not available due to init failure: Error opening new searcher at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:309) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) ... 1 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1521) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1633) at org.apache.solr.core.SolrCore.init(SolrCore.java:827) ... 11 more Caused by: java.io.FileNotFoundException: /mnt/solrdata1/solr/home/collection5_shard1_replica2/data/index/_12x.si (No such file or directory) at java.io.RandomAccessFile.init(RandomAccessFile.java:252) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:233) at org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340) at
[jira] [Updated] (SOLR-7069) A down core(shard replica) on an active node cannot failover the query to its good peer on another server
[ https://issues.apache.org/jira/browse/SOLR-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-7069: -- Summary: A down core(shard replica) on an active node cannot failover the query to its good peer on another server (was: A down core(shard replica) on an active node cannot failover the query to its good peer) A down core(shard replica) on an active node cannot failover the query to its good peer on another server - Key: SOLR-7069 URL: https://issues.apache.org/jira/browse/SOLR-7069 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Reporter: Forest Soup Attachments: Untitled.png When querying a collection with a core in down state, if we send the request to the server containing the down core, while the server is active, it cannot failover to the good replica of same shard on another server. The steps to make a core down on an active server is: 1, delete the content of the data folder of the core 2, restart the solr server the core locates. Then we can see the core is down while other cores on the same server is still active. See attached picture. When we issue a query to the collection, if we send the request to the server containing the down core, we receive below errors: HTTP Status 500 - {msg=SolrCore 'collection5_shard1_replica2' is not available due to init failure: Error opening new searcher,trace=org.apache.solr.common.SolrException: SolrCore 'collection5_shard1_replica2' is not available due to init failure: Error opening new searcher at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:309) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:804) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:244) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) ... 1 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1521) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1633) at org.apache.solr.core.SolrCore.init(SolrCore.java:827) ... 11 more Caused by: java.io.FileNotFoundException: /mnt/solrdata1/solr/home/collection5_shard1_replica2/data/index/_12x.si (No such file or directory) at java.io.RandomAccessFile.init(RandomAccessFile.java:252) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:233) at
[jira] [Issue Comment Deleted] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6675: -- Comment: was deleted (was: We agree it's the suggester part. Thanks!) Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: 1014.zip, callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288613#comment-14288613 ] Forest Soup commented on SOLR-6675: --- We agree it's the suggester part. Thanks! Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: 1014.zip, callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288614#comment-14288614 ] Forest Soup commented on SOLR-6675: --- We agree it's the suggester part. Thanks! Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: 1014.zip, callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6359: -- Comment: was deleted (was: Thanks. But will there be this case? After a snapshot recovery of core A is done, the tlog is still out-of-date without any new records from recovery, and it's not cleared. And if the just recovered core(core A) taking the leader role, and another core(core C) is trying to recover from it. As A's tlog contains the old entries without newest ones, will the core C do a peersync only with the old records, but missing the newest ones? And I think the snapshot recovery is because there are too much difference between the 2 cores, so the tlog gap are also too much. So the out-of-date tlog is no longer needed for peersync. Our testing shows the snapshot recovery does not clean tlog with below steps: 1, Core A and core B are 2 replicas of a shard. 2, Core A down, and core B took leader role. And it takes some updates and record them to its tlog. 3, After A up, it will do recovery from B, and if the difference are too much, A will do snapshot pull recovery. And during the snapshot pull recovery, there is no other update comes in. After the snapshot pull recovery, the tlog of A is not updated, it still does NOT contain any most recent from B. * And the tlog are still out-of-date, although the index of A is already updated. * 4, Core A down again, and core B still remain the leader role, and it takes some other updates and recore them to its tlog. 5, After A up again, it will do recovery from B. But it found its tlog is still too old. So it will do a snapshot recovery again, which is not necessary. Do you agree? Thanks!) Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263806#comment-14263806 ] Forest Soup commented on SOLR-6683: --- The snapshot recovery does not clear tlog of the core being recovered. Is it an issue? Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263807#comment-14263807 ] Forest Soup commented on SOLR-6359: --- The snapshot recovery does not clear tlog of the core being recovered. Is it an issue? Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264276#comment-14264276 ] Forest Soup commented on SOLR-6359: --- Thanks. But will there be this case? After a snapshot recovery of core A is done, the tlog is still out-of-date without any new records from recovery, and it's not cleared. And if the just recovered core(core A) taking the leader role, and another core(core C) is trying to recover from it. As A's tlog contains the old entries without newest ones, will the core C do a peersync only with the old records, but missing the newest ones? And I think the snapshot recovery is because there are too much difference between the 2 cores, so the tlog gap are also too much. So the out-of-date tlog is no longer needed for peersync. Our testing shows the snapshot recovery does not clean tlog with below steps: 1, Core A and core B are 2 replicas of a shard. 2, Core A down, and core B took leader role. And it takes some updates and record them to its tlog. 3, After A up, it will do recovery from B, and if the difference are too much, A will do snapshot pull recovery. And during the snapshot pull recovery, there is no other update comes in. After the snapshot pull recovery, the tlog of A is not updated, it still does NOT contain any most recent from B. * And the tlog are still out-of-date, although the index of A is already updated. * 4, Core A down again, and core B still remain the leader role, and it takes some other updates and recore them to its tlog. 5, After A up again, it will do recovery from B. But it found its tlog is still too old. So it will do a snapshot recovery again, which is not necessary. Do you agree? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264277#comment-14264277 ] Forest Soup commented on SOLR-6359: --- Thanks. But will there be this case? After a snapshot recovery of core A is done, the tlog is still out-of-date without any new records from recovery, and it's not cleared. And if the just recovered core(core A) taking the leader role, and another core(core C) is trying to recover from it. As A's tlog contains the old entries without newest ones, will the core C do a peersync only with the old records, but missing the newest ones? And I think the snapshot recovery is because there are too much difference between the 2 cores, so the tlog gap are also too much. So the out-of-date tlog is no longer needed for peersync. Our testing shows the snapshot recovery does not clean tlog with below steps: 1, Core A and core B are 2 replicas of a shard. 2, Core A down, and core B took leader role. And it takes some updates and record them to its tlog. 3, After A up, it will do recovery from B, and if the difference are too much, A will do snapshot pull recovery. And during the snapshot pull recovery, there is no other update comes in. After the snapshot pull recovery, the tlog of A is not updated, it still does NOT contain any most recent from B. * And the tlog are still out-of-date, although the index of A is already updated. * 4, Core A down again, and core B still remain the leader role, and it takes some other updates and recore them to its tlog. 5, After A up again, it will do recovery from B. But it found its tlog is still too old. So it will do a snapshot recovery again, which is not necessary. Do you agree? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264277#comment-14264277 ] Forest Soup edited comment on SOLR-6359 at 1/5/15 7:27 AM: --- Thanks. But will there be this case? After a snapshot recovery of core A is done, the tlog is still out-of-date without any new records from recovery, and it's not cleared. And if the just recovered core(core A) taking the leader role, and another core(core C) is trying to recover from it. As A's tlog contains the old entries without newest ones, will the core C do a peersync only with the old records, but missing the newest ones? And I think the snapshot recovery is because there are too much difference between the 2 cores, so the tlog gap are also too much. So the out-of-date tlog is no longer needed for peersync. Our testing shows the snapshot recovery does not clean tlog with below steps: 1, Core A and core B are 2 replicas of a shard. 2, Core A down, and core B took leader role. And it takes some updates and record them to its tlog. 3, After A up, it will do recovery from B, and if the difference are too much, A will do snapshot pull recovery. And during the snapshot pull recovery, there is no other update comes in. After the snapshot pull recovery, the tlog of A is not updated, it still does NOT contain any most recent from B. And the tlog are still out-of-date, although the index of A is already updated. 4, Core A down again, and core B still remain the leader role, and it takes some other updates and recore them to its tlog. 5, After A up again, it will do recovery from B. But it found its tlog is still too old. So it will do a snapshot recovery again, which is not necessary. Do you agree? Thanks! was (Author: forest_soup): Thanks. But will there be this case? After a snapshot recovery of core A is done, the tlog is still out-of-date without any new records from recovery, and it's not cleared. And if the just recovered core(core A) taking the leader role, and another core(core C) is trying to recover from it. As A's tlog contains the old entries without newest ones, will the core C do a peersync only with the old records, but missing the newest ones? And I think the snapshot recovery is because there are too much difference between the 2 cores, so the tlog gap are also too much. So the out-of-date tlog is no longer needed for peersync. Our testing shows the snapshot recovery does not clean tlog with below steps: 1, Core A and core B are 2 replicas of a shard. 2, Core A down, and core B took leader role. And it takes some updates and record them to its tlog. 3, After A up, it will do recovery from B, and if the difference are too much, A will do snapshot pull recovery. And during the snapshot pull recovery, there is no other update comes in. After the snapshot pull recovery, the tlog of A is not updated, it still does NOT contain any most recent from B. * And the tlog are still out-of-date, although the index of A is already updated. * 4, Core A down again, and core B still remain the leader role, and it takes some other updates and recore them to its tlog. 5, After A up again, it will do recovery from B. But it found its tlog is still too old. So it will do a snapshot recovery again, which is not necessary. Do you agree? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263763#comment-14263763 ] Forest Soup commented on SOLR-6359: --- it works but with some pre-condition: the 20% newest existing transaction log of the core to be recovered must be newer than the 20% oldest existing transaction log of the good core. Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263762#comment-14263762 ] Forest Soup commented on SOLR-6683: --- it works but with some pre-condition: the 20% newest existing transaction log of the core to be recovered must be newer than the 20% oldest existing transaction log of the good core. Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263765#comment-14263765 ] Forest Soup commented on SOLR-6683: --- A full snapshot recovery does not clean the tlog of the core being recovered. Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263764#comment-14263764 ] Forest Soup commented on SOLR-6359: --- A full snapshot recovery does not clean the tlog of the core being recovered. Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249827#comment-14249827 ] Forest Soup edited comment on SOLR-6359 at 1/4/15 6:18 AM: --- I applied the patch for SOLR-6359 on 4.7 and did some test. Set below config: updateLog str name=dir$ {solr.ulog.dir:} /str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog was (Author: forest_soup): I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as expected. When I set below config, it still go into SnapPuller code even if I only newly added 800 doc. updateLog str name=dir$ {solr.ulog.dir:} /str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After my reading code, it seems that lines in org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the issue: if (ourHighThreshold otherLow) { // Small overlap between version windows and ours is older // This means that we might miss updates if we attempted to use this method. // Since there exists just one replica that is so much newer, we must // fail the sync. log.info(msg() + Our versions are too old. ourHighThreshold=+ourHighThreshold + otherLowThreshold=+otherLow); return false; } Could you please comment? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249826#comment-14249826 ] Forest Soup edited comment on SOLR-6683 at 1/4/15 6:18 AM: --- I applied the patch for SOLR-6359 on 4.7 and did some test. Set below config: updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog was (Author: forest_soup): I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as expected. When I set below config, it still go into SnapPuller code even if I only newly added 800 doc. updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After my reading code, it seems that lines in org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the issue: if (ourHighThreshold otherLow) { // Small overlap between version windows and ours is older // This means that we might miss updates if we attempted to use this method. // Since there exists just one replica that is so much newer, we must // fail the sync. log.info(msg() + Our versions are too old. ourHighThreshold=+ourHighThreshold + otherLowThreshold=+otherLow); return false; } Could you please comment? Thanks! Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249827#comment-14249827 ] Forest Soup edited comment on SOLR-6359 at 1/4/15 6:19 AM: --- I applied the patch for SOLR-6359 on 4.7 and did some test. Set below config: updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog was (Author: forest_soup): I applied the patch for SOLR-6359 on 4.7 and did some test. Set below config: updateLog str name=dir$ {solr.ulog.dir:} /str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246618#comment-14246618 ] Forest Soup edited comment on SOLR-6359 at 12/17/14 7:59 AM: - The numRecordsToKeep and maxNumLogsToKeep values should be in the updateLog., like below. Right? !-- Enables a transaction log, used for real-time get, durability, and and solr cloud replica recovery. The log can grow as big as uncommitted changes to the index, so use of a hard autoCommit is recommended (see below). dir - the target directory for transaction logs, defaults to the solr data directory. -- updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog was (Author: forest_soup): And where should I set the numRecordsToKeep and maxNumLogsToKeep values? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246618#comment-14246618 ] Forest Soup edited comment on SOLR-6359 at 12/17/14 10:01 AM: -- The numRecordsToKeep and maxNumLogsToKeep values should be in the updateLog.like below. !-- Enables a transaction log, used for real-time get, durability, and and solr cloud replica recovery. The log can grow as big as uncommitted changes to the index, so use of a hard autoCommit is recommended (see below). dir - the target directory for transaction logs, defaults to the solr data directory. -- updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog was (Author: forest_soup): The numRecordsToKeep and maxNumLogsToKeep values should be in the updateLog., like below. Right? !-- Enables a transaction log, used for real-time get, durability, and and solr cloud replica recovery. The log can grow as big as uncommitted changes to the index, so use of a hard autoCommit is recommended (see below). dir - the target directory for transaction logs, defaults to the solr data directory. -- updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249826#comment-14249826 ] Forest Soup commented on SOLR-6683: --- I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as expected. When I set below config, it still go into SnapPuller code even if I only newly added 800 doc. updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After my reading code, it seems that lines in org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the issue: if (ourHighThreshold otherLow) { // Small overlap between version windows and ours is older // This means that we might miss updates if we attempted to use this method. // Since there exists just one replica that is so much newer, we must // fail the sync. log.info(msg() + Our versions are too old. ourHighThreshold=+ourHighThreshold + otherLowThreshold=+otherLow); return false; } Could you please comment? Thanks! Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249827#comment-14249827 ] Forest Soup commented on SOLR-6359: --- I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as expected. When I set below config, it still go into SnapPuller code even if I only newly added 800 doc. updateLog str name=dir$ {solr.ulog.dir:} /str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After my reading code, it seems that lines in org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the issue: if (ourHighThreshold otherLow) { // Small overlap between version windows and ours is older // This means that we might miss updates if we attempted to use this method. // Since there exists just one replica that is so much newer, we must // fail the sync. log.info(msg() + Our versions are too old. ourHighThreshold=+ourHighThreshold + otherLowThreshold=+otherLow); return false; } Could you please comment? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246497#comment-14246497 ] Forest Soup commented on SOLR-6675: --- Looks like thread searcherExecutor-5-thread-1 and searcherExecutor-6-thread-1 blocking the coreLoadExecutor-4-thread-1 and coreLoadExecutor-4-thread-2. And searcherExecutor-5-thread-1 and searcherExecutor-6-thread-1 are like suggester code. [~hossman] Could you please help to make sure? Thanks! Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: 1014.zip, callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246505#comment-14246505 ] Forest Soup commented on SOLR-6359: --- Is the patch only available for Solr 5.0? For Solr 4.7, can we apply the patch? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246609#comment-14246609 ] Forest Soup commented on SOLR-6359: --- When could we get the official build with that patch in 4.x or 5.0? Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246618#comment-14246618 ] Forest Soup commented on SOLR-6359: --- And where should I set the numRecordsToKeep and maxNumLogsToKeep values? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6675: -- Attachment: 1014.zip The 0001.txt and 0002.txt are the dump files before solr webapp is deployed. The 0003.txt is the dump file after solr webapp is deployed. Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: 1014.zip, callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234034#comment-14234034 ] Forest Soup commented on SOLR-6675: --- This is our JVM. And we have never tried the latest Solr 4.10.x. Any idea on how to resolve or workaround it? Thanks! java version 1.7.0 Java(TM) SE Runtime Environment (build pxa6470sr6-20131015_01(SR6)) IBM J9 VM (build 2.6, JRE 1.7.0 Linux amd64-64 Compressed References 20131013_170512 (JIT enabled, AOT enabled) J9VM - R26_Java726_SR6_20131013_1510_B170512 JIT - r11.b05_20131003_47443 GC - R26_Java726_SR6_20131013_1510_B170512_CMPRSS J9CL - 20131013_170512) JCL - 20131011_01 based on Oracle 7u45-b18 Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234046#comment-14234046 ] Forest Soup commented on SOLR-4470: --- Does anyone have an idea when this will be released? Thanks! Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: Trunk Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch, SOLR-4470_trunk_r1568857.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234052#comment-14234052 ] Forest Soup commented on SOLR-6683: --- Thanks, Ramkumar. We will try it. Thanks! Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
Forest Soup created SOLR-6683: - Summary: Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6683: -- Description: If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! was: If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. Thanks! Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6674) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
Forest Soup created SOLR-6674: - Summary: Solr webapp deployment is very slow with jmx/ in solrconfig.xml Key: SOLR-6674 URL: https://issues.apache.org/jira/browse/SOLR-6674 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
Forest Soup created SOLR-6675: - Summary: Solr webapp deployment is very slow with jmx/ in solrconfig.xml Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189798#comment-14189798 ] Forest Soup commented on SOLR-6675: --- The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189802#comment-14189802 ] Forest Soup edited comment on SOLR-6675 at 10/30/14 8:33 AM: - The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. was (Author: forest_soup): The call stack of the long blocking thread which is doing statistics calculation. Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6675: -- Attachment: callstack.png The call stack of the long blocking thread which is doing statistics calculation. Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6675: -- Description: We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms was: We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! Solr webapp deployment is very slow with jmx/ in solrconfig.xml - Key: SOLR-6675 URL: https://issues.apache.org/jira/browse/SOLR-6675 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Linux Redhat 64bit Reporter: Forest Soup Priority: Critical Labels: performance Attachments: callstack.png We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr
[jira] [Updated] (SOLR-6675) Solr webapp deployment is very slow with jmx/ in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forest Soup updated SOLR-6675: -- Description: We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The callstack.png file in the attachment is the call stack of the long blocking thread which is doing statistics calculation. The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app Deployment is about 10 minutes --- Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/manager Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/manager has finished in 2,035 ms Oct 13, 2014 2:10:26 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/examples Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/examples has finished in 1,789 ms Oct 13, 2014 2:10:27 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/docs Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/docs has finished in 1,037 ms Oct 13, 2014 2:10:28 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/ROOT has finished in 948 ms Oct 13, 2014 2:10:29 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager Oct 13, 2014 2:10:30 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deployment of web application directory /opt/ibm/solrsearch/tomcat/webapps/host-manager has finished in 951 ms Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8080] Oct 13, 2014 2:10:31 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] Oct 13, 2014 2:10:31 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 601506 ms was: We have a SolrCloud with Solr version 4.7 with Tomcat 7. And our solr index(cores) are big(50~100G) each core. When we start up tomcat, the solr webapp deployment is very slow. From tomcat's catalina log, every time it takes about 10 minutes to get deployed. After we analyzing java core dump, we notice it's because the loading process cannot finish until the MBean calculation for large index is done. So we tried to remove the jmx/ from solrconfig.xml, after that, the loading of solr webapp only take about 1 minute. So we can sure the MBean calculation for large index is the root cause. Could you please point me if there is any async way to do statistic monitoring without jmx/ in solrconfig.xml, or let it do calculation after the deployment? Thanks! The catalina log of tomcat: INFO: Starting Servlet Engine: Apache Tomcat/7.0.54 Oct 13, 2014 2:00:29 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war Oct 13, 2014 2:10:23 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive /opt/ibm/solrsearch/tomcat/webapps/solr.war has finished in 594,325 ms Time taken for solr app
[jira] [Commented] (SOLR-6335) org.apache.solr.common.SolrException: no servers hosting shard
[ https://issues.apache.org/jira/browse/SOLR-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092340#comment-14092340 ] Forest Soup commented on SOLR-6335: --- Thanks Erick. Before I open JIRA, I have searched, but found no similar root cause of my question. And I asked in below link, but no response. My ZK connection and network is good. So, could you please help? Thanks! http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-no-servers-hosting-shard-td4151637.html org.apache.solr.common.SolrException: no servers hosting shard -- Key: SOLR-6335 URL: https://issues.apache.org/jira/browse/SOLR-6335 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Red Hat Enterprise Linux Server release 6.4 (Santiago) 64bit Reporter: Forest Soup Attachments: solrconfig_perf0804.xml http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-no-servers-hosting-shard-td4151637.html I have 2 solr nodes(solr1 and solr2) in a SolrCloud. After this issue happened, solr2 are in recovering state. And after it takes long time to finish recovery, there is this issue again, and it turn to recovery again. It happens again and again. ERROR - 2014-08-04 21:12:27.917; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:148) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) We have those settings in solrconfig.xml different with default: maxIndexingThreads24/maxIndexingThreads ramBufferSizeMB200/ramBufferSizeMB maxBufferedDocs1/maxBufferedDocs autoCommit maxDocs1000/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearchertrue/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime /autoSoftCommit filterCache class=solr.FastLRUCache size=16384 initialSize=16384 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=16384 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=16384 autowarmCount=4096/ fieldValueCache class=solr.FastLRUCache size=16384 autowarmCount=1024 showItems=32 / queryResultWindowSize50/queryResultWindowSize The full solrconfig.xml is as attachment. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org