[jira] [Commented] (SOLR-7319) Workaround the "Four Month Bug" causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542699#comment-15542699 ] Shawn Heisey commented on SOLR-7319: Sounds like a good plan to me. > Workaround the "Four Month Bug" causing GC pause problems > - > > Key: SOLR-7319 > URL: https://issues.apache.org/jira/browse/SOLR-7319 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Shawn Heisey >Assignee: Shawn Heisey > Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch > > > A twitter engineer found a bug in the JVM that contributes to GC pause > problems: > http://www.evanjones.ca/jvm-mmap-pause.html > Problem summary (in case the blog post disappears): The JVM calculates > statistics on things like garbage collection and writes them to a file in the > temp directory using MMAP. If there is a lot of other MMAP write activity, > which is precisely how Lucene accomplishes indexing and merging, it can > result in a GC pause because the mmap write to the temp file is delayed. > We should implement the workaround in the solr start scripts (disable > creation of the mmap statistics tempfile) and document the impact in > CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the "Four Month Bug" causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542663#comment-15542663 ] Alexandre Rafalovitch commented on SOLR-7319: - Should we close the issue then? If there is no next action and there is no consensus, there is no work we can do with it. Mark it as "Information provided" so people can find it if need be. > Workaround the "Four Month Bug" causing GC pause problems > - > > Key: SOLR-7319 > URL: https://issues.apache.org/jira/browse/SOLR-7319 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Shawn Heisey >Assignee: Shawn Heisey > Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch > > > A twitter engineer found a bug in the JVM that contributes to GC pause > problems: > http://www.evanjones.ca/jvm-mmap-pause.html > Problem summary (in case the blog post disappears): The JVM calculates > statistics on things like garbage collection and writes them to a file in the > temp directory using MMAP. If there is a lot of other MMAP write activity, > which is precisely how Lucene accomplishes indexing and merging, it can > result in a GC pause because the mmap write to the temp file is delayed. > We should implement the workaround in the solr start scripts (disable > creation of the mmap statistics tempfile) and document the impact in > CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the "Four Month Bug" causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542616#comment-15542616 ] Shawn Heisey commented on SOLR-7319: AFAIK, nothing got resolved. I think this is a *potential* problem for Solr, but I don't know that it's been explicitly documented anywhere. The fundamental issue is something that probably needs a design change in Java itself. I tried to work around the problem in my own installation by putting the temp files on a RAM-based disk partition, but it never quite worked like I wanted it to. > Workaround the "Four Month Bug" causing GC pause problems > - > > Key: SOLR-7319 > URL: https://issues.apache.org/jira/browse/SOLR-7319 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Shawn Heisey >Assignee: Shawn Heisey > Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch > > > A twitter engineer found a bug in the JVM that contributes to GC pause > problems: > http://www.evanjones.ca/jvm-mmap-pause.html > Problem summary (in case the blog post disappears): The JVM calculates > statistics on things like garbage collection and writes them to a file in the > temp directory using MMAP. If there is a lot of other MMAP write activity, > which is precisely how Lucene accomplishes indexing and merging, it can > result in a GC pause because the mmap write to the temp file is delayed. > We should implement the workaround in the solr start scripts (disable > creation of the mmap statistics tempfile) and document the impact in > CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the "Four Month Bug" causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15539635#comment-15539635 ] Alexandre Rafalovitch commented on SOLR-7319: - Was there a practical outcome of this discussion? I see the commits rolled in and rolled back. The last entry mentions a possible utility. Should that be span out with more explanations into its own improvement JIRA and this (bug) issue closed? > Workaround the "Four Month Bug" causing GC pause problems > - > > Key: SOLR-7319 > URL: https://issues.apache.org/jira/browse/SOLR-7319 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.0 >Reporter: Shawn Heisey >Assignee: Shawn Heisey > Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch > > > A twitter engineer found a bug in the JVM that contributes to GC pause > problems: > http://www.evanjones.ca/jvm-mmap-pause.html > Problem summary (in case the blog post disappears): The JVM calculates > statistics on things like garbage collection and writes them to a file in the > temp directory using MMAP. If there is a lot of other MMAP write activity, > which is precisely how Lucene accomplishes indexing and merging, it can > result in a GC pause because the mmap write to the temp file is delayed. > We should implement the workaround in the solr start scripts (disable > creation of the mmap statistics tempfile) and document the impact in > CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390220#comment-14390220 ] Ferenczi Jim commented on SOLR-7319: Thanks [~elyograg]. We are big fans of your pages about the settings for Solr regarding the Garbage Collector. We changed a lot of our settings after reading your page and we are know happy with the GC performance in our setup. I guess that providing good defaults values for all use cases is almost impossible and that each deployment/use cases would need a round of testing to find optimal values (especially for the tenuring threshold and the size of the heap). Anyway I think that most of the Solr users would be happy to have default values optimized by Solr expert. For those who think that they can have better performance with other settings nothing prevent them to change those defaults ;) My initial point was that the defaults options should not break any external tool accessing Solr especially if it prevents the user to monitor the GC with jstat. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391944#comment-14391944 ] Shawn Heisey commented on SOLR-7319: bq. Would it be easily possible to detect the total amount of system memory and set the max heap to a percentage? We have the JVM and extensive programming skills available to us. IMHO there's no reason we can't leverage a very small commandline Java program (rolled into a jar like jetty uses start.jar) to gather detailed system information, calculate values and GC tuning options, write them to someplace relevant (the solr home maybe), and use them in the start script. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388131#comment-14388131 ] Shawn Heisey commented on SOLR-7319: Does the bin/solr script offer a way to send an option directly to the java commandline? Should we have the ability to have a local user config script (similar to /etc/default/solr but contained within the solr download, with both shell and windows versions) to provide additional config? Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388154#comment-14388154 ] Ferenczi Jim commented on SOLR-7319: Most of the java options in the solr.in.cmd should not be activated by default. The tenuring threshold, the numbers of threads for the GC, ..., they all depend on the type of deployment you have, the size of the heap and the machine hosting the Solr node. In my company we are using a custom script full of java options that we added over the years. Most of the options are here because somebody added this with the assertion that the performance are better. Most of the time, we don't know what the option is for but nobody wants to remove it because the urban legend says it's useful. The solr startup script should be almost empty (at least for the java options), maybe one or two options to set up the garbage collector and that's it. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389924#comment-14389924 ] Shawn Heisey commented on SOLR-7319: Devolving into a general discussion about garbage collection tuning: [~jim.ferenczi], I've had really good luck with these GC tuning options, although I have now moved on to G1GC: https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector I tried really hard to make these options completely generic and not dependent on the number of CPUs, the size of the heap, the amount of system memory, or anything else that's site specific, but users with particularly small or large setups might need to adjust them. Here's the GC tuning options I ended up when I updated and compiled branch_5x and started the server with bin/solr: {noformat} -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled {noformat} These are largely the same as what I came up with for my system. Both sets have options that the other set doesn't. I know from experience and my discussions on the hotspot-gc-use mailing list that ParallelRefProcEnabled is *critical* for good GC performance with Solr. Solr apparently creates a LOT of references, so processing them in parallel is a real help. PretenureSizeThreshold is probably very important, to make sure that objects will not automatically end up in the old generation unless they're REALLY big - similar to the G1HeapRegionSize option for G1 that can control which objects are classified as humongous allocations. The other options are a concerted effort to avoid full GCs. I don't like the fact that the number of GC threads is hard-coded. For someone who's got 8 or more CPU cores (which I do), these are probably good options, but if you've got a low end system with one or two cores, it's too many threads. I have to wonder whether the 512MB default heap size is a problem. It would be for me, but for a small-scale proof-of-concept, it is probably plenty. Would it be easily possible to detect the total amount of system memory and set the max heap to a percentage? Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
Yes and yes. the -a option for the bin/solr command passes stuff through, e.g. bin/solr start -c -z localhost:2181 -p 8981 -s example/cloud/node1/solr -a -Xmx4G -Xms4G and the like. It'd be useful I would guess to be able to specify a local file of options as well I should think. Erikc On Mon, Mar 30, 2015 at 11:49 PM, Shawn Heisey (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388131#comment-14388131 ] Shawn Heisey commented on SOLR-7319: Does the bin/solr script offer a way to send an option directly to the java commandline? Should we have the ability to have a local user config script (similar to /etc/default/solr but contained within the solr download, with both shell and windows versions) to provide additional config? Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
we should actually fix the script to pass any args that being with -X straight through to the JVM so we don't need the -a part, i.e. bin/solr start -XX:+PerfDisableSharedMem vs. bin/solr start -a -XX:+PerfDisableSharedMem On Tue, Mar 31, 2015 at 6:43 AM, Erick Erickson erickerick...@gmail.com wrote: Yes and yes. the -a option for the bin/solr command passes stuff through, e.g. bin/solr start -c -z localhost:2181 -p 8981 -s example/cloud/node1/solr -a -Xmx4G -Xms4G and the like. It'd be useful I would guess to be able to specify a local file of options as well I should think. Erikc On Mon, Mar 30, 2015 at 11:49 PM, Shawn Heisey (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388131#comment-14388131 ] Shawn Heisey commented on SOLR-7319: Does the bin/solr script offer a way to send an option directly to the java commandline? Should we have the ability to have a local user config script (similar to /etc/default/solr but contained within the solr download, with both shell and windows versions) to provide additional config? Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388744#comment-14388744 ] ASF subversion and git services commented on SOLR-7319: --- Commit 1670370 from [~elyograg] in branch 'dev/trunk' [ https://svn.apache.org/r1670370 ] SOLR-7319: Revert previous patch, return to discussion. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388745#comment-14388745 ] ASF subversion and git services commented on SOLR-7319: --- Commit 1670371 from [~elyograg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1670371 ] SOLR-7319: Revert previous patch, return to discussion. (merge trunk r1670370) Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388747#comment-14388747 ] ASF subversion and git services commented on SOLR-7319: --- Commit 1670373 from [~elyograg] in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1670373 ] SOLR-7319: Revert previous patch, return to discussion. (merge trunk r1670370) Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386754#comment-14386754 ] Ferenczi Jim commented on SOLR-7319: If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging = Are you sure about this statement, MMapDirectory uses MMap for reads and a simple RandomAccessFile for writes. I don't know how the RandomAccessFile is implemented but I doubt it's using MMap at all. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386834#comment-14386834 ] Shawn Heisey commented on SOLR-7319: Good questions, [~jim.ferenczi]. The option does appear to have helped GC pauses times for me, although it's hard to quantify. I know that the *average* GC pause time dropped from .10 sec to .06 sec. This isn't a lot, but when there are thousands of collections, even a small difference like that adds up. I wish I had a way to gather median, 75th, 95th, and 99th percentile info on GC pauses. If you know something about how Lucene writes to disk that says it's not mmap when the directory is mmap, then you know more than I do. I wonder whether heavy mmap *reads* might interfere with writing to the stats file. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386786#comment-14386786 ] Ferenczi Jim commented on SOLR-7319: I am saying this because if we are not sure that Lucene is impacted we should not add this in the default options. Not being able to do a jstat on a running node is problematic and will break a lot of monitoring tools built on top of Solr. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386787#comment-14386787 ] Ferenczi Jim commented on SOLR-7319: I am saying this because if we are not sure that Lucene is impacted we should not add this in the default options. Not being able to do a jstat on a running node is problematic and will break a lot of monitoring tools built on top of Solr. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386877#comment-14386877 ] Timothy Potter commented on SOLR-7319: -- That's correct - MMap is only for reading the index, so maybe instead of enabling this by default, we document it in bin/solr.in.(sh|cmd) and users can turn it on if they so choose. I've already been ding'd a few times on adding Java flags as defaults in those scripts because they helped my prod env. but weren't deemed generally applicable for all Solr users. So I vote for leaving it out by default, but documenting it as something for operators to enable if they experience this issue. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386961#comment-14386961 ] Shawn Heisey commented on SOLR-7319: [~thelabdude], I am not opposed to a solution based purely on documentation. Let's get a few more opinions, and if that's the general feeling, can revert my patch. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383912#comment-14383912 ] Shawn Heisey commented on SOLR-7319: The situation where the bug presents itself -- heavy writes via mmap -- is exactly what Solr (Lucene) does when indexing or merging. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html We should implement the workaround and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383925#comment-14383925 ] Shawn Heisey commented on SOLR-7319: Attached patch implementing the workaround and documenting the impact in CHANGES.txt -- tools like jstat will no longer function. Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384020#comment-14384020 ] Timothy Potter commented on SOLR-7319: -- +1 ~ great find Shawn! Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384472#comment-14384472 ] Otis Gospodnetic commented on SOLR-7319: bq. tools like jstat will no longer function Sounds problematic, no? Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385108#comment-14385108 ] ASF subversion and git services commented on SOLR-7319: --- Commit 1669731 from [~elyograg] in branch 'dev/trunk' [ https://svn.apache.org/r1669731 ] SOLR-7319: Workaround for the Four Month Bug GC pause problem Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7319) Workaround the Four Month Bug causing GC pause problems
[ https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385109#comment-14385109 ] ASF subversion and git services commented on SOLR-7319: --- Commit 1669732 from [~elyograg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1669732 ] SOLR-7319: Workaround for the Four Month Bug GC pause problem (merge trunk r1669731) Workaround the Four Month Bug causing GC pause problems - Key: SOLR-7319 URL: https://issues.apache.org/jira/browse/SOLR-7319 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.1 Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch A twitter engineer found a bug in the JVM that contributes to GC pause problems: http://www.evanjones.ca/jvm-mmap-pause.html Problem summary (in case the blog post disappears): The JVM calculates statistics on things like garbage collection and writes them to a file in the temp directory using MMAP. If there is a lot of other MMAP write activity, which is precisely how Lucene accomplishes indexing and merging, it can result in a GC pause because the mmap write to the temp file is delayed. We should implement the workaround in the solr start scripts (disable creation of the mmap statistics tempfile) and document the impact in CHANGES.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org