Re: migration solr 3.5 to 4.1 - JVM GC problems
Big heap because very large number of requests with more than 60 indexes and hundreds of million of documents (all indexes together). My problem is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1 or 2mn and 20Gb of the heap is used. With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs every minute. There must be something that has changed in 4.1 compared to 3.5 to cause this behavior. It's the same requests, same schemas (excepted 4 fields changed from sint to tint) and same config. On 04/10/2013 07:38 PM, Shawn Heisey wrote: On 4/10/2013 9:48 AM, Marc Des Garets wrote: The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m As Otis has already asked, why do you have a 40GB heap? The only way I can imagine that you would actually NEED a heap that big is if your index size is measured in hundreds of gigabytes. If you really do need a heap that big, you will probably need to go with a JVM like Zing. I don't know how much Zing costs, but they claim to be able to make any heap size perform well under any load. It is Linux-only. I was running into extreme problems with GC pauses with my own setup, and that was only with an 8GB heap. I was using the CMS collector and NewRatio=1. Switching to G1 didn't help at all - it might have even made the problem worse. I never did try the Zing JVM. After a lot of experimentation (which I will admit was not done very methodically) I found JVM options that have reduced the GC pause problem greatly. Below is what I am using now on Solr 4.2.1 with a total per-server index size of about 45GB. This works properly on CentOS 6 with Oracle Java 7u17, UseLargePages may require special kernel tuning on other operating systems: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts These options could probably use further tuning, but I haven't had time for the kind of testing that will be required. If you decide to pay someone to make the problem going away instead: http://www.azulsystems.com/products/zing/whatisit Thanks, Shawn This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.
Re: migration solr 3.5 to 4.1 - JVM GC problems
Hi Marc; Could I learn your index size and what is your performance measure as query per second? 2013/4/11 Marc Des Garets marc.desgar...@192.com Big heap because very large number of requests with more than 60 indexes and hundreds of million of documents (all indexes together). My problem is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1 or 2mn and 20Gb of the heap is used. With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs every minute. There must be something that has changed in 4.1 compared to 3.5 to cause this behavior. It's the same requests, same schemas (excepted 4 fields changed from sint to tint) and same config. On 04/10/2013 07:38 PM, Shawn Heisey wrote: On 4/10/2013 9:48 AM, Marc Des Garets wrote: The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m As Otis has already asked, why do you have a 40GB heap? The only way I can imagine that you would actually NEED a heap that big is if your index size is measured in hundreds of gigabytes. If you really do need a heap that big, you will probably need to go with a JVM like Zing. I don't know how much Zing costs, but they claim to be able to make any heap size perform well under any load. It is Linux-only. I was running into extreme problems with GC pauses with my own setup, and that was only with an 8GB heap. I was using the CMS collector and NewRatio=1. Switching to G1 didn't help at all - it might have even made the problem worse. I never did try the Zing JVM. After a lot of experimentation (which I will admit was not done very methodically) I found JVM options that have reduced the GC pause problem greatly. Below is what I am using now on Solr 4.2.1 with a total per-server index size of about 45GB. This works properly on CentOS 6 with Oracle Java 7u17, UseLargePages may require special kernel tuning on other operating systems: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts These options could probably use further tuning, but I haven't had time for the kind of testing that will be required. If you decide to pay someone to make the problem going away instead: http://www.azulsystems.com/products/zing/whatisit Thanks, Shawn This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the company. http://www.192.com(Tel: 08000 192 192). 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.
Re: migration solr 3.5 to 4.1 - JVM GC problems
I have 45 solr 4.1 indexes. Sizes vary between 20Gb and 2.2Gb. - 1 is 20Gb (80 million docs) - 1 is 5.1Gb (24 million docs) - 1 is 5.6Gb (26 million docs) - 1 is 6.5Gb (28 million docs) - 11 others are about 2.2Gb (6-7 million docs). - 20 others are about 600Mb (2.5 million docs) That reminds me of something. The 4.1 indexes are 2 times smaller than the 3.5 indexes. For example the one which is 20Gb with solr 4.1 is 43Gb with solr 3.5. Maybe there is something there? There is roughly 200 queries per second. On 04/11/2013 11:07 AM, Furkan KAMACI wrote: Hi Marc; Could I learn your index size and what is your performance measure as query per second? 2013/4/11 Marc Des Garets marc.desgar...@192.com Big heap because very large number of requests with more than 60 indexes and hundreds of million of documents (all indexes together). My problem is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1 or 2mn and 20Gb of the heap is used. With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs every minute. There must be something that has changed in 4.1 compared to 3.5 to cause this behavior. It's the same requests, same schemas (excepted 4 fields changed from sint to tint) and same config. On 04/10/2013 07:38 PM, Shawn Heisey wrote: On 4/10/2013 9:48 AM, Marc Des Garets wrote: The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m As Otis has already asked, why do you have a 40GB heap? The only way I can imagine that you would actually NEED a heap that big is if your index size is measured in hundreds of gigabytes. If you really do need a heap that big, you will probably need to go with a JVM like Zing. I don't know how much Zing costs, but they claim to be able to make any heap size perform well under any load. It is Linux-only. I was running into extreme problems with GC pauses with my own setup, and that was only with an 8GB heap. I was using the CMS collector and NewRatio=1. Switching to G1 didn't help at all - it might have even made the problem worse. I never did try the Zing JVM. After a lot of experimentation (which I will admit was not done very methodically) I found JVM options that have reduced the GC pause problem greatly. Below is what I am using now on Solr 4.2.1 with a total per-server index size of about 45GB. This works properly on CentOS 6 with Oracle Java 7u17, UseLargePages may require special kernel tuning on other operating systems: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts These options could probably use further tuning, but I haven't had time for the kind of testing that will be required. If you decide to pay someone to make the problem going away instead: http://www.azulsystems.com/products/zing/whatisit Thanks, Shawn This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the company. http://www.192.com(Tel: 08000 192 192). 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273. This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the
Re: migration solr 3.5 to 4.1 - JVM GC problems
Same config? Do a compare with the new example config and see what settings are different/changed. There may have been some defaults that changed. Read the comments in the new config. If you had just taken or merged the new config, then I would suggest making sure that the update log is not enabled (or make sure you do hard commits relatively frequently rather than only soft commits.) -- Jack Krupansky -Original Message- From: Marc Des Garets Sent: Thursday, April 11, 2013 3:07 AM To: solr-user@lucene.apache.org Subject: Re: migration solr 3.5 to 4.1 - JVM GC problems Big heap because very large number of requests with more than 60 indexes and hundreds of million of documents (all indexes together). My problem is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1 or 2mn and 20Gb of the heap is used. With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs every minute. There must be something that has changed in 4.1 compared to 3.5 to cause this behavior. It's the same requests, same schemas (excepted 4 fields changed from sint to tint) and same config. On 04/10/2013 07:38 PM, Shawn Heisey wrote: On 4/10/2013 9:48 AM, Marc Des Garets wrote: The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m As Otis has already asked, why do you have a 40GB heap? The only way I can imagine that you would actually NEED a heap that big is if your index size is measured in hundreds of gigabytes. If you really do need a heap that big, you will probably need to go with a JVM like Zing. I don't know how much Zing costs, but they claim to be able to make any heap size perform well under any load. It is Linux-only. I was running into extreme problems with GC pauses with my own setup, and that was only with an 8GB heap. I was using the CMS collector and NewRatio=1. Switching to G1 didn't help at all - it might have even made the problem worse. I never did try the Zing JVM. After a lot of experimentation (which I will admit was not done very methodically) I found JVM options that have reduced the GC pause problem greatly. Below is what I am using now on Solr 4.2.1 with a total per-server index size of about 45GB. This works properly on CentOS 6 with Oracle Java 7u17, UseLargePages may require special kernel tuning on other operating systems: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts These options could probably use further tuning, but I haven't had time for the kind of testing that will be required. If you decide to pay someone to make the problem going away instead: http://www.azulsystems.com/products/zing/whatisit Thanks, Shawn This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.
Re: migration solr 3.5 to 4.1 - JVM GC problems
Same config. I compared both, some defaults changed like ramBufferSize which I've set like in 3.5 (same with other things). It becomes even more strange to me. Now I have changed the jvm settings to this: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=6 -XX:SurvivorRatio=2 -XX:G1ReservePercent=10 -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=30 -XX:PermSize=728m -XX:MaxPermSize=728m So the Eden space is just 6Gb, survivor space is still weird (80Mb) and full 100% of time, and old gen is 34Gb. I now get GCs of just 0.07 sec every 30sec/1mn. Very regular like this: [GC pause (young) 16214M-10447M(40960M), 0.0738720 secs] Just 30% of the total heap is used. After while it's going to do: [GC pause (young) (initial-mark) 11603M-11391M(40960M), 0.100 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0172380] [GC concurrent-mark-start] [GC concurrent-mark-end, 0.4824210 sec] [GC remark, 0.0248680 secs] [GC cleanup 11476M-11476M(40960M), 0.0116420 secs] Which looks pretty good. If I am not mistaken, concurrent-mark isn't stop the world. remark is stop the world but is just 0.02 sec and GC cleanup is also stop the world but is just 0.01 sec. By the look of it I could have a 20g heap rather than 40... Now I am waiting to see what happens when it will clear the old gen but that will take a while before it happens because it is growing slowly. Still mysterious to me but it looks like it's going to all work out. On 04/11/2013 03:06 PM, Jack Krupansky wrote: Same config? Do a compare with the new example config and see what settings are different/changed. There may have been some defaults that changed. Read the comments in the new config. If you had just taken or merged the new config, then I would suggest making sure that the update log is not enabled (or make sure you do hard commits relatively frequently rather than only soft commits.) -- Jack Krupansky -Original Message- From: Marc Des Garets Sent: Thursday, April 11, 2013 3:07 AM To: solr-user@lucene.apache.org Subject: Re: migration solr 3.5 to 4.1 - JVM GC problems Big heap because very large number of requests with more than 60 indexes and hundreds of million of documents (all indexes together). My problem is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1 or 2mn and 20Gb of the heap is used. With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs every minute. There must be something that has changed in 4.1 compared to 3.5 to cause this behavior. It's the same requests, same schemas (excepted 4 fields changed from sint to tint) and same config. On 04/10/2013 07:38 PM, Shawn Heisey wrote: On 4/10/2013 9:48 AM, Marc Des Garets wrote: The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m As Otis has already asked, why do you have a 40GB heap? The only way I can imagine that you would actually NEED a heap that big is if your index size is measured in hundreds of gigabytes. If you really do need a heap that big, you will probably need to go with a JVM like Zing. I don't know how much Zing costs, but they claim to be able to make any heap size perform well under any load. It is Linux-only. I was running into extreme problems with GC pauses with my own setup, and that was only with an 8GB heap. I was using the CMS collector and NewRatio=1. Switching to G1 didn't help at all - it might have even made the problem worse. I never did try the Zing JVM. After a lot of experimentation (which I will admit was not done very methodically) I found JVM options that have reduced the GC pause problem greatly. Below is what I am using now on Solr 4.2.1 with a total per-server index size of about 45GB. This works properly on CentOS 6 with Oracle Java 7u17, UseLargePages may require special kernel tuning on other operating systems: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts These options could probably use further tuning, but I haven't had time for the kind of testing that will be required. If you decide to pay someone to make the problem going away instead: http://www.azulsystems.com/products/zing/whatisit Thanks, Shawn This transmission is strictly confidential, possibly legally privileged, and intended solely
Re: migration solr 3.5 to 4.1 - JVM GC problems
Marc, Re smaller index sizes - it's the stored field compression that didn't exist in 3.x. See https://issues.apache.org/jira/browse/SOLR-4375 Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 10:53 AM, Marc Des Garets marc.desgar...@192.com wrote: Same config. I compared both, some defaults changed like ramBufferSize which I've set like in 3.5 (same with other things). It becomes even more strange to me. Now I have changed the jvm settings to this: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=6 -XX:SurvivorRatio=2 -XX:G1ReservePercent=10 -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=30 -XX:PermSize=728m -XX:MaxPermSize=728m So the Eden space is just 6Gb, survivor space is still weird (80Mb) and full 100% of time, and old gen is 34Gb. I now get GCs of just 0.07 sec every 30sec/1mn. Very regular like this: [GC pause (young) 16214M-10447M(40960M), 0.0738720 secs] Just 30% of the total heap is used. After while it's going to do: [GC pause (young) (initial-mark) 11603M-11391M(40960M), 0.100 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0172380] [GC concurrent-mark-start] [GC concurrent-mark-end, 0.4824210 sec] [GC remark, 0.0248680 secs] [GC cleanup 11476M-11476M(40960M), 0.0116420 secs] Which looks pretty good. If I am not mistaken, concurrent-mark isn't stop the world. remark is stop the world but is just 0.02 sec and GC cleanup is also stop the world but is just 0.01 sec. By the look of it I could have a 20g heap rather than 40... Now I am waiting to see what happens when it will clear the old gen but that will take a while before it happens because it is growing slowly. Still mysterious to me but it looks like it's going to all work out. On 04/11/2013 03:06 PM, Jack Krupansky wrote: Same config? Do a compare with the new example config and see what settings are different/changed. There may have been some defaults that changed. Read the comments in the new config. If you had just taken or merged the new config, then I would suggest making sure that the update log is not enabled (or make sure you do hard commits relatively frequently rather than only soft commits.) -- Jack Krupansky -Original Message- From: Marc Des Garets Sent: Thursday, April 11, 2013 3:07 AM To: solr-user@lucene.apache.org Subject: Re: migration solr 3.5 to 4.1 - JVM GC problems Big heap because very large number of requests with more than 60 indexes and hundreds of million of documents (all indexes together). My problem is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1 or 2mn and 20Gb of the heap is used. With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs every minute. There must be something that has changed in 4.1 compared to 3.5 to cause this behavior. It's the same requests, same schemas (excepted 4 fields changed from sint to tint) and same config. On 04/10/2013 07:38 PM, Shawn Heisey wrote: On 4/10/2013 9:48 AM, Marc Des Garets wrote: The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m As Otis has already asked, why do you have a 40GB heap? The only way I can imagine that you would actually NEED a heap that big is if your index size is measured in hundreds of gigabytes. If you really do need a heap that big, you will probably need to go with a JVM like Zing. I don't know how much Zing costs, but they claim to be able to make any heap size perform well under any load. It is Linux-only. I was running into extreme problems with GC pauses with my own setup, and that was only with an 8GB heap. I was using the CMS collector and NewRatio=1. Switching to G1 didn't help at all - it might have even made the problem worse. I never did try the Zing JVM. After a lot of experimentation (which I will admit was not done very methodically) I found JVM options that have reduced the GC pause problem greatly. Below is what I am using now on Solr 4.2.1 with a total per-server index size of about 45GB. This works properly on CentOS 6 with Oracle Java 7u17, UseLargePages may require special kernel tuning on other operating systems: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts These options could probably use further tuning, but I
migration solr 3.5 to 4.1 - JVM GC problems
Hi, I run multiple solr indexes in 1 single tomcat (1 webapp per index). All the indexes are solr 3.5 and I have upgraded few of them to solr 4.1 (about half of them). The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m I have tried NewRatio=1 and SurvivorRatio=3 hoping to get the Survivor space to not be 100% full all the time without success. Here is what jmap is giving me: Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 42949672960 (40960.0MB) NewSize = 1363144 (1.254223632812MB) MaxNewSize = 17592186044415 MB OldSize = 5452592 (5.169482421875MB) NewRatio = 1 SurvivorRatio= 3 PermSize = 754974720 (720.0MB) MaxPermSize = 763363328 (728.0MB) G1HeapRegionSize = 16777216 (16.0MB) Heap Usage: G1 Heap: regions = 2560 capacity = 42949672960 (40960.0MB) used = 23786449912 (22684.526359558105MB) free = 19163223048 (18275.473640441895MB) 55.382144432514906% used G1 Young Generation: Eden Space: regions = 674 capacity = 20619198464 (19664.0MB) used = 11307843584 (10784.0MB) free = 9311354880 (8880.0MB) 54.841334418226204% used Survivor Space: regions = 115 capacity = 1929379840 (1840.0MB) used = 1929379840 (1840.0MB) free = 0 (0.0MB) 100.0% used G1 Old Generation: regions = 732 capacity = 20401094656 (19456.0MB) used = 10549226488 (10060.526359558105MB) free = 9851868168 (9395.473640441895MB) 51.70911985792612% used Perm Generation: capacity = 754974720 (720.0MB) used = 514956504 (491.10079193115234MB) free = 240018216 (228.89920806884766MB) 68.20844332377116% used The Survivor space even went up to 3.6Gb but was still 100% used. I have disabled all caches. Obviously I am getting very bad GC performance. Any idea as to what could be wrong and why this could be happening? Thanks, Marc This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.
Re: migration solr 3.5 to 4.1 - JVM GC problems
Hi Marc, Why such a big heap? Do you really need it? You disabled all caches, so the JVM really shouldn't need much memory. Have you tried with -Xmx20g or even -Xmx8g? Aha, survivor is getting to 100% so you kept increasing -Xmx? Have you tried just not using any of these: -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m ? My hunch is that there is a leak somewhere, because without caches you shouldn't eed 40GB heap. Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 10, 2013 at 11:48 AM, Marc Des Garets marc.desgar...@192.com wrote: Hi, I run multiple solr indexes in 1 single tomcat (1 webapp per index). All the indexes are solr 3.5 and I have upgraded few of them to solr 4.1 (about half of them). The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m I have tried NewRatio=1 and SurvivorRatio=3 hoping to get the Survivor space to not be 100% full all the time without success. Here is what jmap is giving me: Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 42949672960 (40960.0MB) NewSize = 1363144 (1.254223632812MB) MaxNewSize = 17592186044415 MB OldSize = 5452592 (5.169482421875MB) NewRatio = 1 SurvivorRatio= 3 PermSize = 754974720 (720.0MB) MaxPermSize = 763363328 (728.0MB) G1HeapRegionSize = 16777216 (16.0MB) Heap Usage: G1 Heap: regions = 2560 capacity = 42949672960 (40960.0MB) used = 23786449912 (22684.526359558105MB) free = 19163223048 (18275.473640441895MB) 55.382144432514906% used G1 Young Generation: Eden Space: regions = 674 capacity = 20619198464 (19664.0MB) used = 11307843584 (10784.0MB) free = 9311354880 (8880.0MB) 54.841334418226204% used Survivor Space: regions = 115 capacity = 1929379840 (1840.0MB) used = 1929379840 (1840.0MB) free = 0 (0.0MB) 100.0% used G1 Old Generation: regions = 732 capacity = 20401094656 (19456.0MB) used = 10549226488 (10060.526359558105MB) free = 9851868168 (9395.473640441895MB) 51.70911985792612% used Perm Generation: capacity = 754974720 (720.0MB) used = 514956504 (491.10079193115234MB) free = 240018216 (228.89920806884766MB) 68.20844332377116% used The Survivor space even went up to 3.6Gb but was still 100% used. I have disabled all caches. Obviously I am getting very bad GC performance. Any idea as to what could be wrong and why this could be happening? Thanks, Marc This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.
Re: migration solr 3.5 to 4.1 - JVM GC problems
On 4/10/2013 9:48 AM, Marc Des Garets wrote: The JVM behavior is now radically different and doesn't seem to make sense. I was using ConcMarkSweepGC. I am now trying the G1 collector. The perm gen went from 410Mb to 600Mb. The eden space usage is a lot bigger and the survivor space usage is 100% all the time. I don't really understand what is happening. GC behavior really doesn't seem right. My jvm settings: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m As Otis has already asked, why do you have a 40GB heap? The only way I can imagine that you would actually NEED a heap that big is if your index size is measured in hundreds of gigabytes. If you really do need a heap that big, you will probably need to go with a JVM like Zing. I don't know how much Zing costs, but they claim to be able to make any heap size perform well under any load. It is Linux-only. I was running into extreme problems with GC pauses with my own setup, and that was only with an 8GB heap. I was using the CMS collector and NewRatio=1. Switching to G1 didn't help at all - it might have even made the problem worse. I never did try the Zing JVM. After a lot of experimentation (which I will admit was not done very methodically) I found JVM options that have reduced the GC pause problem greatly. Below is what I am using now on Solr 4.2.1 with a total per-server index size of about 45GB. This works properly on CentOS 6 with Oracle Java 7u17, UseLargePages may require special kernel tuning on other operating systems: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts These options could probably use further tuning, but I haven't had time for the kind of testing that will be required. If you decide to pay someone to make the problem going away instead: http://www.azulsystems.com/products/zing/whatisit Thanks, Shawn