Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
> we had an awful performance/throughput experience with 3.x coming from 2.1. > 3.11 is simply a memory hog, if you are using batch statements on the client > side. If so, you are likely affected by > https://issues.apache.org/jira/browse/CASSANDRA-16201 > Confirming what Thomas writes, heavy users of batch statements can likely hit memory issues in 3.0 and 3.11. It is worth testing upgrades for these memory issues and if evident waiting for CASSANDRA-16201 to land in a release before upgrading to 3.11 (skip 3.0). Further background info on why you want 3.11 over 3.0 is in CASSANDRA-15430, CASSANDRA-13929 and CASSANDRA-9766 (but this is all very much dependant on 16201 landing). regards, Mick - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
Leon, we had an awful performance/throughput experience with 3.x coming from 2.1. 3.11 is simply a memory hog, if you are using batch statements on the client side. If so, you are likely affected by https://issues.apache.org/jira/browse/CASSANDRA-16201 Regards, Thomas From: Leon Zaruvinsky Sent: Wednesday, October 28, 2020 5:21 AM To: user@cassandra.apache.org Subject: Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade Our JVM options are unchanged between 2.2 and 3.11 For the sake of clarity, do you mean: (a) you're using the default JVM options in 3.11 and it's different to the options you had in 2.2? (b) you've copied the same JVM options you had in 2.2 to 3.11? (b), which are the default options from 2.2 (and I believe the default options in 3.11 from a brief glance). Copied here for clarity, though I'm skeptical that GC settings are actually a cause here because I would expect them to only impact the upgraded node and not the cluster overall. ### CMS Settings -XX:+UseParNewGC XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways XX:+CMSClassUnloadingEnabled The distinction is important because at the moment, you need to go through a process of elimination to identify the cause. Read throughput (rate, bytes read/range scanned, etc.) seems fairly consistent before and after the upgrade across all nodes. What I was trying to get at is whether the upgraded node was getting hit with more traffic compared to the other nodes since it will indicate that the longer GCs are just the symptom, not the cause. I don't see any distinct change, nor do I see an increase in traffic to the upgraded node that would result in longer GC pauses. Frankly I don't see any changes or aberrations in client-related metrics at all that correlate to the GC pauses, except for the corresponding timeouts. The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4020 Linz, Austria, Am F?nfundzwanziger Turm 20
Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
> Our JVM options are unchanged between 2.2 and 3.11 >> > > For the sake of clarity, do you mean: > (a) you're using the default JVM options in 3.11 and it's different to the > options you had in 2.2? > (b) you've copied the same JVM options you had in 2.2 to 3.11? > (b), which are the default options from 2.2 (and I believe the default options in 3.11 from a brief glance). Copied here for clarity, though I'm skeptical that GC settings are actually a cause here because I would expect them to only impact the upgraded node and not the cluster overall. ### CMS Settings -XX:+UseParNewGC XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways XX:+CMSClassUnloadingEnabled > The distinction is important because at the moment, you need to go through > a process of elimination to identify the cause. > > >> Read throughput (rate, bytes read/range scanned, etc.) seems fairly >> consistent before and after the upgrade across all nodes. >> > > What I was trying to get at is whether the upgraded node was getting hit > with more traffic compared to the other nodes since it will indicate that > the longer GCs are just the symptom, not the cause. > > I don't see any distinct change, nor do I see an increase in traffic to the upgraded node that would result in longer GC pauses. Frankly I don't see any changes or aberrations in client-related metrics at all that correlate to the GC pauses, except for the corresponding timeouts.
Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
> > Our JVM options are unchanged between 2.2 and 3.11 > For the sake of clarity, do you mean: (a) you're using the default JVM options in 3.11 and it's different to the options you had in 2.2? (b) you've copied the same JVM options you had in 2.2 to 3.11? The distinction is important because at the moment, you need to go through a process of elimination to identify the cause. > Read throughput (rate, bytes read/range scanned, etc.) seems fairly > consistent before and after the upgrade across all nodes. > What I was trying to get at is whether the upgraded node was getting hit with more traffic compared to the other nodes since it will indicate that the longer GCs are just the symptom, not the cause. Again, it's a process of elimination. Cheers! >
Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
Thanks Erick. Our JVM options are unchanged between 2.2 and 3.11, and we have disk access mode set to standard. Generally we’ve maintained all configuration between the two versions. Read throughput (rate, bytes read/range scanned, etc.) seems fairly consistent before and after the upgrade across all nodes. Leon On Wed, Oct 28, 2020 at 12:01 AM Erick Ramirez wrote: > I haven't seen this specific behaviour in the past but things that I would > look at are: > >- JVM options which differ between 3.11 defaults and what you have >configured in 2.2 >- review your monitoring and check read throughput on the upgraded >node as compared to 2.2 nodes >- possibly not have disk access mode set to map index files only (not >directly related to long GC pauses) > > If you're interested, I've written a post about disk access mode here -- > https://community.datastax.com/questions/6947/. Cheers! > >>
Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
I haven't seen this specific behaviour in the past but things that I would look at are: - JVM options which differ between 3.11 defaults and what you have configured in 2.2 - review your monitoring and check read throughput on the upgraded node as compared to 2.2 nodes - possibly not have disk access mode set to map index files only (not directly related to long GC pauses) If you're interested, I've written a post about disk access mode here -- https://community.datastax.com/questions/6947/. Cheers! >
Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
On Wed, 28 Oct 2020 at 14:41, Rich Hawley wrote: > unsubscribe > You need to email user-unsubscr...@cassandra.apache.org to unsubscribe from the list. Cheers!
Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
unsubscribe On Tue, Oct 27, 2020 at 11:40 PM Leon Zaruvinsky wrote: > Hi, > > I'm attempting an upgrade of Cassandra 2.2.18 to 3.11.6, but had to abort > because of major performance issues associated with GC pauses. > > Details: > 3 node cluster, RF 3, 1 DC > ~2TB data per node > Heap Size: 12G / New Size: 5G > > I didn't even get very far in the upgrade - I just upgraded a binary of a > single node to 3.11.6 (did not run upgradesstables) and let it sit. Within > 10 minutes, I started seeing elevated GC pressure and lots of timeouts in > the metrics. > > All three nodes, not just the upgraded one, are seeing GC problems. > GC par new time jumped from .38 up to 3%. CMS times up to 30 seconds. > > Once I turn off node on 3.11.6, the cluster eventually recovers. > > Can anyone point me to ways to debug this? I've taken heap dumps of all > nodes but nothing in particular stands out, and there are no > obvious messages in the logs that point to problems. > -- hawley.r...@gmail.com 757-243-7665
GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade
Hi, I'm attempting an upgrade of Cassandra 2.2.18 to 3.11.6, but had to abort because of major performance issues associated with GC pauses. Details: 3 node cluster, RF 3, 1 DC ~2TB data per node Heap Size: 12G / New Size: 5G I didn't even get very far in the upgrade - I just upgraded a binary of a single node to 3.11.6 (did not run upgradesstables) and let it sit. Within 10 minutes, I started seeing elevated GC pressure and lots of timeouts in the metrics. All three nodes, not just the upgraded one, are seeing GC problems. GC par new time jumped from .38 up to 3%. CMS times up to 30 seconds. Once I turn off node on 3.11.6, the cluster eventually recovers. Can anyone point me to ways to debug this? I've taken heap dumps of all nodes but nothing in particular stands out, and there are no obvious messages in the logs that point to problems.