Re: Increased latency after setting row_cache_size_in_mb
@Rahul, I am using cassandra-stress tool. On Tue, Feb 6, 2018 at 7:37 PM, Rahul Singhwrote: > Could be the cause. I would run 2 and then 4 concurrent clients to see how > they behave. What’s your client written in? How are you managing your > connection? > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Feb 6, 2018, 8:50 AM -0500, mohsin k , > wrote: > > Thanks, Jeff, will definitely check the trace. Also, one strange thing I > noticed, with number of threads till '64', the latency is around 3ms but as > the number of threads increases latency also increases. Eventually, at > thread count, 609 latency is around 30ms. I am using a single client to > loadtest 4 node cluster. Is this the issue because of client being > bottleneck? > > On Mon, Feb 5, 2018 at 8:05 PM, Jeff Jirsa wrote: > >> Also: coordinator handles tracing and read repair. Make sure tracing is >> off for production. Have your data repaired if possible to eliminate that. >> >> Use tracing to see what’s taking the time. >> >> -- >> Jeff Jirsa >> >> >> On Feb 5, 2018, at 6:32 AM, Jeff Jirsa wrote: >> >> There’s two parts to latency on the Cassandra side: >> >> Local and coordinator >> >> When you read, the node to which you connect coordinates the request to >> the node which has the data (potentially itself). Long tail in coordinator >> latencies tend to be the coordinator itself gc’ing, which will happen from >> time to time. If it’s more consistently high, it may be natural latencies >> in your cluster (ie: your requests are going cross wan and the other dc is >> 10-20ms away). >> >> If the latency is seen in p99 but not p50, you can almost always >> speculatively read from another coordinator (driver level speculation) >> after a millisecond or so. >> >> -- >> Jeff Jirsa >> >> >> On Feb 5, 2018, at 5:41 AM, mohsin k wrote: >> >> Thanks for response @Nicolas. I was considering the total read latency >> from the client to server (as shown in the image above) which is around >> 30ms. Which I want to get around 3ms (client and server are both on same >> network). I did not consider read latency provided by the server (which I >> should have). I monitored CPU , memory and JVM lifecycle, which is at a >> safe level. *I think the difference(0.03 to 30) might be because of low >> network bandwidth, correct me if I am wrong.* >> >> I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable >> amount of difference, might be because there is less room for improvement >> on the server side. >> >> Thanks again. >> >> On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar < >> nicolas.guyo...@gmail.com> wrote: >> >>> Your row hit rate is 0.971 which is already very high, IMHO there is >>> "nothing" left to do here if you can afford to store your entire dataset in >>> memory >>> >>> Local read latency: 0.030 ms already seems good to me, what makes you >>> think that you can achieve more with the relative "small" box you are using >>> ? >>> >>> You have to keep an eye on other metrics which might be a limiting >>> factor, like cpu usage, JVM heap lifecycle and so on >>> >>> For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb >>> from the default 64kb to 4kb, see if it helps ! >>> >>> On 5 February 2018 at 13:09, mohsin k wrote: >>> Hey Rahul, Each partition has around 10 cluster keys. Based on nodetool, I can roughly estimate partition size to be less than 1KB. On Mon, Feb 5, 2018 at 5:37 PM, mohsin k wrote: > Hey Nicolas, > > My goal is to reduce latency as much as possible. I did wait for > warmup. The test ran for more than 15mins, I am not sure why it shows > 2mins > though. > > > > On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh < > rahul.xavier.si...@gmail.com> wrote: > >> What is the average size of your partitions / rows. 1GB may not be >> enough. >> >> Rahul >> >> On Feb 5, 2018, 6:52 AM -0500, mohsin k , >> wrote: >> >> Hi, >> >> I have been looking into different configurations for tuning my >> cassandra servers. So, initially I loadtested server using >> cassandra-stress >> tool, with default configs and then tuning one by one config to measure >> impact of change. First config, I tried was setting " >> *row_cache_size_in_mb*" to 1000 (MB) in yaml, adding caching {'keys': >> 'ALL', *'rows_per_partition': 'ALL'*}. After changing these configs, >> I observed that latency has increased rather than decreasing. It would be >> really helpful if I get to understand why is this the case and what steps >> must be taken to decrease the latency. >> >> I am running a cluster with 4 nodes. >> >> Following is my schema:
Re: Increased latency after setting row_cache_size_in_mb
Could be the cause. I would run 2 and then 4 concurrent clients to see how they behave. What’s your client written in? How are you managing your connection? -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 6, 2018, 8:50 AM -0500, mohsin k, wrote: > Thanks, Jeff, will definitely check the trace. Also, one strange thing I > noticed, with number of threads till '64', the latency is around 3ms but as > the number of threads increases latency also increases. Eventually, at thread > count, 609 latency is around 30ms. I am using a single client to loadtest 4 > node cluster. Is this the issue because of client being bottleneck? > > > On Mon, Feb 5, 2018 at 8:05 PM, Jeff Jirsa wrote: > > > Also: coordinator handles tracing and read repair. Make sure tracing is > > > off for production. Have your data repaired if possible to eliminate that. > > > > > > Use tracing to see what’s taking the time. > > > > > > -- > > > Jeff Jirsa > > > > > > > > > On Feb 5, 2018, at 6:32 AM, Jeff Jirsa wrote: > > > > > > > There’s two parts to latency on the Cassandra side: > > > > > > > > Local and coordinator > > > > > > > > When you read, the node to which you connect coordinates the request to > > > > the node which has the data (potentially itself). Long tail in > > > > coordinator latencies tend to be the coordinator itself gc’ing, which > > > > will happen from time to time. If it’s more consistently high, it may > > > > be natural latencies in your cluster (ie: your requests are going cross > > > > wan and the other dc is 10-20ms away). > > > > > > > > If the latency is seen in p99 but not p50, you can almost always > > > > speculatively read from another coordinator (driver level speculation) > > > > after a millisecond or so. > > > > > > > > -- > > > > Jeff Jirsa > > > > > > > > > > > > On Feb 5, 2018, at 5:41 AM, mohsin k wrote: > > > > > > > > > Thanks for response @Nicolas. I was considering the total read > > > > > latency from the client to server (as shown in the image above) which > > > > > is around 30ms. Which I want to get around 3ms (client and server are > > > > > both on same network). I did not consider read latency provided by > > > > > the server (which I should have). I monitored CPU , memory and JVM > > > > > lifecycle, which is at a safe level. I think the difference(0.03 to > > > > > 30) might be because of low network bandwidth, correct me if I am > > > > > wrong. > > > > > > > > > > I did reduce chunk_length_in_kb to 4kb, but I couldn't get a > > > > > considerable amount of difference, might be because there is less > > > > > room for improvement on the server side. > > > > > > > > > > Thanks again. > > > > > > > > > > > On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar > > > > > > wrote: > > > > > > > Your row hit rate is 0.971 which is already very high, IMHO there > > > > > > > is "nothing" left to do here if you can afford to store your > > > > > > > entire dataset in memory > > > > > > > > > > > > > > Local read latency: 0.030 ms already seems good to me, what makes > > > > > > > you think that you can achieve more with the relative "small" box > > > > > > > you are using ? > > > > > > > > > > > > > > You have to keep an eye on other metrics which might be a > > > > > > > limiting factor, like cpu usage, JVM heap lifecycle and so on > > > > > > > > > > > > > > For read heavy workflow it is sometimes advised to reduce > > > > > > > chunk_length_in_kb from the default 64kb to 4kb, see if it helps ! > > > > > > > > > > > > > > > On 5 February 2018 at 13:09, mohsin k > > > > > > > > wrote: > > > > > > > > > Hey Rahul, > > > > > > > > > > > > > > > > > > Each partition has around 10 cluster keys. Based on nodetool, > > > > > > > > > I can roughly estimate partition size to be less than 1KB. > > > > > > > > > > > > > > > > > > > On Mon, Feb 5, 2018 at 5:37 PM, mohsin k > > > > > > > > > > wrote: > > > > > > > > > > > Hey Nicolas, > > > > > > > > > > > > > > > > > > > > > > My goal is to reduce latency as much as possible. I did > > > > > > > > > > > wait for warmup. The test ran for more than 15mins, I am > > > > > > > > > > > not sure why it shows 2mins though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh > > > > > > > > > > > > wrote: > > > > > > > > > > > > > What is the average size of your partitions / rows. > > > > > > > > > > > > > 1GB may not be enough. > > > > > > > > > > > > > > > > > > > > > > > > > > Rahul > > > > > > > > > > > > > > > > > > > > > > > > > > On Feb 5, 2018, 6:52 AM -0500, mohsin k > > > > > > > > > > > > > , wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have
Re: Increased latency after setting row_cache_size_in_mb
Thanks, Jeff, will definitely check the trace. Also, one strange thing I noticed, with number of threads till '64', the latency is around 3ms but as the number of threads increases latency also increases. Eventually, at thread count, 609 latency is around 30ms. I am using a single client to loadtest 4 node cluster. Is this the issue because of client being bottleneck? On Mon, Feb 5, 2018 at 8:05 PM, Jeff Jirsawrote: > Also: coordinator handles tracing and read repair. Make sure tracing is > off for production. Have your data repaired if possible to eliminate that. > > Use tracing to see what’s taking the time. > > -- > Jeff Jirsa > > > On Feb 5, 2018, at 6:32 AM, Jeff Jirsa wrote: > > There’s two parts to latency on the Cassandra side: > > Local and coordinator > > When you read, the node to which you connect coordinates the request to > the node which has the data (potentially itself). Long tail in coordinator > latencies tend to be the coordinator itself gc’ing, which will happen from > time to time. If it’s more consistently high, it may be natural latencies > in your cluster (ie: your requests are going cross wan and the other dc is > 10-20ms away). > > If the latency is seen in p99 but not p50, you can almost always > speculatively read from another coordinator (driver level speculation) > after a millisecond or so. > > -- > Jeff Jirsa > > > On Feb 5, 2018, at 5:41 AM, mohsin k wrote: > > Thanks for response @Nicolas. I was considering the total read latency > from the client to server (as shown in the image above) which is around > 30ms. Which I want to get around 3ms (client and server are both on same > network). I did not consider read latency provided by the server (which I > should have). I monitored CPU , memory and JVM lifecycle, which is at a > safe level. *I think the difference(0.03 to 30) might be because of low > network bandwidth, correct me if I am wrong.* > > I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable > amount of difference, might be because there is less room for improvement > on the server side. > > Thanks again. > > On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar > wrote: > >> Your row hit rate is 0.971 which is already very high, IMHO there is >> "nothing" left to do here if you can afford to store your entire dataset in >> memory >> >> Local read latency: 0.030 ms already seems good to me, what makes you >> think that you can achieve more with the relative "small" box you are using >> ? >> >> You have to keep an eye on other metrics which might be a limiting >> factor, like cpu usage, JVM heap lifecycle and so on >> >> For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb >> from the default 64kb to 4kb, see if it helps ! >> >> On 5 February 2018 at 13:09, mohsin k wrote: >> >>> Hey Rahul, >>> >>> Each partition has around 10 cluster keys. Based on nodetool, I can >>> roughly estimate partition size to be less than 1KB. >>> >>> On Mon, Feb 5, 2018 at 5:37 PM, mohsin k >>> wrote: >>> Hey Nicolas, My goal is to reduce latency as much as possible. I did wait for warmup. The test ran for more than 15mins, I am not sure why it shows 2mins though. On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh < rahul.xavier.si...@gmail.com> wrote: > What is the average size of your partitions / rows. 1GB may not be > enough. > > Rahul > > On Feb 5, 2018, 6:52 AM -0500, mohsin k , > wrote: > > Hi, > > I have been looking into different configurations for tuning my > cassandra servers. So, initially I loadtested server using > cassandra-stress > tool, with default configs and then tuning one by one config to measure > impact of change. First config, I tried was setting " > *row_cache_size_in_mb*" to 1000 (MB) in yaml, adding caching {'keys': > 'ALL', *'rows_per_partition': 'ALL'*}. After changing these configs, > I observed that latency has increased rather than decreasing. It would be > really helpful if I get to understand why is this the case and what steps > must be taken to decrease the latency. > > I am running a cluster with 4 nodes. > > Following is my schema: > > CREATE TABLE stresstest.user_to_segment ( > userid text, > segmentid text, > PRIMARY KEY (userid, segmentid) > ) WITH CLUSTERING ORDER BY (segmentid DESC) > AND bloom_filter_fp_chance = 0.1 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} > AND comment = 'A table to hold blog segment user relation' > AND compaction = {'class': 'org.apache.cassandra.db.compa > ction.LeveledCompactionStrategy'} > AND compression = {'chunk_length_in_kb': '64', 'class': ' >
Re: Increased latency after setting row_cache_size_in_mb
Also: coordinator handles tracing and read repair. Make sure tracing is off for production. Have your data repaired if possible to eliminate that. Use tracing to see what’s taking the time. -- Jeff Jirsa > On Feb 5, 2018, at 6:32 AM, Jeff Jirsawrote: > > There’s two parts to latency on the Cassandra side: > > Local and coordinator > > When you read, the node to which you connect coordinates the request to the > node which has the data (potentially itself). Long tail in coordinator > latencies tend to be the coordinator itself gc’ing, which will happen from > time to time. If it’s more consistently high, it may be natural latencies in > your cluster (ie: your requests are going cross wan and the other dc is > 10-20ms away). > > If the latency is seen in p99 but not p50, you can almost always > speculatively read from another coordinator (driver level speculation) after > a millisecond or so. > > -- > Jeff Jirsa > > >> On Feb 5, 2018, at 5:41 AM, mohsin k wrote: >> >> Thanks for response @Nicolas. I was considering the total read latency from >> the client to server (as shown in the image above) which is around 30ms. >> Which I want to get around 3ms (client and server are both on same network). >> I did not consider read latency provided by the server (which I should >> have). I monitored CPU , memory and JVM lifecycle, which is at a safe level. >> I think the difference(0.03 to 30) might be because of low network >> bandwidth, correct me if I am wrong. >> >> I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable >> amount of difference, might be because there is less room for improvement on >> the server side. >> >> Thanks again. >> >>> On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar >>> wrote: >>> Your row hit rate is 0.971 which is already very high, IMHO there is >>> "nothing" left to do here if you can afford to store your entire dataset in >>> memory >>> >>> Local read latency: 0.030 ms already seems good to me, what makes you think >>> that you can achieve more with the relative "small" box you are using ? >>> >>> You have to keep an eye on other metrics which might be a limiting factor, >>> like cpu usage, JVM heap lifecycle and so on >>> >>> For read heavy workflow it is sometimes advised to reduce >>> chunk_length_in_kb from the default 64kb to 4kb, see if it helps ! >>> On 5 February 2018 at 13:09, mohsin k wrote: Hey Rahul, Each partition has around 10 cluster keys. Based on nodetool, I can roughly estimate partition size to be less than 1KB. > On Mon, Feb 5, 2018 at 5:37 PM, mohsin k > wrote: > Hey Nicolas, > > My goal is to reduce latency as much as possible. I did wait for warmup. > The test ran for more than 15mins, I am not sure why it shows 2mins > though. > > > >> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh >> wrote: >> What is the average size of your partitions / rows. 1GB may not be >> enough. >> >> Rahul >> >>> On Feb 5, 2018, 6:52 AM -0500, mohsin k , >>> wrote: >>> Hi, >>> >>> I have been looking into different configurations for tuning my >>> cassandra servers. So, initially I loadtested server using >>> cassandra-stress tool, with default configs and then tuning one by one >>> config to measure impact of change. First config, I tried was setting >>> "row_cache_size_in_mb" to 1000 (MB) in yaml, adding caching {'keys': >>> 'ALL', 'rows_per_partition': 'ALL'}. After changing these configs, I >>> observed that latency has increased rather than decreasing. It would be >>> really helpful if I get to understand why is this the case and what >>> steps must be taken to decrease the latency. >>> >>> I am running a cluster with 4 nodes. >>> >>> Following is my schema: >>> >>> CREATE TABLE stresstest.user_to_segment ( >>> userid text, >>> segmentid text, >>> PRIMARY KEY (userid, segmentid) >>> ) WITH CLUSTERING ORDER BY (segmentid DESC) >>> AND bloom_filter_fp_chance = 0.1 >>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} >>> AND comment = 'A table to hold blog segment user relation' >>> AND compaction = {'class': >>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} >>> AND compression = {'chunk_length_in_kb': '64', 'class': >>> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>> AND crc_check_chance = 1.0 >>> AND dclocal_read_repair_chance = 0.1 >>> AND default_time_to_live = 0 >>> AND gc_grace_seconds = 864000 >>> AND max_index_interval = 2048 >>> AND memtable_flush_period_in_ms = 0 >>>
Re: Increased latency after setting row_cache_size_in_mb
There’s two parts to latency on the Cassandra side: Local and coordinator When you read, the node to which you connect coordinates the request to the node which has the data (potentially itself). Long tail in coordinator latencies tend to be the coordinator itself gc’ing, which will happen from time to time. If it’s more consistently high, it may be natural latencies in your cluster (ie: your requests are going cross wan and the other dc is 10-20ms away). If the latency is seen in p99 but not p50, you can almost always speculatively read from another coordinator (driver level speculation) after a millisecond or so. -- Jeff Jirsa > On Feb 5, 2018, at 5:41 AM, mohsin kwrote: > > Thanks for response @Nicolas. I was considering the total read latency from > the client to server (as shown in the image above) which is around 30ms. > Which I want to get around 3ms (client and server are both on same network). > I did not consider read latency provided by the server (which I should have). > I monitored CPU , memory and JVM lifecycle, which is at a safe level. I think > the difference(0.03 to 30) might be because of low network bandwidth, correct > me if I am wrong. > > I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable > amount of difference, might be because there is less room for improvement on > the server side. > > Thanks again. > >> On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar >> wrote: >> Your row hit rate is 0.971 which is already very high, IMHO there is >> "nothing" left to do here if you can afford to store your entire dataset in >> memory >> >> Local read latency: 0.030 ms already seems good to me, what makes you think >> that you can achieve more with the relative "small" box you are using ? >> >> You have to keep an eye on other metrics which might be a limiting factor, >> like cpu usage, JVM heap lifecycle and so on >> >> For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb >> from the default 64kb to 4kb, see if it helps ! >> >>> On 5 February 2018 at 13:09, mohsin k wrote: >>> Hey Rahul, >>> >>> Each partition has around 10 cluster keys. Based on nodetool, I can roughly >>> estimate partition size to be less than 1KB. >>> On Mon, Feb 5, 2018 at 5:37 PM, mohsin k wrote: Hey Nicolas, My goal is to reduce latency as much as possible. I did wait for warmup. The test ran for more than 15mins, I am not sure why it shows 2mins though. > On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh > wrote: > What is the average size of your partitions / rows. 1GB may not be enough. > > Rahul > >> On Feb 5, 2018, 6:52 AM -0500, mohsin k , >> wrote: >> Hi, >> >> I have been looking into different configurations for tuning my >> cassandra servers. So, initially I loadtested server using >> cassandra-stress tool, with default configs and then tuning one by one >> config to measure impact of change. First config, I tried was setting >> "row_cache_size_in_mb" to 1000 (MB) in yaml, adding caching {'keys': >> 'ALL', 'rows_per_partition': 'ALL'}. After changing these configs, I >> observed that latency has increased rather than decreasing. It would be >> really helpful if I get to understand why is this the case and what >> steps must be taken to decrease the latency. >> >> I am running a cluster with 4 nodes. >> >> Following is my schema: >> >> CREATE TABLE stresstest.user_to_segment ( >> userid text, >> segmentid text, >> PRIMARY KEY (userid, segmentid) >> ) WITH CLUSTERING ORDER BY (segmentid DESC) >> AND bloom_filter_fp_chance = 0.1 >> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} >> AND comment = 'A table to hold blog segment user relation' >> AND compaction = {'class': >> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} >> AND compression = {'chunk_length_in_kb': '64', 'class': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >> AND crc_check_chance = 1.0 >> AND dclocal_read_repair_chance = 0.1 >> AND default_time_to_live = 0 >> AND gc_grace_seconds = 864000 >> AND max_index_interval = 2048 >> AND memtable_flush_period_in_ms = 0 >> AND min_index_interval = 128 >> AND read_repair_chance = 0.0 >> AND speculative_retry = '99PERCENTILE'; >> >> Following are node specs: >> RAM: 4GB >> CPU: 4 Core >> HDD: 250BG >> >> >> Following is the output of 'nodetool info' after setting >> row_cache_size_in_mb: >> >> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32 >> Gossip active
Re: Increased latency after setting row_cache_size_in_mb
What are the tbl Local read latency stats vs. the read request latency stats ? Rahul On Feb 5, 2018, 8:41 AM -0500, mohsin k, wrote: > Thanks for response @Nicolas. I was considering the total read latency from > the client to server (as shown in the image above) which is around 30ms. > Which I want to get around 3ms (client and server are both on same network). > I did not consider read latency provided by the server (which I should have). > I monitored CPU , memory and JVM lifecycle, which is at a safe level. I think > the difference(0.03 to 30) might be because of low network bandwidth, correct > me if I am wrong. > > I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable > amount of difference, might be because there is less room for improvement on > the server side. > > Thanks again. > > > On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar > > wrote: > > > Your row hit rate is 0.971 which is already very high, IMHO there is > > > "nothing" left to do here if you can afford to store your entire dataset > > > in memory > > > > > > Local read latency: 0.030 ms already seems good to me, what makes you > > > think that you can achieve more with the relative "small" box you are > > > using ? > > > > > > You have to keep an eye on other metrics which might be a limiting > > > factor, like cpu usage, JVM heap lifecycle and so on > > > > > > For read heavy workflow it is sometimes advised to reduce > > > chunk_length_in_kb from the default 64kb to 4kb, see if it helps ! > > > > > > > On 5 February 2018 at 13:09, mohsin k wrote: > > > > > Hey Rahul, > > > > > > > > > > Each partition has around 10 cluster keys. Based on nodetool, I can > > > > > roughly estimate partition size to be less than 1KB. > > > > > > > > > > > On Mon, Feb 5, 2018 at 5:37 PM, mohsin k > > > > > > wrote: > > > > > > > Hey Nicolas, > > > > > > > > > > > > > > My goal is to reduce latency as much as possible. I did wait for > > > > > > > warmup. The test ran for more than 15mins, I am not sure why it > > > > > > > shows 2mins though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh > > > > > > > > wrote: > > > > > > > > > What is the average size of your partitions / rows. 1GB may > > > > > > > > > not be enough. > > > > > > > > > > > > > > > > > > Rahul > > > > > > > > > > > > > > > > > > On Feb 5, 2018, 6:52 AM -0500, mohsin k > > > > > > > > > , wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I have been looking into different configurations for > > > > > > > > > > tuning my cassandra servers. So, initially I loadtested > > > > > > > > > > server using cassandra-stress tool, with default configs > > > > > > > > > > and then tuning one by one config to measure impact of > > > > > > > > > > change. First config, I tried was setting > > > > > > > > > > "row_cache_size_in_mb" to 1000 (MB) in yaml, adding caching > > > > > > > > > > {'keys': 'ALL', 'rows_per_partition': 'ALL'}. After > > > > > > > > > > changing these configs, I observed that latency has > > > > > > > > > > increased rather than decreasing. It would be really > > > > > > > > > > helpful if I get to understand why is this the case and > > > > > > > > > > what steps must be taken to decrease the latency. > > > > > > > > > > > > > > > > > > > > I am running a cluster with 4 nodes. > > > > > > > > > > > > > > > > > > > > Following is my schema: > > > > > > > > > > > > > > > > > > > > CREATE TABLE stresstest.user_to_segment ( > > > > > > > > > > userid text, > > > > > > > > > > segmentid text, > > > > > > > > > > PRIMARY KEY (userid, segmentid) > > > > > > > > > > ) WITH CLUSTERING ORDER BY (segmentid DESC) > > > > > > > > > > AND bloom_filter_fp_chance = 0.1 > > > > > > > > > > AND caching = {'keys': 'ALL', 'rows_per_partition': > > > > > > > > > > 'ALL'} > > > > > > > > > > AND comment = 'A table to hold blog segment user > > > > > > > > > > relation' > > > > > > > > > > AND compaction = {'class': > > > > > > > > > > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} > > > > > > > > > > AND compression = {'chunk_length_in_kb': '64', 'class': > > > > > > > > > > 'org.apache.cassandra.io.compress.LZ4Compressor'} > > > > > > > > > > AND crc_check_chance = 1.0 > > > > > > > > > > AND dclocal_read_repair_chance = 0.1 > > > > > > > > > > AND default_time_to_live = 0 > > > > > > > > > > AND gc_grace_seconds = 864000 > > > > > > > > > > AND max_index_interval = 2048 > > > > > > > > > > AND memtable_flush_period_in_ms = 0 > > > > > > > > > > AND min_index_interval = 128 > > > > > > > > > > AND read_repair_chance = 0.0 > > > > > > > > > > AND speculative_retry = '99PERCENTILE'; > > > > > > > > > > > > > > > > >
Re: Increased latency after setting row_cache_size_in_mb
Thanks for response @Nicolas. I was considering the total read latency from the client to server (as shown in the image above) which is around 30ms. Which I want to get around 3ms (client and server are both on same network). I did not consider read latency provided by the server (which I should have). I monitored CPU , memory and JVM lifecycle, which is at a safe level. *I think the difference(0.03 to 30) might be because of low network bandwidth, correct me if I am wrong.* I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable amount of difference, might be because there is less room for improvement on the server side. Thanks again. On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomarwrote: > Your row hit rate is 0.971 which is already very high, IMHO there is > "nothing" left to do here if you can afford to store your entire dataset in > memory > > Local read latency: 0.030 ms already seems good to me, what makes you > think that you can achieve more with the relative "small" box you are using > ? > > You have to keep an eye on other metrics which might be a limiting factor, > like cpu usage, JVM heap lifecycle and so on > > For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb > from the default 64kb to 4kb, see if it helps ! > > On 5 February 2018 at 13:09, mohsin k wrote: > >> Hey Rahul, >> >> Each partition has around 10 cluster keys. Based on nodetool, I can >> roughly estimate partition size to be less than 1KB. >> >> On Mon, Feb 5, 2018 at 5:37 PM, mohsin k >> wrote: >> >>> Hey Nicolas, >>> >>> My goal is to reduce latency as much as possible. I did wait for warmup. >>> The test ran for more than 15mins, I am not sure why it shows 2mins though. >>> >>> >>> >>> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh < >>> rahul.xavier.si...@gmail.com> wrote: >>> What is the average size of your partitions / rows. 1GB may not be enough. Rahul On Feb 5, 2018, 6:52 AM -0500, mohsin k , wrote: Hi, I have been looking into different configurations for tuning my cassandra servers. So, initially I loadtested server using cassandra-stress tool, with default configs and then tuning one by one config to measure impact of change. First config, I tried was setting " *row_cache_size_in_mb*" to 1000 (MB) in yaml, adding caching {'keys': 'ALL', *'rows_per_partition': 'ALL'*}. After changing these configs, I observed that latency has increased rather than decreasing. It would be really helpful if I get to understand why is this the case and what steps must be taken to decrease the latency. I am running a cluster with 4 nodes. Following is my schema: CREATE TABLE stresstest.user_to_segment ( userid text, segmentid text, PRIMARY KEY (userid, segmentid) ) WITH CLUSTERING ORDER BY (segmentid DESC) AND bloom_filter_fp_chance = 0.1 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = 'A table to hold blog segment user relation' AND compaction = {'class': 'org.apache.cassandra.db.compa ction.LeveledCompactionStrategy'} AND compression = {'chunk_length_in_kb': '64', 'class': ' org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; Following are node specs: RAM: 4GB CPU: 4 Core HDD: 250BG Following is the output of 'nodetool info' after setting row_cache_size_in_mb: ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32 Gossip active : true Thrift active : false Native Transport active: true Load : 10.94 MiB Generation No : 1517571163 Uptime (seconds) : 9169 Heap Memory (MB) : 136.01 / 3932.00 Off Heap Memory (MB) : 0.10 Data Center: dc1 Rack : rack1 Exceptions : 0 Key Cache : entries 125881, size 9.6 MiB, capacity 100 MiB, 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in seconds Row Cache : entries 125861, size 31.54 MiB, capacity 1000 MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Chunk Cache: entries 273, size
Re: Increased latency after setting row_cache_size_in_mb
Your row hit rate is 0.971 which is already very high, IMHO there is "nothing" left to do here if you can afford to store your entire dataset in memory Local read latency: 0.030 ms already seems good to me, what makes you think that you can achieve more with the relative "small" box you are using ? You have to keep an eye on other metrics which might be a limiting factor, like cpu usage, JVM heap lifecycle and so on For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb from the default 64kb to 4kb, see if it helps ! On 5 February 2018 at 13:09, mohsin kwrote: > Hey Rahul, > > Each partition has around 10 cluster keys. Based on nodetool, I can > roughly estimate partition size to be less than 1KB. > > On Mon, Feb 5, 2018 at 5:37 PM, mohsin k > wrote: > >> Hey Nicolas, >> >> My goal is to reduce latency as much as possible. I did wait for warmup. >> The test ran for more than 15mins, I am not sure why it shows 2mins though. >> >> >> >> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh > > wrote: >> >>> What is the average size of your partitions / rows. 1GB may not be >>> enough. >>> >>> Rahul >>> >>> On Feb 5, 2018, 6:52 AM -0500, mohsin k , >>> wrote: >>> >>> Hi, >>> >>> I have been looking into different configurations for tuning my >>> cassandra servers. So, initially I loadtested server using cassandra-stress >>> tool, with default configs and then tuning one by one config to measure >>> impact of change. First config, I tried was setting " >>> *row_cache_size_in_mb*" to 1000 (MB) in yaml, adding caching {'keys': >>> 'ALL', *'rows_per_partition': 'ALL'*}. After changing these configs, I >>> observed that latency has increased rather than decreasing. It would be >>> really helpful if I get to understand why is this the case and what steps >>> must be taken to decrease the latency. >>> >>> I am running a cluster with 4 nodes. >>> >>> Following is my schema: >>> >>> CREATE TABLE stresstest.user_to_segment ( >>> userid text, >>> segmentid text, >>> PRIMARY KEY (userid, segmentid) >>> ) WITH CLUSTERING ORDER BY (segmentid DESC) >>> AND bloom_filter_fp_chance = 0.1 >>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} >>> AND comment = 'A table to hold blog segment user relation' >>> AND compaction = {'class': 'org.apache.cassandra.db.compa >>> ction.LeveledCompactionStrategy'} >>> AND compression = {'chunk_length_in_kb': '64', 'class': ' >>> org.apache.cassandra.io.compress.LZ4Compressor'} >>> AND crc_check_chance = 1.0 >>> AND dclocal_read_repair_chance = 0.1 >>> AND default_time_to_live = 0 >>> AND gc_grace_seconds = 864000 >>> AND max_index_interval = 2048 >>> AND memtable_flush_period_in_ms = 0 >>> AND min_index_interval = 128 >>> AND read_repair_chance = 0.0 >>> AND speculative_retry = '99PERCENTILE'; >>> >>> Following are node specs: >>> RAM: 4GB >>> CPU: 4 Core >>> HDD: 250BG >>> >>> >>> Following is the output of 'nodetool info' after setting >>> row_cache_size_in_mb: >>> >>> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32 >>> Gossip active : true >>> Thrift active : false >>> Native Transport active: true >>> Load : 10.94 MiB >>> Generation No : 1517571163 >>> Uptime (seconds) : 9169 >>> Heap Memory (MB) : 136.01 / 3932.00 >>> Off Heap Memory (MB) : 0.10 >>> Data Center: dc1 >>> Rack : rack1 >>> Exceptions : 0 >>> Key Cache : entries 125881, size 9.6 MiB, capacity 100 MiB, >>> 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in >>> seconds >>> Row Cache : entries 125861, size 31.54 MiB, capacity 1000 >>> MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save >>> period in seconds >>> Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 >>> hits, 0 requests, NaN recent hit rate, 7200 save period in seconds >>> Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB, >>> 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss >>> latency >>> Percent Repaired : 100.0% >>> Token : (invoke with -T/--tokens to see all 256 tokens) >>> >>> >>> Following is output of nodetool cfstats: >>> >>> Total number of tables: 37 >>> >>> Keyspace : stresstest >>> Read Count: 4398162 >>> Read Latency: 0.02184742626579012 ms. >>> Write Count: 0 >>> Write Latency: NaN ms. >>> Pending Flushes: 0 >>> Table: user_to_segment >>> SSTable count: 1 >>> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] >>> Space used (live): 11076103 >>> Space used (total): 11076103 >>> Space used by snapshots (total): 0 >>> Off heap memory used (total): 107981 >>> SSTable Compression Ratio: 0.5123353861375962 >>> Number of partitions (estimate): 125782 >>> Memtable cell count:
Re: Increased latency after setting row_cache_size_in_mb
Hey Rahul, Each partition has around 10 cluster keys. Based on nodetool, I can roughly estimate partition size to be less than 1KB. On Mon, Feb 5, 2018 at 5:37 PM, mohsin kwrote: > Hey Nicolas, > > My goal is to reduce latency as much as possible. I did wait for warmup. > The test ran for more than 15mins, I am not sure why it shows 2mins though. > > > > On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh > wrote: > >> What is the average size of your partitions / rows. 1GB may not be enough. >> >> Rahul >> >> On Feb 5, 2018, 6:52 AM -0500, mohsin k , >> wrote: >> >> Hi, >> >> I have been looking into different configurations for tuning my cassandra >> servers. So, initially I loadtested server using cassandra-stress tool, >> with default configs and then tuning one by one config to measure impact of >> change. First config, I tried was setting "*row_cache_size_in_mb*" to >> 1000 (MB) in yaml, adding caching {'keys': 'ALL', *'rows_per_partition': >> 'ALL'*}. After changing these configs, I observed that latency has >> increased rather than decreasing. It would be really helpful if I get to >> understand why is this the case and what steps must be taken to decrease >> the latency. >> >> I am running a cluster with 4 nodes. >> >> Following is my schema: >> >> CREATE TABLE stresstest.user_to_segment ( >> userid text, >> segmentid text, >> PRIMARY KEY (userid, segmentid) >> ) WITH CLUSTERING ORDER BY (segmentid DESC) >> AND bloom_filter_fp_chance = 0.1 >> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} >> AND comment = 'A table to hold blog segment user relation' >> AND compaction = {'class': 'org.apache.cassandra.db.compa >> ction.LeveledCompactionStrategy'} >> AND compression = {'chunk_length_in_kb': '64', 'class': ' >> org.apache.cassandra.io.compress.LZ4Compressor'} >> AND crc_check_chance = 1.0 >> AND dclocal_read_repair_chance = 0.1 >> AND default_time_to_live = 0 >> AND gc_grace_seconds = 864000 >> AND max_index_interval = 2048 >> AND memtable_flush_period_in_ms = 0 >> AND min_index_interval = 128 >> AND read_repair_chance = 0.0 >> AND speculative_retry = '99PERCENTILE'; >> >> Following are node specs: >> RAM: 4GB >> CPU: 4 Core >> HDD: 250BG >> >> >> Following is the output of 'nodetool info' after setting >> row_cache_size_in_mb: >> >> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32 >> Gossip active : true >> Thrift active : false >> Native Transport active: true >> Load : 10.94 MiB >> Generation No : 1517571163 >> Uptime (seconds) : 9169 >> Heap Memory (MB) : 136.01 / 3932.00 >> Off Heap Memory (MB) : 0.10 >> Data Center: dc1 >> Rack : rack1 >> Exceptions : 0 >> Key Cache : entries 125881, size 9.6 MiB, capacity 100 MiB, >> 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in >> seconds >> Row Cache : entries 125861, size 31.54 MiB, capacity 1000 >> MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save >> period in seconds >> Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 >> hits, 0 requests, NaN recent hit rate, 7200 save period in seconds >> Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB, >> 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss >> latency >> Percent Repaired : 100.0% >> Token : (invoke with -T/--tokens to see all 256 tokens) >> >> >> Following is output of nodetool cfstats: >> >> Total number of tables: 37 >> >> Keyspace : stresstest >> Read Count: 4398162 >> Read Latency: 0.02184742626579012 ms. >> Write Count: 0 >> Write Latency: NaN ms. >> Pending Flushes: 0 >> Table: user_to_segment >> SSTable count: 1 >> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] >> Space used (live): 11076103 >> Space used (total): 11076103 >> Space used by snapshots (total): 0 >> Off heap memory used (total): 107981 >> SSTable Compression Ratio: 0.5123353861375962 >> Number of partitions (estimate): 125782 >> Memtable cell count: 0 >> Memtable data size: 0 >> Memtable off heap memory used: 0 >> Memtable switch count: 2 >> Local read count: 4398162 >> Local read latency: 0.030 ms >> Local write count: 0 >> Local write latency: NaN ms >> Pending flushes: 0 >> Percent repaired: 0.0 >> Bloom filter false positives: 0 >> Bloom filter false ratio: 0.0 >> Bloom filter space used: 79280 >> Bloom filter off heap memory used: 79272 >> Index summary off heap memory used: 26757 >> Compression metadata off heap memory used: 1952 >> Compacted partition minimum bytes: 43 >> Compacted partition maximum bytes: 215 >> Compacted partition mean bytes: 136 >> Average live cells per slice (last five minutes): 5.719932432432432 >> Maximum live cells per slice (last five minutes): 10 >>
Re: Increased latency after setting row_cache_size_in_mb
Hey Nicolas, My goal is to reduce latency as much as possible. I did wait for warmup. The test ran for more than 15mins, I am not sure why it shows 2mins though. On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singhwrote: > What is the average size of your partitions / rows. 1GB may not be enough. > > Rahul > > On Feb 5, 2018, 6:52 AM -0500, mohsin k , > wrote: > > Hi, > > I have been looking into different configurations for tuning my cassandra > servers. So, initially I loadtested server using cassandra-stress tool, > with default configs and then tuning one by one config to measure impact of > change. First config, I tried was setting "*row_cache_size_in_mb*" to > 1000 (MB) in yaml, adding caching {'keys': 'ALL', *'rows_per_partition': > 'ALL'*}. After changing these configs, I observed that latency has > increased rather than decreasing. It would be really helpful if I get to > understand why is this the case and what steps must be taken to decrease > the latency. > > I am running a cluster with 4 nodes. > > Following is my schema: > > CREATE TABLE stresstest.user_to_segment ( > userid text, > segmentid text, > PRIMARY KEY (userid, segmentid) > ) WITH CLUSTERING ORDER BY (segmentid DESC) > AND bloom_filter_fp_chance = 0.1 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} > AND comment = 'A table to hold blog segment user relation' > AND compaction = {'class': 'org.apache.cassandra.db.compa > ction.LeveledCompactionStrategy'} > AND compression = {'chunk_length_in_kb': '64', 'class': ' > org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > > Following are node specs: > RAM: 4GB > CPU: 4 Core > HDD: 250BG > > > Following is the output of 'nodetool info' after setting > row_cache_size_in_mb: > > ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32 > Gossip active : true > Thrift active : false > Native Transport active: true > Load : 10.94 MiB > Generation No : 1517571163 > Uptime (seconds) : 9169 > Heap Memory (MB) : 136.01 / 3932.00 > Off Heap Memory (MB) : 0.10 > Data Center: dc1 > Rack : rack1 > Exceptions : 0 > Key Cache : entries 125881, size 9.6 MiB, capacity 100 MiB, > 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in > seconds > Row Cache : entries 125861, size 31.54 MiB, capacity 1000 > MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save period > in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, > 0 requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB, > 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss > latency > Percent Repaired : 100.0% > Token : (invoke with -T/--tokens to see all 256 tokens) > > > Following is output of nodetool cfstats: > > Total number of tables: 37 > > Keyspace : stresstest > Read Count: 4398162 > Read Latency: 0.02184742626579012 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: user_to_segment > SSTable count: 1 > SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 11076103 > Space used (total): 11076103 > Space used by snapshots (total): 0 > Off heap memory used (total): 107981 > SSTable Compression Ratio: 0.5123353861375962 > Number of partitions (estimate): 125782 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 2 > Local read count: 4398162 > Local read latency: 0.030 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 0.0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 79280 > Bloom filter off heap memory used: 79272 > Index summary off heap memory used: 26757 > Compression metadata off heap memory used: 1952 > Compacted partition minimum bytes: 43 > Compacted partition maximum bytes: 215 > Compacted partition mean bytes: 136 > Average live cells per slice (last five minutes): 5.719932432432432 > Maximum live cells per slice (last five minutes): 10 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Dropped Mutations: 0 > > Following are my results: > The blue graph is before setting row_cache_size_in_mb, > orange is after > > Thanks, > Mohsin > > >
Re: Increased latency after setting row_cache_size_in_mb
What is the average size of your partitions / rows. 1GB may not be enough. Rahul On Feb 5, 2018, 6:52 AM -0500, mohsin k, wrote: > Hi, > > I have been looking into different configurations for tuning my cassandra > servers. So, initially I loadtested server using cassandra-stress tool, with > default configs and then tuning one by one config to measure impact of > change. First config, I tried was setting "row_cache_size_in_mb" to 1000 (MB) > in yaml, adding caching {'keys': 'ALL', 'rows_per_partition': 'ALL'}. After > changing these configs, I observed that latency has increased rather than > decreasing. It would be really helpful if I get to understand why is this the > case and what steps must be taken to decrease the latency. > > I am running a cluster with 4 nodes. > > Following is my schema: > > CREATE TABLE stresstest.user_to_segment ( > userid text, > segmentid text, > PRIMARY KEY (userid, segmentid) > ) WITH CLUSTERING ORDER BY (segmentid DESC) > AND bloom_filter_fp_chance = 0.1 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} > AND comment = 'A table to hold blog segment user relation' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > > Following are node specs: > RAM: 4GB > CPU: 4 Core > HDD: 250BG > > > Following is the output of 'nodetool info' after setting row_cache_size_in_mb: > > ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32 > Gossip active : true > Thrift active : false > Native Transport active: true > Load : 10.94 MiB > Generation No : 1517571163 > Uptime (seconds) : 9169 > Heap Memory (MB) : 136.01 / 3932.00 > Off Heap Memory (MB) : 0.10 > Data Center : dc1 > Rack : rack1 > Exceptions : 0 > Key Cache : entries 125881, size 9.6 MiB, capacity 100 MiB, 107 > hits, 126004 requests, 0.001 recent hit rate, 14400 save period in seconds > Row Cache : entries 125861, size 31.54 MiB, capacity 1000 MiB, > 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save period in > seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 273, size 17.06 MiB, capacity 480 MiB, 325 > misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss latency > Percent Repaired : 100.0% > Token : (invoke with -T/--tokens to see all 256 tokens) > > > Following is output of nodetool cfstats: > > Total number of tables: 37 > > Keyspace : stresstest > Read Count: 4398162 > Read Latency: 0.02184742626579012 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: user_to_segment > SSTable count: 1 > SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 11076103 > Space used (total): 11076103 > Space used by snapshots (total): 0 > Off heap memory used (total): 107981 > SSTable Compression Ratio: 0.5123353861375962 > Number of partitions (estimate): 125782 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 2 > Local read count: 4398162 > Local read latency: 0.030 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 0.0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 79280 > Bloom filter off heap memory used: 79272 > Index summary off heap memory used: 26757 > Compression metadata off heap memory used: 1952 > Compacted partition minimum bytes: 43 > Compacted partition maximum bytes: 215 > Compacted partition mean bytes: 136 > Average live cells per slice (last five minutes): 5.719932432432432 > Maximum live cells per slice (last five minutes): 10 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Dropped Mutations: 0 > > Following are my results: > The blue graph is before setting row_cache_size_in_mb, > orange is after > > Thanks, > Mohsin > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Increased latency after setting row_cache_size_in_mb
Hi, Could you explain a bit more what you are trying to achieve please ? Performance tuning is by far the most complex problem we have to deal with, and there are a lot of configuration changes that can be made on a C* cluster When you do performance tuning, do not forget that you need to warmup C* JVM. Judging from the provided graph it seems to me that your test ran for 2min, which is really too short On 5 February 2018 at 08:16, mohsin kwrote: > Hi, > > I have been looking into different configurations for tuning my cassandra > servers. So, initially I loadtested server using cassandra-stress tool, > with default configs and then tuning one by one config to measure impact of > change. First config, I tried was setting "*row_cache_size_in_mb*" to > 1000 (MB) in yaml, adding caching {'keys': 'ALL', *'rows_per_partition': > 'ALL'*}. After changing these configs, I observed that latency has > increased rather than decreasing. It would be really helpful if I get to > understand why is this the case and what steps must be taken to decrease > the latency. > > I am running a cluster with 4 nodes. > > Following is my schema: > > CREATE TABLE stresstest.user_to_segment ( > userid text, > segmentid text, > PRIMARY KEY (userid, segmentid) > ) WITH CLUSTERING ORDER BY (segmentid DESC) > AND bloom_filter_fp_chance = 0.1 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} > AND comment = 'A table to hold blog segment user relation' > AND compaction = {'class': 'org.apache.cassandra.db.compa > ction.LeveledCompactionStrategy'} > AND compression = {'chunk_length_in_kb': '64', 'class': ' > org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > > Following are node specs: > RAM: 4GB > CPU: 4 Core > HDD: 250BG > > > Following is the output of 'nodetool info' after setting > row_cache_size_in_mb: > > ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32 > Gossip active : true > Thrift active : false > Native Transport active: true > Load : 10.94 MiB > Generation No : 1517571163 > Uptime (seconds) : 9169 > Heap Memory (MB) : 136.01 / 3932.00 > Off Heap Memory (MB) : 0.10 > Data Center: dc1 > Rack : rack1 > Exceptions : 0 > Key Cache : entries 125881, size 9.6 MiB, capacity 100 MiB, > 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in > seconds > Row Cache : entries 125861, size 31.54 MiB, capacity 1000 > MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save period > in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, > 0 requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB, > 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss > latency > Percent Repaired : 100.0% > Token : (invoke with -T/--tokens to see all 256 tokens) > > > Following is output of nodetool cfstats: > > Total number of tables: 37 > > Keyspace : stresstest > Read Count: 4398162 > Read Latency: 0.02184742626579012 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: user_to_segment > SSTable count: 1 > SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 11076103 > Space used (total): 11076103 > Space used by snapshots (total): 0 > Off heap memory used (total): 107981 > SSTable Compression Ratio: 0.5123353861375962 > Number of partitions (estimate): 125782 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 2 > Local read count: 4398162 > Local read latency: 0.030 ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 0.0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 79280 > Bloom filter off heap memory used: 79272 > Index summary off heap memory used: 26757 > Compression metadata off heap memory used: 1952 > Compacted partition minimum bytes: 43 > Compacted partition maximum bytes: 215 > Compacted partition mean bytes: 136 > Average live cells per slice (last five minutes): 5.719932432432432 > Maximum live cells per slice (last five minutes): 10 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Dropped Mutations: 0 > > Following are my results: > The blue graph is before setting row_cache_size_in_mb, > orange is after > > Thanks, >