RE: Commit logs building up
Nate, What values for the FlushWriter line would draw concern to you? What is the difference between Blocked and All Time Blocked? Parag From: Nate McCall [mailto:n...@thelastpickle.com] Sent: Thursday, February 27, 2014 4:22 PM To: Cassandra Users Subject: Re: Commit logs building up What was the impetus for turning up the commitlog_segment_size_in_mb? Also, in nodetool tpstats, do what are the values for the FlushWriter line? On Wed, Feb 26, 2014 at 12:18 PM, Christopher Wirt mailto:chris.w...@struq.com>> wrote: We're running 2.0.5, recently upgraded from 1.2.14. Sometimes we are seeing CommitLogs starting to build up. Is this a potential bug? Or a symptom of something else we can easily address? We have commitlog_sync: periodic commitlog_sync_period_in_ms:1 commitlog_segment_size_in_mb: 512 Thanks, Chris -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Commitlog questions
1) Why is the default 4GB? Has anyone changed this? What are some aspects to consider when determining the commitlog size? 2) If the commitlog is in periodic mode, there is a property to set a time interval to flush the incoming mutations to disk. This implies that there is a queue inside Cassandra to hold this data in memory until it is flushed. a. Is there a name for this queue? b. Is there a limit for this queue? c. Are there any tuning parameters for this queue? Thanks, Parag
RE: Commitlog questions
Oleg, Thanks for the response. If the commitlog is in periodic mode and the fsync happens every 10 seconds, Cassandra is storing the stuff that needs to be sync'd somewhere for a period of 10 seconds. I'm talking about before it even hits any disk. This has to be in memory, correct? Parag -Original Message- From: Oleg Dulin [mailto:oleg.du...@gmail.com] Sent: Wednesday, April 09, 2014 10:42 AM To: user@cassandra.apache.org Subject: Re: Commitlog questions Parag: To answer your questions: 1) Default is just that, a default. I wouldn't advise raising it though. The bigger it is the longer it takes to restart the node. 2) I think they juse use fsync. There is no queue. All files in cassandra use java.nio buffers, but they need to be fsynced periodically. Look at commitlog_sync parameters in cassandra.yaml file, the comments there explain how it works. I believe the difference between periodic and batch is just that -- if it is periodic, it will fsync every 10 seconds, if it is batch it will fsync if there were any changes within a time window. On 2014-04-09 10:06:52 +0000, Parag Patel said: > >>>>> 1) Why is the default 4GB? Has anyone changed this? What are >>>>> some aspects to consider when determining the commitlog size? >>>>> 2) If the commitlog is in periodic mode, there is a property >>>>> to set a time interval to flush the incoming mutations to disk. >>>>> This implies that there is a queue inside Cassandra to hold this >>>>> data in memory until it is flushed. >>>>>>>>> a. Is there a name for this queue? >>>>>>>>> b. Is there a limit for this queue? >>>>>>>>> c. Are there any tuning parameters for this queue? > > Thanks, > Parag -- Regards, Oleg Dulin http://www.olegdulin.com
Cassandra memory consumption
We're using Cassandra 1.2.12. What aspects of the data is stored in off heap memory vs heap memory?
RE: Cassandra memory consumption
If I'm inserting the following : Partition key = 8 byte String Clustering key = 20 byte String Stored Data = 150 byte byte[] If the insert is still in the memtable, what portion of the above is in the memtable? All of it, or just the keys? If just the keys, where does the stored data live? (keep in mind in this scenario the data has been been purged to the data directory. It's only been added to the commit log). Parag From: DuyHai Doan [mailto:doanduy...@gmail.com] Sent: Thursday, April 10, 2014 3:35 PM To: user@cassandra.apache.org Subject: Re: Cassandra memory consumption Data structures that are stored off heaps: 1) Row cache (if JNA enabled, otherwise on heap) 2) Bloom filter 3) Compression offset 4) Key Index sample On heap: 1) Memtables 2) Partition Key cache Hope that I did not forget anything Regards Duy Hai DOAN On Thu, Apr 10, 2014 at 9:13 PM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: We're using Cassandra 1.2.12. What aspects of the data is stored in off heap memory vs heap memory?
RE: New application - separate column family or separate cluster?
In your scenario #1, is the total number of nodes staying the same? Meaning, if you launch multiple clusters for #2, you’d have N total nodes – are we assuming #1 has N or less than N? If #1 and #2 both have N, wouldn’t the performance be the same since Cassandra’s performance increases linearly? From: Tupshin Harper [mailto:tups...@tupshin.com] Sent: Tuesday, July 08, 2014 11:13 PM To: user@cassandra.apache.org Subject: Re: New application - separate column family or separate cluster? I've seen a lot of deployments, and I think you captured the scenarios and reasoning quite well. You can apply other nuances and details to #2 (e.g. segment based on SLA or topology), but I agree with all of your reasoning. -Tupshin -Global Field Strategy -Datastax On Jul 8, 2014 10:54 AM, "Jeremy Jongsma" mailto:jer...@barchart.com>> wrote: Do you prefer purpose-specific Cassandra clusters that support a single application's data set, or a single Cassandra cluster that contains column families for many applications? I realize there is no ideal answer for every situation, but what have your experiences been in this area for cluster planning? My reason for asking is that we have one application with high data volume (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in the first place. Now we have the tools and cluster management infrastructure built up to the point where it is not a major investment to store smaller sets of data for other applications in C* also, and I am debating whether to: 1) Store everything in one large cluster (no isolation, low cost) 2) Use one cluster for the high-volume data, and one for everything else (good isolation, medium cost) 3) Give every major service its own cluster, even if they have small amounts of data (best isolation, highest cost) I suspect #2 is the way to go as far as balancing hosting costs and application performance isolation. Any pros or cons am I missing? -j
adding more nodes into the cluster
Hi, We have a 12 node cluster with replication factor of 3 in 1 datacenter. We want to add 6 more nodes into the cluster. I'm trying to see what's better bootstapping all 6 at the same time or doing it one node at a time. Anybody have any thoughts on this? Thanks, Parag
RE: adding more nodes into the cluster
Thanks rob From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Wednesday, July 16, 2014 2:21 PM To: user@cassandra.apache.org Subject: Re: adding more nodes into the cluster On Wed, Jul 16, 2014 at 9:16 AM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: We have a 12 node cluster with replication factor of 3 in 1 datacenter. We want to add 6 more nodes into the cluster. I’m trying to see what’s better bootstapping all 6 at the same time or doing it one node at a time. I should really write a blog post on this. For safety, operators should generally bootstrap one node at a time. There are rare cases in non-vnode operation where one can safely bootstrap more than one node, but in general one should not do so. In the future in Cassandra, you will hopefully prohibited from bootstrapping more than one at a time, because it's a natural thing to do and Bad Stuff Can Happen. https://issues.apache.org/jira/browse/CASSANDRA-7069 =Rob
RE: adding more nodes into the cluster
Couple more questions about bootstrapping 1) Should we bootstrap all 6 nodes first and then call clean up once or should cleanup be called after each node is bootstrapped? 2) Is it safe to kill the cleanup call and expect it to resume the next it’s called? Thanks, Parag From: Parag Patel [mailto:ppa...@clearpoolgroup.com] Sent: Wednesday, July 16, 2014 5:22 PM To: user@cassandra.apache.org Subject: RE: adding more nodes into the cluster Thanks rob From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Wednesday, July 16, 2014 2:21 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: adding more nodes into the cluster On Wed, Jul 16, 2014 at 9:16 AM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: We have a 12 node cluster with replication factor of 3 in 1 datacenter. We want to add 6 more nodes into the cluster. I’m trying to see what’s better bootstapping all 6 at the same time or doing it one node at a time. I should really write a blog post on this. For safety, operators should generally bootstrap one node at a time. There are rare cases in non-vnode operation where one can safely bootstrap more than one node, but in general one should not do so. In the future in Cassandra, you will hopefully prohibited from bootstrapping more than one at a time, because it's a natural thing to do and Bad Stuff Can Happen. https://issues.apache.org/jira/browse/CASSANDRA-7069 =Rob
bootstrapping new nodes on 1.2.12
Hi, It's taking a while to boostrap a 13th node into a 12 node cluster. The average node size is about 1.7TB. At the beginning of today we were close to .9TB on the new node and 12 hours later we're at 1.1TB. I figured it would have finished by now because when I was looking on OpsCenter, there were 2 transfers remaining. 1 was at 0% and the other was at 2%. I look again now and those same nodes haven't progressed all day. Instead I see 9 more transfers (some of which are progressing). 1) Would anyone be able to help me interrupt this information from OpsCenter? 2) Is there anything I can do to speed this up? Thanks, Parag
dropping secondary indexes
Hi, I've noticed that our datamodel has many unnecessary secondary indexes. Are there a recommended procedure to drop a secondary index on a very large table? Is there any sort of repair/cleanup that should be done after calling the DROP command? Thanks, Parag
RE: bootstrapping new nodes on 1.2.12
Thanks for the detailed response. I checked ‘nodetool netstats’ and I see there are pending streams, all of which are stuck at 0%. I was expecting to see at least one output that was more than 0%. Have you seen this before? Side question – does a new node stream from other nodes in any particular order? Perhaps this is a coincidence, but if I were to sort my hostnames in alphabetical order, it’s currently streaming from the last 2. From: Mark Reddy [mailto:mark.re...@boxever.com] Sent: Wednesday, July 30, 2014 4:42 AM To: user@cassandra.apache.org Subject: Re: bootstrapping new nodes on 1.2.12 Hi Parag, 1) Would anyone be able to help me interrupt this information from OpsCenter? At a high level bootstrapping a new node has two phases, streaming and secondary index builds. I believe OpsCenter will only report active streams, the pending stream will be listed as such in OpsCenter as well. In OpsCenter rather than looking at the Data Size check the used space on the Storage Capacity pie chart, this will show how much data is on disk but not necessarily live on the node yet. Personally I would check 'nodetool netstats' to see what streams are remaining, this will list all active / pending stream and what files are to be streamed, at the moment you might just be streaming some very large files and once complete you will see a dramatic increase in data size. If streaming is complete and you use secondary indexes, check 'nodetool compcationstats' for any secondary index builds that may be taking place. 2) Is there anything I can do to speed this up? If you have the capacity you could increase stream_throughput_outbound_megabits_per_sec in your cassandra.yaml If you don't have the capacity you could add more nodes to spread the data so you stream less in future. Finally you could upgrade to 2.0.x as it contains a complete refactor of streaming and should make your streaming sessions more robust and transparent: https://issues.apache.org/jira/browse/CASSANDRA-5286 Mark On Wed, Jul 30, 2014 at 3:15 AM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: Hi, It’s taking a while to boostrap a 13th node into a 12 node cluster. The average node size is about 1.7TB. At the beginning of today we were close to .9TB on the new node and 12 hours later we’re at 1.1TB. I figured it would have finished by now because when I was looking on OpsCenter, there were 2 transfers remaining. 1 was at 0% and the other was at 2%. I look again now and those same nodes haven’t progressed all day. Instead I see 9 more transfers (some of which are progressing). 1) Would anyone be able to help me interrupt this information from OpsCenter? 2) Is there anything I can do to speed this up? Thanks, Parag
RE: bootstrapping new nodes on 1.2.12
Mark, I see this output my log many times over for 2 nodes. We have a cron entry across all clusters that force a full GC at 2 AM. node1 is due to Full GC that was scheduled (I can disable this). Node2 was due to a Full GC that occurred during our peak operation (these happen occasionally, we’ve been working to reduce them). Few Questions 1) Will any node leaving the cluster while streaming force us to bootsrap all over again? If so, is this addressed in future versions? 2) We have too much data to migrate to run on non-production hours. How do we make it such that full GC’s don’t impact bootstrapping? Should we increase phi_convict_threshold ? Parag From: Mark Reddy [mailto:mark.re...@boxever.com] Sent: Wednesday, July 30, 2014 7:58 AM To: user@cassandra.apache.org Subject: Re: bootstrapping new nodes on 1.2.12 Thanks for the detailed response. I checked ‘nodetool netstats’ and I see there are pending streams, all of which are stuck at 0%. I was expecting to see at least one output that was more than 0%. Have you seen this before? This could indicate that the bootstrap process is hung due to a failed streaming session. Can you check your logs for the following line: AbstractStreamSession.java (line 110) Stream failed because /xxx.xxx.xxx.xxx died or was restarted/removed (streams may still be active in background, but further streams won't be started) If that is the case you will need to wipe the node and begin the bootstrapping process again Mark On Wed, Jul 30, 2014 at 12:03 PM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: Thanks for the detailed response. I checked ‘nodetool netstats’ and I see there are pending streams, all of which are stuck at 0%. I was expecting to see at least one output that was more than 0%. Have you seen this before? Side question – does a new node stream from other nodes in any particular order? Perhaps this is a coincidence, but if I were to sort my hostnames in alphabetical order, it’s currently streaming from the last 2. From: Mark Reddy [mailto:mark.re...@boxever.com<mailto:mark.re...@boxever.com>] Sent: Wednesday, July 30, 2014 4:42 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: bootstrapping new nodes on 1.2.12 Hi Parag, 1) Would anyone be able to help me interrupt this information from OpsCenter? At a high level bootstrapping a new node has two phases, streaming and secondary index builds. I believe OpsCenter will only report active streams, the pending stream will be listed as such in OpsCenter as well. In OpsCenter rather than looking at the Data Size check the used space on the Storage Capacity pie chart, this will show how much data is on disk but not necessarily live on the node yet. Personally I would check 'nodetool netstats' to see what streams are remaining, this will list all active / pending stream and what files are to be streamed, at the moment you might just be streaming some very large files and once complete you will see a dramatic increase in data size. If streaming is complete and you use secondary indexes, check 'nodetool compcationstats' for any secondary index builds that may be taking place. 2) Is there anything I can do to speed this up? If you have the capacity you could increase stream_throughput_outbound_megabits_per_sec in your cassandra.yaml If you don't have the capacity you could add more nodes to spread the data so you stream less in future. Finally you could upgrade to 2.0.x as it contains a complete refactor of streaming and should make your streaming sessions more robust and transparent: https://issues.apache.org/jira/browse/CASSANDRA-5286 Mark On Wed, Jul 30, 2014 at 3:15 AM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: Hi, It’s taking a while to boostrap a 13th node into a 12 node cluster. The average node size is about 1.7TB. At the beginning of today we were close to .9TB on the new node and 12 hours later we’re at 1.1TB. I figured it would have finished by now because when I was looking on OpsCenter, there were 2 transfers remaining. 1 was at 0% and the other was at 2%. I look again now and those same nodes haven’t progressed all day. Instead I see 9 more transfers (some of which are progressing). 1) Would anyone be able to help me interrupt this information from OpsCenter? 2) Is there anything I can do to speed this up? Thanks, Parag
RE: bootstrapping new nodes on 1.2.12
As to why we do it, we need to reevaluate because the GC optimizations we’ve made recently probably don’t require it anymore. However, prior to our optimizations we observed a benefit at our peak time. When we force a GC, we don’t remove it from the ring. This seems like a fundamental flaw in our approach. Thanks for pointing this out. For the purposes of bootstrapping, we will disable the manual GC’s to make sure we don’t interrupt the joining process. However, one unpredictable problem can always remain – a Full GC happens causing the node to go offline and causing the bootstrap to fail. To solve this, we’ll try increasing the phi_convict_threshold. Our Full GC’s take about 9 seconds. If we were to increase the phi_convict_threshold to not take a node offline for a 9 second unavailability, what negative side effects can there be? Parag From: Mark Reddy [mailto:mark.re...@boxever.com] Sent: Wednesday, July 30, 2014 9:06 AM To: user@cassandra.apache.org Subject: Re: bootstrapping new nodes on 1.2.12 HI Parag, I see this output my log many times over for 2 nodes. We have a cron entry across all clusters that force a full GC at 2 AM. node1 is due to Full GC that was scheduled (I can disable this). Node2 was due to a Full GC that occurred during our peak operation (these happen occasionally, we’ve been working to reduce them). Firstly, why are you forcing a GC? Do you have sufficient evidence that Cassandra is not managing the heap in the way in which your application requires? Also how are you accomplishing this full GC? Are you removing the node from the ring, forcing a GC and then adding it back in? Or are you forcing a GC while it is in the ring? 1) Will any node leaving the cluster while streaming force us to bootsrap all over again? If so, is this addressed in future versions? If the node that is leaving the ring is streaming data to the bootstrapping node, yes, this will break the streaming session and no further streams will be started from that node. To my knowledge, there is nothing in newer / future versions that will prevent this. 2) We have too much data to migrate to run on non-production hours. How do we make it such that full GC’s don’t impact bootstrapping? Should we increase phi_convict_threshold ? Again I'll need some more information around these manual GC's. But yes, increasing the phi value would reduce the chance of a node in the ring being marked down during a heavy gc cycle. Mark On Wed, Jul 30, 2014 at 1:41 PM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: Mark, I see this output my log many times over for 2 nodes. We have a cron entry across all clusters that force a full GC at 2 AM. node1 is due to Full GC that was scheduled (I can disable this). Node2 was due to a Full GC that occurred during our peak operation (these happen occasionally, we’ve been working to reduce them). Few Questions 1) Will any node leaving the cluster while streaming force us to bootsrap all over again? If so, is this addressed in future versions? 2) We have too much data to migrate to run on non-production hours. How do we make it such that full GC’s don’t impact bootstrapping? Should we increase phi_convict_threshold ? Parag From: Mark Reddy [mailto:mark.re...@boxever.com<mailto:mark.re...@boxever.com>] Sent: Wednesday, July 30, 2014 7:58 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: bootstrapping new nodes on 1.2.12 Thanks for the detailed response. I checked ‘nodetool netstats’ and I see there are pending streams, all of which are stuck at 0%. I was expecting to see at least one output that was more than 0%. Have you seen this before? This could indicate that the bootstrap process is hung due to a failed streaming session. Can you check your logs for the following line: AbstractStreamSession.java (line 110) Stream failed because /xxx.xxx.xxx.xxx died or was restarted/removed (streams may still be active in background, but further streams won't be started) If that is the case you will need to wipe the node and begin the bootstrapping process again Mark On Wed, Jul 30, 2014 at 12:03 PM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: Thanks for the detailed response. I checked ‘nodetool netstats’ and I see there are pending streams, all of which are stuck at 0%. I was expecting to see at least one output that was more than 0%. Have you seen this before? Side question – does a new node stream from other nodes in any particular order? Perhaps this is a coincidence, but if I were to sort my hostnames in alphabetical order, it’s currently streaming from the last 2. From: Mark Reddy [mailto:mark.re...@boxever.com<mailto:mark.re...@boxever.com>] Sent: Wednesday, July 30, 2014 4:42 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: bootstrapping new nodes on
RE: bootstrapping new nodes on 1.2.12
My understanding of a 9 second GC seems to be very off based on the gossip logs. Correct me if im wrong, but the “handshaking version” is just a log for it to attempt to connect to the other nodes? Manual FGC 2:01:02 - Node1 full GC 2:01:25 - Node2 detects node1 DOWN 2:01:27 - Node2 handshaking version with Node1 2:01:32 - Node2 handshaking version with Node1 because failed previously 2:01:37 - Node2 handshaking version with Node1 because failed previously 2:01:39 - Node2 detects Node1 UP Production FGC 9:30:45 - Node1 full gc 9:30:47 - Node2 detects Node1 DOWN 9:30:55 - handshaking version with Node1 9:31:00 - handshaking version with Node1 because failed previously 9:31:05 - handshaking version with Node1 because failed previously 9:31:10 - handshaking version with Node1 because failed previously 9:31:15 - handshaking version with Node1 because failed previously 9:31:20 - handshaking version with Node1 because failed previously 9:31:37 - Node2 – detects Node1 UP From: Mark Reddy [mailto:mark.re...@boxever.com] Sent: Wednesday, July 30, 2014 9:41 AM To: user@cassandra.apache.org Subject: Re: bootstrapping new nodes on 1.2.12 Our Full GC’s take about 9 seconds. If we were to increase the phi_convict_threshold to not take a node offline for a 9 second unavailability, what negative side effects can there be? When you observe these GC's do you also see the node being marked down and then back up ~9 seconds later? GC's can often happen and have no effect on gossip marking a node as down, in which case the streaming session will remain intact. The side effect of long GC's is increased latency from that node during that period. Mark On Wed, Jul 30, 2014 at 2:24 PM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: As to why we do it, we need to reevaluate because the GC optimizations we’ve made recently probably don’t require it anymore. However, prior to our optimizations we observed a benefit at our peak time. When we force a GC, we don’t remove it from the ring. This seems like a fundamental flaw in our approach. Thanks for pointing this out. For the purposes of bootstrapping, we will disable the manual GC’s to make sure we don’t interrupt the joining process. However, one unpredictable problem can always remain – a Full GC happens causing the node to go offline and causing the bootstrap to fail. To solve this, we’ll try increasing the phi_convict_threshold. Our Full GC’s take about 9 seconds. If we were to increase the phi_convict_threshold to not take a node offline for a 9 second unavailability, what negative side effects can there be? Parag From: Mark Reddy [mailto:mark.re...@boxever.com<mailto:mark.re...@boxever.com>] Sent: Wednesday, July 30, 2014 9:06 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: bootstrapping new nodes on 1.2.12 HI Parag, I see this output my log many times over for 2 nodes. We have a cron entry across all clusters that force a full GC at 2 AM. node1 is due to Full GC that was scheduled (I can disable this). Node2 was due to a Full GC that occurred during our peak operation (these happen occasionally, we’ve been working to reduce them). Firstly, why are you forcing a GC? Do you have sufficient evidence that Cassandra is not managing the heap in the way in which your application requires? Also how are you accomplishing this full GC? Are you removing the node from the ring, forcing a GC and then adding it back in? Or are you forcing a GC while it is in the ring? 1) Will any node leaving the cluster while streaming force us to bootsrap all over again? If so, is this addressed in future versions? If the node that is leaving the ring is streaming data to the bootstrapping node, yes, this will break the streaming session and no further streams will be started from that node. To my knowledge, there is nothing in newer / future versions that will prevent this. 2) We have too much data to migrate to run on non-production hours. How do we make it such that full GC’s don’t impact bootstrapping? Should we increase phi_convict_threshold ? Again I'll need some more information around these manual GC's. But yes, increasing the phi value would reduce the chance of a node in the ring being marked down during a heavy gc cycle. Mark On Wed, Jul 30, 2014 at 1:41 PM, Parag Patel mailto:ppa...@clearpoolgroup.com>> wrote: Mark, I see this output my log many times over for 2 nodes. We have a cron entry across all clusters that force a full GC at 2 AM. node1 is due to Full GC that was scheduled (I can disable this). Node2 was due to a Full GC that occurred during our peak operation (these happen occasionally, we’ve been working to reduce them). Few Questions 1) Will any node leaving the cluster while streaming force us to bootsrap all over again? If so, is this addressed in future versions? 2)
Manually deleting sstables
After we dropped a table, we noticed that the sstables are still there. After searching through the forum history, I noticed that this is known behavior. 1) Is there any negative impact of deleting the sstables off disk and then restarting Cassandra? 2) Are there any other recommended procedures for this? Thanks, Parag
RE: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Agreed. We only use secondary indexes for column families that are relatively small (~5k rows). For anything larger, we store the data into a wide row (but this depends on your data model) -Original Message- From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of Jonathan Haddad Sent: Friday, September 19, 2014 4:01 AM To: user@cassandra.apache.org Subject: Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6). Keep in mind secondary indexes in cassandra are not there to improve performance, or even really be used in a serious user facing manner. Build and maintain your own view of the data, it'll be much faster. On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel wrote: > Hi there, > > We are seeing extreme slow down (500ms to 1s) in query on secondary > index with vnode. I'm seeing multiple secondary index scans on a given > node in trace output when vnode is enabled. Without vnode, everything is good. > > Cluster size: 6 nodes > Replication factor: 3 > Consistency level: local_quorum. Same behavior happens with > consistency level of ONE. > > Snippet from the trace output. Pls see the attached output1.txt for > the full log. Are we hitting any bug? Do not understand why > coordinator sends requests multiple times to the same node (e.g. > 192.168.51.22 in below > output) for different token ranges. > > > Executing indexed scan for [min(-9223372036854775808), > max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing > indexed scan for (max(-9193352069377957523), > max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing > indexed scan for (max(-9136021049555745100), > max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing > indexed scan for (max(-8959555493872108621), > max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing > indexed scan for (max(-8929774302283364912), > max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 | Executing > indexed scan for (max(-8854653908608918942), > max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 | > Executing indexed scan for (max(-8762620856967633953), > max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing > indexed scan for (max(-8668275030769104047), > max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 | Executing > indexed scan for (max(-8659066486210615614), > max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 | Executing > indexed scan for (max(-8419137646248370231), > max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 | Executing > indexed scan for (max(-8416786876632807845), > max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing > indexed scan for (max(-8315889933848495185), > max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing > indexed scan for (max(-8270922890152952193), > max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 | Executing > indexed scan for (max(-8260813759533312175), > max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 | Executing > indexed scan for (max(-8234845345932129353), > max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 | > > Thanks, > Jay > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Off heap memory leak?
We have a 12node Cassandra cluster running on 1.2.12. Each node is using 1.1TB out of 2TB. Each node has a min+max heap of 24Gb and the physical server has 48Gb. Our nodes do not restart during the week, only on the weekend, and we're observing that the off heap memory that is consumed ramps up over the course of the week. There were a few occasions lately where it consumed so much, it started swapping and OS eventually killed the process (for good reason). I know we need to upgrade, but I'd like to evaluate if upgrading will fix the problem. Has anyone experienced this or can anyone provide guidance? Parag
Read query slows down when a node goes down
Hi, We have a six node cluster running DataStax Community Edition 1.2.9. From our app, we use the Netflix Astyanax library to read and write records into our cluster. We read and write with QUARUM. We're experiencing an issue where when a node goes down, we see our read queries slowing down in our app whenever a node goes offline. This is a problem that is very reproducible. Has anybody experienced this before or do people have suggestions on what I could try? Thanks, Parag
RE: Read query slows down when a node goes down
RF=3. Single dc deployment. No v-nodes. Is there a certain amount of time I need to wait from the time the down node is started to the point where it's ready to be used? If so, what's that time? If it's dynamic, how would I know when it's ready? Thanks, Parag From: sankalp kohli [mailto:kohlisank...@gmail.com] Sent: Sunday, September 15, 2013 4:52 PM To: user@cassandra.apache.org Subject: Re: Read query slows down when a node goes down What is your replication factor? DO you have multi-DC deployment? Also are u using v nodes? On Sun, Sep 15, 2013 at 7:54 AM, Parag Patel mailto:parag.pa...@fusionts.com>> wrote: Hi, We have a six node cluster running DataStax Community Edition 1.2.9. From our app, we use the Netflix Astyanax library to read and write records into our cluster. We read and write with QUARUM. We're experiencing an issue where when a node goes down, we see our read queries slowing down in our app whenever a node goes offline. This is a problem that is very reproducible. Has anybody experienced this before or do people have suggestions on what I could try? Thanks, Parag
RE: Read query slows down when a node goes down
Thanks. I've noticed that a repair takes a long to time to finish. My data is quite small, 1.5GB on each node when running nodetool status. Is there anyway to speed up repairs? (FYI, I haven't actually seen a repair finish since it didn't retrun after 10 mins - I figured I was doing something wrong). From: sankalp kohli [mailto:kohlisank...@gmail.com] Sent: Monday, September 16, 2013 1:10 PM To: user@cassandra.apache.org Subject: Re: Read query slows down when a node goes down For how long does the read latencies go up once a machine is down? It takes a configurable amount of time for machines to detect that another machine is down. This is done through Gossip. The algo to detect failures is The Phi accrual failure detector. Regarding your question, if you are bootstrapping then it need to get the data from other nodes and during this time, it will not serve any reads but will accept writes. Once it has all the data, it will start serving reads. In the logs it will have something like "now serving reads" . If you are bringing back a machine which is offline, then it will start accepting reads and writes immediately but then you should run a repair to get the missing data. On Mon, Sep 16, 2013 at 8:12 AM, Parag Patel mailto:parag.pa...@fusionts.com>> wrote: RF=3. Single dc deployment. No v-nodes. Is there a certain amount of time I need to wait from the time the down node is started to the point where it's ready to be used? If so, what's that time? If it's dynamic, how would I know when it's ready? Thanks, Parag From: sankalp kohli [mailto:kohlisank...@gmail.com<mailto:kohlisank...@gmail.com>] Sent: Sunday, September 15, 2013 4:52 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Read query slows down when a node goes down What is your replication factor? DO you have multi-DC deployment? Also are u using v nodes? On Sun, Sep 15, 2013 at 7:54 AM, Parag Patel mailto:parag.pa...@fusionts.com>> wrote: Hi, We have a six node cluster running DataStax Community Edition 1.2.9. From our app, we use the Netflix Astyanax library to read and write records into our cluster. We read and write with QUARUM. We're experiencing an issue where when a node goes down, we see our read queries slowing down in our app whenever a node goes offline. This is a problem that is very reproducible. Has anybody experienced this before or do people have suggestions on what I could try? Thanks, Parag
Statistics
Hi, I'm looking for a way to view statistics. Mainly, I'd like to see the distribution of writes and reads over the course of a day or a set of days. Is there a way to do this through nodetool or by downloading a utility? Thanks, Parag
Issue upgrading from 1.2 to 2.0.3
Hi, We are in the process of upgrading 1.2 to 2.0.3. We have a four node cluster and we're upgrading one node at a time. After upgrading two of the nodes, we encountered a problem. We observed that if we run nodetool status on the 2.0.3 hosts, they would show 2 nodes down and 2 nodes up. If we run nodetool status on the 1.2 hosts, it would show all nodes up. Has anyone encountered this? Perhaps I'm missing a step in my upgrade procedure? Please help as this will prevent us from pushing into production. Thanks, Parag
RE: Issue upgrading from 1.2 to 2.0.3
Thanks for that link. Our 1.2 version is 1.2.12 Our 2.0.3 nodes were restarted once. Before restart, it was the 1.2.12 binary, after it was the 2.0.3. Immediately after the node was back in the cluster, we ran nodetool upgradesstables. We haven't restarted since. Is a restart required for each node? From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Thursday, December 19, 2013 4:17 PM To: user@cassandra.apache.org Subject: Re: Issue upgrading from 1.2 to 2.0.3 On Thu, Dec 19, 2013 at 1:03 PM, Parag Patel mailto:parag.pa...@fusionts.com>> wrote: We are in the process of upgrading 1.2 to 2.0.3. ... Please help as this will prevent us from pushing into production. (as a general commentary : https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ ) specific feedback on your question : Did the 2.0.3 nodes see the 1.2.x (which 1.2.x?) nodes after the first restart? =Rob
Astyanax - multiple key search with pagination
Hi, I'm using Astyanax and trying to do search for multiple keys with pagination. I tried ".getKeySlice" with a list a of primary keys, but it doesn't allow pagination. Does anyone know how to tackle this issue with Astyanax? Parag
RE: Issue upgrading from 1.2 to 2.0.3
After restarting all the nodes, they all see each other. I'll try nodetool gossipinfo if it happens again. Thanks. From: Aaron Morton [mailto:aa...@thelastpickle.com] Sent: Monday, December 23, 2013 10:19 PM To: Cassandra User Subject: Re: Issue upgrading from 1.2 to 2.0.3 If this is still a concern can you post the output from nodetool gossipinfo ? It will give the details of the nodes think of the other ones. A - Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 20/12/2013, at 11:38 am, Parag Patel mailto:parag.pa...@fusionts.com>> wrote: Thanks for that link. Our 1.2 version is 1.2.12 Our 2.0.3 nodes were restarted once. Before restart, it was the 1.2.12 binary, after it was the 2.0.3. Immediately after the node was back in the cluster, we ran nodetool upgradesstables. We haven't restarted since. Is a restart required for each node? From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Thursday, December 19, 2013 4:17 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Issue upgrading from 1.2 to 2.0.3 On Thu, Dec 19, 2013 at 1:03 PM, Parag Patel mailto:parag.pa...@fusionts.com>> wrote: We are in the process of upgrading 1.2 to 2.0.3. ... Please help as this will prevent us from pushing into production. (as a general commentary : https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ ) specific feedback on your question : Did the 2.0.3 nodes see the 1.2.x (which 1.2.x?) nodes after the first restart? =Rob