about validity of recipe A node join using external data copy methods
Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split, with 2 nodes: the existing one, and a new one. And for starting a new node, one needs to: - copy the data records from the corresponding node (without the system records) - start the new node with auto_bootstrap: false This raises 2 questions: A) is this recipe still valid with v1.1 and v1.2 ? B) do we still need to start the new node with auto_bootstrap: false ? My guess is yes as the happening of the bootstrap phase is not recorded into the data records. Thanks. Dominique [1] see recipe A node join using external data copy methods, page 165
Re: Astyanax
The wiki? https://github.com/Netflix/astyanax/wiki On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima peitin.inu...@gmail.comwrote: Hi, Someone has or could indicate some good tutorial or book to learn Astyanax? Thanks -- Everton Lima Aleixo Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: about validity of recipe A node join using external data copy methods
Basically this recipe is from the old days when we had anti-compaction. Now streaming is very efficient rarely fails and there is no need to do it this way anymore. This recipe will be abolished from the second edition. It still likely works except when using counters. Edward On Tue, Jan 8, 2013 at 7:27 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split, with 2 nodes: the existing one, and a new one. And for starting a new node, one needs to: - copy the data records from the corresponding node (without the system records) - start the new node with auto_bootstrap: false This raises 2 questions: A) is this recipe still valid with v1.1 and v1.2 ? B) do we still need to start the new node with auto_bootstrap: false ? My guess is yes as the happening of the bootstrap phase is not recorded into the data records. Thanks. Dominique [1] see recipe A node join using external data copy methods, page 165
Re: help turning compaction..hours of run to get 0% compaction....
One metric to watch is pending compactions (via nodetool compactionstats). This count will give you some idea of whether you are falling behind with compactions. The other measure is how long you are compacting after your inserts have stopped. If I understand correctly, since you never update the data, that would explain why the compaction logging shows 100% of orig. With size-tiered, you are flushing small files, compacting when you get 4 of like size, etc. Since you have no updates, the compaction will not shrink the data. As Aaron said, use iostat –x (or dstat) to see if you are taxing the disks. If so, then leveled compaction may be your option (for reasons already stated). If not taxing the disks, then you might want to increase your compaction throughput, as you suggested. Depending on what version you are using, another thing to possibly tune is the size of sstables when flushed to disk. In your case of insert only, the smaller the flush size, the more times that row is going to be rewritten during a compaction (hence increase I/O). jc From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, January 7, 2013 2:33 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: help turning compaction..hours of run to get 0% compaction There is some point where you simply need more machines. On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: Right, I guess I'm saying that you should try loading your data with leveled compaction and see how your compaction load is. Your work load sounds like leveled will fit much better than size tiered. From: Brian Tarbox tar...@cabotresearch.commailto:tar...@cabotresearch.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, January 7, 2013 1:58 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: help turning compaction..hours of run to get 0% compaction The problem I see is that it already takes me more than 24 hours just to load my data...during which time the logs say I'm spending tons of time doing compaction. For example in the last 72 hours I'm consumed 20 hours per machine on compaction. Can I conclude from that than I should be (perhaps drastically) increasing my compaction_mb_per_sec on the theory that I'm getting behind? The fact that it takes me 3 days or more to run a test means its hard to just play with values and see what works best, so I'm trying to understand the behavior in detail. Thanks. Brain On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction If you perform at least twice as many reads as you do writes, leveled compaction may actually save you disk I/O, despite consuming more I/O for compaction. This is especially true if your reads are fairly random and don’t focus on a single, hot dataset. From: Brian Tarbox tar...@cabotresearch.commailto:tar...@cabotresearch.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, January 7, 2013 12:56 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: help turning compaction..hours of run to get 0% compaction I have not specified leveled compaction so I guess I'm defaulting to size tiered? My data (in the column family causing the trouble) insert once, ready many, update-never. Brian On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: Size tiered or leveled compaction? From: Brian Tarbox tar...@cabotresearch.commailto:tar...@cabotresearch.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, January 7, 2013 12:03 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: help turning compaction..hours of run to get 0% compaction I have a column family where I'm doing 500 inserts/sec for 12 hours or so at time. At some point my performance falls off a cliff due to time spent doing compactions. I'm seeing row after row of logs saying that after 1 or 2 hours of compactiing it reduced to 100% of 99% of the original. I'm trying to understand what direction this data points me to in term of configuration change. a) increase my compaction_throughput_mb_per_sec
Re: Astyanax
Hi, We are using astyanax and we found out that github wiki with stackoverflow is the most comprehensive set of documentation. Do you have any specific questions? Kind regards, Radek Gruchalski On 8 Jan 2013, at 15:46, Everton Lima peitin.inu...@gmail.com wrote: I was studing by there, but I would to know if anyone knows other sources. 2013/1/8 Markus Klems markuskl...@gmail.com The wiki? https://github.com/Netflix/astyanax/wiki On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima peitin.inu...@gmail.com wrote: Hi, Someone has or could indicate some good tutorial or book to learn Astyanax? Thanks -- Everton Lima Aleixo Mestrando em Ciência da Computação pela UFG Programador no LUPA -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
RE: about validity of recipe A node join using external data copy methods
Now streaming is very efficient rarely fails and there is no need to do it this way anymore I guess it's true in v1.2. Is it true also in v1.1 ? Thanks. Dominique De : Edward Capriolo [mailto:edlinuxg...@gmail.com] Envoyé : mardi 8 janvier 2013 16:01 À : user@cassandra.apache.org Objet : Re: about validity of recipe A node join using external data copy methods Basically this recipe is from the old days when we had anti-compaction. Now streaming is very efficient rarely fails and there is no need to do it this way anymore. This recipe will be abolished from the second edition. It still likely works except when using counters. Edward On Tue, Jan 8, 2013 at 7:27 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split, with 2 nodes: the existing one, and a new one. And for starting a new node, one needs to: - copy the data records from the corresponding node (without the system records) - start the new node with auto_bootstrap: false This raises 2 questions: A) is this recipe still valid with v1.1 and v1.2 ? B) do we still need to start the new node with auto_bootstrap: false ? My guess is yes as the happening of the bootstrap phase is not recorded into the data records. Thanks. Dominique [1] see recipe A node join using external data copy methods, page 165
Re: CQL3 Frame Length
Mostly this is because having the frame length is convenient to have in practice. Without pretending that there is only one way to write a server, it is common to separate the phase read a frame from the network from the phase decode the frame which is often simpler if you can read the frame upfront. Also, if you don't have the frame size, it means you need to decode the whole frame before being able to decode the next one, and so you can't parallelize the decoding. It is true however that it means for the write side that you need to either be able to either pre-compute the frame body size or to serialize it in memory first. That's a trade of for making it easier on the read side. But if you want my opinion, on the write side too it's probably worth parallelizing the message encoding (which require you encode it in memory first) since it's an asynchronous protocol and so there will likely be multiple writer simultaneously. -- Sylvain On Tue, Jan 8, 2013 at 12:48 PM, Ben Hood 0x6e6...@gmail.com wrote: Hi, I've read the CQL wire specification and naively, I can't see how the frame length length header is used. To me, it looks like on the read side, you know which type of structures to expect based on the opcode and each structure is TLV encoded. On the write side, you need to encode TLV structures as well, but you don't know the overall frame length until you've encoded it. So it would seem that you either need to pre-calculate the cumulative TLV size before you serialize the frame body, or you serialize the frame body to a buffer which you can then get the size of and then write to the socket, after having first written the count out. Is there potentially an implicit assumption that the reader will want to pre-buffer the entire frame before decoding it? Cheers, Ben
Re: CQL3 Frame Length
Hey Sylvain, Thanks for explaining the rationale. When you look at from the perspective of the use cases you mention, it makes sense to be able to supply the reader with the frame size up front. I've opted to go for serializing the frame into a buffer. Although this could materialize an arbitrarily large amount of memory, ultimately the driving application has control of the degree to which this can occur, so in the grander scheme of things, you can still maintain streaming semantics. Thanks for the heads up. Cheers, Ben On Tue, Jan 8, 2013 at 4:08 PM, Sylvain Lebresne sylv...@datastax.comwrote: Mostly this is because having the frame length is convenient to have in practice. Without pretending that there is only one way to write a server, it is common to separate the phase read a frame from the network from the phase decode the frame which is often simpler if you can read the frame upfront. Also, if you don't have the frame size, it means you need to decode the whole frame before being able to decode the next one, and so you can't parallelize the decoding. It is true however that it means for the write side that you need to either be able to either pre-compute the frame body size or to serialize it in memory first. That's a trade of for making it easier on the read side. But if you want my opinion, on the write side too it's probably worth parallelizing the message encoding (which require you encode it in memory first) since it's an asynchronous protocol and so there will likely be multiple writer simultaneously. -- Sylvain On Tue, Jan 8, 2013 at 12:48 PM, Ben Hood 0x6e6...@gmail.com wrote: Hi, I've read the CQL wire specification and naively, I can't see how the frame length length header is used. To me, it looks like on the read side, you know which type of structures to expect based on the opcode and each structure is TLV encoded. On the write side, you need to encode TLV structures as well, but you don't know the overall frame length until you've encoded it. So it would seem that you either need to pre-calculate the cumulative TLV size before you serialize the frame body, or you serialize the frame body to a buffer which you can then get the size of and then write to the socket, after having first written the count out. Is there potentially an implicit assumption that the reader will want to pre-buffer the entire frame before decoding it? Cheers, Ben
Re: Astyanax
Not sure where you are on the learning curve, but I've put a couple getting started projects out on github: https://github.com/boneill42/astyanax-quickstart And the latest from the webinar is here: https://github.com/boneill42/naughty-or-nice http://brianoneill.blogspot.com/2013/01/creating-your-frist-java-application -w.html -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 http://www.twitter.com/boneill42 healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Radek Gruchalski radek.gruchal...@portico.io Reply-To: user@cassandra.apache.org Date: Tuesday, January 8, 2013 10:17 AM To: user@cassandra.apache.org user@cassandra.apache.org Cc: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Astyanax Hi, We are using astyanax and we found out that github wiki with stackoverflow is the most comprehensive set of documentation. Do you have any specific questions? Kind regards, Radek Gruchalski On 8 Jan 2013, at 15:46, Everton Lima peitin.inu...@gmail.com wrote: I was studing by there, but I would to know if anyone knows other sources. 2013/1/8 Markus Klems markuskl...@gmail.com The wiki? https://github.com/Netflix/astyanax/wiki On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima peitin.inu...@gmail.com wrote: Hi, Someone has or could indicate some good tutorial or book to learn Astyanax? Thanks -- Everton Lima Aleixo Mestrando em Ciência da Computação pela UFG Programador no LUPA -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: about validity of recipe A node join using external data copy methods
It has been true since about 0.8. in the old days ANTI-COMPACTION stunk and many weird errors would cause node joins to have to be retried N times. Now node moves/joins seem to work near 100% of the time (in 1.0.7) they are also very fast and efficient. If you want to move a node to new hardware you can do it with rsync, but I would not use the technique for growing the cluster. It is error prone, and ends up being more work. On Tue, Jan 8, 2013 at 10:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Now streaming is very efficient rarely fails and there is no need to do it this way anymore I guess it's true in v1.2. Is it true also in v1.1 ? Thanks. Dominique *De :* Edward Capriolo [mailto:edlinuxg...@gmail.com] *Envoyé :* mardi 8 janvier 2013 16:01 *À :* user@cassandra.apache.org *Objet :* Re: about validity of recipe A node join using external data copy methods Basically this recipe is from the old days when we had anti-compaction. Now streaming is very efficient rarely fails and there is no need to do it this way anymore. This recipe will be abolished from the second edition. It still likely works except when using counters. Edward On Tue, Jan 8, 2013 at 7:27 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split, with 2 nodes: the existing one, and a new one. And for starting a new node, one needs to: - copy the data records from the corresponding node (without the system records) - start the new node with auto_bootstrap: false This raises 2 questions: A) is this recipe still valid with v1.1 and v1.2 ? B) do we still need to start the new node with auto_bootstrap: false ? My guess is yes as the happening of the bootstrap phase is not recorded into the data records. Thanks. Dominique [1] see recipe A node join using external data copy methods, page 165
Script to load sstables from v1.0.x to v 1.1.x
Hi all, I have recently been trying to restore backups from a v1.0.x cluster we have into a 1.1.7 cluster. This has not been as trivial as I expected, and I've had a lot of help from the IRC channel in tackling this problem. As a way of saying thanks, I'd like to contribute the updated ruby script I was originally given for accomplishing this task. Here it is. https://gist.github.com/1c161edab88a4e4aea06 It takes a keyspace directory as the input, then creates symlinks in the output directory with the 1.1 structure pointing to the 1.0 sstables. If you've specified a host, it will then invoke the sstableloader for each of the Keyspaces and CFs it discovers in the output directory. I hope this is helpful to someone else. I'll keep the gist updated as I update the script. Todd
Date Index?
Hi folks - Question about secondary indexes. How are people doing date indexes?I have a date column in my tables in RDBMS that we use frequently, such as look at all records recorded in the last month. What is the best practice for being able to do such a query? It seems like there could be an advantage to adding a couple of columns like this: {timestamp=2013/01/08 12:32:01 -0500} {month=201301} {day=08} And then I could do secondary index on the month and day columns? Would that be the best way to do something like this? Is there any accepted best practice on this yet? Thanks! Steve
Re: help turning compaction..hours of run to get 0% compaction....
i'll second edward's comment. cassandra is designed to scale horizontally, so if disk I/O is slowing you down then you must scale On Tue, Jan 8, 2013 at 7:10 AM, Jim Cistaro jcist...@netflix.com wrote: One metric to watch is pending compactions (via nodetool compactionstats). This count will give you some idea of whether you are falling behind with compactions. The other measure is how long you are compacting after your inserts have stopped. If I understand correctly, since you never update the data, that would explain why the compaction logging shows 100% of orig. With size-tiered, you are flushing small files, compacting when you get 4 of like size, etc. Since you have no updates, the compaction will not shrink the data. As Aaron said, use iostat –x (or dstat) to see if you are taxing the disks. If so, then leveled compaction may be your option (for reasons already stated). If not taxing the disks, then you might want to increase your compaction throughput, as you suggested. Depending on what version you are using, another thing to possibly tune is the size of sstables when flushed to disk. In your case of insert only, the smaller the flush size, the more times that row is going to be rewritten during a compaction (hence increase I/O). jc From: Edward Capriolo edlinuxg...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, January 7, 2013 2:33 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: help turning compaction..hours of run to get 0% compaction There is some point where you simply need more machines. On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman mkjell...@barracuda.comwrote: Right, I guess I'm saying that you should try loading your data with leveled compaction and see how your compaction load is. Your work load sounds like leveled will fit much better than size tiered. From: Brian Tarbox tar...@cabotresearch.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, January 7, 2013 1:58 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: help turning compaction..hours of run to get 0% compaction The problem I see is that it already takes me more than 24 hours just to load my data...during which time the logs say I'm spending tons of time doing compaction. For example in the last 72 hours I'm consumed* 20 hours* per machine on compaction. Can I conclude from that than I should be (perhaps drastically) increasing my compaction_mb_per_sec on the theory that I'm getting behind? The fact that it takes me 3 days or more to run a test means its hard to just play with values and see what works best, so I'm trying to understand the behavior in detail. Thanks. Brain On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman mkjell...@barracuda.com wrote: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction If you perform at least twice as many reads as you do writes, leveled compaction may actually save you disk I/O, despite consuming more I/O for compaction. This is especially true if your reads are fairly random and don’t focus on a single, hot dataset. From: Brian Tarbox tar...@cabotresearch.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, January 7, 2013 12:56 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: help turning compaction..hours of run to get 0% compaction I have not specified leveled compaction so I guess I'm defaulting to size tiered? My data (in the column family causing the trouble) insert once, ready many, update-never. Brian On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman mkjell...@barracuda.com wrote: Size tiered or leveled compaction? From: Brian Tarbox tar...@cabotresearch.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, January 7, 2013 12:03 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: help turning compaction..hours of run to get 0% compaction I have a column family where I'm doing 500 inserts/sec for 12 hours or so at time. At some point my performance falls off a cliff due to time spent doing compactions. I'm seeing row after row of logs saying that after 1 or 2 hours of compactiing it reduced to 100% of 99% of the original. I'm trying to understand what direction this data points me to in term of configuration change. a) increase my compaction_throughput_mb_per_sec because I'm falling behind (am I falling behind?) b) enable multi-threaded compaction? Any help is appreciated. Brian -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your
Re: Script to load sstables from v1.0.x to v 1.1.x
On Tue, Jan 8, 2013 at 8:41 AM, Todd Nine todd.n...@gmail.com wrote: I have recently been trying to restore backups from a v1.0.x cluster we have into a 1.1.7 cluster. This has not been as trivial as I expected, and I've had a lot of help from the IRC channel in tackling this problem. As a way of saying thanks, I'd like to contribute the updated ruby script I was originally given for accomplishing this task. Here it is. While I laud your contribution, I am still not fully understanding why this is not working automagically, as it should : http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement What about upgrading? Do you need to manually move all pre-1.1 data files to the new directory structure before upgrading to 1.1? No. Immediately after Cassandra 1.1 starts, it checks to see whether it has old directory structure and migrates all data files (including backups and snapshots) to the new directory structure if needed. So, just upgrade as you always do (don’t forget to read NEWS.txt first), and you will get more control over data files for free. Is it possible that, for example, the installation of the debian package results in your 1.1.x node starting up before you intend it to.. and then when you start it again with the 1.0 paths, it doesn't try to change the paths? * To check if sstables needs migration, we look at the System directory. If it contains a directory for the status cf, we'll attempt a sstable migrating. This quote from Directories.java (thx driftx!) suggests that any starting of a 1.1 node, which would result in a Status columnfamily being created, would make sstablesNeedsMigration return false. If this is your case due to the use of the debian package or similar which auto-starts, your input is welcomed at : https://issues.apache.org/jira/browse/CASSANDRA-2356 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Script to load sstables from v1.0.x to v 1.1.x
I thought this was to load between separate clusters not to upgrade within the same cluster. No? On Jan 8, 2013, at 11:29 AM, Rob Coli rc...@palominodb.com wrote: On Tue, Jan 8, 2013 at 8:41 AM, Todd Nine todd.n...@gmail.com wrote: I have recently been trying to restore backups from a v1.0.x cluster we have into a 1.1.7 cluster. This has not been as trivial as I expected, and I've had a lot of help from the IRC channel in tackling this problem. As a way of saying thanks, I'd like to contribute the updated ruby script I was originally given for accomplishing this task. Here it is. While I laud your contribution, I am still not fully understanding why this is not working automagically, as it should : http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement What about upgrading? Do you need to manually move all pre-1.1 data files to the new directory structure before upgrading to 1.1? No. Immediately after Cassandra 1.1 starts, it checks to see whether it has old directory structure and migrates all data files (including backups and snapshots) to the new directory structure if needed. So, just upgrade as you always do (don’t forget to read NEWS.txt first), and you will get more control over data files for free. Is it possible that, for example, the installation of the debian package results in your 1.1.x node starting up before you intend it to.. and then when you start it again with the 1.0 paths, it doesn't try to change the paths? * To check if sstables needs migration, we look at the System directory. If it contains a directory for the status cf, we'll attempt a sstable migrating. This quote from Directories.java (thx driftx!) suggests that any starting of a 1.1 node, which would result in a Status columnfamily being created, would make sstablesNeedsMigration return false. If this is your case due to the use of the debian package or similar which auto-starts, your input is welcomed at : https://issues.apache.org/jira/browse/CASSANDRA-2356 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb Southfield Public School students safely access the tech tools they need on and off campus with the Barracuda Web Filter. Quick installation and easy to use- try the Barracuda Web Filter free for 30 days: http://on.fb.me/Vj6JBd
Re: Script to load sstables from v1.0.x to v 1.1.x
Our use case is for testing migrations in our data, as well as stress testing outside our production environment. To do this, we load our backups into a fresh cluster, then perform our testing. Our current production cluster is still on 1.0.x, so we can either fire up a 1.0.x cluster, then upgrade every node to accomplish this, or just use the script. We also have a different number of nodes in stage vs production, so we'd still need to run a repair if we did a straight sstable copy. The script is a lot faster and easier for us than going through the upgrade process, then running repair to ensure the data is distributed correctly in the ring. -- Todd Nine On Tuesday, January 8, 2013 at 12:32 PM, Michael Kjellman wrote: I thought this was to load between separate clusters not to upgrade within the same cluster. No? On Jan 8, 2013, at 11:29 AM, Rob Coli rc...@palominodb.com (mailto:rc...@palominodb.com) wrote: On Tue, Jan 8, 2013 at 8:41 AM, Todd Nine todd.n...@gmail.com (mailto:todd.n...@gmail.com) wrote: I have recently been trying to restore backups from a v1.0.x cluster we have into a 1.1.7 cluster. This has not been as trivial as I expected, and I've had a lot of help from the IRC channel in tackling this problem. As a way of saying thanks, I'd like to contribute the updated ruby script I was originally given for accomplishing this task. Here it is. While I laud your contribution, I am still not fully understanding why this is not working automagically, as it should : http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement What about upgrading? Do you need to manually move all pre-1.1 data files to the new directory structure before upgrading to 1.1? No. Immediately after Cassandra 1.1 starts, it checks to see whether it has old directory structure and migrates all data files (including backups and snapshots) to the new directory structure if needed. So, just upgrade as you always do (don’t forget to read NEWS.txt first), and you will get more control over data files for free. Is it possible that, for example, the installation of the debian package results in your 1.1.x node starting up before you intend it to.. and then when you start it again with the 1.0 paths, it doesn't try to change the paths? * To check if sstables needs migration, we look at the System directory. If it contains a directory for the status cf, we'll attempt a sstable migrating. This quote from Directories.java (thx driftx!) suggests that any starting of a 1.1 node, which would result in a Status columnfamily being created, would make sstablesNeedsMigration return false. If this is your case due to the use of the debian package or similar which auto-starts, your input is welcomed at : https://issues.apache.org/jira/browse/CASSANDRA-2356 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com (mailto:rc...@palominodb.com) YAHOO - rcoli.palominob SKYPE - rcoli_palominodb Southfield Public School students safely access the tech tools they need on and off campus with the Barracuda Web Filter. Quick installation and easy to use- try the Barracuda Web Filter free for 30 days: http://on.fb.me/Vj6JBd
Re: Script to load sstables from v1.0.x to v 1.1.x
On Tue, Jan 8, 2013 at 11:56 AM, Todd Nine todd.n...@gmail.com wrote: Our current production cluster is still on 1.0.x, so we can either fire up a 1.0.x cluster, then upgrade every node to accomplish this, or just use the script. No 1.0 cluster is required to restore 1.0 directory structure to a 1.1 cluster and have the tables be migrated by Cassandra. The 1.1 node should look at the 1.0 directory structure you just restored and migrate it automagically. We also have a different number of nodes in stage vs production, so we'd still need to run a repair if we did a straight sstable copy. This is a compelling reason to bulk load. My commentary merely points out that if you *aren't* changing cluster size/topology, Cassandra 1.1 should be migrating the sstables for you. :) =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: inconsistent hadoop/cassandra results
Assuming their were no further writes, running repair or using CL all should have fixed it. Can you describe the inconsistency between runs? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 8/01/2013, at 2:16 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I need some help understanding unexpected behavior I saw in some recent experiments with Cassandra 1.1.5 and Hadoop 1.0.3: I've written a small map/reduce job that simply counts the number of columns in each row of a static CF (call it Foo) and generates a list of every row and column count. A relatively small fraction of the rows have a large number of columns; worst case is approximately 36 million. So when I set up the job, I used wide-row support: ConfigHelper.setInputColumnFamily(job.getConfiguration(), fooKS, Foo, WIDE_ROWS); // where WIDE_ROWS == true When I ran this job using the default CL (1) I noticed that the results varied from run to run, which I attributed to inconsistent replicas, since Foo was generated with CL == 1 and the RF == 3. So I ran repair for that CF on every node. The cassandra log on every node contains lines similar to: INFO [AntiEntropyStage:1] 2013-01-05 20:38:48,605 AntiEntropyService.java (line 778) [repair #e4a1d7f0-579d-11e2--d64e0a75e6df] Foo is fully synced However, repeated runs were still inconsistent. Then I set CL to ALL, which I presumed would always result in identical output, but repeated runs initially continued to be inconsistent. However, I noticed that the results seemed to be converging, and after several runs (somewhere between 4 and 6) I finally was producing identical results on every run. Then I set CL to QUORUM, and again generated inconsistent results. Does this behavior make sense? Brian
JIRA for native IAuthorizer and IAuthenticator ?
I am very interested in the native IAuthorizer and IAuthenticator implementation. However, I can't find a JIRA entry to follow in the 1.2.1 [1] or 1.2.2 [2] issues page. does anybody know about it ? thanks ! [1] https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%221.2.1%22%20AND%20project%20%3D%20CASSANDRA [2] https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%221.2.2%22%20AND%20project%20%3D%20CASSANDRA On Wed, Jan 2, 2013 at 7:00 AM, Sylvain Lebresne sylv...@datastax.comwrote: The Cassandra team wishes you a very happy new year 2013, and is very pleased to announce the release of Apache Cassandra version 1.2.0. Cassandra 1.2.0 is a new major release for the Apache Cassandra distributed database. This version adds numerous improvements[1,2] including (but not restricted to): - Virtual nodes[4] - The final version of CQL3 (featuring many improvements) - Atomic batches[5] - Request tracing[6] - Numerous performance improvements[7] - A new binary protocol for CQL3[8] - Improved configuration options[9] - And much more... Please make sure to carefully read the release notes[2] before upgrading. Both source and binary distributions of Cassandra 1.2.0 can be downloaded at: http://cassandra.apache.org/download/ Or you can use the debian package available from the project APT repository[3] (you will need to use the 12x series). The Cassandra Team [1]: http://goo.gl/JmKp3 (CHANGES.txt) [2]: http://goo.gl/47bFz (NEWS.txt) [3]: http://wiki.apache.org/cassandra/DebianPackaging [4]: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 [5]: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2 [6]: http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 [7]: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 [8]: http://www.datastax.com/dev/blog/binary-protocol [9]: http://www.datastax.com/dev/blog/configuration-changes-in-cassandra-1-2 -- Frank Hsueh | frank.hs...@gmail.com
Re: How long does it take for a write to actually happen?
EC2 m1.large node You will have a much happier time if you use a m1.xlarge. We set MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=400M Thats a pretty low new heap size. checks for new entries (in Entries CF, with indexed column status=1), processes them, and sets the status to 2, when done This is not the best data model. You may be better have one CF for the unprocessed and one for the process. Or if you really need a queue using something like Kafka. I will appreciate any advice on how to speed the writes up, Writes are instantly available for reading. The first thing I would do is see where the delay is. Use the nodetool cfstats to see the local write latency, or track the write latency from the client perspective. If you are looking for near real time / continuous computation style processing take a look at http://storm-project.net/ and register for this talk from a Brian O'Neill one of my fellow Data Stax MVP's http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 9/01/2013, at 5:48 AM, Vitaly Sourikov vitaly.souri...@gmail.com wrote: Hi, we are currently at an early stage of our project and have only one Cassandra 1.1.7 node hosted on EC2 m1.large node, where the data is written to the ephemeral disk, and /var/lib/cassandra/data is just a soft link to it. Commit logs and caches are still on /var/lib/cassandra/. We set MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=400M On the client-side, we use Astyanax 1.56.18 to access the data. We have a processing server that writes to Cassandra, and an online server that reads from it. The former wakes up every 0.5-5sec., checks for new entries (in Entries CF, with indexed column status=1), processes them, and sets the status to 2, when done. The online server checks once a second if an entry that should be processed got the status 2 and sends it to its client side for display. Processing takes 5-10 seconds and updates various columns in the Entries CF few times on the way. One of these columns may contain ~12KB of textual data, others are just short strings or numbers. Now, our problem is that it takes 20-40 seconds before the online server actually sees the change - and it is way too long, this process is supposed to be nearly real-time. Moreover, in sqlsh, if I perform a similar update, it is immediately seen in the following select results, but the updates from the back-end server also do not appear for 20-40 seconds. I tried switching the row caches for that table and in yaml on and of. I tried commitlog_sync: batch with commitlog_sync_batch_window_in_ms: 50. Nothing helped. I will appreciate any advice on how to speed the writes up, or at least an explanation why this happens. thanks, Vitaly
Re: Date Index?
There has to be one equality clause in there, and thats the thing to cassandra uses to select of disk. The others are in memory filters. So if you have one on the year+month you can have a simple select clause and it limits the amount of data that has to be read. If you have like many 10's to 100's millions of things in the same month you may want to do some performance testing. There can still be times when you want to support common read paths by using custom / hand rolled indexes. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 9/01/2013, at 6:05 AM, stephen.m.thomp...@wellsfargo.com wrote: Hi folks – Question about secondary indexes. How are people doing date indexes?I have a date column in my tables in RDBMS that we use frequently, such as look at all records recorded in the last month. What is the best practice for being able to do such a query? It seems like there could be an advantage to adding a couple of columns like this: {timestamp=2013/01/08 12:32:01 -0500} {month=201301} {day=08} And then I could do secondary index on the month and day columns? Would that be the best way to do something like this? Is there any accepted “best practice” on this yet? Thanks! Steve