Re: quorum calculation seems to depend on previous selected nodes
On 18 January 2011 07:15, Samuel Benz samuel.b...@switch.ch wrote: On 01/17/2011 09:28 PM, Jonathan Ellis wrote: On Mon, Jan 17, 2011 at 2:10 PM, Samuel Benz samuel.b...@switch.ch wrote: Case1: If 'TEST' was previous stored on Node1, Node2, Node3 - The update will succeed. Case2: If 'TEST' was previous stored on Node2, Node3, Node4 - The update will not work. If you have RF=2 then it will be stored on 2 nodes, not 3. I think this is the source of the confusion. I checked the existence of the row on the different serverver with sstablekeys after flushing. So I saw three copies of every key in the cluster. If you want to be guaranteed to be able to read with two nodes down and RF=3, you have to read at CL.ONE, since if the two nodes that are down are replicas of the data you are reading (as in the 2nd case here) Cassandra will be unable to achieve quorum (quorum of 3 is 2 live nodes). Now it seems clear to me. Thanks! I was confused by the fact that: live nodes != replica live nodes Correct me if I'm wrong, but even in a cluster with 1000 nodes and RF=3, if I shut down the wrong two nodes, i have the same problem as in my mini cluster. Correct -- Sam
RE: Super CF or two CFs?
Some of the fields are indeed written in one shot, but others (such as label and categories) are added later, so I think the question still stands. Hugo. From: dri...@gmail.com Date: Mon, 17 Jan 2011 18:47:28 -0600 Subject: Re: Super CF or two CFs? To: user@cassandra.apache.org On Mon, Jan 17, 2011 at 5:12 PM, Steven Mac ugs...@hotmail.com wrote: I guess I was maybe trying to simplify the question too much. In reality I do not have one volatile part, but multiple ones (say all trading data of day). Each would be a supercolumn identified by the time slot, with the individual fields as subcolumns. If you're always going to write these attributes in one shot, then just serialize them and use a simple CF, there's no need for a SCF. -Brandon
Re: Tombstone lifespan after multiple deletions
Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Cassandra/Hadoop only write few columns
Hi, I'm working on ColumnFamilyOutputFormat and for some reasons my reduce class does not write all columns to cassandra. I tried to modify mapreduce.output.columnfamilyoutputformat.batch.threshold with some different values (1, 8, .. etc) but no thing changes. What i'm having in my reduce class is : ArrayListMutation a = new ArrayListMutation(); a.add(getMutation(colNam1, val1)); a.add(getMutation(colNam2, val2)); a.add(getMutation(colNam2, val2)); ...etc context.write(key,a); Only 2 columns are written in to cassandra, and no error log is found on both hadoop and cassandra log. Any help is appreciated. Thanks, Trung.
Re: Super CF or two CFs?
With regard to overwrites, and assuming you always want to get all the data for a stock ticker. Any read on the volatile data will potentially touch many sstables, this IO is unavoidable to read this data so we may as well read as many cols as possible at this time. Whereas if you split the data into two cf's you would incure all the IO for the volatile data plus IO for the non volatile, and have to make two calls. (Or use different keys and make a multiget_slice call, the IO argument still stands) Thanks to compaction less volatile data, say cols that are written once a day, week or month, will be tend to accrete into fewer sstables. To that end it may make sense to schedule compactions to run after weekly bulk operations. Also take a look at the per CF compaction thresholds. I'd recommend trying one standard CF (with the quotes packed as suggested) to start with, run some tests and let us know how you go. There are some small penalties to using super Cfs, see the limitations page on the wiki. Hope that helps. Aaron On 18/01/2011, at 9:29 PM, Steven Mac ugs...@hotmail.com wrote: Some of the fields are indeed written in one shot, but others (such as label and categories) are added later, so I think the question still stands. Hugo. From: dri...@gmail.com Date: Mon, 17 Jan 2011 18:47:28 -0600 Subject: Re: Super CF or two CFs? To: user@cassandra.apache.org On Mon, Jan 17, 2011 at 5:12 PM, Steven Mac ugs...@hotmail.com wrote: I guess I was maybe trying to simplify the question too much. In reality I do not have one volatile part, but multiple ones (say all trading data of day). Each would be a supercolumn identified by the time slot, with the individual fields as subcolumns. If you're always going to write these attributes in one shot, then just serialize them and use a simple CF, there's no need for a SCF. -Brandon
Is there a concept of a session
Hi All, Is there a concept of a session? I would like to log-in(authenticate) one time into the Cassandra, and then subsequently access the Cassandra without authenticating again. Thanks, Indika
Re: Is there a concept of a session
Yes, the client should maintain it's connection to the cluster. The connection holds the login credentials and the keyspace to use. This is normally managed by the client, which one are you using? Aaron On 18/01/2011, at 9:58 PM, indika kumara indika.k...@gmail.com wrote: Hi All, Is there a concept of a session? I would like to log-in(authenticate) one time into the Cassandra, and then subsequently access the Cassandra without authenticating again. Thanks, Indika
Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?
I can't comment of phpcassa directly, but we use Cassandra plus PHP in production without any difficulties. We are happy with the performance. Most of the information we needed to get started we found here: https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP This includes details on how to compile the native PHP C Extension for Thrift. We use a bespoke client which wraps the Thrift interface. You may be better of with a higher level client, although when we were starting out there was less of a push away from Thrift directly. I found using Thrift useful as you gain an appreciation for what calls Cassandra actually supports. One potential advantage of using a higher level client is that it may protect you from the frequent Thrift interface changes which currently seem to accompany every major release. Dave On Tuesday, 18 January 2011, Tyler Hobbs ty...@riptano.com wrote: 1. ) Is it devloped to the level in order to support all the necessary features to take full advantage of Cassandra? Yes. There aren't some of the niceties of pycassa yet, but you can do everything that Cassandra offers with it. 2. ) Is it used in production by anyone ? Yes, I've talked to a few people at least who are using it in production. It tends to play a limited role instead of a central one, though. 3. ) What are its limitations? Being written in PHP. Seriously. The lack of universal 64bit integer support can be problematic if you don't have a fully 64bit system. PHP is fairly slow. PHP makes a few other things less easy to do. If you're doing some pretty lightweight interaction with Cassandra through PHP, these might not be a problem for you. - Tyler -- *Dave Gardner* Technical Architect [image: imagini_58mmX15mm.png] [image: VisualDNA-Logo-small.png] *Imagini Europe Limited* 7 Moor Street, London W1D 5NB [image: phone_icon.png] +44 20 7734 7033 [image: skype_icon.png] daveg79 [image: emailIcon.png] dave.gard...@imagini.net [image: icon-web.png] http://www.visualdna.com Imagini Europe Limited, Company number 5565112 (England and Wales), Registered address: c/o Bird Bird, 90 Fetter Lane, London, EC4A 1EQ, United Kingdom
Re: Cassandra/Hadoop only write few columns
May just be your example code, but you are repeating colName2 . Can you log the mutation list before you write it and confirm you have unique column names? Can you turn up the logging to DEBUG for the hadoop job and the Cassandra cluster to see what's happening? Aaron On 18/01/2011, at 9:40 PM, Trung Tran tran.hieutr...@gmail.com wrote: Hi, I'm working on ColumnFamilyOutputFormat and for some reasons my reduce class does not write all columns to cassandra. I tried to modify mapreduce.output.columnfamilyoutputformat.batch.threshold with some different values (1, 8, .. etc) but no thing changes. What i'm having in my reduce class is : ArrayListMutation a = new ArrayListMutation(); a.add(getMutation(colNam1, val1)); a.add(getMutation(colNam2, val2)); a.add(getMutation(colNam2, val2)); ...etc context.write(key,a); Only 2 columns are written in to cassandra, and no error log is found on both hadoop and cassandra log. Any help is appreciated. Thanks, Trung.
Re: Cassandra/Hadoop only write few columns
It was a typo in my example code in this email. I logged the list to make sure that everything was correct before trigger the write. Will try to enable debug on both cassandra and hadoop next. Thanks, Trung. On Tue, Jan 18, 2011 at 1:21 AM, Aaron Morton aa...@thelastpickle.com wrote: May just be your example code, but you are repeating colName2 . Can you log the mutation list before you write it and confirm you have unique column names? Can you turn up the logging to DEBUG for the hadoop job and the Cassandra cluster to see what's happening? Aaron On 18/01/2011, at 9:40 PM, Trung Tran tran.hieutr...@gmail.com wrote: Hi, I'm working on ColumnFamilyOutputFormat and for some reasons my reduce class does not write all columns to cassandra. I tried to modify mapreduce.output.columnfamilyoutputformat.batch.threshold with some different values (1, 8, .. etc) but no thing changes. What i'm having in my reduce class is : ArrayListMutation a = new ArrayListMutation(); a.add(getMutation(colNam1, val1)); a.add(getMutation(colNam2, val2)); a.add(getMutation(colNam2, val2)); ...etc context.write(key,a); Only 2 columns are written in to cassandra, and no error log is found on both hadoop and cassandra log. Any help is appreciated. Thanks, Trung.
Re: Tombstone lifespan after multiple deletions
AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Re: Is there a concept of a session
Hi Aaron, Thank you very much. I am going to use the hector client library. There is a method for creating a connection for a cluster in that library. But, inside the source code, I noticed that each time it calls 'login' method. Is there a server-side session? Thanks, Indika On Tue, Jan 18, 2011 at 3:07 PM, Aaron Morton aa...@thelastpickle.comwrote: Yes, the client should maintain it's connection to the cluster. The connection holds the login credentials and the keyspace to use. This is normally managed by the client, which one are you using? Aaron On 18/01/2011, at 9:58 PM, indika kumara indika.k...@gmail.com wrote: Hi All, Is there a concept of a session? I would like to log-in(authenticate) one time into the Cassandra, and then subsequently access the Cassandra without authenticating again. Thanks, Indika
Re: Is there a concept of a session
I'm just going to assume Hector is doing the right thing, and you probably can as well :) Have you checked out the documentation here ? http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf (also yes the session is server side, each connection has a thread on the server it connects to) Aaron On 18/01/2011, at 10:40 PM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, Thank you very much. I am going to use the hector client library. There is a method for creating a connection for a cluster in that library. But, inside the source code, I noticed that each time it calls 'login' method. Is there a server-side session? Thanks, Indika On Tue, Jan 18, 2011 at 3:07 PM, Aaron Morton aa...@thelastpickle.com wrote: Yes, the client should maintain it's connection to the cluster. The connection holds the login credentials and the keyspace to use. This is normally managed by the client, which one are you using? Aaron On 18/01/2011, at 9:58 PM, indika kumara indika.k...@gmail.com wrote: Hi All, Is there a concept of a session? I would like to log-in(authenticate) one time into the Cassandra, and then subsequently access the Cassandra without authenticating again. Thanks, Indika
RE: Super CF or two CFs?
Thanks for the answer. It provides me the insight I'm looking for. However, I'm also a bit confused as your first paragraph seems to indicate that using a SCF is better, whereas the last sentence states just the opposite. Do I interpret correctly that this is because of the compactions that put all non-volatile data together in one sstable, leading to compact sstable if the non-volatile data is put into a separate CF? Can this then be generalised into a rule of thumb to separate non-volatile data from volatile data into separate CFs, or am I going too far then? I will definitely be trying out both suggestions and post my findings. Hugo. Subject: Re: Super CF or two CFs? From: aa...@thelastpickle.com Date: Tue, 18 Jan 2011 21:54:25 +1300 To: user@cassandra.apache.org With regard to overwrites, and assuming you always want to get all the data for a stock ticker. Any read on the volatile data will potentially touch many sstables, this IO is unavoidable to read this data so we may as well read as many cols as possible at this time. Whereas if you split the data into two cf's you would incure all the IO for the volatile data plus IO for the non volatile, and have to make two calls. (Or use different keys and make a multiget_slice call, the IO argument still stands) Thanks to compaction less volatile data, say cols that are written once a day, week or month, will be tend to accrete into fewer sstables. To that end it may make sense to schedule compactions to run after weekly bulk operations. Also take a look at the per CF compaction thresholds. I'd recommend trying one standard CF (with the quotes packed as suggested) to start with, run some tests and let us know how you go. There are some small penalties to using super Cfs, see the limitations page on the wiki. Hope that helps.Aaron On 18/01/2011, at 9:29 PM, Steven Mac ugs...@hotmail.com wrote: Some of the fields are indeed written in one shot, but others (such as label and categories) are added later, so I think the question still stands. Hugo. From: dri...@gmail.com Date: Mon, 17 Jan 2011 18:47:28 -0600 Subject: Re: Super CF or two CFs? To: user@cassandra.apache.org On Mon, Jan 17, 2011 at 5:12 PM, Steven Mac ugs...@hotmail.com wrote: I guess I was maybe trying to simplify the question too much. In reality I do not have one volatile part, but multiple ones (say all trading data of day). Each would be a supercolumn identified by the time slot, with the individual fields as subcolumns. If you're always going to write these attributes in one shot, then just serialize them and use a simple CF, there's no need for a SCF. -Brandon
Re: Is there a concept of a session
Thanks Aaron... Hector cannot uses strategies such as cookies for maintaining session, so it has to make the authentication call each time? In the Cassandra server, I see 'ThreadLocalClientState'. It keeps the session information? How long is a session alive? Does the connection means a TCP connection? is it a persistent connection - send and receive multiple requests/responses? Thanks, Indika On Tue, Jan 18, 2011 at 3:48 PM, Aaron Morton aa...@thelastpickle.comwrote: I'm just going to assume Hector is doing the right thing, and you probably can as well :) Have you checked out the documentation here ? http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf (also yes the session is server side, each connection has a thread on the server it connects to) Aaron On 18/01/2011, at 10:40 PM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, Thank you very much. I am going to use the hector client library. There is a method for creating a connection for a cluster in that library. But, inside the source code, I noticed that each time it calls 'login' method. Is there a server-side session? Thanks, Indika On Tue, Jan 18, 2011 at 3:07 PM, Aaron Morton aa...@thelastpickle.com aa...@thelastpickle.com wrote: Yes, the client should maintain it's connection to the cluster. The connection holds the login credentials and the keyspace to use. This is normally managed by the client, which one are you using? Aaron On 18/01/2011, at 9:58 PM, indika kumara indika.k...@gmail.com indika.k...@gmail.com wrote: Hi All, Is there a concept of a session? I would like to log-in(authenticate) one time into the Cassandra, and then subsequently access the Cassandra without authenticating again. Thanks, Indika
Java cient
What is the most commonly used java client library? Which is the the most mature/feature complete? Noble
Re: Super CF or two CFs?
Sorry was not suggesting super CF is better in the first para, I think it applies to any CF. The role of compaction is to (among other things) reduce the number of SSTables for each CF. The logical endpoint of this process would be a single file for each CF, giving the lowest possible IO. The volatility of your data (overwrites and new colums for a row) fights against this process. In reality it will not get to that endstate. Even in the best case I think it will only go down to 3 sstables. See http://wiki.apache.org/cassandra/MemtableSSTable If you do have a some data that is highly volatile, and you have performance problems. Then changing compaction thresholds is a recommended approach I think. See the comments in Cassandra.yaml. My argument is for you to keep data in one CF if you want to read it together. As always store the data to serve the read requests. Do some tests and see where your bottle necks may be for your HW and usage. I may be wrong. IMHO in this discussion Super or Standard CF will make little performance difference, other the super CF limitations mentioned. Aaron On 18/01/2011, at 11:14 PM, Steven Mac ugs...@hotmail.com wrote: Thanks for the answer. It provides me the insight I'm looking for. However, I'm also a bit confused as your first paragraph seems to indicate that using a SCF is better, whereas the last sentence states just the opposite. Do I interpret correctly that this is because of the compactions that put all non-volatile data together in one sstable, leading to compact sstable if the non-volatile data is put into a separate CF? Can this then be generalised into a rule of thumb to separate non-volatile data from volatile data into separate CFs, or am I going too far then? I will definitely be trying out both suggestions and post my findings. Hugo. Subject: Re: Super CF or two CFs? From: aa...@thelastpickle.com Date: Tue, 18 Jan 2011 21:54:25 +1300 To: user@cassandra.apache.org With regard to overwrites, and assuming you always want to get all the data for a stock ticker. Any read on the volatile data will potentially touch many sstables, this IO is unavoidable to read this data so we may as well read as many cols as possible at this time. Whereas if you split the data into two cf's you would incure all the IO for the volatile data plus IO for the non volatile, and have to make two calls. (Or use different keys and make a multiget_slice call, the IO argument still stands) Thanks to compaction less volatile data, say cols that are written once a day, week or month, will be tend to accrete into fewer sstables. To that end it may make sense to schedule compactions to run after weekly bulk operations. Also take a look at the per CF compaction thresholds. I'd recommend trying one standard CF (with the quotes packed as suggested) to start with, run some tests and let us know how you go. There are some small penalties to using super Cfs, see the limitations page on the wiki. Hope that helps. Aaron On 18/01/2011, at 9:29 PM, Steven Mac ugs...@hotmail.com wrote: Some of the fields are indeed written in one shot, but others (such as label and categories) are added later, so I think the question still stands. Hugo. From: dri...@gmail.com Date: Mon, 17 Jan 2011 18:47:28 -0600 Subject: Re: Super CF or two CFs? To: user@cassandra.apache.org On Mon, Jan 17, 2011 at 5:12 PM, Steven Mac ugs...@hotmail.com wrote: I guess I was maybe trying to simplify the question too much. In reality I do not have one volatile part, but multiple ones (say all trading data of day). Each would be a supercolumn identified by the time slot, with the individual fields as subcolumns. If you're always going to write these attributes in one shot, then just serialize them and use a simple CF, there's no need for a SCF. -Brandon
cassandra-cli: where a and b (works) vs. where b and a (doesn't)
I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
Re: Java cient
Hector is excellent. https://github.com/rantav/hector http://www.datastax.com/sites/default/files/hector-v2-client-doc.pdf 2011/1/18 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: What is the most commonly used java client library? Which is the the most mature/feature complete? Noble
Re: Is there a concept of a session
There are no cookies in thrift. All connection state is managed by the server. It's a tcp connection. Multiple request are sent over it,it stays around as long as the client wants it to. Try the Hector mailing list for details on it's implementation. Aaron On 18/01/2011, at 11:15 PM, indika kumara indika.k...@gmail.com wrote: Thanks Aaron... Hector cannot uses strategies such as cookies for maintaining session, so it has to make the authentication call each time? In the Cassandra server, I see 'ThreadLocalClientState'. It keeps the session information? How long is a session alive? Does the connection means a TCP connection? is it a persistent connection - send and receive multiple requests/responses? Thanks, Indika On Tue, Jan 18, 2011 at 3:48 PM, Aaron Morton aa...@thelastpickle.com wrote: I'm just going to assume Hector is doing the right thing, and you probably can as well :) Have you checked out the documentation here ? http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf (also yes the session is server side, each connection has a thread on the server it connects to) Aaron On 18/01/2011, at 10:40 PM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, Thank you very much. I am going to use the hector client library. There is a method for creating a connection for a cluster in that library. But, inside the source code, I noticed that each time it calls 'login' method. Is there a server-side session? Thanks, Indika On Tue, Jan 18, 2011 at 3:07 PM, Aaron Morton aa...@thelastpickle.com wrote: Yes, the client should maintain it's connection to the cluster. The connection holds the login credentials and the keyspace to use. This is normally managed by the client, which one are you using? Aaron On 18/01/2011, at 9:58 PM, indika kumara indika.k...@gmail.com wrote: Hi All, Is there a concept of a session? I would like to log-in(authenticate) one time into the Cassandra, and then subsequently access the Cassandra without authenticating again. Thanks, Indika
Re: Java cient
http://wiki.apache.org/cassandra/ClientOptions Hector On 18/01/2011, at 11:48 PM, Noble Paul നോബിള് नोब्ळ्noble.p...@gmail.com wrote: What is the most commonly used java client library? Which is the the most mature/feature complete? Noble
Re: Is there a concept of a session
Thanks Aaron. I will look into codebase. Thanks, Indika On Tue, Jan 18, 2011 at 4:55 PM, Aaron Morton aa...@thelastpickle.comwrote: There are no cookies in thrift. All connection state is managed by the server. It's a tcp connection. Multiple request are sent over it,it stays around as long as the client wants it to. Try the Hector mailing list for details on it's implementation. Aaron On 18/01/2011, at 11:15 PM, indika kumara indika.k...@gmail.com wrote: Thanks Aaron... Hector cannot uses strategies such as cookies for maintaining session, so it has to make the authentication call each time? In the Cassandra server, I see 'ThreadLocalClientState'. It keeps the session information? How long is a session alive? Does the connection means a TCP connection? is it a persistent connection - send and receive multiple requests/responses? Thanks, Indika On Tue, Jan 18, 2011 at 3:48 PM, Aaron Morton aa...@thelastpickle.com aa...@thelastpickle.com wrote: I'm just going to assume Hector is doing the right thing, and you probably can as well :) Have you checked out the documentation here ? http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf (also yes the session is server side, each connection has a thread on the server it connects to) Aaron On 18/01/2011, at 10:40 PM, indika kumara indika.k...@gmail.com indika.k...@gmail.com wrote: Hi Aaron, Thank you very much. I am going to use the hector client library. There is a method for creating a connection for a cluster in that library. But, inside the source code, I noticed that each time it calls 'login' method. Is there a server-side session? Thanks, Indika On Tue, Jan 18, 2011 at 3:07 PM, Aaron Morton aa...@thelastpickle.comaa...@thelastpickle.com aa...@thelastpickle.com wrote: Yes, the client should maintain it's connection to the cluster. The connection holds the login credentials and the keyspace to use. This is normally managed by the client, which one are you using? Aaron On 18/01/2011, at 9:58 PM, indika kumara indika.k...@gmail.comindika.k...@gmail.com indika.k...@gmail.com wrote: Hi All, Is there a concept of a session? I would like to log-in(authenticate) one time into the Cassandra, and then subsequently access the Cassandra without authenticating again. Thanks, Indika
Re: cassandra-cli: where a and b (works) vs. where b and a (doesn't)
Does wrapping foo in single quotes help? Also, does this help http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes Aaron On 18/01/2011, at 11:54 PM, Timo Nentwig timo.nent...@toptarif.de wrote: I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
Re: cassandra-cli: where a and b (works) vs. where b and a (doesn't)
On Jan 18, 2011, at 12:02, Aaron Morton wrote: Does wrapping foo in single quotes help? No. Also, does this help http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes Actually this doesn't even compile because addGtExpression expects a String type (?!). StringSerializer ss = StringSerializer.get(); IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, ss, ss, ss); indexedSlicesQuery.setColumnNames(full_name, birth_date, state); indexedSlicesQuery.addGtExpression(birth_date, 1970L); indexedSlicesQuery.addEqualsExpression(state, UT); indexedSlicesQuery.setColumnFamily(users); indexedSlicesQuery.setStartKey(); QueryResultOrderedRowsString, String, String result = indexedSlicesQuery.execute(); Aaron On 18/01/2011, at 11:54 PM, Timo Nentwig timo.nent...@toptarif.de wrote: I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
Re: Multi-tenancy, and authentication and authorization
Moving to user list On Tue, Jan 18, 2011 at 4:05 PM, Aaron Morton aa...@thelastpickle.comwrote: Have a read about JVM heap sizing here http://wiki.apache.org/cassandra/MemtableThresholds If you let people create keyspaces with a mouse click you will soon run out of memory. I use Cassandra to provide a self service storage service at my organisation. All virtual databases operate in the same Cassandra keyspace (which does not change), and I use namespaces in the keys to separate things. Take a look at how amazon S3 works, it may give you some ideas. If you want to continue to discussion let's move this to the user list. A On 17/01/2011, at 7:44 PM, indika kumara indika.k...@gmail.com wrote: Hi Stu, In our app, we would like to offer cassandra 'as-is' to tenants. It that case, each tenant should be able to create Keyspaces as needed. Based on the authorization, I expect to implement it. In my view, the implementation options are as follows. 1) The name of a keyspace would be 'the actual keyspace name' + 'tenant ID' 2) The name of a keyspace would not be changed, but the name of a column family would be the 'the actual column family name' + 'tenant ID'. It is needed to keep a separate mapping for keyspace vs tenants. 3) The name of a keypace or a column family would not be changed, but the name of a column would be 'the actual column name' + 'tenant ID'. It is needed to keep separate mappings for keyspace vs tenants and column family vs tenants Could you please give your opinions on the above three options? if there are any issue regarding above approaches and if those issues can be solved, I would love to contribute on that. Thanks, Indika On Fri, Jan 7, 2011 at 11:22 AM, Stu Hood stuh...@gmail.com wrote: (1) has the problem of multiple memtables (a large amount just isn't viable There are some very straightforward solutions to this particular problem: I wouldn't rule out running with a very large number of keyspace/columnfamilies given some minor changes. As Brandon said, some of the folks that were working on multi-tenancy for Cassandra are no longer focused on it. But the code that was generated during our efforts is very much available, and is unlikely to have gone stale. Would love to talk about this with you. Thanks, Stu On Thu, Jan 6, 2011 at 8:08 PM, indika kumara indika.k...@gmail.com wrote: Thank you very much Brandon! On Fri, Jan 7, 2011 at 12:40 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Jan 6, 2011 at 12:33 PM, indika kumara indika.k...@gmail.com wrote: Hi Brandon, I would like you feedback on my two ideas for implementing mufti tenancy with the existing implementation. Would those be possible to implement? Thanks, Indika Two vague ideas: (1) qualified keyspaces (by the tenet domain) (2) multiple Cassandra storage configurations in a single node (one per tenant). For both options, the resource hierarchy would be /cassandra/ cluster_name/tenant name (domain)/keyspaces/ks_name/ (1) has the problem of multiple memtables (a large amount just isn't viable right now.) (2) more or less has the same problem, but in JVM instances. I would suggest a) not trying to offer cassandra itself, and instead build a service that uses cassandra under the hood, and b) splitting up tenants in this layer. -Brandon
Re: Multi-tenancy, and authentication and authorization
Hi Aaron, I appreciate your help. I am a newbie to Cassandra - just began to study the code-base. Do you suggest the following approach? *1) No changes are in either keyspace names or column family names but the row-key would be ‘the actual row key’ + 'tenant ID'. It is needed to keep separate mappings for keyspace vs tenants and column family vs tenants (can be a form of authorization).* 2) *keep a keyspace per tenant yet expose virtually as many keyspaces.* 3)* A single keyspace for all tenant * What do you mean by 'use namespaces in the keys'? Can a key be an QName? Thanks, Indika On Tue, Jan 18, 2011 at 5:26 PM, indika kumara indika.k...@gmail.comwrote: Moving to user list On Tue, Jan 18, 2011 at 4:05 PM, Aaron Morton aa...@thelastpickle.comwrote: Have a read about JVM heap sizing here http://wiki.apache.org/cassandra/MemtableThresholds If you let people create keyspaces with a mouse click you will soon run out of memory. I use Cassandra to provide a self service storage service at my organization. All virtual databases operate in the same Cassandra keyspace (which does not change), and I use namespaces in the keys to separate things. Take a look at how amazon S3 works, it may give you some ideas. If you want to continue to discussion let's move this to the user list. A On 17/01/2011, at 7:44 PM, indika kumara indika.k...@gmail.com wrote: Hi Stu, In our app, we would like to offer cassandra 'as-is' to tenants. It that case, each tenant should be able to create Keyspaces as needed. Based on the authorization, I expect to implement it. In my view, the implementation options are as follows. 1) The name of a keyspace would be 'the actual keyspace name' + 'tenant ID' 2) The name of a keyspace would not be changed, but the name of a column family would be the 'the actual column family name' + 'tenant ID'. It is needed to keep a separate mapping for keyspace vs tenants. 3) The name of a keypace or a column family would not be changed, but the name of a column would be 'the actual column name' + 'tenant ID'. It is needed to keep separate mappings for keyspace vs tenants and column family vs tenants Could you please give your opinions on the above three options? if there are any issue regarding above approaches and if those issues can be solved, I would love to contribute on that. Thanks, Indika On Fri, Jan 7, 2011 at 11:22 AM, Stu Hood stuh...@gmail.com wrote: (1) has the problem of multiple memtables (a large amount just isn't viable There are some very straightforward solutions to this particular problem: I wouldn't rule out running with a very large number of keyspace/columnfamilies given some minor changes. As Brandon said, some of the folks that were working on multi-tenancy for Cassandra are no longer focused on it. But the code that was generated during our efforts is very much available, and is unlikely to have gone stale. Would love to talk about this with you. Thanks, Stu On Thu, Jan 6, 2011 at 8:08 PM, indika kumara indika.k...@gmail.com wrote: Thank you very much Brandon! On Fri, Jan 7, 2011 at 12:40 AM, Brandon Williams dri...@gmail.com wrote: On Thu, Jan 6, 2011 at 12:33 PM, indika kumara indika.k...@gmail.com wrote: Hi Brandon, I would like you feedback on my two ideas for implementing mufti tenancy with the existing implementation. Would those be possible to implement? Thanks, Indika Two vague ideas: (1) qualified keyspaces (by the tenet domain) (2) multiple Cassandra storage configurations in a single node (one per tenant). For both options, the resource hierarchy would be /cassandra/ cluster_name/tenant name (domain)/keyspaces/ks_name/ (1) has the problem of multiple memtables (a large amount just isn't viable right now.) (2) more or less has the same problem, but in JVM instances. I would suggest a) not trying to offer cassandra itself, and instead build a service that uses cassandra under the hood, and b) splitting up tenants in this layer. -Brandon
Re: Java cient
We moved over to Hector when we went to Cassandra 0.7, it was a painless and worthwhile experience. What is the most commonly used java client library? Which is the the most mature/feature complete? --Jools
Re: Java cient
Definitelly Pelops https://github.com/s7/scale7-pelops 2011/1/18 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com What is the most commonly used java client library? Which is the the most mature/feature complete? Noble
Re: Tombstone lifespan after multiple deletions
Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.comwrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Re: Tombstone lifespan after multiple deletions
On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Re: Tombstone lifespan after multiple deletions
Thanks. On Tue, Jan 18, 2011 at 3:55 PM, Sylvain Lebresne sylv...@riptano.comwrote: On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?
I think we might need to go with full Java implementation only, in that case, to live up with Hector as we do not find any other better option. @Dave: Thanks for the links but we wouldn't much prefer to go with thrift implementation because of frequently changing api and other complexities there. Also we would not like to lock ourselves with implementation in a language with a client option that has limitations that we can bear now but not necessarily in future. If anybody else has a better solution to this please let me know. Thank you all. Ertio Lew On Tue, Jan 18, 2011 at 2:49 PM, Dave Gardner dave.gard...@imagini.net wrote: I can't comment of phpcassa directly, but we use Cassandra plus PHP in production without any difficulties. We are happy with the performance. Most of the information we needed to get started we found here: https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP This includes details on how to compile the native PHP C Extension for Thrift. We use a bespoke client which wraps the Thrift interface. You may be better of with a higher level client, although when we were starting out there was less of a push away from Thrift directly. I found using Thrift useful as you gain an appreciation for what calls Cassandra actually supports. One potential advantage of using a higher level client is that it may protect you from the frequent Thrift interface changes which currently seem to accompany every major release. Dave On Tuesday, 18 January 2011, Tyler Hobbs ty...@riptano.com wrote: 1. ) Is it devloped to the level in order to support all the necessary features to take full advantage of Cassandra? Yes. There aren't some of the niceties of pycassa yet, but you can do everything that Cassandra offers with it. 2. ) Is it used in production by anyone ? Yes, I've talked to a few people at least who are using it in production. It tends to play a limited role instead of a central one, though. 3. ) What are its limitations? Being written in PHP. Seriously. The lack of universal 64bit integer support can be problematic if you don't have a fully 64bit system. PHP is fairly slow. PHP makes a few other things less easy to do. If you're doing some pretty lightweight interaction with Cassandra through PHP, these might not be a problem for you. - Tyler -- *Dave Gardner* Technical Architect [image: imagini_58mmX15mm.png] [image: VisualDNA-Logo-small.png] *Imagini Europe Limited* 7 Moor Street, London W1D 5NB [image: phone_icon.png] +44 20 7734 7033 [image: skype_icon.png] daveg79 [image: emailIcon.png] dave.gard...@imagini.net [image: icon-web.png] http://www.visualdna.com Imagini Europe Limited, Company number 5565112 (England and Wales), Registered address: c/o Bird Bird, 90 Fetter Lane, London, EC4A 1EQ, United Kingdom
Re: Question re: the use of multiple ColumnFamilies
Sorry for the delayed reply, but thanks very much - this pointed me at the exact problem. I found that the queue size here was equal to the number of configured DataFileDirectories, so a good test was to lie to Cassandra and claim that there were more DataFileDirectories than I needed. Interestingly, it still only ever wrote to the first configured DataFileDirectory, but it certainly eliminated the problem, which I think means that for my use case at least, it will be good enough to patch Cassandra to introduce more control of the queue size. On 08/01/11 18:20, Peter Schuller wrote: [multiple active cf;s, often triggering flush at the same time] Can anyone confirm whether or not this behaviour is expected, and suggest anything that I could do about it? This is on 0.6.6, by the way. Patched with time-to-live code, if that makes a difference. I looked at the code (trunk though, not 0.6.6) and was a bit surprised. There seems to be a single shared (static) executor for the sorting and writing stages of memtable flushing (so far so good). But what I didn't expect was that they seem to have a work queue of a size equal to the concurrency. In the case of the writer, the concurrency is the memtable_flush_writers option (not available in 0.6.6). For the sorter, it is the number of CPU cores on the system. This makes sense for the concurrency aspect. If my understanding is correct and I am not missing something else, this means that for multiple column families you do indeed need to expect to have this problem. The more column families the greater the probability. What I expected to find was to see that each cf would be guaranteed to have at least one memtable in queue before writes would block for that cf. Assuming the same holds true in your case on 0.6.6 (it looks to be so on the 0.6 branch by quick examination), I would have to assume that either one of the following is true: (1) You have more cf:s actively written to than the number of CPU cores on your machine so that you're waiting on flushSorter. or (2) Your write speed is overall higher than what can be sustained by an sstable writer. If you are willing to patch Cassandra and do the appropriate testing, and are find with the implications on heap size, you should be able to work around this by adjusting the size of the work queues for the flushSorter and flushWriter in ColumnFamilyStory.java. Note that I did not test this, so proceed with caution if you do. It will definitely mean that you will eat more heap space if you submit writes to the cluster faster than they are processed. So in particular if you're relying on backpressure mechanisms to avoid causing problems when you do non-rate-limited writes to the cluster, results are probably negative. I'll file a bug about this to (1) elicit feedback if I'm wrong, and (2) to fix it. -- Andy Burgess Principal Development Engineer Application Delivery WorldPay Ltd. 270-289 Science Park, Milton Road Cambridge, CB4 0WE, United Kingdom (Depot Code: 024) Office: +44 (0)1223 706 779| Mobile: +44 (0)7909 534 940 andy.burg...@worldpay.com WorldPay (UK) Limited, Company No. 07316500. Registered Office: 55 Mansell Street, London E1 8AN Authorised and regulated by the Financial Services Authority. ‘WorldPay Group’ means WorldPay (UK) Limited and its affiliates from time to time. A reference to an “affiliate” means any Subsidiary Undertaking, any Parent Undertaking and any Subsidiary Undertaking of any such Parent Undertaking and reference to a “Parent Undertaking” or a “Subsidiary Undertaking” is to be construed in accordance with section 1162 of the Companies Act 2006, as amended. DISCLAIMER: This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from the WorldPay Group, are confidential and solely for the use of the intended recipient. If you are not the intended recipient (or authorised to receive for the intended recipient), you have received this email in error and any review, use, distribution or disclosure of its content is strictly prohibited. If you have received this email in error please notify the sender immediately by replying to this message. Please then delete this email and destroy any copies of it. Messages sent to and from the WorldPay Group may be monitored to ensure compliance with internal policies and to protect our business. Emails are not necessarily secure. The WorldPay Group does not accept responsibility for changes made to this message after it was sent. Please note that neither the WorldPay Group nor the sender accepts any responsibility for viruses and it is the responsibility of the recipient to ensure that the onward transmission, opening or use of this message and any attachments will not adversely affect its systems or data. Anyone who communicates with us by email is taken to accept these risks. Opinions, conclusions and other information
Re: Question re: the use of multiple ColumnFamilies
Sorry for the delayed reply, but thanks very much - this pointed me at the exact problem. I found that the queue size here was equal to the number of configured DataFileDirectories, so a good test was to lie to Cassandra and claim that there were more DataFileDirectories than I needed. Interestingly, it still only ever wrote to the first configured DataFileDirectory, but it certainly eliminated the problem, which I think means that for my use case at least, it will be good enough to patch Cassandra to introduce more control of the queue size. Based on your use case as you originally stated it (some cf:s that got written at a slow pace and just happened to flush at the same time), that should be enough. (If you have some CF:s being written to faster than they are flushed, there would still be potential for one CF to hog the flush writers unfairly.) -- / Peter Schuller
Re: cassandra-cli: where a and b (works) vs. where b and a (doesn't)
On Jan 18, 2011, at 12:05, Timo Nentwig wrote: On Jan 18, 2011, at 12:02, Aaron Morton wrote: Does wrapping foo in single quotes help? No. Also, does this help http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes Actually this doesn't even compile because addGtExpression expects a String type (?!). This works as expected: .addInsertion(now, cf, createColumn(rc, a, SS, StringSerializer.get())).execute(); while this doesn't: .addInsertion(now, cf, createColumn(rc, 97, SS, IntegerSerializer.get())).execute(); The only difference is that the IntegerSerializer pads the byte array with zeros. Shouldn't matter (?). But it does. I dumped both versions to JSON and reimported them[*]. Same behavior. Then I manually removed the trailing six zeros from the IntegerSerializer version and retried. Same behavior. [*] BTW when reimporting the JSON data the secondary indices are not being recreated. I had to remove the system keyspace and reimport the schema in order to trigger that... StringSerializer ss = StringSerializer.get(); IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, ss, ss, ss); indexedSlicesQuery.setColumnNames(full_name, birth_date, state); indexedSlicesQuery.addGtExpression(birth_date, 1970L); indexedSlicesQuery.addEqualsExpression(state, UT); indexedSlicesQuery.setColumnFamily(users); indexedSlicesQuery.setStartKey(); QueryResultOrderedRowsString, String, String result = indexedSlicesQuery.execute(); Aaron On 18/01/2011, at 11:54 PM, Timo Nentwig timo.nent...@toptarif.de wrote: I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
Re: Multi-tenancy, and authentication and authorization
Hi Aaron, I read some articles about the Cassandra, and now understand a little bit about trade-offs. I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant. I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance. Thanks, Indika
Re: cassandra-cli: where a and b (works) vs. where b and a (doesn't)
When doing mixed types on slicing operations, you should use ByteArraySerializer and handle the conversions by hand. We have an issue open for making this more graceful. On Tue, Jan 18, 2011 at 10:07 AM, Timo Nentwig timo.nent...@toptarif.de wrote: On Jan 18, 2011, at 12:05, Timo Nentwig wrote: On Jan 18, 2011, at 12:02, Aaron Morton wrote: Does wrapping foo in single quotes help? No. Also, does this help http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes Actually this doesn't even compile because addGtExpression expects a String type (?!). This works as expected: .addInsertion(now, cf, createColumn(rc, a, SS, StringSerializer.get())).execute(); while this doesn't: .addInsertion(now, cf, createColumn(rc, 97, SS, IntegerSerializer.get())).execute(); The only difference is that the IntegerSerializer pads the byte array with zeros. Shouldn't matter (?). But it does. I dumped both versions to JSON and reimported them[*]. Same behavior. Then I manually removed the trailing six zeros from the IntegerSerializer version and retried. Same behavior. [*] BTW when reimporting the JSON data the secondary indices are not being recreated. I had to remove the system keyspace and reimport the schema in order to trigger that... StringSerializer ss = StringSerializer.get(); IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, ss, ss, ss); indexedSlicesQuery.setColumnNames(full_name, birth_date, state); indexedSlicesQuery.addGtExpression(birth_date, 1970L); indexedSlicesQuery.addEqualsExpression(state, UT); indexedSlicesQuery.setColumnFamily(users); indexedSlicesQuery.setStartKey(); QueryResultOrderedRowsString, String, String result = indexedSlicesQuery.execute(); Aaron On 18/01/2011, at 11:54 PM, Timo Nentwig timo.nent...@toptarif.de wrote: I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
RE: please help with multiget
Well, maybe making a batch-get is not anymore efficient on the server side but without it, you can get bottlenecked on client-server connections and client resources. If the number of requests you want to batch is on the order of connections in your pool, then yes, making gets in parallel is as good or maybe better. But what if you want to batch thousands of requests? The server I can scale out, I would want to get my requests there without needing to wait for connections on my client to free up. I just don't really understand the reasoning for designing muliget_slice the way it is. I still think if you're gonna have a batch-get request (multiget_slice), you should be able to add to the batch a reasonable number of ANY corresponding non-batch get requests. And you can't do that... Plus, it's not symmetrical to the batch-mutate. Is there a good reason for that? From: Brandon Williams [dri...@gmail.com] Sent: Monday, January 17, 2011 5:09 PM To: user@cassandra.apache.org Cc: hector-us...@googlegroups.com Subject: Re: please help with multiget On Mon, Jan 17, 2011 at 6:53 PM, Shu Zhang szh...@mediosystems.commailto:szh...@mediosystems.com wrote: Here's the method declaration for quick reference: mapstring,listColumnOrSuperColumn multiget_slice(string keyspace, liststring keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) It looks like you must have the same SlicePredicate for every key in your batch retrieval, so what are you suppose to do when you need to retrieve different columns for different keys? Issue multiple gets in parallel yourself. Keep in mind that multiget is not an optimization, in fact, it can work against you when one key exceeds the rpc timeout, because you get nothing back. -Brandon
Re: Multi-tenancy, and authentication and authorization
Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that. The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment. Ed On Tue, Jan 18, 2011 at 9:40 AM, indika kumara indika.k...@gmail.comwrote: Hi Aaron, I read some articles about the Cassandra, and now understand a little bit about trade-offs. I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant. I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance. Thanks, Indika
changing the replication level on the fly
Hi, I've noticed in the new Cassandra 0.7.0 release that if I have a keyspace with a replication level of 2, but only one Cassandra node, I cannot insert anything into the system. Likely this was a bug in the old release I was using (0.6.8 -- is there a JIRA describing this problem?). However, this is a problem for our application, as we don't want to have to predefine the number of nodes, but rather start with one node, and add nodes as needed. Ideally, we could start our system with one node, and be able to insert data just on that one node. Then, when a second node is added, we can start using that node to store replicas for the keyspace. I know that 0.7.0 has a new operation for updating keyspace properties like replication level, but in the documentation there is some mention about having to run manual repair operations after using it. My question is: what happens if we do not run these repair operations? Here's what I'd like to do: 1) Start with a single node with autobootstrap=false and replication level=1. 2) Later, start a second node with autobootstrap=true and join it to the first. 3) The application detects that there are now two nodes, and issues the command to pump up the replication level to 2. 4) If it ever drops back down to one node, it will turn the replication level down again. If we do not do a repair, will all hell break lose, or will it just be the case that data inserted when there was only one node will continue to be unreplicated, but data inserted when there were two nodes will have two replicas? Thanks, Jeremy
Re: changing the replication level on the fly
On Tue, Jan 18, 2011 at 2:14 PM, Jeremy Stribling st...@nicira.com wrote: Hi, I've noticed in the new Cassandra 0.7.0 release that if I have a keyspace with a replication level of 2, but only one Cassandra node, I cannot insert anything into the system. Likely this was a bug in the old release I was using (0.6.8 -- is there a JIRA describing this problem?). However, this is a problem for our application, as we don't want to have to predefine the number of nodes, but rather start with one node, and add nodes as needed. Ideally, we could start our system with one node, and be able to insert data just on that one node. Then, when a second node is added, we can start using that node to store replicas for the keyspace. I know that 0.7.0 has a new operation for updating keyspace properties like replication level, but in the documentation there is some mention about having to run manual repair operations after using it. My question is: what happens if we do not run these repair operations? Here's what I'd like to do: 1) Start with a single node with autobootstrap=false and replication level=1. 2) Later, start a second node with autobootstrap=true and join it to the first. 3) The application detects that there are now two nodes, and issues the command to pump up the replication level to 2. 4) If it ever drops back down to one node, it will turn the replication level down again. If we do not do a repair, will all hell break lose, or will it just be the case that data inserted when there was only one node will continue to be unreplicated, but data inserted when there were two nodes will have two replicas? Thanks, Jeremy If you up your replication Factor and do not repair this is what happens: READ.QUORUM - This is safe. Over time all entries that are read will be fixed through read repair. Reads will return correct data. BUT data never read will never be copied to the new node. READ.ONE - 50% of your reads will return correct data. 50% of your Reads will return NO data the first time (based on the server your read hits). Then they will be read repaired. Second read will return the correct data. You can extrapolate the complications caused be this if you are add 10 or 15 nodes over time. You are never really sure if the data from the first node got replicated to the second, did the second get replicated to the third ? Brian hurting... CAP complicated enough...
Re: changing the replication level on the fly
On 01/18/2011 11:36 AM, Edward Capriolo wrote: On Tue, Jan 18, 2011 at 2:14 PM, Jeremy Striblingst...@nicira.com wrote: Hi, I've noticed in the new Cassandra 0.7.0 release that if I have a keyspace with a replication level of 2, but only one Cassandra node, I cannot insert anything into the system. Likely this was a bug in the old release I was using (0.6.8 -- is there a JIRA describing this problem?). However, this is a problem for our application, as we don't want to have to predefine the number of nodes, but rather start with one node, and add nodes as needed. Ideally, we could start our system with one node, and be able to insert data just on that one node. Then, when a second node is added, we can start using that node to store replicas for the keyspace. I know that 0.7.0 has a new operation for updating keyspace properties like replication level, but in the documentation there is some mention about having to run manual repair operations after using it. My question is: what happens if we do not run these repair operations? Here's what I'd like to do: 1) Start with a single node with autobootstrap=false and replication level=1. 2) Later, start a second node with autobootstrap=true and join it to the first. 3) The application detects that there are now two nodes, and issues the command to pump up the replication level to 2. 4) If it ever drops back down to one node, it will turn the replication level down again. If we do not do a repair, will all hell break lose, or will it just be the case that data inserted when there was only one node will continue to be unreplicated, but data inserted when there were two nodes will have two replicas? Thanks, Jeremy If you up your replication Factor and do not repair this is what happens: READ.QUORUM - This is safe. Over time all entries that are read will be fixed through read repair. Reads will return correct data. BUT data never read will never be copied to the new node. READ.ONE - 50% of your reads will return correct data. 50% of your Reads will return NO data the first time (based on the server your read hits). Then they will be read repaired. Second read will return the correct data. You can extrapolate the complications caused be this if you are add 10 or 15 nodes over time. You are never really sure if the data from the first node got replicated to the second, did the second get replicated to the third ? Brian hurting... CAP complicated enough... Thanks. Are you referring only to data that was written at replication factor 1, or any data?
Re: Multi-tenancy, and authentication and authorization
Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues. The wiki is editable by all. The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT. Obviously the memtable problem is the largest concern at this point. If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list. A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra. If possible, make things pluggable and optional. The round robin request scheduler is an example. The functionality is there but you have to enable it. If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular). Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects). On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote: Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that. The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment. Ed On Tue, Jan 18, 2011 at 9:40 AM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, I read some articles about the Cassandra, and now understand a little bit about trade-offs. I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant. I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance. Thanks, Indika
Re: Multi-tenancy, and authentication and authorization
Hi Jeremy, thanks, I was really coming at it from the question of whether keyspaces were a functional basis for multitenancy in Cassandra. I think the MT issues discussed on the wiki page are the , but I'd like to get a better understanding of the core issue of keyspaces and then try to get that onto the page as maybe the first section. Ed On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues. The wiki is editable by all. The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT. Obviously the memtable problem is the largest concern at this point. If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list. A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra. If possible, make things pluggable and optional. The round robin request scheduler is an example. The functionality is there but you have to enable it. If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular). Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects). On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote: Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that. The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment. Ed On Tue, Jan 18, 2011 at 9:40 AM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, I read some articles about the Cassandra, and now understand a little bit about trade-offs. I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant. I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance. Thanks, Indika
Re: Tombstone lifespan after multiple deletions
Sylvain, Just to check my knowledge. Is this only the case if the delete is sent without a super column or predicate? What about a delete for a specific column that did not exist? Thanks Aaron On 19/01/2011, at 2:58 AM, David Boxenhorn da...@lookin2.com wrote: Thanks. On Tue, Jan 18, 2011 at 3:55 PM, Sylvain Lebresne sylv...@riptano.com wrote: On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Re: please help with multiget
I think the general approach is to denormalise data to remove the need for complicated semantics when reading. Aaron On 19/01/2011, at 7:57 AM, Shu Zhang szh...@mediosystems.com wrote: Well, maybe making a batch-get is not anymore efficient on the server side but without it, you can get bottlenecked on client-server connections and client resources. If the number of requests you want to batch is on the order of connections in your pool, then yes, making gets in parallel is as good or maybe better. But what if you want to batch thousands of requests? The server I can scale out, I would want to get my requests there without needing to wait for connections on my client to free up. I just don't really understand the reasoning for designing muliget_slice the way it is. I still think if you're gonna have a batch-get request (multiget_slice), you should be able to add to the batch a reasonable number of ANY corresponding non-batch get requests. And you can't do that... Plus, it's not symmetrical to the batch-mutate. Is there a good reason for that? From: Brandon Williams [dri...@gmail.com] Sent: Monday, January 17, 2011 5:09 PM To: user@cassandra.apache.org Cc: hector-us...@googlegroups.com Subject: Re: please help with multiget On Mon, Jan 17, 2011 at 6:53 PM, Shu Zhang szh...@mediosystems.commailto:szh...@mediosystems.com wrote: Here's the method declaration for quick reference: mapstring,listColumnOrSuperColumn multiget_slice(string keyspace, liststring keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) It looks like you must have the same SlicePredicate for every key in your batch retrieval, so what are you suppose to do when you need to retrieve different columns for different keys? Issue multiple gets in parallel yourself. Keep in mind that multiget is not an optimization, in fact, it can work against you when one key exceeds the rpc timeout, because you get nothing back. -Brandon
Re: Multi-tenancy, and authentication and authorization
As everyone says, it's not issues with the Keyspace directly as they are just a container. It's the CF's in the keyspace, but let's just say keyspace cause it's easier. As things stand, if you allow point and click creation for keyspaces you will hand over control of the memory requirements to the users. This will be a bad thing. E.g. Lots of cf's will get created and you will run out of memory, or cf's will get created with huge Memtable settings and you will run out of memory, or caches will get set huge and you get the picture. One badly behaving keyspace or column family can take down a node / cluster. IMHO currently the best way to share a Cassandra cluster is through some sort of application layer that uses as static keyspace. Others have a better understanding of the internals and may have ideas about how this could change in the future. Aaron On 19/01/2011, at 9:07 AM, Ed Anuff e...@anuff.com wrote: Hi Jeremy, thanks, I was really coming at it from the question of whether keyspaces were a functional basis for multitenancy in Cassandra. I think the MT issues discussed on the wiki page are the , but I'd like to get a better understanding of the core issue of keyspaces and then try to get that onto the page as maybe the first section. Ed On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues. The wiki is editable by all. The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT. Obviously the memtable problem is the largest concern at this point. If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list. A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra. If possible, make things pluggable and optional. The round robin request scheduler is an example. The functionality is there but you have to enable it. If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular). Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects). On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote: Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that. The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment. Ed On Tue, Jan 18, 2011 at 9:40 AM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, I read some articles about the Cassandra, and now understand a little bit about trade-offs. I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant. I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance. Thanks, Indika
Re: Multi-tenancy, and authentication and authorization
I would imagine it to be somewhat easy to implement this via a thrift wrapper so that each tenant is connecting to the proxy thrift server that masks the fact that there are multiple tenants... or is that how people are thinking about this - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 18 Jan 2011 21:20, Aaron Morton aa...@thelastpickle.com wrote: As everyone says, it's not issues with the Keyspace directly as they are just a container. It's the CF's in the keyspace, but let's just say keyspace cause it's easier. As things stand, if you allow point and click creation for keyspaces you will hand over control of the memory requirements to the users. This will be a bad thing. E.g. Lots of cf's will get created and you will run out of memory, or cf's will get created with huge Memtable settings and you will run out of memory, or caches will get set huge and you get the picture. One badly behaving keyspace or column family can take down a node / cluster. IMHO currently the best way to share a Cassandra cluster is through some sort of application layer that uses as static keyspace. Others have a better understanding of the internals and may have ideas about how this could change in the future. Aaron On 19/01/2011, at 9:07 AM, Ed Anuff e...@anuff.com wrote: Hi Jeremy, thanks, I was really coming at it from the question of whether keyspaces were a functional basis for multitenancy in Cassandra. I think the MT issues discussed on the wiki page are the , but I'd like to get a better understanding of the core issue of keyspaces and then try to get that onto the page as maybe the first section. Ed On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues. The wiki is editable by all. The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT. Obviously the memtable problem is the largest concern at this point. If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list. A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra. If possible, make things pluggable and optional. The round robin request scheduler is an example. The functionality is there but you have to enable it. If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular). Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects). On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote: Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that. The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment. Ed On Tue, Jan 18, 2011 at 9:40 AM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, I read some articles about the Cassandra, and now understand a little bit about trade-offs. I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant. I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance. Thanks, Indika
RE: please help with multiget
Well, I don't think what I'm describing is complicated semantics. I think I've described general batch operation design and something that is symmetrical the batch_mutate method already on the Cassandra API. You are right, I can solve the problem with further denormalization, and the approach of making individual gets in parallel as described by Brandon will work too. I'll be doing one of these for now. But I think neither is as efficient, and I guess I'm still not sure why the multiget is designed the way it is. The problem with denormalization is you gotta make multiple row writes in place of one, adding load to the server, adding required physical space and losing atomicity on write operations. I know writes are cheap in cassandra, and you can catch failed writes and retry so these problems are not major, but it still seems clear that having a batch-get that works appropriately is a least a little better... From: Aaron Morton [aa...@thelastpickle.com] Sent: Tuesday, January 18, 2011 12:55 PM To: user@cassandra.apache.org Subject: Re: please help with multiget I think the general approach is to denormalise data to remove the need for complicated semantics when reading. Aaron On 19/01/2011, at 7:57 AM, Shu Zhang szh...@mediosystems.com wrote: Well, maybe making a batch-get is not anymore efficient on the server side but without it, you can get bottlenecked on client-server connections and client resources. If the number of requests you want to batch is on the order of connections in your pool, then yes, making gets in parallel is as good or maybe better. But what if you want to batch thousands of requests? The server I can scale out, I would want to get my requests there without needing to wait for connections on my client to free up. I just don't really understand the reasoning for designing muliget_slice the way it is. I still think if you're gonna have a batch-get request (multiget_slice), you should be able to add to the batch a reasonable number of ANY corresponding non-batch get requests. And you can't do that... Plus, it's not symmetrical to the batch-mutate. Is there a good reason for that? From: Brandon Williams [dri...@gmail.com] Sent: Monday, January 17, 2011 5:09 PM To: user@cassandra.apache.org Cc: hector-us...@googlegroups.com Subject: Re: please help with multiget On Mon, Jan 17, 2011 at 6:53 PM, Shu Zhang szh...@mediosystems.commailto:szh...@mediosystems.com wrote: Here's the method declaration for quick reference: mapstring,listColumnOrSuperColumn multiget_slice(string keyspace, liststring keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) It looks like you must have the same SlicePredicate for every key in your batch retrieval, so what are you suppose to do when you need to retrieve different columns for different keys? Issue multiple gets in parallel yourself. Keep in mind that multiget is not an optimization, in fact, it can work against you when one key exceeds the rpc timeout, because you get nothing back. -Brandon
Re: Multi-tenancy, and authentication and authorization
I've used an S3 style data model with a REST interface (varnish nginx tornado cassandra), users do not see anything remotely cassandra like.AaronOn 19 Jan, 2011,at 10:27 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote:I would imagine it to be somewhat easy to implement this via a thrift wrapper so that each tenant is connecting to the proxy thrift server that masks the fact that there are multiple tenants... or is that how people are thinking about this - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 18 Jan 2011 21:20, "Aaron Morton" aa...@thelastpickle.com wrote: As everyone says, it's not issues with the Keyspace directly as they are just a container. It's the CF's in the keyspace, but let's just say keyspace cause it's easier. As things stand, if you allow point and click creation for keyspaces you will hand over control of the memory requirements to the users. This will be a bad thing. E.g. Lots of cf's will get created and you will run out of memory, or cf's will get created with huge Memtable settings and you will run out of memory, or caches will get set huge and you get the picture. One badly behaving keyspace or column family can take down a node / cluster. IMHO currently the best way to share a Cassandra cluster is through some sort of application layer that uses as static keyspace. Others have a better understanding of the internals and may have ideas about how this could change in the future. Aaron On 19/01/2011, at 9:07 AM, Ed Anuff e...@anuff.com wrote: Hi Jeremy, thanks, I was really coming at it from the question of whether keyspaces were a functional basis for multitenancy in Cassandra. I think the MT issues discussed on the wiki page are the , but I'd like to get a better understanding of the core issue of keyspaces and then try to get that onto the page as maybe the first section. Ed On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues. The wiki is editable by all. The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT. Obviously the memtable problem is the largest concern at this point. If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list. A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra. If possible, make things pluggable and optional The round robin request scheduler is an example. The functionality is there but you have to enable it. If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular). Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects). On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote: Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that. The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment. Ed On Tue, Jan 18, 2011 at 9:40 AM, indika kumara indika.k...@gmail.com wrote: Hi Aaron, I read some articles about the Cassandra, and now understand a little bit about trade-offs. I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant. I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance. Thanks, Indika
Re: please help with multiget
On Tue, Jan 18, 2011 at 4:29 PM, Shu Zhang szh...@mediosystems.com wrote: Well, I don't think what I'm describing is complicated semantics. I think I've described general batch operation design and something that is symmetrical the batch_mutate method already on the Cassandra API. You are right, I can solve the problem with further denormalization, and the approach of making individual gets in parallel as described by Brandon will work too. I'll be doing one of these for now. But I think neither is as efficient, and I guess I'm still not sure why the multiget is designed the way it is. The problem with denormalization is you gotta make multiple row writes in place of one, adding load to the server, adding required physical space and losing atomicity on write operations. I know writes are cheap in cassandra, and you can catch failed writes and retry so these problems are not major, but it still seems clear that having a batch-get that works appropriately is a least a little better... From: Aaron Morton [aa...@thelastpickle.com] Sent: Tuesday, January 18, 2011 12:55 PM To: user@cassandra.apache.org Subject: Re: please help with multiget I think the general approach is to denormalise data to remove the need for complicated semantics when reading. Aaron On 19/01/2011, at 7:57 AM, Shu Zhang szh...@mediosystems.com wrote: Well, maybe making a batch-get is not anymore efficient on the server side but without it, you can get bottlenecked on client-server connections and client resources. If the number of requests you want to batch is on the order of connections in your pool, then yes, making gets in parallel is as good or maybe better. But what if you want to batch thousands of requests? The server I can scale out, I would want to get my requests there without needing to wait for connections on my client to free up. I just don't really understand the reasoning for designing muliget_slice the way it is. I still think if you're gonna have a batch-get request (multiget_slice), you should be able to add to the batch a reasonable number of ANY corresponding non-batch get requests. And you can't do that... Plus, it's not symmetrical to the batch-mutate. Is there a good reason for that? From: Brandon Williams [dri...@gmail.com] Sent: Monday, January 17, 2011 5:09 PM To: user@cassandra.apache.org Cc: hector-us...@googlegroups.com Subject: Re: please help with multiget On Mon, Jan 17, 2011 at 6:53 PM, Shu Zhang szh...@mediosystems.commailto:szh...@mediosystems.com wrote: Here's the method declaration for quick reference: mapstring,listColumnOrSuperColumn multiget_slice(string keyspace, liststring keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) It looks like you must have the same SlicePredicate for every key in your batch retrieval, so what are you suppose to do when you need to retrieve different columns for different keys? Issue multiple gets in parallel yourself. Keep in mind that multiget is not an optimization, in fact, it can work against you when one key exceeds the rpc timeout, because you get nothing back. -Brandon muliget_slice is very useful I IMHO. In my testing, the roundtrip time for 1000 get requests all being acked individually is much higher then rountrip time for 200 get_slice grouped 5 at a time. For anyone that needs that type of access they are in good shape. I was also theorizing that a CF using RowCache with very, very high read rate would benefit from pooling a bunch of reads together with multiget. I do agree that the first time I looked at the multi_get_slice signature I realized I could do many of the things I was expecting from a multi-get.
json2sstable NPE
Hello I have problem when use json2sstable (in cassandra 0.7). When i invoke: json2sstable -K test -c test D:\apache-cassandra-0.7.0\bin\test-e-1-Data.json F:\cassandra\test\test\test-e-1-Data.db I got NPE: WARN 01:31:38,750 Schema definitions were defined both locally and in cassandra.yaml. Definitions in cassan a.yaml were ignored. Exception in thread main java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:68) at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:62) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:174) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:251)
Re: json2sstable NPE
Thats odd, the line before line 68 has an assertion that should have kicked in. Are you on the release version of 0.7.0 ?Does the "test" CF exist in the keyspace "test" in your cluster ?AaronOn 19 Jan, 2011,at 11:37 AM, ruslan usifov ruslan.usi...@gmail.com wrote:HelloI have problem when use json2sstable (in cassandra 0.7). When i invoke:json2sstable -K test -c test D:\apache-cassandra-0.7.0\bin\test-e-1-Data.json F:\cassandra\test\test\test-e-1-Data.dbI got NPE: WARN 01:31:38,750 Schema definitions were defined both locally and in cassandra.yaml. Definitions in cassana.yaml were ignored.Exception in thread "main" java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:68) at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:62) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:174) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:251)
Re: json2sstable NPE
Thats odd, the line before line 68 has an assertion that should have kicked in. Are you on the release version of 0.7.0 ? Yes i use release downloaded from official site Does the test CF exist in the keyspace test in your cluster ? no it doesn't exists
Re: json2sstable NPE
AFAIK the CF must exist. Create it and try again.AOn 19 Jan, 2011,at 12:03 PM, ruslan usifov ruslan.usi...@gmail.com wrote:Thats odd, the line before line 68 has an assertion that should have kicked in. Are you on the release version of 0.7.0 ? Yes i use release downloaded from official site Does the "test" CF exist in the keyspace "test" in your cluster ?no it doesn't exists
Re: Java cient
Pelops is a nice lib. I found it very easy to use and the developers are very responsive to requests for information and/or bugs, etc. I have not tried hector On Tue, Jan 18, 2011 at 11:11 PM, Alois Bělaška alois.bela...@gmail.com wrote: Definitelly Pelops https://github.com/s7/scale7-pelops 2011/1/18 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com What is the most commonly used java client library? Which is the the most mature/feature complete? Noble
Re: Tombstone lifespan after multiple deletions
Maybe it could be taken into account when the compaction is executed, if I only have a consecutive list of uninterrupted tombstones it could only care about the first. It sounds like the-way-it-should-be, maybe as a part of the row-reduce process. Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne sylv...@riptano.com wrote: On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Re: Tombstone lifespan after multiple deletions
If you mean that multiple tombstones for the same row or column should be merged into a single one at compaction time, then yes, that is what happens. On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf german.kond...@gmail.com wrote: Maybe it could be taken into account when the compaction is executed, if I only have a consecutive list of uninterrupted tombstones it could only care about the first. It sounds like the-way-it-should-be, maybe as a part of the row-reduce process. Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne sylv...@riptano.com wrote: On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Tombstone lifespan after multiple deletions
If the tombstone is older than the row or column inserted later, is the tombstone skipped entirely after compaction? best regards, hanzhu On Wed, Jan 19, 2011 at 11:16 AM, Jonathan Ellis jbel...@gmail.com wrote: If you mean that multiple tombstones for the same row or column should be merged into a single one at compaction time, then yes, that is what happens. On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf german.kond...@gmail.com wrote: Maybe it could be taken into account when the compaction is executed, if I only have a consecutive list of uninterrupted tombstones it could only care about the first. It sounds like the-way-it-should-be, maybe as a part of the row-reduce process. Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne sylv...@riptano.com wrote: On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Tombstone lifespan after multiple deletions
I'm not clear here. Are you worried about the later inserted tombstone prevents the whole row from being reclaimed and the storage space can not be freed? To my knowledge, after major compaction, only the row key and tombstone are kept. Is it a big deal? best regards, hanzhu On Tue, Jan 18, 2011 at 9:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.comwrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15?
Re: Tombstone lifespan after multiple deletions
Yes, that's what I meant, but correct me if I'm wrong, when a deletion comes after another deletion for the same row or column will the gc-before count against the last one, isn't it? Maybe knowing that all the subsequent versions of a deletion are deletions too, it could take the first timestamp against the gc-grace-seconds when is reducing compacting. // Germán Kondolf http://twitter.com/germanklf http://code.google.com/p/seide/ // @i4 On 19/01/2011, at 00:16, Jonathan Ellis jbel...@gmail.com wrote: If you mean that multiple tombstones for the same row or column should be merged into a single one at compaction time, then yes, that is what happens. On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf german.kond...@gmail.com wrote: Maybe it could be taken into account when the compaction is executed, if I only have a consecutive list of uninterrupted tombstones it could only care about the first. It sounds like the-way-it-should-be, maybe as a part of the row-reduce process. Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne sylv...@riptano.com wrote: On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Tombstone lifespan after multiple deletions
On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf german.kond...@gmail.comwrote: Yes, that's what I meant, but correct me if I'm wrong, when a deletion comes after another deletion for the same row or column will the gc-before count against the last one, isn't it? IIRC, after compaction. even if the row key is not wiped, all the CF are replaced by the youngest tombstone. I do not understand very clearly the benefit of wiping out the whole row as early as possible. Maybe knowing that all the subsequent versions of a deletion are deletions too, it could take the first timestamp against the gc-grace-seconds when is reducing compacting. // Germán Kondolf http://twitter.com/germanklf http://code.google.com/p/seide/ // @i4 On 19/01/2011, at 00:16, Jonathan Ellis jbel...@gmail.com wrote: If you mean that multiple tombstones for the same row or column should be merged into a single one at compaction time, then yes, that is what happens. On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf german.kond...@gmail.com wrote: Maybe it could be taken into account when the compaction is executed, if I only have a consecutive list of uninterrupted tombstones it could only care about the first. It sounds like the-way-it-should-be, maybe as a part of the row-reduce process. Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne sylv...@riptano.com wrote: On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com wrote: Thanks, Aaron, but I'm not 100% clear. My situation is this: My use case spins off rows (not columns) that I no longer need and want to delete. It is possible that these rows were never created in the first place, or were already deleted. This is a very large cleanup task that normally deletes a lot of rows, and the last thing that I want to do is create tombstones for rows that didn't exist in the first place, or lengthen the life on disk of tombstones of rows that are already deleted. So the question is: before I delete, do I have to retrieve the row to see if it exists in the first place? Yes, in your situation you do. On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK that's not necessary, there is no need to worry about previous deletes. You can delete stuff that does not even exist, neither batch_mutate or remove are going to throw an error. All the columns that were (roughly speaking) present at your first deletion will be available for GC at the end of the first tombstones life. Same for the second. Say you were to write a col between the two deletes with the same name as one present at the start. The first version of the col is avail for GC after tombstone 1, and the second after tombstone 2. Hope that helps Aaron On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote: Thanks. In other words, before I delete something, I should check to see whether it exists as a live row in the first place. On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the following scenario: GCGraceSeconds = 10 days On day 1 I delete a row On day 5 I delete the row again Will the tombstone be removed on day 10 or day 15? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
about the hector client
Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd.[Company_logo] Phone: 28358610 Mobile: 13425182943 Email: raoyix...@huawei.commailto:raoyix...@huawei.com 地址:深圳市龙岗区坂田华为基地 邮编:518129 Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! inline: image001.jpg
Re: about the hector client
Try the hector user group for help on how to use the clienthttp://groups.google.com/group/hector-usersYou can also create a keyspace in a cassandra cluster via the cassandra-cli command line interface Take a look at the tools online help if you're interested.AaronOn 19 Jan, 2011,at 05:00 PM, "raoyixuan (Shandy)" raoyix...@huawei.com wrote: Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd. image001.jpg Phone: 28358610 Mobile: 13425182943 Email: raoyix...@huawei.com 地址:深圳市龙岗区坂田华为基地 邮编:518129 Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: about the hector client
Definitely get involved with that google group, but some examples are found here: https://github.com/zznate/hector-examples/blob/master/src/main/java/com/riptano/cassandra/hector/example/SchemaManipulation.java On Jan 18, 2011, at 10:17 PM, Aaron Morton wrote: Try the hector user group for help on how to use the client http://groups.google.com/group/hector-users You can also create a keyspace in a cassandra cluster via the cassandra-cli command line interface Take a look at the tools online help if you're interested. Aaron On 19 Jan, 2011,at 05:00 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd. image001.jpg Phone: 28358610 Mobile: 13425182943 Email: raoyix...@huawei.com 地址:深圳市龙岗区坂田华为基地 邮编:518129 Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: about the hector client
Most often, you will define schema with the cli. Programmatic schema definition is advanced in Cassandra, just as in relational databases. On Tue, Jan 18, 2011 at 10:19 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Definitely get involved with that google group, but some examples are found here: https://github.com/zznate/hector-examples/blob/master/src/main/java/com/riptano/cassandra/hector/example/SchemaManipulation.java On Jan 18, 2011, at 10:17 PM, Aaron Morton wrote: Try the hector user group for help on how to use the client http://groups.google.com/group/hector-users You can also create a keyspace in a cassandra cluster via the cassandra-cli command line interface Take a look at the tools online help if you're interested. Aaron On 19 Jan, 2011,at 05:00 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd. image001.jpg Phone: 28358610 Mobile: 13425182943 Email: raoyix...@huawei.com 地址:深圳市龙岗区坂田华为基地 邮编:518129 Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: about the hector client
OK if I add a link tohttps://github.com/zznate/hector-examplesto the wiki page for clientshttp://wiki.apache.org/cassandra/ClientOptions?AOn 19 Jan, 2011,at 05:22 PM, Jonathan Ellis jbel...@gmail.com wrote:Most often, you will define schema with the cli. Programmatic schema definition is "advanced" in Cassandra, just as in relational databases. On Tue, Jan 18, 2011 at 10:19 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Definitely get involved with that google group, but some examples are found here: https://github.com/zznate/hector-examples/blob/master/src/main/java/com/riptano/cassandra/hector/example/SchemaManipulation.java On Jan 18, 2011, at 10:17 PM, Aaron Morton wrote: Try the hector user group for help on how to use the client http://groups.google.com/group/hector-users You can also create a keyspace in a cassandra cluster via the cassandra-cli command line interface Take a look at the tools online help if you're interested. Aaron On 19 Jan, 2011,at 05:00 PM, "raoyixuan (Shandy)" raoyix...@huawei.com wrote: Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd. image001.jpg Phone: 28358610 Mobile: 13425182943 Email: raoyix...@huawei.com 地址:深圳市龙岗区坂田华为基地 邮编:518129 Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
RE: about the hector client
The url is unavailable From: Aaron Morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, January 19, 2011 12:17 PM To: user@cassandra.apache.org Subject: Re: about the hector client Try the hector user group for help on how to use the client http://groups.google.com/group/hector-users You can also create a keyspace in a cassandra cluster via the cassandra-cli command line interface Take a look at the tools online help if you're interested. Aaron On 19 Jan, 2011,at 05:00 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd. image001.jpg Phone: 28358610 Mobile: 13425182943 Email: raoyix...@huawei.commailto:raoyix...@huawei.com 地址:深圳市龙岗区坂田华为基地 邮编:518129 Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: about the hector client
Working fine for me. Can you pls try again. thanks ashish On Wed, Jan 19, 2011 at 11:42 AM, raoyixuan (Shandy) raoyix...@huawei.com wrote: The url is unavailable From: Aaron Morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, January 19, 2011 12:17 PM To: user@cassandra.apache.org Subject: Re: about the hector client Try the hector user group for help on how to use the client http://groups.google.com/group/hector-users You can also create a keyspace in a cassandra cluster via the cassandra-cli command line interface Take a look at the tools online help if you're interested. Aaron On 19 Jan, 2011,at 05:00 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd.
RE: about the hector client
I will try it again, thank you . -Original Message- From: Ashish [mailto:paliwalash...@gmail.com] Sent: Wednesday, January 19, 2011 2:16 PM To: user@cassandra.apache.org Subject: Re: about the hector client Working fine for me. Can you pls try again. thanks ashish On Wed, Jan 19, 2011 at 11:42 AM, raoyixuan (Shandy) raoyix...@huawei.com wrote: The url is unavailable From: Aaron Morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, January 19, 2011 12:17 PM To: user@cassandra.apache.org Subject: Re: about the hector client Try the hector user group for help on how to use the client http://groups.google.com/group/hector-users You can also create a keyspace in a cassandra cluster via the cassandra-cli command line interface Take a look at the tools online help if you're interested. Aaron On 19 Jan, 2011,at 05:00 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: Can you tell me the exactly steps to create a keyspace by hector client? 华为技术有限公司 Huawei Technologies Co., Ltd.
Re: Java cient
Thanks everyone. I guess, I should go with hector On 18 Jan 2011 17:41, Alois Bělaška alois.bela...@gmail.com wrote: Definitelly Pelops https://github.com/s7/scale7-pelops 2011/1/18 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com What is the most commonly used java client library? Which is the the most mature/feature complete? Noble
Keys must be written in ascending order
I'm upgrading an 0.6 cluster to 0.7 in a testing environment. In cleaning up one of the nodes I get the exception below. Googling around seems to reveal people having trouble with it caused by too-small heap sizes but that doesn't look to be what's going on here. Am I missing something obvious? $ time ./cassandra-0.7/bin/nodetool -h cassa7test01 cleanup Error occured while cleaning up keyspace keyspace java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order. at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.cassandra.db.CompactionManager.performCleanup(CompactionManager.java:180) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:909) at org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1127) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1449) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1284) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1382) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: Keys must be written in ascending order. at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:107) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:124) at org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:411) at org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:54) at org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:171) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) ... 3 more real14m27.895s user0m0.670s sys 0m0.200s
Re: Multi-tenancy, and authentication and authorization
I think tuning of Cassandra is overly complex, and even with a single tenant you can run into problems with too many CFs. Right now there is a one-to-one mapping between memtables and SSTables. Instead of that, would it be possible to have one giant memtable for each Cassandra instance, with partial flushing to SSTs? It seems to me like a single memtable would make it MUCH easier to tune Cassandra, since the decision whether to (partially) flush the memtable to disk could be made on a node-wide basis, based on the resources you really have, instead of the guess-work that we are forced to do today.