Re: Reason for not allowing null values for in Column
On Mon, Mar 8, 2010 at 10:14 AM, Jonathan Ellis wrote: > On Mon, Mar 8, 2010 at 12:07 PM, Erik Holstad > wrote: > > So why is it again that the value field in the Column cannot be null if > it > > is not the > > value field in the map, but just a part of the value field? > > Because without a compelling reason to allow nulls, the best policy is > not to do so. > > This for me is about memory usage, I guess, so I was just curious if there was a good reason for using more than needed and I guess best policy is a reason for that. > All of this makes total sense, I'm wondering about use cases where you > want > > to > > get an empty row when you don't know if it has been deleted or not. > > If you're saying, "I understand that doing X would be Really > Inefficient, but I want you to do it anyway because of some use case > that nobody actually needs so far," then I think you have your answer. > > If that is not what you are asking then you'll need to give me a > concrete example because I don't understand the question. > > Well, I cannot say that I understand all of this, since I'm not getting it :) But for me when you do a range query you want to know what data that you have to work with in those rows and usually not too interested in the empty ones. And the reason for not returning empty ones would be to save IO. > -Jonathan > -- Regards Erik
Re: Reason for not allowing null values for in Column
On Mon, Mar 8, 2010 at 9:30 AM, Jonathan Ellis wrote: > On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad > wrote: > > I was probably a little bit unclear here. I'm wondering about the two > byte[] > > in Column. > > One for name and one for value. I was under the impression that the > > skiplistmap > > wraps the Columns, not that the name and the value are themselves > inserted > > into a map? > > The column name is the key in one such map, yes. > So why is it again that the value field in the Column cannot be null if it is not the value field in the map, but just a part of the value field? > > >> > is it really that expensive to check if the list is empty before > >> > returning > >> > that row > >> > >> Yes, because you have to check the entire row, which may be much > >> larger than the given predicate. > > > > That makes sense, but why would you be interested in the rows present > > outside > > your specified predicate? > > Because get_range_slice says, "apply this predicate to the range of > rows given," meaning, if the predicate result is empty, we have to > include an empty result for that row key. It is perfectly valid to > perform such a query returning empty column lists for some or all > keys, even if no deletions have been performed. So to special case > leaving out result entries for deletions, we have to check the entire > rest of the row to make sure there is no undeleted data anywhere else > either (in which case leaving the key out would be an error). > All of this makes total sense, I'm wondering about use cases where you want to get an empty row when you don't know if it has been deleted or not. -- Regards Erik
Re: Reason for not allowing null values for in Column
On Mon, Mar 8, 2010 at 9:10 AM, Jonathan Ellis wrote: > On Mon, Mar 8, 2010 at 11:07 AM, Erik Holstad > wrote: > > Why is it that null column values are not allowed? > > It's semantically unnecessary and potentially harmful at an > implementation level. (Many java Map implementations can't > distinguish between a null key and a key that is not present.) > I was probably a little bit unclear here. I'm wondering about the two byte[] in Column. One for name and one for value. I was under the impression that the skiplistmap wraps the Columns, not that the name and the value are themselves inserted into a map? > > > What is the reason for using a ConcurrentSkipListMap for > > columns_ in ColumnFamily > > compared to using the set version and use the comparator to sort on the > name > > field in IColumn? > > ? > > > For the call get_range_slice() you get all the rows returned even though > > they might have been deleted, > > Yes, that is the point. > > > is it really that expensive to check if the list is empty before > returning > > that row > > Yes, because you have to check the entire row, which may be much > larger than the given predicate. > That makes sense, but why would you be interested in the rows present outside your specified predicate? > > -Jonathan > -- Regards Erik
Reason for not allowing null values for in Column
Hey! Been looking at the src and have a couple of questions: Why is it that null column values are not allowed? What is the reason for using a ConcurrentSkipListMap for columns_ in ColumnFamily compared to using the set version and use the comparator to sort on the name field in IColumn? For the call get_range_slice() you get all the rows returned even though they might have been deleted, is it really that expensive to check if the list is empty before returning that row, or are there some other places where this gets complicated? -- Regards Erik
Re: ColumnFamilies vs composite rows in one table.
Thanks David and Jonathan! @David Yes rows doesn't have a name, I'm just using the word name for anything, like cluster name, table name, row name etc, that is my bad. Yes, I did change two things, that was probably stupid, but the reason for the second change is space efficiency. You are totally right that I'm choosing between scalability and performance with the different structures. What I really want to do is to just store indices in rows with a composite key and do range queries. Jonathan has firmly steered me away from this approach for now in regards to performance. Thanks a lot! Erik
ColumnFamilies vs composite rows in one table.
What are the benefits of using multiple ColumnFamilies compared to using a composite row name? Example: You have messages that you want to index on sent and to. So you can either have ColumnFamilyFrom:userTo:{userFrom->messageid} ColumnFamilyTo:userFrom:{userTo->messageid} or something like ColumnFamily:user_to:{user1_messageId, user2_messageId} ColumnFamily:user_from:{user1_messageId, user2_messageId} One thing that I can see the advantage of using families are if you want to use different types in the families. But are there others? Like storage space, read/write speeds etc. -- Regards Erik
Re: Storage format
Thank you!
Re: Storage format
On Mon, Mar 1, 2010 at 2:51 PM, Jonathan Ellis wrote: > On Mon, Mar 1, 2010 at 4:49 PM, Erik Holstad > wrote: > > Haha! > > Thanks. Well I'm z little bit worried about this but since the indexes > are > > pretty > > small I don't think it is going to be too bad. But was mostly thinking > about > > performance and and having the index row as a bottleneck for writing, > since > > the > > partition is per row. > > Writing N columns to 1 row is faster than writing 1 column to N rows, > even when all N are coming from different clients. Our concurrency > story there is excellent. > That sounds good, and the same thing goes for reading, cause that is basically what I'm looking for, faster reads, not too worried about the writes. Thanks a lot!
Re: Storage format
Haha! Thanks. Well I'm z little bit worried about this but since the indexes are pretty small I don't think it is going to be too bad. But was mostly thinking about performance and and having the index row as a bottleneck for writing, since the partition is per row. -- Regards Erik
Re: Is Cassandra a document based DB?
Yes, Cassandra has supercolumns and HBase versions and you are probably correct that supercolumns are more used than versions, but I don't really think you can compare them since versions are not a serialized structure. The reason that I didn't include table and family in the mapping is as I've understood it a sql table can be compared to the family in C and that multiple Keyspaces that would then map to your table is not really frequently used. Whereas multiple tables in HBase is almost always the rule. Not sure that I would agree that a virtual dimension can compare to a real one, but that is just the way I see it. -- Regards Erik
Re: Storage format
So that is kinda of what I want to do, but I want to go from a row with multiple columns to multiple rows with one column, maybe I'm not hearing you here and you are trying to tell me that the columns, not supercolumns, are not stored together in a row structure? -- Regards Erik
Re: Is Cassandra a document based DB?
On Mon, Mar 1, 2010 at 4:41 AM, Brandon Williams wrote: > On Mon, Mar 1, 2010 at 5:34 AM, HHB wrote: > >> >> What are the advantages/disadvantages of Cassandra over HBase? >> > > Ease of setup: all nodes are the same. > > No single point of failure: all nodes are the same. > > Speed: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf > > Richer model: supercolumns. > I think that there are people that would be of a different opinion here. Cassandra has as I've understood it table:key:name:val and in cases the val is a serialized data structure. In HBase you have table:row:family:key:val:version, which some people might consider richer. > > Multi-datacenter awareness. > > There are likely other things I'm forgetting, but those stand out for me. > > -Brandon > -- Regards Erik
Re: Storage format
Sorry about that! Continuing: And in that case when using rows as indexes instead of columns we only need to read that specific row and might be more efficient in that case than to read a big row every time? -- Regards Erik
Re: Storage format
Thanks Jonathan! So let's see if I got this right. Just like an overview data being stored like HashMap> and in case of a superColumnFamily HashMap>>? Does that mean that when using columns as indexes we need to deserialize So when asking for a column in a row, the whole row structure first need to be deserialized and then we can get the columns we are looking for? And in that case when using rows as indexes instead of columns we only need to read On Mon, Mar 1, 2010 at 11:24 AM, Jonathan Ellis wrote: > On Mon, Mar 1, 2010 at 12:50 PM, Erik Holstad > wrote: > > I've been looking at the source, but not quite find the things I'm > looking > > for, so I have a few > > questions. > > Are columns for a row stored in a serialized data structure on disk or > > stored individually and > > put into a data structure when the call is being made? > > The former, but only for top-level columns -- subcolumns are all read > at once for slices against supercolumns. > (http://issues.apache.org/jira/browse/CASSANDRA-598) > -- Regards Erik
Storage format
I've been looking at the source, but not quite find the things I'm looking for, so I have a few questions. Are columns for a row stored in a serialized data structure on disk or stored individually and put into a data structure when the call is being made? Because of the slice query, does that mean that all columns have to be read in before any are being sent back? If that is the case, could it be more efficient to use rows instead of columns for storing for example indexes and you just want to get a few at a time? -- Regards Erik
Deleted rows showing up when doing a get_range_slice query
When deleting rows from a table and then using a get_range_slice query, the keys or the deleted rows show up, with no name/value pairs. What is the reasoning behind this? I have also seen a weird issue when using a md5 generated byte[] as a column name, doesn't seem like it actually work. I can't get the value that was inserted that way. But if I for example Base64.encode().getBytes() it seems to be ok, any ideas? -- Regards Erik
Re: Getting the keys in your system?
Haha! Yeah, fortunately we are only in the testing phase so this is not that big of a deal. Thanks a lot! -- Regards Erik
Re: Getting the keys in your system?
Thanks Jonathan! We are thinking about moving over to the OPP to be able to be able to do this and to use an md5 for some of the data just to get the data written to different nodes for some of the cases where order is not really needed. Is there anything we need to think about when making the switch or any big drawbacks in doing so? -- Regards Erik
Getting the keys in your system?
If you have a system setup using the RandomPartitioner and have a couple of indexes setup for your data but realize that you need to add another index. How do you get the keys for your data, so that you can know where to point your indexes? I guess what I'm really asking is, is there a way to get your keys when using the RP or how do people out there deal with something like this? -- Regards Erik
Re: Row with many columns
Hey Rusian! Maybe you should do what Ted suggested, look at what Cassandra is good at and then try to change your data structure from 10 rows with 10 columns to maybe 10 rows with 10 columns each. I think the best way to solve a problem is to look at the tools that you have at hand and try to use then for what they are good at. If it is really hard to change your data set and you really need to have the structure that you have, maybe Cassandra is not the best option for you. Good luck and please let the mailing list know if you need any help with this. -- Regards Erik
Re: Using column plus value or only column?
Don't be silly, thanks a lot for helping me out! -- Regards Erik
Re: Using column plus value or only column?
I don't understand what you mean ;) Will see what happens when we are done with this first project, will see if we can get some time to give back. -- Regards Erik
Re: Using column plus value or only column?
Hey Nate! What I wanted to do with the get_range_slice was to receive the keys in the inverted order, so that I could so offset limit queries on key ranges in reverse order. Like you said, this can be done for both columns and super columns with help of the SliceRange, but not on keys afaik, but maybe there is a way? Thanks Erik On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall wrote: > Erik, > You can do an inverse with 'reversed=true' in SliceRange as part of > the SlicePredicate for both get_slice or get_range_slice. I have not > tried reverse=true on SuperColumn results, but I dont think there is > any difference there - what can't be changed is how things are ordered > but direction can go either way (If I am wrong on this, somebody > please correct me). > > http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my > radar as I dont have anything reporting-ish like you describe with > SuperColumns (yet). I will defer to more experienced folks with this. > > Regards, > -Nate > > > On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad > wrote: > > @Nathan > > So what I'm planning to do is to store multiple sort orders for the same > > data, where they all use the > > same data table just fetches it in different orders, so to say. I want to > be > > able to rad the different sort > > orders from the front and from the back to get both regular and reverse > sort > > order. > > > > With your approach using super columns you would need to replicate all > data, > > right? > > > > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598 > > correctly you would need to > > read the whole thing before you can limit the results handed back to you. > > > > In regards to the two calls get_slice and get_range_slice, the way I > > understand it is that you hand > > the second one an optional start and stop key plus a limit, to get a > range > > of keys/rows. I was planning > > to use this call together with the OPP, but are thinking about not using > it > > since there is no way to do > > an inverse scan, right? > > > > Thanks a lot > > Erik > > > > > > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell < > jesse.mcconn...@gmail.com> > > wrote: > >> > >> infinite is a bit of a bold claim > >> > >> by my understanding you are bound by the memory of the jvm as all of > >> the content of a key/row currently needs to fit in memory for > >> compaction, which includes columns and supercolumns for given key/row. > >> > >> if you are going to run into those scenarios then some sort of > >> sharding on the keys is required, afaict > >> > >> cheers, > >> jesse > >> > >> -- > >> jesse mcconnell > >> jesse.mcconn...@gmail.com > >> > >> > >> > >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall > >> wrote: > >> > Erik, > >> > Sure, you could and depending on the workload, that might be quite > >> > efficient for small pieces of data. However, this also sounds like > >> > something that might be better addressed with the addition of a > >> > SuperColumn on "Sorts" and getting rid of "Data" altogether: > >> > > >> > Sorts : { > >> > sort_row_1 : { > >> >sortKey1 : { col1:val1, col2:val2 }, > >> >sortKey2 : { col1:val3, col2:val4 } > >> > } > >> > } > >> > > >> > You can have an infinite number of SuperColumns for a key, but make > >> > sure you understand get_slice vs. get_range_slice before you commit to > >> > a design. Hopefully I understood your example correctly, if not, do > >> > you have anything more concrete? > >> > > >> > Cheers, > >> > -Nate > >> > > >> > > >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad > >> > wrote: > >> >> Thanks Nate for the example. > >> >> > >> >> I was thinking more a long the lines of something like: > >> >> > >> >> If you have a family > >> >> > >> >> Data : { > >> >> row1 : { > >> >> col1:val1, > >> >> row2 : { > >> >> col1:val2, > >> >> ... > >> >> } > >> >> } > >> >> > >> >> > >> >> Using > >> >> Sorts : { > >> >> sort_row : { > >> >> sortKey1_datarow1: [], > >> >> sortKey2_datarow2: [] > >> >> } > >> >> } > >> >> > >> >> Instead of > >> >> Sorts : { > >> >> sort_row : { > >> >> sortKey1: datarow1, > >> >> sortKey2: datarow2 > >> >> } > >> >> } > >> >> > >> >> If that makes any sense? > >> >> > >> >> -- > >> >> Regards Erik > >> >> > >> > > > > > > > > > -- > > Regards Erik > > > -- Regards Erik
Re: Using column plus value or only column?
@Nathan So what I'm planning to do is to store multiple sort orders for the same data, where they all use the same data table just fetches it in different orders, so to say. I want to be able to rad the different sort orders from the front and from the back to get both regular and reverse sort order. With your approach using super columns you would need to replicate all data, right? And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598correctly you would need to read the whole thing before you can limit the results handed back to you. In regards to the two calls get_slice and get_range_slice, the way I understand it is that you hand the second one an optional start and stop key plus a limit, to get a range of keys/rows. I was planning to use this call together with the OPP, but are thinking about not using it since there is no way to do an inverse scan, right? Thanks a lot Erik On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell wrote: > infinite is a bit of a bold claim > > by my understanding you are bound by the memory of the jvm as all of > the content of a key/row currently needs to fit in memory for > compaction, which includes columns and supercolumns for given key/row. > > if you are going to run into those scenarios then some sort of > sharding on the keys is required, afaict > > cheers, > jesse > > -- > jesse mcconnell > jesse.mcconn...@gmail.com > > > > On Tue, Feb 2, 2010 at 16:30, Nathan McCall > wrote: > > Erik, > > Sure, you could and depending on the workload, that might be quite > > efficient for small pieces of data. However, this also sounds like > > something that might be better addressed with the addition of a > > SuperColumn on "Sorts" and getting rid of "Data" altogether: > > > > Sorts : { > > sort_row_1 : { > >sortKey1 : { col1:val1, col2:val2 }, > >sortKey2 : { col1:val3, col2:val4 } > > } > > } > > > > You can have an infinite number of SuperColumns for a key, but make > > sure you understand get_slice vs. get_range_slice before you commit to > > a design. Hopefully I understood your example correctly, if not, do > > you have anything more concrete? > > > > Cheers, > > -Nate > > > > > > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad > wrote: > >> Thanks Nate for the example. > >> > >> I was thinking more a long the lines of something like: > >> > >> If you have a family > >> > >> Data : { > >> row1 : { > >> col1:val1, > >> row2 : { > >> col1:val2, > >> ... > >> } > >> } > >> > >> > >> Using > >> Sorts : { > >> sort_row : { > >> sortKey1_datarow1: [], > >> sortKey2_datarow2: [] > >> } > >> } > >> > >> Instead of > >> Sorts : { > >> sort_row : { > >> sortKey1: datarow1, > >> sortKey2: datarow2 > >> } > >> } > >> > >> If that makes any sense? > >> > >> -- > >> Regards Erik > >> > > > -- Regards Erik
Re: Using column plus value or only column?
Thanks Nate for the example. I was thinking more a long the lines of something like: If you have a family Data : { row1 : { col1:val1, row2 : { col1:val2, ... } } Using Sorts : { sort_row : { sortKey1_datarow1: [], sortKey2_datarow2: [] } } Instead of Sorts : { sort_row : { sortKey1: datarow1, sortKey2: datarow2 } } If that makes any sense? -- Regards Erik
Re: Key/row names?
Thank you! On Tue, Feb 2, 2010 at 9:41 AM, Jonathan Ellis wrote: > On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad > wrote: > > Is there a way to use a byte[] as the key instead of a string? > > no. > > > If not what is the main reason for using strings for the key but > > the columns and the values can be byte[]? > > historical baggage. we might switch to byte[] keys in 0.7. > > -Jonathan > -- Regards Erik
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 9:57 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad wrote: > >> >> Wow that sounds really good. So you are saying if I set it to reverse sort >> order and count 10 for the first round I get the last 10, >> for the next call I just set the last column from the first call to start >> and I will get the columns -10- -20, so to speak? > > > Actually, since they are reversed and you're trying to move backwards, > you'll need to pass the last column from the first query (since they will be > sorted in reverse order) as the start to the next one with reverse still set > to true. > > -Brandon > > Thanks a lot Brandon for clearing that out for me, I think that was what I was trying to say. But that is really good, cause now I don't have to store the data twice in different sort orders. -- Regards Erik
Using column plus value or only column?
Sorry that there are a lot of questions from me this week, just trying to better understand the best way to use Cassandra :) Let us say that you know the length of your key, everything is standardized, are there people out there that just tag the value onto the key so that you don't have to pay the extra overhead of the second byte[]? -- Regards Erik
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 9:35 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad wrote: > >> Thanks guys! >> So I want to use sliceRange but thinking about using the count parameter. >> For example give me >> the first x columns, next call I would like to call it with a start value >> and a count. >> >> If I was to use the reverse param in sliceRange I would have to fetch all >> the columns first, right? > > > If you pass reverse as true, then instead of getting the first x columns, > you'll get the last x columns. If you want to head backwards toward the > beginning, you can pass the first column as the end value. > > -Brandon > Wow that sounds really good. So you are saying if I set it to reverse sort order and count 10 for the first round I get the last 10, for the next call I just set the last column from the first call to start and I will get the columns -10- -20, so to speak? -- Regards Erik
Key/row names?
Is there a way to use a byte[] as the key instead of a string? If not what is the main reason for using strings for the key but the columns and the values can be byte[]? Is it just to be able to use it as the key in a Map etc or are there other reasons? -- Regards Erik
Re: Reverse sort order comparator?
Thanks guys! So I want to use sliceRange but thinking about using the count parameter. For example give me the first x columns, next call I would like to call it with a start value and a count. If I was to use the reverse param in sliceRange I would have to fetch all the columns first, right? On Tue, Feb 2, 2010 at 9:23 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad wrote: > >> Hey! >> I'm looking for a comparator that sort columns in reverse order on for >> example bytes? >> I saw that you can write your own comparator class, but just thought that >> someone must have done that already. > > > When you get_slice, just set reverse to true in the SliceRange and it will > reverse the order. > > -Brandon > -- Regards Erik
Reverse sort order comparator?
Hey! I'm looking for a comparator that sort columns in reverse order on for example bytes? I saw that you can write your own comparator class, but just thought that someone must have done that already. -- Regards Erik
Re: Best design in Cassandra
On Tue, Feb 2, 2010 at 7:45 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad wrote: >> >> A supercolumn can still only compare subcolumns in a single way. >>> >> Yeah, I know that, but you can have a super column per sort order without >> having to restart the cluster. >> > > You get a CompareWith for the columns, and a CompareSubcolumnsWith for > subcolumns. If you need more column types to get different sort orders, you > need another ColumnFamily. > Not sure what column types mean. What I want to do is to have a few things sorted by asc and desc order, like {a,b}, {b,a} and {1,2}, {2,1} > > -Brandon > > -- Regards Erik
Re: How to retrieve keys from Cassandra ?
Hi Sebastien! I'm totally new to Cassandra, but as far as I know there is no way of getting just the keys that are in the database, they are not stored separately but only with the data itself. Why do you want a list of keys, what are you going to use them for? Maybe there is another way of solving your problem. What you are describing, getting all the keys/rows for a given column sounds like you have to fetch all the data that you have and then filter every key on your column, I don't think that get_key_range will do that for you even, says that it takes column_family, but like I said I'm totally new Erik 2010/2/2 Sébastien Pierre > Hi all, > > I would like to know how to retrieve the list of available keys available > for a specific column. There is the get_key_range method, but it is only > available when using the OrderPreservingPartitioner -- I use a > RandomPartitioner. > > Does this mean that when using a RandomPartitioner, you cannot see which > keys are available in the database ? > > -- Sébastien > -- Regards Erik
Re: Best design in Cassandra
On Mon, Feb 1, 2010 at 3:31 PM, Brandon Williams wrote: > On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad wrote: > >> Hey! >> Have a couple of questions about the best way to use Cassandra. >> Using the random partitioner + the multi_get calls vs order preservation + >> range_slice calls? >> > > When you use an OPP, the distribution of your keys becomes your problem. > If you don't have an even distribution, this will be reflected in the load > on the nodes, while the RP gives you even distribution. > Yeah, that is why it would be nice to hear if anyone has compared the performance between the two, to see if it is worth worrying about your own distribution. I also read that the random partitioner doesn't give that great distribution. > > What is the benefit of using multiple families vs super column? > > > http://issues.apache.org/jira/browse/CASSANDRA-598 is currrently why I > prefer simple CFs instead of supercolumns. > Yeah, this is nasty. > > >> For example in the case of sorting >> in different orders. One good thing that I can see here when using super >> column is that you don't >> have to restart your cluster every time you want to add something new >> order. >> > > A supercolumn can still only compare subcolumns in a single way. > Yeah, I know that, but you can have a super column per sort order without having to restart the cluster. > > When http://issues.apache.org/jira/browse/CASSANDRA-44 is completed, you > will be able to add CFs without restarting. > Looks interesting, but targeted at 0.7, so it is probably going to be a little while, or? > > -Brandon > -- Regards Erik
Re: Sample applications
Hi Carlos! I'm also really new to Cassandra but here are a couple of links that I found useful: http://wiki.apache.org/cassandra/ClientExamples http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model and one of the presentations like: http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod Erik
Best design in Cassandra
Hey! Have a couple of questions about the best way to use Cassandra. Using the random partitioner + the multi_get calls vs order preservation + range_slice calls? What is the benefit of using multiple families vs super column? For example in the case of sorting in different orders. One good thing that I can see here when using super column is that you don't have to restart your cluster every time you want to add something new order. -- Regards Erik
Re: Internal structure of api calls
Thanks a lot Brandon!
Internal structure of api calls
Hey guys! I'm totally new to Cassandra and have a couple of question about the internal structure of some of the calls. When using the slicerange(count) for the get calls, does the actual result being truncated on the server or is it happening on the client ie is it more efficient than the regular call? Is there an internal counter for the get_count call that keeps track of the count or do you only save on return IO? -- Regards Erik