Re: How do cassandra clients failover?
On Mon, Feb 1, 2010 at 7:38 PM, Jonathan Ellis wrote: > No. Thrift is just an RPC mechanism. Whether RRDNS, software or > hardware load balancing, or client-based failover like Gary describes > is best is not a one-size-fits-all answer. Everyone who uses Cassandra would need to implement Loadbalancing and failover. Some may do it right and some may do it wrong .Because this solution is going to be cassandra specific , you may not find any publicly available libraries to help you out. Ideally, the client would be a a Thrift API wrapper, which automatically does Loadbalancing and failover . This definitely may not be the only solution. But this can be one which may not need any external RRDNS. > > 2010/2/1 Noble Paul നോബിള് नोब्ळ् : >> is it worth adding this feature to the standard java client? >> >> On Mon, Feb 1, 2010 at 7:28 PM, Gary Dusbabek wrote: >>> One approach is to discover what other nodes there are before any of >>> them fail. Then when you detect failure, you can connect to a >>> different node that is (hopefully) still responding. >>> >>> There is an API call that allows you get get a list of all the nodes: >>> client.get_string_property("token map"), which returns a JSON list of >>> the node ring. >>> >>> I hope that helps. >>> >>> Gary. >>> >>> 2010/2/1 Noble Paul നോബിള് नोब्ळ् : The cassandra client (thift client) is started up with the host:post of a single cassandra node. * What happens if that node fails? * Does it mean that all the operations go through the same node? --Noble >>> >> >> >> >> -- >> - >> Noble Paul | Systems Architect| AOL | http://aol.com >> > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Using column plus value or only column?
Don't be silly, thanks a lot for helping me out! -- Regards Erik
Re: Using column plus value or only column?
Ok - I was afraid I was going to miss something with the generic example before - my apologies on that. You cannot impose an order on keys like that as far as I am aware. I think maintaining a Sort CF as you had originally is a decent approach. Cheers, -Nate On Tue, Feb 2, 2010 at 4:06 PM, Erik Holstad wrote: > Hey Nate! > What I wanted to do with the get_range_slice was to receive the keys in the > inverted order, so that I could so offset limit queries on key ranges in > reverse > order. Like you said, this can be done for both columns and super columns > with > help of the SliceRange, but not on keys afaik, but maybe there is a way? > > Thanks Erik > > > On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall > wrote: >> >> Erik, >> You can do an inverse with 'reversed=true' in SliceRange as part of >> the SlicePredicate for both get_slice or get_range_slice. I have not >> tried reverse=true on SuperColumn results, but I dont think there is >> any difference there - what can't be changed is how things are ordered >> but direction can go either way (If I am wrong on this, somebody >> please correct me). >> >> http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my >> radar as I dont have anything reporting-ish like you describe with >> SuperColumns (yet). I will defer to more experienced folks with this. >> >> Regards, >> -Nate >> >> >> On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad >> wrote: >> > @Nathan >> > So what I'm planning to do is to store multiple sort orders for the same >> > data, where they all use the >> > same data table just fetches it in different orders, so to say. I want >> > to be >> > able to rad the different sort >> > orders from the front and from the back to get both regular and reverse >> > sort >> > order. >> > >> > With your approach using super columns you would need to replicate all >> > data, >> > right? >> > >> > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598 >> > correctly you would need to >> > read the whole thing before you can limit the results handed back to >> > you. >> > >> > In regards to the two calls get_slice and get_range_slice, the way I >> > understand it is that you hand >> > the second one an optional start and stop key plus a limit, to get a >> > range >> > of keys/rows. I was planning >> > to use this call together with the OPP, but are thinking about not using >> > it >> > since there is no way to do >> > an inverse scan, right? >> > >> > Thanks a lot >> > Erik >> > >> > >> > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell >> > >> > wrote: >> >> >> >> infinite is a bit of a bold claim >> >> >> >> by my understanding you are bound by the memory of the jvm as all of >> >> the content of a key/row currently needs to fit in memory for >> >> compaction, which includes columns and supercolumns for given key/row. >> >> >> >> if you are going to run into those scenarios then some sort of >> >> sharding on the keys is required, afaict >> >> >> >> cheers, >> >> jesse >> >> >> >> -- >> >> jesse mcconnell >> >> jesse.mcconn...@gmail.com >> >> >> >> >> >> >> >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall >> >> wrote: >> >> > Erik, >> >> > Sure, you could and depending on the workload, that might be quite >> >> > efficient for small pieces of data. However, this also sounds like >> >> > something that might be better addressed with the addition of a >> >> > SuperColumn on "Sorts" and getting rid of "Data" altogether: >> >> > >> >> > Sorts : { >> >> > sort_row_1 : { >> >> > sortKey1 : { col1:val1, col2:val2 }, >> >> > sortKey2 : { col1:val3, col2:val4 } >> >> > } >> >> > } >> >> > >> >> > You can have an infinite number of SuperColumns for a key, but make >> >> > sure you understand get_slice vs. get_range_slice before you commit >> >> > to >> >> > a design. Hopefully I understood your example correctly, if not, do >> >> > you have anything more concrete? >> >> > >> >> > Cheers, >> >> > -Nate >> >> > >> >> > >> >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad >> >> > wrote: >> >> >> Thanks Nate for the example. >> >> >> >> >> >> I was thinking more a long the lines of something like: >> >> >> >> >> >> If you have a family >> >> >> >> >> >> Data : { >> >> >> row1 : { >> >> >> col1:val1, >> >> >> row2 : { >> >> >> col1:val2, >> >> >> ... >> >> >> } >> >> >> } >> >> >> >> >> >> >> >> >> Using >> >> >> Sorts : { >> >> >> sort_row : { >> >> >> sortKey1_datarow1: [], >> >> >> sortKey2_datarow2: [] >> >> >> } >> >> >> } >> >> >> >> >> >> Instead of >> >> >> Sorts : { >> >> >> sort_row : { >> >> >> sortKey1: datarow1, >> >> >> sortKey2: datarow2 >> >> >> } >> >> >> } >> >> >> >> >> >> If that makes any sense? >> >> >> >> >> >> -- >> >> >> Regards Erik >> >> >> >> >> > >> > >> > >> > >> > -- >> > Regards Erik >> > > > > > -- > Regards Erik >
Re: Using column plus value or only column?
I don't understand what you mean ;) Will see what happens when we are done with this first project, will see if we can get some time to give back. -- Regards Erik
Re: Using column plus value or only column?
Right, we don't currently support scanning rows in reverse order, but that is only because nobody has wanted it badly enough to code it. :) On Tue, Feb 2, 2010 at 6:06 PM, Erik Holstad wrote: > Hey Nate! > What I wanted to do with the get_range_slice was to receive the keys in the > inverted order, so that I could so offset limit queries on key ranges in > reverse > order. Like you said, this can be done for both columns and super columns > with > help of the SliceRange, but not on keys afaik, but maybe there is a way? > > Thanks Erik > > > On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall > wrote: >> >> Erik, >> You can do an inverse with 'reversed=true' in SliceRange as part of >> the SlicePredicate for both get_slice or get_range_slice. I have not >> tried reverse=true on SuperColumn results, but I dont think there is >> any difference there - what can't be changed is how things are ordered >> but direction can go either way (If I am wrong on this, somebody >> please correct me). >> >> http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my >> radar as I dont have anything reporting-ish like you describe with >> SuperColumns (yet). I will defer to more experienced folks with this. >> >> Regards, >> -Nate >> >> >> On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad >> wrote: >> > @Nathan >> > So what I'm planning to do is to store multiple sort orders for the same >> > data, where they all use the >> > same data table just fetches it in different orders, so to say. I want >> > to be >> > able to rad the different sort >> > orders from the front and from the back to get both regular and reverse >> > sort >> > order. >> > >> > With your approach using super columns you would need to replicate all >> > data, >> > right? >> > >> > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598 >> > correctly you would need to >> > read the whole thing before you can limit the results handed back to >> > you. >> > >> > In regards to the two calls get_slice and get_range_slice, the way I >> > understand it is that you hand >> > the second one an optional start and stop key plus a limit, to get a >> > range >> > of keys/rows. I was planning >> > to use this call together with the OPP, but are thinking about not using >> > it >> > since there is no way to do >> > an inverse scan, right? >> > >> > Thanks a lot >> > Erik >> > >> > >> > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell >> > >> > wrote: >> >> >> >> infinite is a bit of a bold claim >> >> >> >> by my understanding you are bound by the memory of the jvm as all of >> >> the content of a key/row currently needs to fit in memory for >> >> compaction, which includes columns and supercolumns for given key/row. >> >> >> >> if you are going to run into those scenarios then some sort of >> >> sharding on the keys is required, afaict >> >> >> >> cheers, >> >> jesse >> >> >> >> -- >> >> jesse mcconnell >> >> jesse.mcconn...@gmail.com >> >> >> >> >> >> >> >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall >> >> wrote: >> >> > Erik, >> >> > Sure, you could and depending on the workload, that might be quite >> >> > efficient for small pieces of data. However, this also sounds like >> >> > something that might be better addressed with the addition of a >> >> > SuperColumn on "Sorts" and getting rid of "Data" altogether: >> >> > >> >> > Sorts : { >> >> > sort_row_1 : { >> >> > sortKey1 : { col1:val1, col2:val2 }, >> >> > sortKey2 : { col1:val3, col2:val4 } >> >> > } >> >> > } >> >> > >> >> > You can have an infinite number of SuperColumns for a key, but make >> >> > sure you understand get_slice vs. get_range_slice before you commit >> >> > to >> >> > a design. Hopefully I understood your example correctly, if not, do >> >> > you have anything more concrete? >> >> > >> >> > Cheers, >> >> > -Nate >> >> > >> >> > >> >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad >> >> > wrote: >> >> >> Thanks Nate for the example. >> >> >> >> >> >> I was thinking more a long the lines of something like: >> >> >> >> >> >> If you have a family >> >> >> >> >> >> Data : { >> >> >> row1 : { >> >> >> col1:val1, >> >> >> row2 : { >> >> >> col1:val2, >> >> >> ... >> >> >> } >> >> >> } >> >> >> >> >> >> >> >> >> Using >> >> >> Sorts : { >> >> >> sort_row : { >> >> >> sortKey1_datarow1: [], >> >> >> sortKey2_datarow2: [] >> >> >> } >> >> >> } >> >> >> >> >> >> Instead of >> >> >> Sorts : { >> >> >> sort_row : { >> >> >> sortKey1: datarow1, >> >> >> sortKey2: datarow2 >> >> >> } >> >> >> } >> >> >> >> >> >> If that makes any sense? >> >> >> >> >> >> -- >> >> >> Regards Erik >> >> >> >> >> > >> > >> > >> > >> > -- >> > Regards Erik >> > > > > > -- > Regards Erik >
Re: Using column plus value or only column?
Hey Nate! What I wanted to do with the get_range_slice was to receive the keys in the inverted order, so that I could so offset limit queries on key ranges in reverse order. Like you said, this can be done for both columns and super columns with help of the SliceRange, but not on keys afaik, but maybe there is a way? Thanks Erik On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall wrote: > Erik, > You can do an inverse with 'reversed=true' in SliceRange as part of > the SlicePredicate for both get_slice or get_range_slice. I have not > tried reverse=true on SuperColumn results, but I dont think there is > any difference there - what can't be changed is how things are ordered > but direction can go either way (If I am wrong on this, somebody > please correct me). > > http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my > radar as I dont have anything reporting-ish like you describe with > SuperColumns (yet). I will defer to more experienced folks with this. > > Regards, > -Nate > > > On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad > wrote: > > @Nathan > > So what I'm planning to do is to store multiple sort orders for the same > > data, where they all use the > > same data table just fetches it in different orders, so to say. I want to > be > > able to rad the different sort > > orders from the front and from the back to get both regular and reverse > sort > > order. > > > > With your approach using super columns you would need to replicate all > data, > > right? > > > > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598 > > correctly you would need to > > read the whole thing before you can limit the results handed back to you. > > > > In regards to the two calls get_slice and get_range_slice, the way I > > understand it is that you hand > > the second one an optional start and stop key plus a limit, to get a > range > > of keys/rows. I was planning > > to use this call together with the OPP, but are thinking about not using > it > > since there is no way to do > > an inverse scan, right? > > > > Thanks a lot > > Erik > > > > > > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell < > jesse.mcconn...@gmail.com> > > wrote: > >> > >> infinite is a bit of a bold claim > >> > >> by my understanding you are bound by the memory of the jvm as all of > >> the content of a key/row currently needs to fit in memory for > >> compaction, which includes columns and supercolumns for given key/row. > >> > >> if you are going to run into those scenarios then some sort of > >> sharding on the keys is required, afaict > >> > >> cheers, > >> jesse > >> > >> -- > >> jesse mcconnell > >> jesse.mcconn...@gmail.com > >> > >> > >> > >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall > >> wrote: > >> > Erik, > >> > Sure, you could and depending on the workload, that might be quite > >> > efficient for small pieces of data. However, this also sounds like > >> > something that might be better addressed with the addition of a > >> > SuperColumn on "Sorts" and getting rid of "Data" altogether: > >> > > >> > Sorts : { > >> > sort_row_1 : { > >> >sortKey1 : { col1:val1, col2:val2 }, > >> >sortKey2 : { col1:val3, col2:val4 } > >> > } > >> > } > >> > > >> > You can have an infinite number of SuperColumns for a key, but make > >> > sure you understand get_slice vs. get_range_slice before you commit to > >> > a design. Hopefully I understood your example correctly, if not, do > >> > you have anything more concrete? > >> > > >> > Cheers, > >> > -Nate > >> > > >> > > >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad > >> > wrote: > >> >> Thanks Nate for the example. > >> >> > >> >> I was thinking more a long the lines of something like: > >> >> > >> >> If you have a family > >> >> > >> >> Data : { > >> >> row1 : { > >> >> col1:val1, > >> >> row2 : { > >> >> col1:val2, > >> >> ... > >> >> } > >> >> } > >> >> > >> >> > >> >> Using > >> >> Sorts : { > >> >> sort_row : { > >> >> sortKey1_datarow1: [], > >> >> sortKey2_datarow2: [] > >> >> } > >> >> } > >> >> > >> >> Instead of > >> >> Sorts : { > >> >> sort_row : { > >> >> sortKey1: datarow1, > >> >> sortKey2: datarow2 > >> >> } > >> >> } > >> >> > >> >> If that makes any sense? > >> >> > >> >> -- > >> >> Regards Erik > >> >> > >> > > > > > > > > > -- > > Regards Erik > > > -- Regards Erik
Re: Using column plus value or only column?
Erik, You can do an inverse with 'reversed=true' in SliceRange as part of the SlicePredicate for both get_slice or get_range_slice. I have not tried reverse=true on SuperColumn results, but I dont think there is any difference there - what can't be changed is how things are ordered but direction can go either way (If I am wrong on this, somebody please correct me). http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my radar as I dont have anything reporting-ish like you describe with SuperColumns (yet). I will defer to more experienced folks with this. Regards, -Nate On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad wrote: > @Nathan > So what I'm planning to do is to store multiple sort orders for the same > data, where they all use the > same data table just fetches it in different orders, so to say. I want to be > able to rad the different sort > orders from the front and from the back to get both regular and reverse sort > order. > > With your approach using super columns you would need to replicate all data, > right? > > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598 > correctly you would need to > read the whole thing before you can limit the results handed back to you. > > In regards to the two calls get_slice and get_range_slice, the way I > understand it is that you hand > the second one an optional start and stop key plus a limit, to get a range > of keys/rows. I was planning > to use this call together with the OPP, but are thinking about not using it > since there is no way to do > an inverse scan, right? > > Thanks a lot > Erik > > > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell > wrote: >> >> infinite is a bit of a bold claim >> >> by my understanding you are bound by the memory of the jvm as all of >> the content of a key/row currently needs to fit in memory for >> compaction, which includes columns and supercolumns for given key/row. >> >> if you are going to run into those scenarios then some sort of >> sharding on the keys is required, afaict >> >> cheers, >> jesse >> >> -- >> jesse mcconnell >> jesse.mcconn...@gmail.com >> >> >> >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall >> wrote: >> > Erik, >> > Sure, you could and depending on the workload, that might be quite >> > efficient for small pieces of data. However, this also sounds like >> > something that might be better addressed with the addition of a >> > SuperColumn on "Sorts" and getting rid of "Data" altogether: >> > >> > Sorts : { >> > sort_row_1 : { >> > sortKey1 : { col1:val1, col2:val2 }, >> > sortKey2 : { col1:val3, col2:val4 } >> > } >> > } >> > >> > You can have an infinite number of SuperColumns for a key, but make >> > sure you understand get_slice vs. get_range_slice before you commit to >> > a design. Hopefully I understood your example correctly, if not, do >> > you have anything more concrete? >> > >> > Cheers, >> > -Nate >> > >> > >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad >> > wrote: >> >> Thanks Nate for the example. >> >> >> >> I was thinking more a long the lines of something like: >> >> >> >> If you have a family >> >> >> >> Data : { >> >> row1 : { >> >> col1:val1, >> >> row2 : { >> >> col1:val2, >> >> ... >> >> } >> >> } >> >> >> >> >> >> Using >> >> Sorts : { >> >> sort_row : { >> >> sortKey1_datarow1: [], >> >> sortKey2_datarow2: [] >> >> } >> >> } >> >> >> >> Instead of >> >> Sorts : { >> >> sort_row : { >> >> sortKey1: datarow1, >> >> sortKey2: datarow2 >> >> } >> >> } >> >> >> >> If that makes any sense? >> >> >> >> -- >> >> Regards Erik >> >> >> > > > > > -- > Regards Erik >
Re: order-preserving partitioner per CF?
just remember that you can't mix nodes w/ different partitioner types in the same cluster. On Tue, Feb 2, 2010 at 5:04 PM, Wojciech Kaczmarek wrote: > Yeah excellent. > > I checked that it's doable to convert the data to another Partitioner > using json backup tools - cool. I will probably write own partitioner > so it's good I won't loose my test data (though I assume I need to > pack all my data back to one node, export to json, delete sstables, > change partitioner, import sstables, then rerun node and subsequently > distribute to others). > > On Tue, Feb 2, 2010 at 22:52, Jonathan Ellis wrote: >> yes >> >> On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek >> wrote: >>> On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis wrote: > My biggest question so far is about order-preserving partitioner. I'd > like to have such partitioner for a specific column family, having > random partitioner for others. Is it possible wrt to the current > architecture? No. >>> >>> Ok. Upon reading more details on a wiki I see it doesn't fit now. >>> >>> Now I'm thinking about scenarios of distributing the keys using OPP >>> without knowing the number of nodes a priori. >>> >>> Does this explanation: >>> http://wiki.apache.org/cassandra/Operations#Range_changes >>> >>> applies to any partitioner? >>> >> >
Re: order-preserving partitioner per CF?
Yeah excellent. I checked that it's doable to convert the data to another Partitioner using json backup tools - cool. I will probably write own partitioner so it's good I won't loose my test data (though I assume I need to pack all my data back to one node, export to json, delete sstables, change partitioner, import sstables, then rerun node and subsequently distribute to others). On Tue, Feb 2, 2010 at 22:52, Jonathan Ellis wrote: > yes > > On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek > wrote: >> On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis wrote: >>> My biggest question so far is about order-preserving partitioner. I'd like to have such partitioner for a specific column family, having random partitioner for others. Is it possible wrt to the current architecture? >>> >>> No. >> >> Ok. Upon reading more details on a wiki I see it doesn't fit now. >> >> Now I'm thinking about scenarios of distributing the keys using OPP >> without knowing the number of nodes a priori. >> >> Does this explanation: >> http://wiki.apache.org/cassandra/Operations#Range_changes >> >> applies to any partitioner? >> >
Re: Using column plus value or only column?
@Nathan So what I'm planning to do is to store multiple sort orders for the same data, where they all use the same data table just fetches it in different orders, so to say. I want to be able to rad the different sort orders from the front and from the back to get both regular and reverse sort order. With your approach using super columns you would need to replicate all data, right? And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598correctly you would need to read the whole thing before you can limit the results handed back to you. In regards to the two calls get_slice and get_range_slice, the way I understand it is that you hand the second one an optional start and stop key plus a limit, to get a range of keys/rows. I was planning to use this call together with the OPP, but are thinking about not using it since there is no way to do an inverse scan, right? Thanks a lot Erik On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell wrote: > infinite is a bit of a bold claim > > by my understanding you are bound by the memory of the jvm as all of > the content of a key/row currently needs to fit in memory for > compaction, which includes columns and supercolumns for given key/row. > > if you are going to run into those scenarios then some sort of > sharding on the keys is required, afaict > > cheers, > jesse > > -- > jesse mcconnell > jesse.mcconn...@gmail.com > > > > On Tue, Feb 2, 2010 at 16:30, Nathan McCall > wrote: > > Erik, > > Sure, you could and depending on the workload, that might be quite > > efficient for small pieces of data. However, this also sounds like > > something that might be better addressed with the addition of a > > SuperColumn on "Sorts" and getting rid of "Data" altogether: > > > > Sorts : { > > sort_row_1 : { > >sortKey1 : { col1:val1, col2:val2 }, > >sortKey2 : { col1:val3, col2:val4 } > > } > > } > > > > You can have an infinite number of SuperColumns for a key, but make > > sure you understand get_slice vs. get_range_slice before you commit to > > a design. Hopefully I understood your example correctly, if not, do > > you have anything more concrete? > > > > Cheers, > > -Nate > > > > > > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad > wrote: > >> Thanks Nate for the example. > >> > >> I was thinking more a long the lines of something like: > >> > >> If you have a family > >> > >> Data : { > >> row1 : { > >> col1:val1, > >> row2 : { > >> col1:val2, > >> ... > >> } > >> } > >> > >> > >> Using > >> Sorts : { > >> sort_row : { > >> sortKey1_datarow1: [], > >> sortKey2_datarow2: [] > >> } > >> } > >> > >> Instead of > >> Sorts : { > >> sort_row : { > >> sortKey1: datarow1, > >> sortKey2: datarow2 > >> } > >> } > >> > >> If that makes any sense? > >> > >> -- > >> Regards Erik > >> > > > -- Regards Erik
Re: Using column plus value or only column?
infinite is a bit of a bold claim by my understanding you are bound by the memory of the jvm as all of the content of a key/row currently needs to fit in memory for compaction, which includes columns and supercolumns for given key/row. if you are going to run into those scenarios then some sort of sharding on the keys is required, afaict cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Tue, Feb 2, 2010 at 16:30, Nathan McCall wrote: > Erik, > Sure, you could and depending on the workload, that might be quite > efficient for small pieces of data. However, this also sounds like > something that might be better addressed with the addition of a > SuperColumn on "Sorts" and getting rid of "Data" altogether: > > Sorts : { > sort_row_1 : { > sortKey1 : { col1:val1, col2:val2 }, > sortKey2 : { col1:val3, col2:val4 } > } > } > > You can have an infinite number of SuperColumns for a key, but make > sure you understand get_slice vs. get_range_slice before you commit to > a design. Hopefully I understood your example correctly, if not, do > you have anything more concrete? > > Cheers, > -Nate > > > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad wrote: >> Thanks Nate for the example. >> >> I was thinking more a long the lines of something like: >> >> If you have a family >> >> Data : { >> row1 : { >> col1:val1, >> row2 : { >> col1:val2, >> ... >> } >> } >> >> >> Using >> Sorts : { >> sort_row : { >> sortKey1_datarow1: [], >> sortKey2_datarow2: [] >> } >> } >> >> Instead of >> Sorts : { >> sort_row : { >> sortKey1: datarow1, >> sortKey2: datarow2 >> } >> } >> >> If that makes any sense? >> >> -- >> Regards Erik >> >
Re: Using column plus value or only column?
Erik, Sure, you could and depending on the workload, that might be quite efficient for small pieces of data. However, this also sounds like something that might be better addressed with the addition of a SuperColumn on "Sorts" and getting rid of "Data" altogether: Sorts : { sort_row_1 : { sortKey1 : { col1:val1, col2:val2 }, sortKey2 : { col1:val3, col2:val4 } } } You can have an infinite number of SuperColumns for a key, but make sure you understand get_slice vs. get_range_slice before you commit to a design. Hopefully I understood your example correctly, if not, do you have anything more concrete? Cheers, -Nate On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad wrote: > Thanks Nate for the example. > > I was thinking more a long the lines of something like: > > If you have a family > > Data : { > row1 : { > col1:val1, > row2 : { > col1:val2, > ... > } > } > > > Using > Sorts : { > sort_row : { > sortKey1_datarow1: [], > sortKey2_datarow2: [] > } > } > > Instead of > Sorts : { > sort_row : { > sortKey1: datarow1, > sortKey2: datarow2 > } > } > > If that makes any sense? > > -- > Regards Erik >
Re: order-preserving partitioner per CF?
yes On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek wrote: > On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis wrote: >> >>> My biggest question so far is about order-preserving partitioner. I'd >>> like to have such partitioner for a specific column family, having >>> random partitioner for others. Is it possible wrt to the current >>> architecture? >> >> No. > > Ok. Upon reading more details on a wiki I see it doesn't fit now. > > Now I'm thinking about scenarios of distributing the keys using OPP > without knowing the number of nodes a priori. > > Does this explanation: > http://wiki.apache.org/cassandra/Operations#Range_changes > > applies to any partitioner? >
Re: order-preserving partitioner per CF?
On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis wrote: > >> My biggest question so far is about order-preserving partitioner. I'd >> like to have such partitioner for a specific column family, having >> random partitioner for others. Is it possible wrt to the current >> architecture? > > No. Ok. Upon reading more details on a wiki I see it doesn't fit now. Now I'm thinking about scenarios of distributing the keys using OPP without knowing the number of nodes a priori. Does this explanation: http://wiki.apache.org/cassandra/Operations#Range_changes applies to any partitioner?
Re: "easy" interface to Cassandra
On Tue, 19 Jan 2010 08:09:13 -0600 Ted Zlatanov wrote: TZ> My proposal is as follows: TZ> - provide an IPluggableAPI interface; classes that implement it are TZ> essentially standalone Cassandra servers. Maybe this can just TZ> parallel Thread and implement Runnable. TZ> - enable the users to specify which IPluggableAPI they want and provide TZ> instantiation options (port, connection limit, etc.) TZ> - write a simple HTTPPluggableAPI, which provides a web server and TZ> accepts POST requests. The exact path and option spec can be worked TZ> out later. The input and output formats can be specified with a query TZ> parameter; at least JSON and XML should be supported. First very rough proposal is at https://issues.apache.org/jira/browse/CASSANDRA-754 Ted
Re: order-preserving partitioner per CF?
On Tue, Feb 2, 2010 at 2:53 PM, Wojciech Kaczmarek wrote: > Hi, > > I'm evaluating Cassandra since few days and I'd say it has really high > coolness factor! :) > > My biggest question so far is about order-preserving partitioner. I'd > like to have such partitioner for a specific column family, having > random partitioner for others. Is it possible wrt to the current > architecture? No. > If not, is it planned? As attractive as it is on the wish list, I don't see how you could sanely do it with the current architecture. -Jonathan
order-preserving partitioner per CF?
Hi, I'm evaluating Cassandra since few days and I'd say it has really high coolness factor! :) My biggest question so far is about order-preserving partitioner. I'd like to have such partitioner for a specific column family, having random partitioner for others. Is it possible wrt to the current architecture? If not, is it planned? What I see now is that partitioner is defined in the scope of Storage tag in storage-conf.xml, not even inside a keyspace definition. It makes me assume that partitioner setting is per the whole cassandra cluster. cheers, Wojtek
Re: Using column plus value or only column?
Thanks Nate for the example. I was thinking more a long the lines of something like: If you have a family Data : { row1 : { col1:val1, row2 : { col1:val2, ... } } Using Sorts : { sort_row : { sortKey1_datarow1: [], sortKey2_datarow2: [] } } Instead of Sorts : { sort_row : { sortKey1: datarow1, sortKey2: datarow2 } } If that makes any sense? -- Regards Erik
Re: Using column plus value or only column?
If I understand you correctly, I think I have a decent example. I have a ColumnFamily which models user preferences for a "site" in our system: UserPreferences : { 123_EDD43E57589F12032AF73E23A6AF3F47 : { favorite_color : red, ... } } I structured it this way because we have a lot of "sites" to which a user could create preferences for, so the site_id is prepended to the value of a session_id therefore you always need two pieces of information to enforce that a given record belongs to a specific "site". The site_id is always an integer and the session_id is always a 32 char string so sticking an underscore between them makes validatable parsing and construction trivial. The bloom filtering behind the key lookups also make checks for existence extremely fast. Note: I feel compelled to mention this is not a typical use case that I think would generally warrant anything outside of an RDBMS. However, In our system writes to this preference "table" can burst up to several thousand a second. Thus the need for the predictable write performance of Cassandra. Cheers, Nate On Tue, Feb 2, 2010 at 9:50 AM, Erik Holstad wrote: > Sorry that there are a lot of questions from me this week, just trying to > better understand > the best way to use Cassandra :) > > Let us say that you know the length of your key, everything is standardized, > are there people > out there that just tag the value onto the key so that you don't have to pay > the extra overhead > of the second byte[]? > > -- > Regards Erik >
Re: get_slice() slow if more number of columns present in a SCF.
On Tue, Feb 2, 2010 at 9:27 AM, envio user wrote: > All, > > Here are some tests[batch_insert() and get_slice()] I performed on > cassandra. > > I am ok with TEST1A and TEST1B. I want to populate the SCF with > 500 > columns and read 25 columns per key. > > > This test is more worrying for us. We can't even read 1000 reads per > second. Is there any limitation on cassandra, which will not work with > more number of columns ?. Or mm I doing something wrong here?. Please > let me know. > I think you're mostly being limited by http://issues.apache.org/jira/browse/CASSANDRA-598 Can you try with a simple CF? -Brandon
Re: How to retrieve keys from Cassandra ?
On Tue, Feb 2, 2010 at 12:51 PM, Jean-Denis Greze wrote: > Anyway, partially to address the efficiency concern, I've been playing > around with the idea of having 745-like functionality on a per-node basis: a > call to get all of the keys on a particular node as opposed to the entire > cluster. It just seems like with a very large cluster with billions, tens > of billions, or hundreds of billions of keys 745 would just get overwhelmed. That's why 745 is really there for hadoop support (https://issues.apache.org/jira/browse/CASSANDRA-342), not something intended to be used manually. -Jonathan
Re: get_slice() slow if more number of columns present in a SCF.
Thank you for the benchmarks. What version of Cassandra are you using? I had about 80% performance improvement on single node reads after using a trunk build with the results from https://issues.apache.org/jira/browse/CASSANDRA-688 (result caching) and playing around with the configuration. I am not yet running this in production though, so I cannot provide any real numbers. That said, I have no intention of deploying a single node. I keep seing performance concerns from folks on small or single node clusters. My impression so far is that Cassandra may not be the right solution for these types of deployments. My main interest in Cassandra is the linear scalability of reads and writes. From my own tests and some of the discussion on these lists, it seems Cassandra can thrash around a lot when the number of nodes <= the replication factor * 2, particularly if a node goes down. I understand this is a design trade-off of sorts and I am fine with it. Any sort of distributed, fault tolerant system is well served by using lots of commodity hardware. What I found to have been most valuable for my evaluation was to get a good test together with some real data from our system and then add nodes, remove nodes, break nodes, etc. and watch what happens. Once I finish with this, it looks like I will have some solid numbers to do some capacity planning for figuring out exactly how much hardware to purchase and when I will need to add more. Apologies to the original poster if that got a little long winded, but hopefully it will be useful information for folks. Cheers, -Nate On Tue, Feb 2, 2010 at 7:27 AM, envio user wrote: > All, > > Here are some tests[batch_insert() and get_slice()] I performed on cassandra. > > H/W: Single node, Quad Core(8 cores), 8GB RAM: > Two separate physical disks, one for the commit log and another for the data. > > storage-conf.xml > > 0.4 > 256 > 128 > 0.2 > 1440 > 16 > > > Data Model: > > CompareSubcolumnsWith="UTF8Type" Name="Super1" /> > > TEST1A > == > /home/sun>python stress.py -n 10 -t 100 -y super -u 1 -c 25 -r -o > insert -i 10 > WARNING: multiprocessing not present, threading will be used. > Benchmark may not be accurate! > total,interval_op_rate,avg_latency,elapsed_time > 19039,1903,0.0532085509215,10 > 52052,3301,0.0302550313445,20 > 82274,3022,0.0330235137811,30 > 10,1772,0.0337765234716,40 > > TEST1B > = > /home/sun>python stress.py -n 10 -t 100 -y super -u 1 -c 25 -r -o read -i > 10 > WARNING: multiprocessing not present, threading will be used. > Benchmark may not be accurate! > total,interval_op_rate,avg_latency,elapsed_time > 16472,1647,0.0615632034523,10 > 39375,2290,0.04384300123,20 > 65259,2588,0.0385473697268,30 > 91613,2635,0.0379411213277,40 > 10,838,0.0331208069702,50 > /home/sun> > > > I deleted all the data(all: commitlog,data..) and restarted cassandra.*** > I am ok with TEST1A and TEST1B. I want to populate the SCF with > 500 > columns and read 25 columns per key. > > TEST2A > == > /home/sun>python stress.py -n 10 -t 100 -y super -u 1 -c 600 -r -o > insert -i 10 > WARNING: multiprocessing not present, threading will be used. > Benchmark may not be accurate! > total,interval_op_rate,avg_latency,elapsed_time > . > . > 84216,144,0.689481827031,570 > 85768,155,0.625061393859,580 > 87307,153,0.648041650953,590 > 88785,147,0.671928719674,600 > 90488,170,0.611753724284,610 > 91983,149,0.677673689896,620 > 93490,150,0.63891824366,630 > 95017,152,0.65472143182,640 > 96612,159,0.64355712789,650 > 98098,148,0.673311280851,660 > 99622,152,0.486848112166,670 > 10,37,0.174115514629,680 > > I understand nobody will write 600 columns at a time. I just need to > populate the data, hence I did this test. > > [r...@fc10mc1 ~]# ls -l /var/lib/cassandra/commitlog/ > total 373880 > -rw-r--r-- 1 root root 268462742 2010-02-03 02:00 CommitLog-1265141714717.log > -rw-r--r-- 1 root root 114003919 2010-02-03 02:00 CommitLog-1265142593543.log > > [r...@fc10mc1 ~]# ls -l /cassandra/lib/cassandra/data/Keyspace1/ > total 3024232 > -rw-r--r-- 1 root root 1508524822 2010-02-03 02:00 Super1-192-Data.db > -rw-r--r-- 1 root root 92725 2010-02-03 02:00 Super1-192-Filter.db > -rw-r--r-- 1 root root 2639957 2010-02-03 02:00 Super1-192-Index.db > -rw-r--r-- 1 root root 100838971 2010-02-03 02:02 Super1-279-Data.db > -rw-r--r-- 1 root root 8725 2010-02-03 02:02 Super1-279-Filter.db > -rw-r--r-- 1 root root 176481 2010-02-03 02:02 Super1-279-Index.db > -rw-r--r-- 1 root root 1478775337 2010-02-03 02:03 Super1-280-Data.db > -rw-r--r-- 1 root root 90805 2010-02-03 02:03 Super1-280-Filter.db > -rw-r--r-- 1 root root 2588072 2010-02-03 02:03 Super1-280-Index.db > [r...@fc10mc1 ~]# > > [r...@fc10mc1 ~]# du -hs /cassandra/lib/cassandra/data/Keyspace1/ > 2.9G /cassandra/lib/cassandra/data/Keyspace1/ > > > TEST2B > == > > /home/sun>python stress.py -n 10 -t 100 -y su
Re: How to retrieve keys from Cassandra ?
Ok, so 0.6's https://issues.apache.org/jira/browse/CASSANDRA-745 permits "someone using RandomPartitioner to pass start="" and finish="" to get all of the rows in their cluster, although in an extremely inefficient way." We are in a situation like Pierre's, where we need to know what's currently in the DB so to speak -- except that we have a hundreds of millions of rows (and increasing) and that maintaining an index of the keys in another CF, as Brandon suggests, is becoming difficult (we also don't like the double write on initial key inserts, in terms of transactionality especially). Also, every once in a while, we need to enhance our data as part of some functionality upgrade or refactoring. So far, what we do is enhance on reads (i.e., whenever we read a particular record, see if it's not up to the latest version, and if so enhance), but there are many problems with this approach. We've been considering doing background process enhancing by running through all of the keys, which is why 745 is pretty exciting. We'd rather go through the inefficient operation once in a while as opposed to doing a check on every read. Anyway, partially to address the efficiency concern, I've been playing around with the idea of having 745-like functionality on a per-node basis: a call to get all of the keys on a particular node as opposed to the entire cluster. It just seems like with a very large cluster with billions, tens of billions, or hundreds of billions of keys 745 would just get overwhelmed. Just a thought. On Tue, Feb 2, 2010 at 7:31 AM, Jonathan Ellis wrote: > > More or less (but see > https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6). > > Think of it this way: when you have a few billion keys, how useful is > it to list them? > > -Jonathan > > 2010/2/2 Sébastien Pierre : > > Hi all, > > I would like to know how to retrieve the list of available keys available > > for a specific column. There is the get_key_range method, but it is only > > available when using the OrderPreservingPartitioner -- I use a > > RandomPartitioner. > > Does this mean that when using a RandomPartitioner, you cannot see which > > keys are available in the database ? > > -- Sébastien -- jeande...@6coders.com (917) 951-0636 This email and any files transmitted with it are confidential and intended solely for the use of the individual to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
Re: Key/row names?
Thank you! On Tue, Feb 2, 2010 at 9:41 AM, Jonathan Ellis wrote: > On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad > wrote: > > Is there a way to use a byte[] as the key instead of a string? > > no. > > > If not what is the main reason for using strings for the key but > > the columns and the values can be byte[]? > > historical baggage. we might switch to byte[] keys in 0.7. > > -Jonathan > -- Regards Erik
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 9:57 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad wrote: > >> >> Wow that sounds really good. So you are saying if I set it to reverse sort >> order and count 10 for the first round I get the last 10, >> for the next call I just set the last column from the first call to start >> and I will get the columns -10- -20, so to speak? > > > Actually, since they are reversed and you're trying to move backwards, > you'll need to pass the last column from the first query (since they will be > sorted in reverse order) as the start to the next one with reverse still set > to true. > > -Brandon > > Thanks a lot Brandon for clearing that out for me, I think that was what I was trying to say. But that is really good, cause now I don't have to store the data twice in different sort orders. -- Regards Erik
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad wrote: > > Wow that sounds really good. So you are saying if I set it to reverse sort > order and count 10 for the first round I get the last 10, > for the next call I just set the last column from the first call to start > and I will get the columns -10- -20, so to speak? Actually, since they are reversed and you're trying to move backwards, you'll need to pass the last column from the first query (since they will be sorted in reverse order) as the start to the next one with reverse still set to true. -Brandon
Using column plus value or only column?
Sorry that there are a lot of questions from me this week, just trying to better understand the best way to use Cassandra :) Let us say that you know the length of your key, everything is standardized, are there people out there that just tag the value onto the key so that you don't have to pay the extra overhead of the second byte[]? -- Regards Erik
Re: Key/row names?
On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad wrote: > Is there a way to use a byte[] as the key instead of a string? no. > If not what is the main reason for using strings for the key but > the columns and the values can be byte[]? historical baggage. we might switch to byte[] keys in 0.7. -Jonathan
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 9:35 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad wrote: > >> Thanks guys! >> So I want to use sliceRange but thinking about using the count parameter. >> For example give me >> the first x columns, next call I would like to call it with a start value >> and a count. >> >> If I was to use the reverse param in sliceRange I would have to fetch all >> the columns first, right? > > > If you pass reverse as true, then instead of getting the first x columns, > you'll get the last x columns. If you want to head backwards toward the > beginning, you can pass the first column as the end value. > > -Brandon > Wow that sounds really good. So you are saying if I set it to reverse sort order and count 10 for the first round I get the last 10, for the next call I just set the last column from the first call to start and I will get the columns -10- -20, so to speak? -- Regards Erik
Key/row names?
Is there a way to use a byte[] as the key instead of a string? If not what is the main reason for using strings for the key but the columns and the values can be byte[]? Is it just to be able to use it as the key in a Map etc or are there other reasons? -- Regards Erik
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad wrote: > Thanks guys! > So I want to use sliceRange but thinking about using the count parameter. > For example give me > the first x columns, next call I would like to call it with a start value > and a count. > > If I was to use the reverse param in sliceRange I would have to fetch all > the columns first, right? If you pass reverse as true, then instead of getting the first x columns, you'll get the last x columns. If you want to head backwards toward the beginning, you can pass the first column as the end value. -Brandon
Re: Reverse sort order comparator?
Thanks guys! So I want to use sliceRange but thinking about using the count parameter. For example give me the first x columns, next call I would like to call it with a start value and a count. If I was to use the reverse param in sliceRange I would have to fetch all the columns first, right? On Tue, Feb 2, 2010 at 9:23 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad wrote: > >> Hey! >> I'm looking for a comparator that sort columns in reverse order on for >> example bytes? >> I saw that you can write your own comparator class, but just thought that >> someone must have done that already. > > > When you get_slice, just set reverse to true in the SliceRange and it will > reverse the order. > > -Brandon > -- Regards Erik
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad wrote: > Hey! > I'm looking for a comparator that sort columns in reverse order on for > example bytes? > I saw that you can write your own comparator class, but just thought that > someone must have done that already. When you get_slice, just set reverse to true in the SliceRange and it will reverse the order. -Brandon
Re: Reverse sort order comparator?
you can scan in reversed (reversed=True in slicerange) w/o needing a custom comparator. On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad wrote: > Hey! > I'm looking for a comparator that sort columns in reverse order on for > example bytes? > I saw that you can write your own comparator class, but just thought that > someone must have done that already. > > -- > Regards Erik >
Reverse sort order comparator?
Hey! I'm looking for a comparator that sort columns in reverse order on for example bytes? I saw that you can write your own comparator class, but just thought that someone must have done that already. -- Regards Erik
Re: How to retrieve keys from Cassandra ?
Yes, that's a good idea ! I've considered doing that at some point, but I'm still learning the basics ;) -- Sébastien 2010/2/2 Brandon Williams > 2010/2/2 Sébastien Pierre > >> Hi Jonathan, >> >> In my case, I'll have much more columns (thousands to millions) than keys >> in logs (campaign x days), so it's not an issue to retrieve all of them. >> > > If that's the case, your dataset is small enough that you could maintain an > index of the keys in another CF. If it needs to scale further, you can > segment the index keys by year, month, etc. > > -Brandon >
Re: How to retrieve keys from Cassandra ?
2010/2/2 Sébastien Pierre > Hi Jonathan, > > In my case, I'll have much more columns (thousands to millions) than keys > in logs (campaign x days), so it's not an issue to retrieve all of them. > If that's the case, your dataset is small enough that you could maintain an index of the keys in another CF. If it needs to scale further, you can segment the index keys by year, month, etc. -Brandon
Re: How to retrieve keys from Cassandra ?
Hi Jonathan, In my case, I'll have much more columns (thousands to millions) than keys in logs (campaign x days), so it's not an issue to retrieve all of them. Also, if you assume that you can't retrieve values from Cassandra, just because you're using the wrong key (say your using "user/10" instead of "user:10") without the ability to list the keys, you'd have no way to find out the error. I'm glad to see this implemented :) -- Sébastien 2010/2/2 Jonathan Ellis > More or less (but see > https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6). > > Think of it this way: when you have a few billion keys, how useful is > it to list them? > > -Jonathan > > 2010/2/2 Sébastien Pierre : > > Hi all, > > I would like to know how to retrieve the list of available keys available > > for a specific column. There is the get_key_range method, but it is only > > available when using the OrderPreservingPartitioner -- I use a > > RandomPartitioner. > > Does this mean that when using a RandomPartitioner, you cannot see which > > keys are available in the database ? > > -- Sébastien >
Re: How to retrieve keys from Cassandra ?
Hi all, It's basically for "knowing what's inside the db", as I've been toying with Cassandra for some time, I have keys that are no longer useful and should be removed. I'm also storing HTTP logs in cassandra, where keys follow this convention "campaign::". So for instance, if I'd like to know what logs are available I just have to do: client.get_keys("Keyspace1", "Logs", "", "", 100, ConsistencyLevel.ONE) However, I have to use an OrderPreservingPartitioner to do so, which is (from my understanding) bad for load in this case. -- Sébastien 2010/2/2 Erik Holstad > Hi Sebastien! > I'm totally new to Cassandra, but as far as I know there is no way of > getting just the keys that are in the > database, they are not stored separately but only with the data itself. > > Why do you want a list of keys, what are you going to use them for? Maybe > there is another way of solving > your problem. > > What you are describing, getting all the keys/rows for a given column > sounds like you have to fetch all the > data that you have and then filter every key on your column, I don't think > that get_key_range will do that for > you even, says that it takes column_family, but like I said I'm totally new > > Erik > > 2010/2/2 Sébastien Pierre > >> Hi all, >> >> I would like to know how to retrieve the list of available keys available >> for a specific column. There is the get_key_range method, but it is only >> available when using the OrderPreservingPartitioner -- I use a >> RandomPartitioner. >> >> Does this mean that when using a RandomPartitioner, you cannot see which >> keys are available in the database ? >> >> -- Sébastien >> > > > > -- > Regards Erik >
Re: Did CASSANDRA-647 get fixed in 0.5?
Here it is: https://issues.apache.org/jira/browse/CASSANDRA-752 From: Jonathan Ellis To: cassandra-user@incubator.apache.org Sent: Mon, February 1, 2010 5:22:13 PM Subject: Re: Did CASSANDRA-647 get fixed in 0.5? Can you create a ticket for this? Thanks! On Mon, Feb 1, 2010 at 4:11 PM, Omer van der Horst Jansen wrote: > I checked out the 0.5 branch and ran ant release (on my linux box). > Installed the new tar.gz and ran the test on my Windows laptop as before but > got the same result -- the key isn't deleted from the perspective of > get_range_slice. > > Omer > > > From: Jonathan Ellis > To: cassandra-user@incubator.apache.org > Sent: Mon, February 1, 2010 4:52:17 PM > Subject: Re: Did CASSANDRA-647 get fixed in 0.5? > > 647 was committed for 0.5, yes, but CASSANDRA-703 was not. Can you > try the 0.5 branch and see if it is fixed there? > > On Mon, Feb 1, 2010 at 3:26 PM, Omer van der Horst Jansen > wrote: >> I'm running >> into an issue with Cassandra 0.5 (the current release version) that >> sounds exactly like the description of issue CASSANDRA-647. >> >> I'm >> using the Thrift Java API to store a couple of columns in a single row. A >> few seconds after that my application deletes the entire row. A plain >> Cassandra.Client.get() will then throw a NotFoundException for that >> particular key, as expected. However, the key will still show up when >> executing a >> Cassandra.Client.get_range_slice query. >> >> Here is some quick and >> dirty Java code that demonstrates the problem: >> >> import >> java.util.List; >> >> import org.apache.cassandra.service.*; >> import >> org.apache.thrift.protocol.*; >> import org.apache.thrift.transport.*; >> >> public class Cassandra647TestApp >> { >>/** >> * Demonstrates >> CASSANDRA-647 presence in Cassandra 0.5 release. >> * Requires an >> unmodified Cassandra configuration except that an >> * >> OrderPreservingPartitioner should be used. >> */ >>public >> static void main(String[] args) throws Exception >>{ >> >> String keyspace = "Keyspace1"; >>String cf = "Standard1"; >>String key = "testrow1"; >>byte[] columnName = >> "colname".getBytes(); >>byte[] data = "testdata".getBytes(); >> >>TTransport transport = new TSocket("localhost", 9160); >>TProtocol protocol = new TBinaryProtocol(transport); >> >> Cassandra.Client client = new Cassandra.Client(protocol); >> >> transport.open(); >>ColumnPath path = new ColumnPath(cf, null, >> columnName); >> >>client.insert(keyspace, key, path, data, >> System.currentTimeMillis(), >>ConsistencyLevel.ONE); >> >>Thread.sleep(1000); >> >>ColumnPath rowpath = new >> ColumnPath(cf, null, null); >> >>client.remove(keyspace, key, >> rowpath, System.currentTimeMillis(), >> >> ConsistencyLevel.ONE); >>Thread.sleep(1000); >> >>try >>{ >>ColumnOrSuperColumn cosc = client.get(keyspace, >> key, path, >>ConsistencyLevel.ONE); >> >> System.out.println("Whoops! NotFoundException not thrown!"); >>} >>catch (NotFoundException e) >>{ >> >> System.out.println("OK, we got a NotFoundException"); >>} >> >>ColumnParent parent = new ColumnParent(cf, null); >> >> SlicePredicate predicate = new SlicePredicate(); >>SliceRange >> range = new SliceRange(); >>range.start = new byte[0]; >>range.finish = new byte[0]; >>predicate.slice_range = range; >> >>List sliceList = client.get_range_slice(keyspace, parent, >>predicate, "", "", 1000, >> ConsistencyLevel.ONE); >> >>for (KeySlice k : sliceList) >>{ >>System.out.println("Found key " + k.key); >>if (key.equals(k.key)) >>{ >> >> System.out.println("but key " + k.key >>+ " >> should have been removed"); >>} >>} >>} >> } >> >> Am I using the API correctly in the code above? >> >> -Omer van der Horst Jansen >> >> >> >> >> > >
Re: Best design in Cassandra
On Tue, Feb 2, 2010 at 7:45 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad wrote: >> >> A supercolumn can still only compare subcolumns in a single way. >>> >> Yeah, I know that, but you can have a super column per sort order without >> having to restart the cluster. >> > > You get a CompareWith for the columns, and a CompareSubcolumnsWith for > subcolumns. If you need more column types to get different sort orders, you > need another ColumnFamily. > Not sure what column types mean. What I want to do is to have a few things sorted by asc and desc order, like {a,b}, {b,a} and {1,2}, {2,1} > > -Brandon > > -- Regards Erik
Re: Best design in Cassandra
On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad wrote: > > A supercolumn can still only compare subcolumns in a single way. >> > Yeah, I know that, but you can have a super column per sort order without > having to restart the cluster. > You get a CompareWith for the columns, and a CompareSubcolumnsWith for subcolumns. If you need more column types to get different sort orders, you need another ColumnFamily. -Brandon
Re: How to retrieve keys from Cassandra ?
Hi Sebastien! I'm totally new to Cassandra, but as far as I know there is no way of getting just the keys that are in the database, they are not stored separately but only with the data itself. Why do you want a list of keys, what are you going to use them for? Maybe there is another way of solving your problem. What you are describing, getting all the keys/rows for a given column sounds like you have to fetch all the data that you have and then filter every key on your column, I don't think that get_key_range will do that for you even, says that it takes column_family, but like I said I'm totally new Erik 2010/2/2 Sébastien Pierre > Hi all, > > I would like to know how to retrieve the list of available keys available > for a specific column. There is the get_key_range method, but it is only > available when using the OrderPreservingPartitioner -- I use a > RandomPartitioner. > > Does this mean that when using a RandomPartitioner, you cannot see which > keys are available in the database ? > > -- Sébastien > -- Regards Erik
Re: How to retrieve keys from Cassandra ?
More or less (but see https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6). Think of it this way: when you have a few billion keys, how useful is it to list them? -Jonathan 2010/2/2 Sébastien Pierre : > Hi all, > I would like to know how to retrieve the list of available keys available > for a specific column. There is the get_key_range method, but it is only > available when using the OrderPreservingPartitioner -- I use a > RandomPartitioner. > Does this mean that when using a RandomPartitioner, you cannot see which > keys are available in the database ? > -- Sébastien
Re: Best design in Cassandra
On Mon, Feb 1, 2010 at 3:31 PM, Brandon Williams wrote: > On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad wrote: > >> Hey! >> Have a couple of questions about the best way to use Cassandra. >> Using the random partitioner + the multi_get calls vs order preservation + >> range_slice calls? >> > > When you use an OPP, the distribution of your keys becomes your problem. > If you don't have an even distribution, this will be reflected in the load > on the nodes, while the RP gives you even distribution. > Yeah, that is why it would be nice to hear if anyone has compared the performance between the two, to see if it is worth worrying about your own distribution. I also read that the random partitioner doesn't give that great distribution. > > What is the benefit of using multiple families vs super column? > > > http://issues.apache.org/jira/browse/CASSANDRA-598 is currrently why I > prefer simple CFs instead of supercolumns. > Yeah, this is nasty. > > >> For example in the case of sorting >> in different orders. One good thing that I can see here when using super >> column is that you don't >> have to restart your cluster every time you want to add something new >> order. >> > > A supercolumn can still only compare subcolumns in a single way. > Yeah, I know that, but you can have a super column per sort order without having to restart the cluster. > > When http://issues.apache.org/jira/browse/CASSANDRA-44 is completed, you > will be able to add CFs without restarting. > Looks interesting, but targeted at 0.7, so it is probably going to be a little while, or? > > -Brandon > -- Regards Erik
How to retrieve keys from Cassandra ?
Hi all, I would like to know how to retrieve the list of available keys available for a specific column. There is the get_key_range method, but it is only available when using the OrderPreservingPartitioner -- I use a RandomPartitioner. Does this mean that when using a RandomPartitioner, you cannot see which keys are available in the database ? -- Sébastien
RE: Sample applications
Thanks Erik From: Erik Holstad [mailto:erikhols...@gmail.com] Sent: Tuesday, February 02, 2010 9:08 AM To: cassandra-user@incubator.apache.org Subject: Re: Sample applications Hi Carlos! I'm also really new to Cassandra but here are a couple of links that I found useful: http://wiki.apache.org/cassandra/ClientExamples http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model and one of the presentations like: http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod Erik This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
Re: Sample applications
Hi Carlos! I'm also really new to Cassandra but here are a couple of links that I found useful: http://wiki.apache.org/cassandra/ClientExamples http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model and one of the presentations like: http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod Erik
Re: Cassandra error with large connection
Thank you very much, Mr Jonathan. On Mon, Feb 1, 2010 at 11:04 AM, Jonathan Ellis wrote: > On Mon, Feb 1, 2010 at 10:03 AM, Jonathan Ellis wrote: > >> I see a lot of CLOSE_WAIT TCP connection. > > Also, this sounds like you are not properly pooling client connections > to casssandra. You should have one connection per user, not one > connection per operation. > > -Jonathan > -- Best regards, JKnight