Re: How do cassandra clients failover?

2010-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Feb 1, 2010 at 7:38 PM, Jonathan Ellis  wrote:
> No.  Thrift is just an RPC mechanism.  Whether RRDNS, software or
> hardware load balancing, or client-based failover like Gary describes
> is best is not a one-size-fits-all answer.
Everyone who uses Cassandra would need to implement Loadbalancing and
failover. Some may do it right and some may do it wrong .Because this
solution is going to be cassandra specific ,  you may not find any
publicly available libraries to help you out.

Ideally, the client would be a a Thrift API wrapper, which
automatically does Loadbalancing and failover . This definitely may
not be the only solution. But this can be one which may not need any
external RRDNS.

>
> 2010/2/1 Noble Paul നോബിള്‍  नोब्ळ् :
>> is it worth adding this feature to the standard java client?
>>
>> On Mon, Feb 1, 2010 at 7:28 PM, Gary Dusbabek  wrote:
>>> One approach is to discover what other nodes there are before any of
>>> them fail.  Then when you detect failure, you can connect to a
>>> different node that is (hopefully) still responding.
>>>
>>> There is an API call that allows you get get a list of all the nodes:
>>> client.get_string_property("token map"), which returns a JSON list of
>>> the node ring.
>>>
>>> I hope that helps.
>>>
>>> Gary.
>>>
>>> 2010/2/1 Noble Paul നോബിള്‍  नोब्ळ् :
 The cassandra client (thift client) is started up with the host:post
 of a single cassandra node.

 * What happens if that node fails?
 * Does it mean that all the operations go through the same node?

 --Noble

>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Systems Architect| AOL | http://aol.com
>>
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Don't be silly, thanks a lot for helping me out!

-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
Ok - I was afraid I was going to miss something with the generic
example before - my apologies on that. You cannot impose an order on
keys like that as far as I am aware. I think maintaining a Sort CF as
you had originally is a decent approach.

Cheers,
-Nate

On Tue, Feb 2, 2010 at 4:06 PM, Erik Holstad  wrote:
> Hey Nate!
> What I wanted to do with the get_range_slice was to receive the keys in the
> inverted order, so that I could so offset limit queries on key ranges in
> reverse
> order. Like you said, this can be done for both columns and super columns
> with
> help of the SliceRange, but not on keys afaik, but maybe there is a way?
>
> Thanks Erik
>
>
> On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall 
> wrote:
>>
>> Erik,
>> You can do an inverse with 'reversed=true' in SliceRange as part of
>> the SlicePredicate for both get_slice or get_range_slice. I have not
>> tried reverse=true on SuperColumn results, but I dont think there is
>> any difference there - what can't be changed is how things are ordered
>> but direction can go either way (If I am wrong on this, somebody
>> please correct me).
>>
>> http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
>> radar as I dont have anything reporting-ish like you describe with
>> SuperColumns (yet). I will defer to more experienced folks with this.
>>
>> Regards,
>> -Nate
>>
>>
>> On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad 
>> wrote:
>> > @Nathan
>> > So what I'm planning to do is to store multiple sort orders for the same
>> > data, where they all use the
>> > same data table just fetches it in different orders, so to say. I want
>> > to be
>> > able to rad the different sort
>> > orders from the front and from the back to get both regular and reverse
>> > sort
>> > order.
>> >
>> > With your approach using super columns you would need to replicate all
>> > data,
>> > right?
>> >
>> > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
>> > correctly you would need to
>> > read the whole thing before you can limit the results handed back to
>> > you.
>> >
>> > In regards to the two calls get_slice and get_range_slice, the way I
>> > understand it is that you hand
>> > the second one an optional start and stop key plus a limit, to get a
>> > range
>> > of keys/rows. I was planning
>> > to use this call together with the OPP, but are thinking about not using
>> > it
>> > since there is no way to do
>> > an inverse scan, right?
>> >
>> > Thanks a lot
>> > Erik
>> >
>> >
>> > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell
>> > 
>> > wrote:
>> >>
>> >> infinite is a bit of a bold claim
>> >>
>> >> by my understanding you are bound by the memory of the jvm as all of
>> >> the content of a key/row currently needs to fit in memory for
>> >> compaction, which includes columns and supercolumns for given key/row.
>> >>
>> >> if you are going to run into those scenarios then some sort of
>> >> sharding on the keys is required, afaict
>> >>
>> >> cheers,
>> >> jesse
>> >>
>> >> --
>> >> jesse mcconnell
>> >> jesse.mcconn...@gmail.com
>> >>
>> >>
>> >>
>> >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall 
>> >> wrote:
>> >> > Erik,
>> >> > Sure, you could and depending on the workload, that might be quite
>> >> > efficient for small pieces of data. However, this also sounds like
>> >> > something that might be better addressed with the addition of a
>> >> > SuperColumn on "Sorts" and getting rid of "Data" altogether:
>> >> >
>> >> > Sorts : {
>> >> >   sort_row_1 : {
>> >> >        sortKey1 : { col1:val1, col2:val2 },
>> >> >        sortKey2 : { col1:val3, col2:val4 }
>> >> >   }
>> >> > }
>> >> >
>> >> > You can have an infinite number of SuperColumns for a key, but make
>> >> > sure you understand get_slice vs. get_range_slice before you commit
>> >> > to
>> >> > a design. Hopefully I understood your example correctly, if not, do
>> >> > you have anything more concrete?
>> >> >
>> >> > Cheers,
>> >> > -Nate
>> >> >
>> >> >
>> >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad 
>> >> > wrote:
>> >> >> Thanks Nate for the example.
>> >> >>
>> >> >> I was thinking more a long the lines of something like:
>> >> >>
>> >> >> If you have a family
>> >> >>
>> >> >> Data : {
>> >> >>   row1 : {
>> >> >>     col1:val1,
>> >> >>   row2 : {
>> >> >>     col1:val2,
>> >> >>     ...
>> >> >>   }
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Using
>> >> >> Sorts : {
>> >> >>   sort_row : {
>> >> >>     sortKey1_datarow1: [],
>> >> >>     sortKey2_datarow2: []
>> >> >>   }
>> >> >> }
>> >> >>
>> >> >> Instead of
>> >> >> Sorts : {
>> >> >>   sort_row : {
>> >> >>     sortKey1: datarow1,
>> >> >>     sortKey2: datarow2
>> >> >>   }
>> >> >> }
>> >> >>
>> >> >> If that makes any sense?
>> >> >>
>> >> >> --
>> >> >> Regards Erik
>> >> >>
>> >> >
>> >
>> >
>> >
>> > --
>> > Regards Erik
>> >
>
>
>
> --
> Regards Erik
>


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
I don't understand what you mean ;)
Will see what happens when we are done with this first project, will see
if we can get some time to give back.

-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Jonathan Ellis
Right, we don't currently support scanning rows in reverse order, but
that is only because nobody has wanted it badly enough to code it. :)

On Tue, Feb 2, 2010 at 6:06 PM, Erik Holstad  wrote:
> Hey Nate!
> What I wanted to do with the get_range_slice was to receive the keys in the
> inverted order, so that I could so offset limit queries on key ranges in
> reverse
> order. Like you said, this can be done for both columns and super columns
> with
> help of the SliceRange, but not on keys afaik, but maybe there is a way?
>
> Thanks Erik
>
>
> On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall 
> wrote:
>>
>> Erik,
>> You can do an inverse with 'reversed=true' in SliceRange as part of
>> the SlicePredicate for both get_slice or get_range_slice. I have not
>> tried reverse=true on SuperColumn results, but I dont think there is
>> any difference there - what can't be changed is how things are ordered
>> but direction can go either way (If I am wrong on this, somebody
>> please correct me).
>>
>> http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
>> radar as I dont have anything reporting-ish like you describe with
>> SuperColumns (yet). I will defer to more experienced folks with this.
>>
>> Regards,
>> -Nate
>>
>>
>> On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad 
>> wrote:
>> > @Nathan
>> > So what I'm planning to do is to store multiple sort orders for the same
>> > data, where they all use the
>> > same data table just fetches it in different orders, so to say. I want
>> > to be
>> > able to rad the different sort
>> > orders from the front and from the back to get both regular and reverse
>> > sort
>> > order.
>> >
>> > With your approach using super columns you would need to replicate all
>> > data,
>> > right?
>> >
>> > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
>> > correctly you would need to
>> > read the whole thing before you can limit the results handed back to
>> > you.
>> >
>> > In regards to the two calls get_slice and get_range_slice, the way I
>> > understand it is that you hand
>> > the second one an optional start and stop key plus a limit, to get a
>> > range
>> > of keys/rows. I was planning
>> > to use this call together with the OPP, but are thinking about not using
>> > it
>> > since there is no way to do
>> > an inverse scan, right?
>> >
>> > Thanks a lot
>> > Erik
>> >
>> >
>> > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell
>> > 
>> > wrote:
>> >>
>> >> infinite is a bit of a bold claim
>> >>
>> >> by my understanding you are bound by the memory of the jvm as all of
>> >> the content of a key/row currently needs to fit in memory for
>> >> compaction, which includes columns and supercolumns for given key/row.
>> >>
>> >> if you are going to run into those scenarios then some sort of
>> >> sharding on the keys is required, afaict
>> >>
>> >> cheers,
>> >> jesse
>> >>
>> >> --
>> >> jesse mcconnell
>> >> jesse.mcconn...@gmail.com
>> >>
>> >>
>> >>
>> >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall 
>> >> wrote:
>> >> > Erik,
>> >> > Sure, you could and depending on the workload, that might be quite
>> >> > efficient for small pieces of data. However, this also sounds like
>> >> > something that might be better addressed with the addition of a
>> >> > SuperColumn on "Sorts" and getting rid of "Data" altogether:
>> >> >
>> >> > Sorts : {
>> >> >   sort_row_1 : {
>> >> >        sortKey1 : { col1:val1, col2:val2 },
>> >> >        sortKey2 : { col1:val3, col2:val4 }
>> >> >   }
>> >> > }
>> >> >
>> >> > You can have an infinite number of SuperColumns for a key, but make
>> >> > sure you understand get_slice vs. get_range_slice before you commit
>> >> > to
>> >> > a design. Hopefully I understood your example correctly, if not, do
>> >> > you have anything more concrete?
>> >> >
>> >> > Cheers,
>> >> > -Nate
>> >> >
>> >> >
>> >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad 
>> >> > wrote:
>> >> >> Thanks Nate for the example.
>> >> >>
>> >> >> I was thinking more a long the lines of something like:
>> >> >>
>> >> >> If you have a family
>> >> >>
>> >> >> Data : {
>> >> >>   row1 : {
>> >> >>     col1:val1,
>> >> >>   row2 : {
>> >> >>     col1:val2,
>> >> >>     ...
>> >> >>   }
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Using
>> >> >> Sorts : {
>> >> >>   sort_row : {
>> >> >>     sortKey1_datarow1: [],
>> >> >>     sortKey2_datarow2: []
>> >> >>   }
>> >> >> }
>> >> >>
>> >> >> Instead of
>> >> >> Sorts : {
>> >> >>   sort_row : {
>> >> >>     sortKey1: datarow1,
>> >> >>     sortKey2: datarow2
>> >> >>   }
>> >> >> }
>> >> >>
>> >> >> If that makes any sense?
>> >> >>
>> >> >> --
>> >> >> Regards Erik
>> >> >>
>> >> >
>> >
>> >
>> >
>> > --
>> > Regards Erik
>> >
>
>
>
> --
> Regards Erik
>


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Hey Nate!
What I wanted to do with the get_range_slice was to receive the keys in the
inverted order, so that I could so offset limit queries on key ranges in
reverse
order. Like you said, this can be done for both columns and super columns
with
help of the SliceRange, but not on keys afaik, but maybe there is a way?

Thanks Erik


On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall wrote:

> Erik,
> You can do an inverse with 'reversed=true' in SliceRange as part of
> the SlicePredicate for both get_slice or get_range_slice. I have not
> tried reverse=true on SuperColumn results, but I dont think there is
> any difference there - what can't be changed is how things are ordered
> but direction can go either way (If I am wrong on this, somebody
> please correct me).
>
> http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
> radar as I dont have anything reporting-ish like you describe with
> SuperColumns (yet). I will defer to more experienced folks with this.
>
> Regards,
> -Nate
>
>
> On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad 
> wrote:
> > @Nathan
> > So what I'm planning to do is to store multiple sort orders for the same
> > data, where they all use the
> > same data table just fetches it in different orders, so to say. I want to
> be
> > able to rad the different sort
> > orders from the front and from the back to get both regular and reverse
> sort
> > order.
> >
> > With your approach using super columns you would need to replicate all
> data,
> > right?
> >
> > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
> > correctly you would need to
> > read the whole thing before you can limit the results handed back to you.
> >
> > In regards to the two calls get_slice and get_range_slice, the way I
> > understand it is that you hand
> > the second one an optional start and stop key plus a limit, to get a
> range
> > of keys/rows. I was planning
> > to use this call together with the OPP, but are thinking about not using
> it
> > since there is no way to do
> > an inverse scan, right?
> >
> > Thanks a lot
> > Erik
> >
> >
> > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell <
> jesse.mcconn...@gmail.com>
> > wrote:
> >>
> >> infinite is a bit of a bold claim
> >>
> >> by my understanding you are bound by the memory of the jvm as all of
> >> the content of a key/row currently needs to fit in memory for
> >> compaction, which includes columns and supercolumns for given key/row.
> >>
> >> if you are going to run into those scenarios then some sort of
> >> sharding on the keys is required, afaict
> >>
> >> cheers,
> >> jesse
> >>
> >> --
> >> jesse mcconnell
> >> jesse.mcconn...@gmail.com
> >>
> >>
> >>
> >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall 
> >> wrote:
> >> > Erik,
> >> > Sure, you could and depending on the workload, that might be quite
> >> > efficient for small pieces of data. However, this also sounds like
> >> > something that might be better addressed with the addition of a
> >> > SuperColumn on "Sorts" and getting rid of "Data" altogether:
> >> >
> >> > Sorts : {
> >> >   sort_row_1 : {
> >> >sortKey1 : { col1:val1, col2:val2 },
> >> >sortKey2 : { col1:val3, col2:val4 }
> >> >   }
> >> > }
> >> >
> >> > You can have an infinite number of SuperColumns for a key, but make
> >> > sure you understand get_slice vs. get_range_slice before you commit to
> >> > a design. Hopefully I understood your example correctly, if not, do
> >> > you have anything more concrete?
> >> >
> >> > Cheers,
> >> > -Nate
> >> >
> >> >
> >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad 
> >> > wrote:
> >> >> Thanks Nate for the example.
> >> >>
> >> >> I was thinking more a long the lines of something like:
> >> >>
> >> >> If you have a family
> >> >>
> >> >> Data : {
> >> >>   row1 : {
> >> >> col1:val1,
> >> >>   row2 : {
> >> >> col1:val2,
> >> >> ...
> >> >>   }
> >> >> }
> >> >>
> >> >>
> >> >> Using
> >> >> Sorts : {
> >> >>   sort_row : {
> >> >> sortKey1_datarow1: [],
> >> >> sortKey2_datarow2: []
> >> >>   }
> >> >> }
> >> >>
> >> >> Instead of
> >> >> Sorts : {
> >> >>   sort_row : {
> >> >> sortKey1: datarow1,
> >> >> sortKey2: datarow2
> >> >>   }
> >> >> }
> >> >>
> >> >> If that makes any sense?
> >> >>
> >> >> --
> >> >> Regards Erik
> >> >>
> >> >
> >
> >
> >
> > --
> > Regards Erik
> >
>



-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
Erik,
You can do an inverse with 'reversed=true' in SliceRange as part of
the SlicePredicate for both get_slice or get_range_slice. I have not
tried reverse=true on SuperColumn results, but I dont think there is
any difference there - what can't be changed is how things are ordered
but direction can go either way (If I am wrong on this, somebody
please correct me).

http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
radar as I dont have anything reporting-ish like you describe with
SuperColumns (yet). I will defer to more experienced folks with this.

Regards,
-Nate


On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad  wrote:
> @Nathan
> So what I'm planning to do is to store multiple sort orders for the same
> data, where they all use the
> same data table just fetches it in different orders, so to say. I want to be
> able to rad the different sort
> orders from the front and from the back to get both regular and reverse sort
> order.
>
> With your approach using super columns you would need to replicate all data,
> right?
>
> And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
> correctly you would need to
> read the whole thing before you can limit the results handed back to you.
>
> In regards to the two calls get_slice and get_range_slice, the way I
> understand it is that you hand
> the second one an optional start and stop key plus a limit, to get a range
> of keys/rows. I was planning
> to use this call together with the OPP, but are thinking about not using it
> since there is no way to do
> an inverse scan, right?
>
> Thanks a lot
> Erik
>
>
> On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell 
> wrote:
>>
>> infinite is a bit of a bold claim
>>
>> by my understanding you are bound by the memory of the jvm as all of
>> the content of a key/row currently needs to fit in memory for
>> compaction, which includes columns and supercolumns for given key/row.
>>
>> if you are going to run into those scenarios then some sort of
>> sharding on the keys is required, afaict
>>
>> cheers,
>> jesse
>>
>> --
>> jesse mcconnell
>> jesse.mcconn...@gmail.com
>>
>>
>>
>> On Tue, Feb 2, 2010 at 16:30, Nathan McCall 
>> wrote:
>> > Erik,
>> > Sure, you could and depending on the workload, that might be quite
>> > efficient for small pieces of data. However, this also sounds like
>> > something that might be better addressed with the addition of a
>> > SuperColumn on "Sorts" and getting rid of "Data" altogether:
>> >
>> > Sorts : {
>> >   sort_row_1 : {
>> >        sortKey1 : { col1:val1, col2:val2 },
>> >        sortKey2 : { col1:val3, col2:val4 }
>> >   }
>> > }
>> >
>> > You can have an infinite number of SuperColumns for a key, but make
>> > sure you understand get_slice vs. get_range_slice before you commit to
>> > a design. Hopefully I understood your example correctly, if not, do
>> > you have anything more concrete?
>> >
>> > Cheers,
>> > -Nate
>> >
>> >
>> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad 
>> > wrote:
>> >> Thanks Nate for the example.
>> >>
>> >> I was thinking more a long the lines of something like:
>> >>
>> >> If you have a family
>> >>
>> >> Data : {
>> >>   row1 : {
>> >>     col1:val1,
>> >>   row2 : {
>> >>     col1:val2,
>> >>     ...
>> >>   }
>> >> }
>> >>
>> >>
>> >> Using
>> >> Sorts : {
>> >>   sort_row : {
>> >>     sortKey1_datarow1: [],
>> >>     sortKey2_datarow2: []
>> >>   }
>> >> }
>> >>
>> >> Instead of
>> >> Sorts : {
>> >>   sort_row : {
>> >>     sortKey1: datarow1,
>> >>     sortKey2: datarow2
>> >>   }
>> >> }
>> >>
>> >> If that makes any sense?
>> >>
>> >> --
>> >> Regards Erik
>> >>
>> >
>
>
>
> --
> Regards Erik
>


Re: order-preserving partitioner per CF?

2010-02-02 Thread Jonathan Ellis
just remember that you can't mix nodes w/ different partitioner types
in the same cluster.

On Tue, Feb 2, 2010 at 5:04 PM, Wojciech Kaczmarek
 wrote:
> Yeah excellent.
>
> I checked that it's doable to convert the data to another Partitioner
> using json backup tools - cool. I will probably write own partitioner
> so it's good I won't loose my test data (though I assume I need to
> pack all my data back to one node, export to json, delete sstables,
> change partitioner, import sstables, then rerun node and subsequently
> distribute to others).
>
> On Tue, Feb 2, 2010 at 22:52, Jonathan Ellis  wrote:
>> yes
>>
>> On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek
>>  wrote:
>>> On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis  wrote:

> My biggest question so far is about order-preserving partitioner. I'd
> like to have such partitioner for a specific column family, having
> random partitioner for others. Is it possible wrt to the current
> architecture?

 No.
>>>
>>> Ok. Upon reading more details on a wiki I see it doesn't fit now.
>>>
>>> Now I'm thinking about scenarios of distributing the keys using OPP
>>> without knowing the number of nodes a priori.
>>>
>>> Does this explanation:
>>> http://wiki.apache.org/cassandra/Operations#Range_changes
>>>
>>> applies to any partitioner?
>>>
>>
>


Re: order-preserving partitioner per CF?

2010-02-02 Thread Wojciech Kaczmarek
Yeah excellent.

I checked that it's doable to convert the data to another Partitioner
using json backup tools - cool. I will probably write own partitioner
so it's good I won't loose my test data (though I assume I need to
pack all my data back to one node, export to json, delete sstables,
change partitioner, import sstables, then rerun node and subsequently
distribute to others).

On Tue, Feb 2, 2010 at 22:52, Jonathan Ellis  wrote:
> yes
>
> On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek
>  wrote:
>> On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis  wrote:
>>>
 My biggest question so far is about order-preserving partitioner. I'd
 like to have such partitioner for a specific column family, having
 random partitioner for others. Is it possible wrt to the current
 architecture?
>>>
>>> No.
>>
>> Ok. Upon reading more details on a wiki I see it doesn't fit now.
>>
>> Now I'm thinking about scenarios of distributing the keys using OPP
>> without knowing the number of nodes a priori.
>>
>> Does this explanation:
>> http://wiki.apache.org/cassandra/Operations#Range_changes
>>
>> applies to any partitioner?
>>
>


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
@Nathan
So what I'm planning to do is to store multiple sort orders for the same
data, where they all use the
same data table just fetches it in different orders, so to say. I want to be
able to rad the different sort
orders from the front and from the back to get both regular and reverse sort
order.

With your approach using super columns you would need to replicate all data,
right?

And if I understand
http://issues.apache.org/jira/browse/CASSANDRA-598correctly you would
need to
read the whole thing before you can limit the results handed back to you.

In regards to the two calls get_slice and get_range_slice, the way I
understand it is that you hand
the second one an optional start and stop key plus a limit, to get a range
of keys/rows. I was planning
to use this call together with the OPP, but are thinking about not using it
since there is no way to do
an inverse scan, right?

Thanks a lot
Erik


On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell
wrote:

> infinite is a bit of a bold claim
>
> by my understanding you are bound by the memory of the jvm as all of
> the content of a key/row currently needs to fit in memory for
> compaction, which includes columns and supercolumns for given key/row.
>
> if you are going to run into those scenarios then some sort of
> sharding on the keys is required, afaict
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconn...@gmail.com
>
>
>
> On Tue, Feb 2, 2010 at 16:30, Nathan McCall 
> wrote:
> > Erik,
> > Sure, you could and depending on the workload, that might be quite
> > efficient for small pieces of data. However, this also sounds like
> > something that might be better addressed with the addition of a
> > SuperColumn on "Sorts" and getting rid of "Data" altogether:
> >
> > Sorts : {
> >   sort_row_1 : {
> >sortKey1 : { col1:val1, col2:val2 },
> >sortKey2 : { col1:val3, col2:val4 }
> >   }
> > }
> >
> > You can have an infinite number of SuperColumns for a key, but make
> > sure you understand get_slice vs. get_range_slice before you commit to
> > a design. Hopefully I understood your example correctly, if not, do
> > you have anything more concrete?
> >
> > Cheers,
> > -Nate
> >
> >
> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad 
> wrote:
> >> Thanks Nate for the example.
> >>
> >> I was thinking more a long the lines of something like:
> >>
> >> If you have a family
> >>
> >> Data : {
> >>   row1 : {
> >> col1:val1,
> >>   row2 : {
> >> col1:val2,
> >> ...
> >>   }
> >> }
> >>
> >>
> >> Using
> >> Sorts : {
> >>   sort_row : {
> >> sortKey1_datarow1: [],
> >> sortKey2_datarow2: []
> >>   }
> >> }
> >>
> >> Instead of
> >> Sorts : {
> >>   sort_row : {
> >> sortKey1: datarow1,
> >> sortKey2: datarow2
> >>   }
> >> }
> >>
> >> If that makes any sense?
> >>
> >> --
> >> Regards Erik
> >>
> >
>



-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Jesse McConnell
infinite is a bit of a bold claim

by my understanding you are bound by the memory of the jvm as all of
the content of a key/row currently needs to fit in memory for
compaction, which includes columns and supercolumns for given key/row.

if you are going to run into those scenarios then some sort of
sharding on the keys is required, afaict

cheers,
jesse

--
jesse mcconnell
jesse.mcconn...@gmail.com



On Tue, Feb 2, 2010 at 16:30, Nathan McCall  wrote:
> Erik,
> Sure, you could and depending on the workload, that might be quite
> efficient for small pieces of data. However, this also sounds like
> something that might be better addressed with the addition of a
> SuperColumn on "Sorts" and getting rid of "Data" altogether:
>
> Sorts : {
>   sort_row_1 : {
>        sortKey1 : { col1:val1, col2:val2 },
>        sortKey2 : { col1:val3, col2:val4 }
>   }
> }
>
> You can have an infinite number of SuperColumns for a key, but make
> sure you understand get_slice vs. get_range_slice before you commit to
> a design. Hopefully I understood your example correctly, if not, do
> you have anything more concrete?
>
> Cheers,
> -Nate
>
>
> On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad  wrote:
>> Thanks Nate for the example.
>>
>> I was thinking more a long the lines of something like:
>>
>> If you have a family
>>
>> Data : {
>>   row1 : {
>>     col1:val1,
>>   row2 : {
>>     col1:val2,
>>     ...
>>   }
>> }
>>
>>
>> Using
>> Sorts : {
>>   sort_row : {
>>     sortKey1_datarow1: [],
>>     sortKey2_datarow2: []
>>   }
>> }
>>
>> Instead of
>> Sorts : {
>>   sort_row : {
>>     sortKey1: datarow1,
>>     sortKey2: datarow2
>>   }
>> }
>>
>> If that makes any sense?
>>
>> --
>> Regards Erik
>>
>


Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
Erik,
Sure, you could and depending on the workload, that might be quite
efficient for small pieces of data. However, this also sounds like
something that might be better addressed with the addition of a
SuperColumn on "Sorts" and getting rid of "Data" altogether:

Sorts : {
   sort_row_1 : {
sortKey1 : { col1:val1, col2:val2 },
sortKey2 : { col1:val3, col2:val4 }
   }
}

You can have an infinite number of SuperColumns for a key, but make
sure you understand get_slice vs. get_range_slice before you commit to
a design. Hopefully I understood your example correctly, if not, do
you have anything more concrete?

Cheers,
-Nate


On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad  wrote:
> Thanks Nate for the example.
>
> I was thinking more a long the lines of something like:
>
> If you have a family
>
> Data : {
>   row1 : {
>     col1:val1,
>   row2 : {
>     col1:val2,
>     ...
>   }
> }
>
>
> Using
> Sorts : {
>   sort_row : {
>     sortKey1_datarow1: [],
>     sortKey2_datarow2: []
>   }
> }
>
> Instead of
> Sorts : {
>   sort_row : {
>     sortKey1: datarow1,
>     sortKey2: datarow2
>   }
> }
>
> If that makes any sense?
>
> --
> Regards Erik
>


Re: order-preserving partitioner per CF?

2010-02-02 Thread Jonathan Ellis
yes

On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek
 wrote:
> On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis  wrote:
>>
>>> My biggest question so far is about order-preserving partitioner. I'd
>>> like to have such partitioner for a specific column family, having
>>> random partitioner for others. Is it possible wrt to the current
>>> architecture?
>>
>> No.
>
> Ok. Upon reading more details on a wiki I see it doesn't fit now.
>
> Now I'm thinking about scenarios of distributing the keys using OPP
> without knowing the number of nodes a priori.
>
> Does this explanation:
> http://wiki.apache.org/cassandra/Operations#Range_changes
>
> applies to any partitioner?
>


Re: order-preserving partitioner per CF?

2010-02-02 Thread Wojciech Kaczmarek
On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis  wrote:
>
>> My biggest question so far is about order-preserving partitioner. I'd
>> like to have such partitioner for a specific column family, having
>> random partitioner for others. Is it possible wrt to the current
>> architecture?
>
> No.

Ok. Upon reading more details on a wiki I see it doesn't fit now.

Now I'm thinking about scenarios of distributing the keys using OPP
without knowing the number of nodes a priori.

Does this explanation:
http://wiki.apache.org/cassandra/Operations#Range_changes

applies to any partitioner?


Re: "easy" interface to Cassandra

2010-02-02 Thread Ted Zlatanov
On Tue, 19 Jan 2010 08:09:13 -0600 Ted Zlatanov  wrote: 

TZ> My proposal is as follows:

TZ> - provide an IPluggableAPI interface; classes that implement it are
TZ>   essentially standalone Cassandra servers.  Maybe this can just
TZ>   parallel Thread and implement Runnable.

TZ> - enable the users to specify which IPluggableAPI they want and provide
TZ>   instantiation options (port, connection limit, etc.)

TZ> - write a simple HTTPPluggableAPI, which provides a web server and
TZ>   accepts POST requests.  The exact path and option spec can be worked
TZ>   out later.  The input and output formats can be specified with a query
TZ>   parameter; at least JSON and XML should be supported.

First very rough proposal is at
https://issues.apache.org/jira/browse/CASSANDRA-754

Ted



Re: order-preserving partitioner per CF?

2010-02-02 Thread Jonathan Ellis
On Tue, Feb 2, 2010 at 2:53 PM, Wojciech Kaczmarek
 wrote:
> Hi,
>
> I'm evaluating Cassandra since few days and I'd say it has really high
> coolness factor! :)
>
> My biggest question so far is about order-preserving partitioner. I'd
> like to have such partitioner for a specific column family, having
> random partitioner for others. Is it possible wrt to the current
> architecture?

No.

> If not, is it planned?

As attractive as it is on the wish list, I don't see how you could
sanely do it with the current architecture.

-Jonathan


order-preserving partitioner per CF?

2010-02-02 Thread Wojciech Kaczmarek
Hi,

I'm evaluating Cassandra since few days and I'd say it has really high
coolness factor! :)

My biggest question so far is about order-preserving partitioner. I'd
like to have such partitioner for a specific column family, having
random partitioner for others. Is it possible wrt to the current
architecture? If not, is it planned?

What I see now is that partitioner is defined in the scope of Storage
tag in storage-conf.xml, not even inside a keyspace definition. It
makes me assume that partitioner setting is per the whole cassandra
cluster.

cheers,

Wojtek


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Thanks Nate for the example.

I was thinking more a long the lines of something like:

If you have a family

Data : {
  row1 : {
col1:val1,
  row2 : {
col1:val2,
...
  }
}


Using
Sorts : {
  sort_row : {
sortKey1_datarow1: [],
sortKey2_datarow2: []
  }
}

Instead of
Sorts : {
  sort_row : {
sortKey1: datarow1,
sortKey2: datarow2
  }
}

If that makes any sense?

-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
If I understand you correctly, I think I have a decent example. I have
a ColumnFamily which models user preferences for a "site" in our
system:

UserPreferences : {
  123_EDD43E57589F12032AF73E23A6AF3F47 : {
favorite_color : red,
...
  }
}

I structured it this way because we have a lot of "sites" to which a
user could create preferences for, so the site_id is prepended to the
value of a session_id therefore you always need two pieces of
information to enforce that a given record belongs to a specific
"site". The site_id is always an integer and the session_id is always
a 32 char string so sticking an underscore between them makes
validatable parsing and construction trivial. The bloom filtering
behind the key lookups also make checks for existence extremely fast.

Note: I feel compelled to mention this is not a typical use case that
I think would generally warrant anything outside of an RDBMS. However,
In our system writes to this preference "table" can burst up to
several thousand a second. Thus the need for the predictable write
performance of Cassandra.

Cheers,
Nate



On Tue, Feb 2, 2010 at 9:50 AM, Erik Holstad  wrote:
> Sorry that there are a lot of questions from me this week,  just trying to
> better understand
> the best way to use Cassandra :)
>
> Let us say that you know the length of your key, everything is standardized,
> are there people
> out there that just tag the value onto the key so that you don't have to pay
> the extra overhead
> of the second byte[]?
>
> --
> Regards Erik
>


Re: get_slice() slow if more number of columns present in a SCF.

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 9:27 AM, envio user  wrote:

> All,
>
> Here are some tests[batch_insert() and get_slice()] I performed on
> cassandra.
>



> I am ok with TEST1A and TEST1B. I want to populate the SCF with > 500
> columns and read 25 columns per key.
>
> 


> This test is more worrying for us. We can't even read 1000 reads per
> second. Is there any limitation on cassandra, which will not work with
> more number of columns ?.  Or mm I doing something wrong here?. Please
> let me know.
>

I think you're mostly being limited by
http://issues.apache.org/jira/browse/CASSANDRA-598
Can you try with a simple CF?

-Brandon


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Jonathan Ellis
On Tue, Feb 2, 2010 at 12:51 PM, Jean-Denis Greze  wrote:
> Anyway, partially to address the efficiency concern, I've been playing
> around with the idea of having 745-like functionality on a per-node basis: a
> call to get all of the keys on a particular node as opposed to the entire
> cluster.  It just seems like with a very large cluster with billions, tens
> of billions, or hundreds of billions of keys 745 would just get overwhelmed.

That's why 745 is really there for hadoop support
(https://issues.apache.org/jira/browse/CASSANDRA-342), not something
intended to be used manually.

-Jonathan


Re: get_slice() slow if more number of columns present in a SCF.

2010-02-02 Thread Nathan McCall
Thank you for the benchmarks. What version of Cassandra are you using?
I had about 80% performance improvement on single node reads after
using a trunk build with the results from
https://issues.apache.org/jira/browse/CASSANDRA-688 (result caching)
and playing around with the configuration. I am not yet running this
in production though, so I cannot provide any real numbers.

That said, I have no intention of deploying a single node. I keep
seing performance concerns from folks on small or single node
clusters. My impression so far is that Cassandra may not be the right
solution for these types of deployments.

My main interest in Cassandra is the linear scalability of reads and
writes. From my own tests and some of the discussion on these lists,
it seems Cassandra can thrash around a lot when the number of nodes <=
the replication factor * 2, particularly if a node goes down. I
understand this is a design trade-off of sorts and I am fine with it.
Any sort of distributed, fault tolerant system is well served by using
lots of commodity hardware.

What I found to have been most valuable for my evaluation was to get a
good test together with some real data from our system and then add
nodes, remove nodes, break nodes, etc. and watch what happens. Once I
finish with this, it looks like I will have some solid numbers to do
some capacity planning for figuring out exactly how much hardware to
purchase and when I will need to add more.

Apologies to the original poster if that got a little long winded, but
hopefully it will be useful information for folks.

Cheers,
-Nate


On Tue, Feb 2, 2010 at 7:27 AM, envio user  wrote:
> All,
>
> Here are some tests[batch_insert() and get_slice()] I performed on cassandra.
>
> H/W: Single node, Quad Core(8 cores), 8GB RAM:
> Two separate physical disks, one for the commit log and another for the data.
>
> storage-conf.xml
> 
> 0.4
> 256
> 128
> 0.2
> 1440
> 16
>
>
> Data Model:
>
>  CompareSubcolumnsWith="UTF8Type" Name="Super1" />
>
> TEST1A
> ==
> /home/sun>python stress.py -n 10 -t 100 -y super -u 1 -c 25 -r -o
> insert -i 10
> WARNING: multiprocessing not present, threading will be used.
>        Benchmark may not be accurate!
> total,interval_op_rate,avg_latency,elapsed_time
> 19039,1903,0.0532085509215,10
> 52052,3301,0.0302550313445,20
> 82274,3022,0.0330235137811,30
> 10,1772,0.0337765234716,40
>
> TEST1B
> =
> /home/sun>python stress.py -n 10 -t 100 -y super -u 1 -c 25 -r -o read -i 
> 10
> WARNING: multiprocessing not present, threading will be used.
>        Benchmark may not be accurate!
> total,interval_op_rate,avg_latency,elapsed_time
> 16472,1647,0.0615632034523,10
> 39375,2290,0.04384300123,20
> 65259,2588,0.0385473697268,30
> 91613,2635,0.0379411213277,40
> 10,838,0.0331208069702,50
> /home/sun>
>
>
>  I deleted all the data(all: commitlog,data..) and restarted cassandra.***
> I am ok with TEST1A and TEST1B. I want to populate the SCF with > 500
> columns and read 25 columns per key.
>
> TEST2A
> ==
> /home/sun>python stress.py -n 10 -t 100 -y super -u 1 -c 600 -r -o
> insert -i 10
> WARNING: multiprocessing not present, threading will be used.
>        Benchmark may not be accurate!
> total,interval_op_rate,avg_latency,elapsed_time
> .
> .
> 84216,144,0.689481827031,570
> 85768,155,0.625061393859,580
> 87307,153,0.648041650953,590
> 88785,147,0.671928719674,600
> 90488,170,0.611753724284,610
> 91983,149,0.677673689896,620
> 93490,150,0.63891824366,630
> 95017,152,0.65472143182,640
> 96612,159,0.64355712789,650
> 98098,148,0.673311280851,660
> 99622,152,0.486848112166,670
> 10,37,0.174115514629,680
>
> I understand nobody will write 600 columns at a time. I just need to
> populate the data, hence I did this test.
>
> [r...@fc10mc1 ~]# ls -l /var/lib/cassandra/commitlog/
> total 373880
> -rw-r--r-- 1 root root 268462742 2010-02-03 02:00 CommitLog-1265141714717.log
> -rw-r--r-- 1 root root 114003919 2010-02-03 02:00 CommitLog-1265142593543.log
>
> [r...@fc10mc1 ~]# ls -l /cassandra/lib/cassandra/data/Keyspace1/
> total 3024232
> -rw-r--r-- 1 root root 1508524822 2010-02-03 02:00 Super1-192-Data.db
> -rw-r--r-- 1 root root      92725 2010-02-03 02:00 Super1-192-Filter.db
> -rw-r--r-- 1 root root    2639957 2010-02-03 02:00 Super1-192-Index.db
> -rw-r--r-- 1 root root  100838971 2010-02-03 02:02 Super1-279-Data.db
> -rw-r--r-- 1 root root       8725 2010-02-03 02:02 Super1-279-Filter.db
> -rw-r--r-- 1 root root     176481 2010-02-03 02:02 Super1-279-Index.db
> -rw-r--r-- 1 root root 1478775337 2010-02-03 02:03 Super1-280-Data.db
> -rw-r--r-- 1 root root      90805 2010-02-03 02:03 Super1-280-Filter.db
> -rw-r--r-- 1 root root    2588072 2010-02-03 02:03 Super1-280-Index.db
> [r...@fc10mc1 ~]#
>
> [r...@fc10mc1 ~]# du -hs /cassandra/lib/cassandra/data/Keyspace1/
> 2.9G    /cassandra/lib/cassandra/data/Keyspace1/
>
>
> TEST2B
> ==
>
> /home/sun>python stress.py -n 10 -t 100 -y su

Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Jean-Denis Greze
Ok, so 0.6's https://issues.apache.org/jira/browse/CASSANDRA-745 permits
"someone using RandomPartitioner to pass start="" and finish="" to get all
of the rows in their cluster, although in an extremely inefficient way."

We are in a situation like Pierre's, where we need to know what's currently
in the DB so to speak -- except that we have a hundreds of millions of rows
(and increasing) and that maintaining an index of the keys in another CF, as
Brandon suggests, is becoming difficult (we also don't like the double write
on initial key inserts, in terms of transactionality especially).

Also, every once in a while, we need to enhance our data as part of some
functionality upgrade or refactoring.  So far, what we do is enhance on
reads (i.e., whenever we read a particular record, see if it's not up to the
latest version, and if so enhance), but there are many problems with this
approach. We've been considering doing background process enhancing by
running through all of the keys, which is why 745 is pretty exciting.  We'd
rather go through the inefficient operation once in a while as opposed to
doing a check on every read.

Anyway, partially to address the efficiency concern, I've been playing
around with the idea of having 745-like functionality on a per-node basis: a
call to get all of the keys on a particular node as opposed to the entire
cluster.  It just seems like with a very large cluster with billions, tens
of billions, or hundreds of billions of keys 745 would just get overwhelmed.
 Just a thought.







On Tue, Feb 2, 2010 at 7:31 AM, Jonathan Ellis  wrote:
>
> More or less (but see
> https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6).
>
> Think of it this way: when you have a few billion keys, how useful is
> it to list them?
>
> -Jonathan
>
> 2010/2/2 Sébastien Pierre :
> > Hi all,
> > I would like to know how to retrieve the list of available keys
available
> > for a specific column. There is the get_key_range method, but it is only
> > available when using the OrderPreservingPartitioner -- I use a
> > RandomPartitioner.
> > Does this mean that when using a RandomPartitioner, you cannot see which
> > keys are available in the database ?
> >  -- Sébastien



--
jeande...@6coders.com
(917) 951-0636

This email and any files transmitted with it are confidential and intended
solely for the use of the individual to whom they are addressed. If you have
received this email in error please notify the system manager. This message
contains confidential information and is intended only for the individual
named. If you are not the named addressee you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately by
e-mail if you have received this e-mail by mistake and delete this e-mail
from your system. If you are not the intended recipient you are notified
that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.


Re: Key/row names?

2010-02-02 Thread Erik Holstad
Thank you!

On Tue, Feb 2, 2010 at 9:41 AM, Jonathan Ellis  wrote:

> On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad 
> wrote:
> > Is there a way to use a byte[] as the key instead of a string?
>
> no.
>
> > If not what is the main reason for using strings for the key but
> > the columns and the values can be byte[]?
>
> historical baggage.  we might switch to byte[] keys in 0.7.
>
> -Jonathan
>



-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:57 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad wrote:
>
>>
>> Wow that sounds really good. So you are saying if I set it to reverse sort
>> order and count 10 for the first round I get the last 10,
>> for the next call I just set the last column from the first call to start
>> and I will get the columns -10- -20, so to speak?
>
>
> Actually, since they are reversed and you're trying to move backwards,
> you'll need to pass the last column from the first query (since they will be
> sorted in reverse order) as the start to the next one with reverse still set
> to true.
>
> -Brandon
>
>
Thanks a lot Brandon for clearing that out for me, I think that was what I
was trying to say. But that is really good,
cause now I don't have to store the data twice in different sort orders.



-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad  wrote:

>
> Wow that sounds really good. So you are saying if I set it to reverse sort
> order and count 10 for the first round I get the last 10,
> for the next call I just set the last column from the first call to start
> and I will get the columns -10- -20, so to speak?


Actually, since they are reversed and you're trying to move backwards,
you'll need to pass the last column from the first query (since they will be
sorted in reverse order) as the start to the next one with reverse still set
to true.

-Brandon


Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Sorry that there are a lot of questions from me this week,  just trying to
better understand
the best way to use Cassandra :)

Let us say that you know the length of your key, everything is standardized,
are there people
out there that just tag the value onto the key so that you don't have to pay
the extra overhead
of the second byte[]?

-- 
Regards Erik


Re: Key/row names?

2010-02-02 Thread Jonathan Ellis
On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad  wrote:
> Is there a way to use a byte[] as the key instead of a string?

no.

> If not what is the main reason for using strings for the key but
> the columns and the values can be byte[]?

historical baggage.  we might switch to byte[] keys in 0.7.

-Jonathan


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:35 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad wrote:
>
>> Thanks guys!
>> So I want to use sliceRange but thinking about using the count parameter.
>> For example give me
>> the first x columns, next call I would like to call it with a start value
>> and a count.
>>
>> If I was to use the reverse param in sliceRange I would have to fetch all
>> the columns first, right?
>
>
> If you pass reverse as true, then instead of getting the first x columns,
> you'll get the last x columns.  If you want to head backwards toward the
> beginning, you can pass the first column as the end value.
>
> -Brandon
>
Wow that sounds really good. So you are saying if I set it to reverse sort
order and count 10 for the first round I get the last 10,
for the next call I just set the last column from the first call to start
and I will get the columns -10- -20, so to speak?


-- 
Regards Erik


Key/row names?

2010-02-02 Thread Erik Holstad
Is there a way to use a byte[] as the key instead of a string?
If not what is the main reason for using strings for the key but
the columns and the values can be byte[]? Is it just to be able
to use it as the key in a Map etc or are there other reasons?

-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad  wrote:

> Thanks guys!
> So I want to use sliceRange but thinking about using the count parameter.
> For example give me
> the first x columns, next call I would like to call it with a start value
> and a count.
>
> If I was to use the reverse param in sliceRange I would have to fetch all
> the columns first, right?


If you pass reverse as true, then instead of getting the first x columns,
you'll get the last x columns.  If you want to head backwards toward the
beginning, you can pass the first column as the end value.

-Brandon


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
Thanks guys!
So I want to use sliceRange but thinking about using the count parameter.
For example give me
the first x columns, next call I would like to call it with a start value
and a count.

If I was to use the reverse param in sliceRange I would have to fetch all
the columns first, right?


On Tue, Feb 2, 2010 at 9:23 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad wrote:
>
>> Hey!
>> I'm looking for a comparator that sort columns in reverse order on for
>> example bytes?
>> I saw that you can write your own comparator class, but just thought that
>> someone must have done that already.
>
>
> When you get_slice, just set reverse to true in the SliceRange and it will
> reverse the order.
>
> -Brandon
>



-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad  wrote:

> Hey!
> I'm looking for a comparator that sort columns in reverse order on for
> example bytes?
> I saw that you can write your own comparator class, but just thought that
> someone must have done that already.


When you get_slice, just set reverse to true in the SliceRange and it will
reverse the order.

-Brandon


Re: Reverse sort order comparator?

2010-02-02 Thread Jonathan Ellis
you can scan in reversed (reversed=True in slicerange) w/o needing a
custom comparator.

On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad  wrote:
> Hey!
> I'm looking for a comparator that sort columns in reverse order on for
> example bytes?
> I saw that you can write your own comparator class, but just thought that
> someone must have done that already.
>
> --
> Regards Erik
>


Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
Hey!
I'm looking for a comparator that sort columns in reverse order on for
example bytes?
I saw that you can write your own comparator class, but just thought that
someone must have done that already.

-- 
Regards Erik


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Sébastien Pierre
Yes, that's a good idea ! I've considered doing that at some point, but I'm
still learning the basics ;)

 -- Sébastien

2010/2/2 Brandon Williams 

> 2010/2/2 Sébastien Pierre 
>
>> Hi Jonathan,
>>
>> In my case, I'll have much more columns (thousands to millions) than keys
>> in logs (campaign x days), so it's not an issue to retrieve all of them.
>>
>
> If that's the case, your dataset is small enough that you could maintain an
> index of the keys in another CF.   If it needs to scale further, you can
> segment the index keys by year, month, etc.
>
> -Brandon
>


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Brandon Williams
2010/2/2 Sébastien Pierre 

> Hi Jonathan,
>
> In my case, I'll have much more columns (thousands to millions) than keys
> in logs (campaign x days), so it's not an issue to retrieve all of them.
>

If that's the case, your dataset is small enough that you could maintain an
index of the keys in another CF.   If it needs to scale further, you can
segment the index keys by year, month, etc.

-Brandon


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Sébastien Pierre
Hi Jonathan,

In my case, I'll have much more columns (thousands to millions) than keys in
logs (campaign x days), so it's not an issue to retrieve all of them.

Also, if you assume that you can't retrieve values from Cassandra, just
because you're using the wrong key (say your using "user/10" instead of
"user:10") without the ability to list the keys, you'd have no way to find
out the error.

I'm glad to see this implemented :)

 -- Sébastien

2010/2/2 Jonathan Ellis 

> More or less (but see
> https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6).
>
> Think of it this way: when you have a few billion keys, how useful is
> it to list them?
>
> -Jonathan
>
> 2010/2/2 Sébastien Pierre :
> > Hi all,
> > I would like to know how to retrieve the list of available keys available
> > for a specific column. There is the get_key_range method, but it is only
> > available when using the OrderPreservingPartitioner -- I use a
> > RandomPartitioner.
> > Does this mean that when using a RandomPartitioner, you cannot see which
> > keys are available in the database ?
> >  -- Sébastien
>


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Sébastien Pierre
Hi all,

It's basically for "knowing what's inside the db", as I've been toying with
Cassandra for some time, I have keys that are no longer useful and should be
removed.

I'm also storing HTTP logs in cassandra, where keys follow this convention
"campaign::". So for instance, if I'd like to know
what logs are available I just have to do:

   client.get_keys("Keyspace1", "Logs", "", "", 100, ConsistencyLevel.ONE)

However, I have to use an OrderPreservingPartitioner to do so, which is
(from my understanding) bad for load in this case.

 -- Sébastien


2010/2/2 Erik Holstad 

> Hi Sebastien!
> I'm totally new to Cassandra, but as far as I know there is no way of
> getting just the keys that are in the
> database, they are not stored separately but only with the data itself.
>
> Why do you want a list of keys, what are you going to use them for? Maybe
> there is another way of solving
> your problem.
>
> What you are describing, getting all the keys/rows for a given column
> sounds like you have to fetch all the
> data that you have and then filter every key on your column, I don't think
> that get_key_range will do that for
> you even, says that it takes column_family, but like I said I'm totally new
>
> Erik
>
> 2010/2/2 Sébastien Pierre 
>
>> Hi all,
>>
>> I would like to know how to retrieve the list of available keys available
>> for a specific column. There is the get_key_range method, but it is only
>> available when using the OrderPreservingPartitioner -- I use a
>> RandomPartitioner.
>>
>> Does this mean that when using a RandomPartitioner, you cannot see which
>> keys are available in the database ?
>>
>>  -- Sébastien
>>
>
>
>
> --
> Regards Erik
>


Re: Did CASSANDRA-647 get fixed in 0.5?

2010-02-02 Thread Omer van der Horst Jansen
Here it is: https://issues.apache.org/jira/browse/CASSANDRA-752




From: Jonathan Ellis 
To: cassandra-user@incubator.apache.org
Sent: Mon, February 1, 2010 5:22:13 PM
Subject: Re: Did CASSANDRA-647 get fixed in 0.5?

Can you create a ticket for this?

Thanks!

On Mon, Feb 1, 2010 at 4:11 PM, Omer van der Horst Jansen
 wrote:
> I checked out the 0.5 branch and ran ant release (on my linux box).
> Installed the new tar.gz and ran the test on my Windows laptop as before but
> got the same result -- the key isn't deleted from the perspective of
> get_range_slice.
>
> Omer
>
> 
> From: Jonathan Ellis 
> To: cassandra-user@incubator.apache.org
> Sent: Mon, February 1, 2010 4:52:17 PM
> Subject: Re: Did CASSANDRA-647 get fixed in 0.5?
>
> 647 was committed for 0.5, yes, but CASSANDRA-703 was not.  Can you
> try the 0.5 branch and see if it is fixed there?
>
> On Mon, Feb 1, 2010 at 3:26 PM, Omer van der Horst Jansen
>  wrote:
>> I'm running
>> into an issue with Cassandra 0.5 (the current release version) that
>> sounds exactly like the description of issue CASSANDRA-647.
>>
>> I'm
>> using the Thrift Java API to store a couple of columns in a single row. A
>> few seconds after that my application deletes the entire row. A plain
>> Cassandra.Client.get() will then throw a NotFoundException for that
>> particular key, as expected. However, the key will still show up when
>> executing a
>> Cassandra.Client.get_range_slice query.
>>
>> Here is some quick and
>> dirty Java code that demonstrates the problem:
>>
>> import
>> java.util.List;
>>
>> import org.apache.cassandra.service.*;
>> import
>> org.apache.thrift.protocol.*;
>> import org.apache.thrift.transport.*;
>>
>> public class Cassandra647TestApp
>> {
>>/**
>> * Demonstrates
>> CASSANDRA-647 presence in Cassandra 0.5 release.
>> * Requires an
>> unmodified Cassandra configuration except that an
>> *
>> OrderPreservingPartitioner should be used.
>> */
>>public
>> static void main(String[] args) throws Exception
>>{
>>
>> String keyspace = "Keyspace1";
>>String cf = "Standard1";
>>String key = "testrow1";
>>byte[] columnName =
>> "colname".getBytes();
>>byte[] data = "testdata".getBytes();
>>
>>TTransport transport = new TSocket("localhost", 9160);
>>TProtocol protocol = new TBinaryProtocol(transport);
>>
>> Cassandra.Client client = new Cassandra.Client(protocol);
>>
>> transport.open();
>>ColumnPath path = new ColumnPath(cf, null,
>> columnName);
>>
>>client.insert(keyspace, key, path, data,
>> System.currentTimeMillis(),
>>ConsistencyLevel.ONE);
>>
>>Thread.sleep(1000);
>>
>>ColumnPath rowpath = new
>> ColumnPath(cf, null, null);
>>
>>client.remove(keyspace, key,
>> rowpath, System.currentTimeMillis(),
>>
>> ConsistencyLevel.ONE);
>>Thread.sleep(1000);
>>
>>try
>>{
>>ColumnOrSuperColumn cosc = client.get(keyspace,
>> key, path,
>>ConsistencyLevel.ONE);
>>
>> System.out.println("Whoops! NotFoundException not thrown!");
>>}
>>catch (NotFoundException e)
>>{
>>
>> System.out.println("OK, we got a NotFoundException");
>>}
>>
>>ColumnParent parent = new ColumnParent(cf, null);
>>
>> SlicePredicate predicate = new SlicePredicate();
>>SliceRange
>> range = new SliceRange();
>>range.start = new byte[0];
>>range.finish = new byte[0];
>>predicate.slice_range = range;
>>
>>List sliceList = client.get_range_slice(keyspace, parent,
>>predicate, "", "", 1000,
>> ConsistencyLevel.ONE);
>>
>>for (KeySlice k : sliceList)
>>{
>>System.out.println("Found key " + k.key);
>>if (key.equals(k.key))
>>{
>>
>> System.out.println("but key " + k.key
>>+ "
>> should have been removed");
>>}
>>}
>>}
>> }
>>
>> Am I using the API correctly in the code above?
>>
>> -Omer van der Horst Jansen
>>
>>
>>
>>
>>
>
>



  

Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 7:45 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad wrote:
>>
>> A supercolumn can still only compare subcolumns in a single way.
>>>
>> Yeah, I know that, but you can have a super column per sort order without
>> having to restart the cluster.
>>
>
> You get a CompareWith for the columns, and a CompareSubcolumnsWith for
> subcolumns.  If you need more column types to get different sort orders, you
> need another ColumnFamily.
>
Not sure what column types mean. What I want to do is to have a few things
sorted by asc and desc order, like {a,b}, {b,a} and {1,2}, {2,1}

>
> -Brandon
>
>


-- 
Regards Erik


Re: Best design in Cassandra

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad  wrote:
>
> A supercolumn can still only compare subcolumns in a single way.
>>
> Yeah, I know that, but you can have a super column per sort order without
> having to restart the cluster.
>

You get a CompareWith for the columns, and a CompareSubcolumnsWith for
subcolumns.  If you need more column types to get different sort orders, you
need another ColumnFamily.

-Brandon


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Erik Holstad
Hi Sebastien!
I'm totally new to Cassandra, but as far as I know there is no way of
getting just the keys that are in the
database, they are not stored separately but only with the data itself.

Why do you want a list of keys, what are you going to use them for? Maybe
there is another way of solving
your problem.

What you are describing, getting all the keys/rows for a given column sounds
like you have to fetch all the
data that you have and then filter every key on your column, I don't think
that get_key_range will do that for
you even, says that it takes column_family, but like I said I'm totally new

Erik

2010/2/2 Sébastien Pierre 

> Hi all,
>
> I would like to know how to retrieve the list of available keys available
> for a specific column. There is the get_key_range method, but it is only
> available when using the OrderPreservingPartitioner -- I use a
> RandomPartitioner.
>
> Does this mean that when using a RandomPartitioner, you cannot see which
> keys are available in the database ?
>
>  -- Sébastien
>



-- 
Regards Erik


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Jonathan Ellis
More or less (but see
https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6).

Think of it this way: when you have a few billion keys, how useful is
it to list them?

-Jonathan

2010/2/2 Sébastien Pierre :
> Hi all,
> I would like to know how to retrieve the list of available keys available
> for a specific column. There is the get_key_range method, but it is only
> available when using the OrderPreservingPartitioner -- I use a
> RandomPartitioner.
> Does this mean that when using a RandomPartitioner, you cannot see which
> keys are available in the database ?
>  -- Sébastien


Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Mon, Feb 1, 2010 at 3:31 PM, Brandon Williams  wrote:

> On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad wrote:
>
>> Hey!
>> Have a couple of questions about the best way to use Cassandra.
>> Using the random partitioner + the multi_get calls vs order preservation +
>> range_slice calls?
>>
>
> When you use an OPP, the distribution of your keys becomes your problem.
>  If you don't have an even distribution, this will be reflected in the load
> on the nodes, while the RP gives you even distribution.
>

Yeah, that is why it would be nice to hear if anyone has compared the
performance between the two,
to see if it is worth worrying about your own distribution. I also read that
the random partitioner doesn't
give that great distribution.


>
> What is the benefit of using multiple families vs super column?
>
>
> http://issues.apache.org/jira/browse/CASSANDRA-598 is currrently why I
> prefer simple CFs instead of supercolumns.
>
Yeah, this is nasty.

>
>
>> For example in the case of sorting
>> in different orders. One good thing that I can see here when using super
>> column is that you don't
>> have to restart your cluster every time you want to add something new
>> order.
>>
>
> A supercolumn can still only compare subcolumns in a single way.
>
Yeah, I know that, but you can have a super column per sort order without
having to restart the cluster.

>
> When http://issues.apache.org/jira/browse/CASSANDRA-44 is completed, you
> will be able to add CFs without restarting.
>
Looks interesting, but targeted at 0.7, so it is probably going to be a
little while, or?

>
> -Brandon
>



-- 
Regards Erik


How to retrieve keys from Cassandra ?

2010-02-02 Thread Sébastien Pierre
Hi all,

I would like to know how to retrieve the list of available keys available
for a specific column. There is the get_key_range method, but it is only
available when using the OrderPreservingPartitioner -- I use a
RandomPartitioner.

Does this mean that when using a RandomPartitioner, you cannot see which
keys are available in the database ?

 -- Sébastien


RE: Sample applications

2010-02-02 Thread Carlos Sanchez
Thanks Erik

From: Erik Holstad [mailto:erikhols...@gmail.com]
Sent: Tuesday, February 02, 2010 9:08 AM
To: cassandra-user@incubator.apache.org
Subject: Re: Sample applications

Hi Carlos!

I'm also really new to Cassandra but here are a couple of links that I found 
useful:
http://wiki.apache.org/cassandra/ClientExamples
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

and one of the presentations like:
http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod

Erik

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


Re: Sample applications

2010-02-02 Thread Erik Holstad
Hi Carlos!

I'm also really new to Cassandra but here are a couple of links that I found
useful:
http://wiki.apache.org/cassandra/ClientExamples
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

and one of the presentations like:
http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod

Erik


Re: Cassandra error with large connection

2010-02-02 Thread JKnight JKnight
Thank you very much, Mr Jonathan.

On Mon, Feb 1, 2010 at 11:04 AM, Jonathan Ellis  wrote:

> On Mon, Feb 1, 2010 at 10:03 AM, Jonathan Ellis  wrote:
> >> I see a lot of CLOSE_WAIT TCP connection.
>
> Also, this sounds like you are not properly pooling client connections
> to casssandra.  You should have one connection per user, not one
> connection per operation.
>
> -Jonathan
>



-- 
Best regards,
JKnight