Re: question about deleting from cassandra

2010-03-12 Thread Sylvain Lebresne
I guess you can also vote for this ticket :
https://issues.apache.org/jira/browse/CASSANDRA-699 :)



--
Sylvain


On Fri, Mar 12, 2010 at 8:28 AM, Mark Robson  wrote:
> On 12 March 2010 03:34, Bill Au  wrote:
>>
>> Let take Twitter as an example.  All the tweets are timestamped.  I want
>> to keep only a month's worth of tweets for each user.  The number of tweets
>> that fit within this one month window varies from user to user.  What is the
>> best way to accomplish this?
>
> This is the "expiry" problem that has been discussed on this list before. As
> far as I can see there are no easy ways to do it with 0.5
>
> If you use the ordered partitioner and make the first part of the keys a
> timestamp (or part of it) then you can get the keys and delete them.
>
> However, these deletes will be quite inefficient, currently each row must be
> deleted individually (there was a patch to range delete kicking around, I
> don't know if it's accepted yet)
>
> But even if range delete is implemented, it's still quite inefficient and
> not really what you want, and doesn't work with the RandomPartitioner
>
> If you have some metadata to say who tweeted within a given period (say 10
> days or 30 days) and you store the tweets all in the same key per user per
> period (say with one column per tweet, or use supercolumns), then you can
> just delete one key per user per period.
>
> One of the problems with using a time-based key with ordered partitioner is
> that you're always going to have a data imbalance, so you may want to try
> hashing *part* of the key (The first part) so you can still range scan the
> next part. This may fix load balancing while still enabling you to use range
> scans to do data expiry.
>
> e.g. your key is
>
> Hash of day number + user id + timestamp
>
> Then you can range scan the entire day's tweets to expire them, and range
> scan a given user's tweets for a given day efficiently (and doing this for
> 30 days is just 30 range scans)
>
> Putting a hash in there fixes load balancing with OPP.
>
> Mark
>


Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-12 Thread Bill Au
Thanks.  With 0.6.0-beta2 using Standard2 does show a human-readable column.

However, the behavior is definitely different between 0.5.1 and
0.6.0-beta2.  I am using the binary distribution of 0.5.1:

cassandra> show version
0.5.1
cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42'
Value inserted.
cassandra> get Keyspace1.Standard1['jsmith']
=> (column=last, value=Smith, timestamp=1268408466548)
=> (column=first, value=John, timestamp=1268408464036)
=> (column=age, value=42, timestamp=1268408468895)
Returned 3 results.

With 0.5.1 using Standard1 does show a human-readable column as documented
in the Wiki.

Not sure which one is the correct behavior here.

Bill

On Thu, Mar 11, 2010 at 1:22 PM, Eric Evans  wrote:

> On Wed, 2010-03-10 at 18:09 -0500, Bill Au wrote:
> > I am checking out 0.6.0-beta2 since I need the batch-mutate function.
> > I am just trying to run the example is the cassandra-cli Wiki:
> >
> > http://wiki.apache.org/cassandra/CassandraCli
> >
> > Here is what I am getting:
> >
> > cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John'
> > Value inserted.
> > cassandra> get Keyspace1.Standard1['jsmith']
> > => (column=6669727374, value=John, timestamp=1268261785077)
> > Returned 1 results.
> >
> > The column name being returned by get (6669727374) does not match what
> > is set (first).  This is true for all column names.
> >
> > cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
> > Value inserted.
> > cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42'
> > Value inserted.
> > cassandra> get Keyspace1.Standard1['jsmith']
> > => (column=6c617374, value=Smith, timestamp=1268262480130)
> > => (column=6669727374, value=John, timestamp=1268261785077)
> > => (column=616765, value=42, timestamp=1268262484133)
> > Returned 3 results.
> >
> > Is this a problem in 0.6.0-beta2 or am I doing anything wrong?
>
> No, you're not doing anything wrong. What you're seeing is the hex
> representation of a BytesType, which is the comparator that Standard1 in
> the example config uses. This is the same for 0.5.1 too.
>
> If you haven't made any changes to the default config, try using
> Standard2 as the column family and you'll see a human-readable column
> name as expected (Standard2 uses a UTF8Type comparator).
>
> The wiki page has sample output that is confusing, (it's probably
> cut-and-paste from a time when Standard1 used an ASCII or UTF8
> comparator), we should probably fix that.
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


get_range_slice(s) question

2010-03-12 Thread Omer van der Horst Jansen
I've noticed that both 0.5.1 and 0.6b2 return (ReplicationFactor) 
identical copies of the data stored in my keyspace whenever I make a 
call to get_range_slice or get_range_slices using 
ConsistencyLevel.QUORUM.

So with ReplicationFactor set to 2 for my application's KeySpace I get 
double the number of KeySlices that I expect to get. When using 
ConsistencyLevel.ONE I get only one KeySlice for each row. 

The same routine running against the Standard1 keyspace with a 
ReplicationFactor of 1 returns only a single KeySlice for each row. A 
ReplicationFactor of three gives me three identical KeySlices when using 
ConsistencyLevel.QUORUM.

Is this the intended behavior of get_range_slices? I remember reading in 
one of the Dynamo papers that applications (and not Dynamo) are required 
to sort out any discrepancies in the data, but in this case there aren't 
any discrepancies.

Omer


  


Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-12 Thread Eric Evans
On Fri, 2010-03-12 at 11:21 -0500, Bill Au wrote:
> Thanks.  With 0.6.0-beta2 using Standard2 does show a human-readable
> column.
> 
> However, the behavior is definitely different between 0.5.1 and
> 0.6.0-beta2.  I am using the binary distribution of 0.5.1:
> 
> cassandra> show version
> 0.5.1
> cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John'
> Value inserted.
> cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
> Value inserted.
> cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42'
> Value inserted.
> cassandra> get Keyspace1.Standard1['jsmith']
> => (column=last, value=Smith, timestamp=1268408466548)
> => (column=first, value=John, timestamp=1268408464036)
> => (column=age, value=42, timestamp=1268408468895)
> Returned 3 results.
> 
> With 0.5.1 using Standard1 does show a human-readable column as
> documented
> in the Wiki.

Right you are, my mistake. This changed in
https://issues.apache.org/jira/browse/CASSANDRA-661 (which occurred
between 0.5 and 0.6).

> Not sure which one is the correct behavior here.

The current behavior is correct. I'll update the examples to avoid
future confusion.

-- 
Eric Evans
eev...@rackspace.com



Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-12 Thread Bill Au
Thanks for clearing this up for me.

Bill

On Fri, Mar 12, 2010 at 11:49 AM, Eric Evans  wrote:

> On Fri, 2010-03-12 at 11:21 -0500, Bill Au wrote:
> > Thanks.  With 0.6.0-beta2 using Standard2 does show a human-readable
> > column.
> >
> > However, the behavior is definitely different between 0.5.1 and
> > 0.6.0-beta2.  I am using the binary distribution of 0.5.1:
> >
> > cassandra> show version
> > 0.5.1
> > cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John'
> > Value inserted.
> > cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
> > Value inserted.
> > cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42'
> > Value inserted.
> > cassandra> get Keyspace1.Standard1['jsmith']
> > => (column=last, value=Smith, timestamp=1268408466548)
> > => (column=first, value=John, timestamp=1268408464036)
> > => (column=age, value=42, timestamp=1268408468895)
> > Returned 3 results.
> >
> > With 0.5.1 using Standard1 does show a human-readable column as
> > documented
> > in the Wiki.
>
> Right you are, my mistake. This changed in
> https://issues.apache.org/jira/browse/CASSANDRA-661 (which occurred
> between 0.5 and 0.6).
>
> > Not sure which one is the correct behavior here.
>
> The current behavior is correct. I'll update the examples to avoid
> future confusion.
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: Effective allocation of multiple disks

2010-03-12 Thread Eric Rosenberry
Ryan-

Are you going to use software or hardware based RAID 0?

Does anyone on the list have any data to compare the performance of hardware
RAID 0 vs. software LVM RAID 0?

I would think software RAID 0 would be fine since there is no actual
computation being done...

Thanks!

-Eric

On Thu, Mar 11, 2010 at 1:16 PM, Ryan King  wrote:
>
>
> Even without major compaction, you can get significant imbalances in
> how much data is on each disk which will bottleneck your IO
> throughput. We're running JBOD right now, but going to switch to RAID
> 0 soon.
>
> -ryan
>


How to force GC in Cassandra?

2010-03-12 Thread Weijun Li
Suppose I insert a lot of new items but also delete a lot of new items
daily, it will be ideal if I can force GC to happen during mid night (when
traffic is low). Is there any way to manually force GC to be executed? In
this way I can add a cronjob to trigger gc in mid night. I tried nodetool
and the JMX interface but they don't seem to have that.

-Weijun


Re: Effective allocation of multiple disks

2010-03-12 Thread Ted Zlatanov
On Thu, 11 Mar 2010 12:01:27 -0600 Eric Evans  wrote: 

EE> On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote:
>> On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
>>  wrote:
>> > I would almost recommend just keeping things simple and removing
>> > multiple data directories from the config altogether and just
>> > documenting that you should plan on using OS level mechanisms for
>> > growing diskspace and io.
>> 
>> I think that is a pretty sane suggestion actually. 

EE> Or maybe leave the code as is and just document the situation more
EE> clearly? If you're adding more disks to increase storage capacity
EE> and you don't strictly need the extra IO, then multiple data
EE> directories might be preferable to other forms of aggregation (it's
EE> certainly simpler than say a volume manager).

Could Cassandra use a block device as raw storage?  You avoid the
filesystem overhead and it lets the sysadmin determine the best kind of
device (RAID or not underneath) to allocate.

Ted



Cassandra Demo/Tutorial Applications

2010-03-12 Thread Krishna Sankar
I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
for my OSCON tutorial. Have couple of questions. Would appreciate insights:

A)  Cassandra-873 suggests Luenandra as one demo application
B)  Are there other ideas that will bring out the various aspects of
Cassandra ?
C)  What would be the goal of demo apps ? Tutorial to help folks learn the
ins and outs of Cassandra ? Show case capabilities ? I think Cassandra-873
belongs to the latter; Twissandra most probably belongs to the former.
D)  Hadoop on Cassandra might be a good demo/tutorial
E)  How would one structure the infrastructure for the demo/tutorials ? What
assumptions can we make in creating them ? As AMIs to be run in EC2 ? Also
to be run on 2-3 local machines for folks who can spare some ? Or as
multiple processes - all in one machine ? What is an optimum configuration
for learning and demo ? We need to make it simple (to reflect the domain)
but not simpler.
F)  Am looking for ideas from developers and users - hence the cross
posting. I hope apache mailer is smart enough to dedup - will find it soon
...

Cheers





Re: get_range_slice(s) question

2010-03-12 Thread Jonathan Ellis
That would be a bug, not intended behavior.  Can you open a ticket?

On Fri, Mar 12, 2010 at 11:48 AM, Omer van der Horst Jansen
 wrote:
> I've noticed that both 0.5.1 and 0.6b2 return (ReplicationFactor)
> identical copies of the data stored in my keyspace whenever I make a
> call to get_range_slice or get_range_slices using
> ConsistencyLevel.QUORUM.
>
> So with ReplicationFactor set to 2 for my application's KeySpace I get
> double the number of KeySlices that I expect to get. When using
> ConsistencyLevel.ONE I get only one KeySlice for each row.
>
> The same routine running against the Standard1 keyspace with a
> ReplicationFactor of 1 returns only a single KeySlice for each row. A
> ReplicationFactor of three gives me three identical KeySlices when using
> ConsistencyLevel.QUORUM.
>
> Is this the intended behavior of get_range_slices? I remember reading in
> one of the Dynamo papers that applications (and not Dynamo) are required
> to sort out any discrepancies in the data, but in this case there aren't
> any discrepancies.
>
> Omer
>
>
>
>


Re: How to force GC in Cassandra?

2010-03-12 Thread Jonathan Ellis
I think you mean compaction?

You can use nodeprobe / nodetool for that.

http://wiki.apache.org/cassandra/NodeProbe

On Fri, Mar 12, 2010 at 12:40 PM, Weijun Li  wrote:
> Suppose I insert a lot of new items but also delete a lot of new items
> daily, it will be ideal if I can force GC to happen during mid night (when
> traffic is low). Is there any way to manually force GC to be executed? In
> this way I can add a cronjob to trigger gc in mid night. I tried nodetool
> and the JMX interface but they don't seem to have that.
>
> -Weijun
>


Re: Effective allocation of multiple disks

2010-03-12 Thread Ryan King
We're going to us software raid.

-ryan

On Fri, Mar 12, 2010 at 9:24 AM, Eric Rosenberry  wrote:
> Ryan-
> Are you going to use software or hardware based RAID 0?
>
> Does anyone on the list have any data to compare the performance of hardware
> RAID 0 vs. software LVM RAID 0?
> I would think software RAID 0 would be fine since there is no actual
> computation being done...
> Thanks!
>
> -Eric
>
> On Thu, Mar 11, 2010 at 1:16 PM, Ryan King  wrote:
>>
>> Even without major compaction, you can get significant imbalances in
>> how much data is on each disk which will bottleneck your IO
>> throughput. We're running JBOD right now, but going to switch to RAID
>> 0 soon.
>>
>> -ryan
>
>


Grails Cassandra plugin

2010-03-12 Thread Ned Wolpert
Folks-

  I put together a quick n' dirty grails plugin for Cassandra, wrapped with
Hector. Its available at http://github.com/wolpert/grails-cassandra in its
initial state. I wouldn't call it 'production-ready' yet. :-)

  We're using Cassandra at work and I wanted an easy way to access Cassandra
from a grails application, but couldn't find anything. I have some plans on
how where I want it to go, but I'm open to suggestions. I'll submit the code
to grails plugins once I get a bit further along with it. Its pretty basic
at this point.

-- 
Virtually, Ned Wolpert
"Settle thy studies, Faustus, and begin..."   --Marlowe


Cassandra 0.5.1 get_key_range problem

2010-03-12 Thread Jon Graham
Hello,

When using the get_key_range method with ConsistencyLevel.ONE an entire
block of keys is not returned.
I loop over the get_key_range method, advancing the start key after each
call (requesting 8K keys per call).

When running the program several times, I got the same results with large
key blocks not returned.

Then, I change the program to use ConsistencyLevel.ALL, then all the keys
are returned as expected.

Change the program back to use ConsistencyLevel.ONE and all the keys are now
returned.

Has anyone else seen this issue?

I would have expected ConsistencyLevel.ONE to be able to return all the
keys. My 6 node cluster uses
a replication factor of 3.

Thanks for your help,
Jon


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang  wrote:

> I'm wondering about good strategies for picking keys that I want to be
> lexically sorted in a super column family. For example, my data looks like
> this:
>
> [user1_uuid][connections][some_key_for_user2] = ""
> [user1_uuid][connections][some_key_for_user3] = ""
>
> I was thinking that I wanted some_key_for_user2 to be sorted by a user's
> name. So I was thinking I set the subcolumn compareWith to UTF8Type or
> BytesType and construct a key
>
> [user's lastname + user's firstname + user's uuid]
>
> This would result in sorted subcolumn and user list. That's fine. But I
> wonder what would happen if, say, a user changes their last name. Happens
> rarely but I imagine people getting married and modifying their name. Now
> the sort is no longer correct. There seems to be some bad consequences to
> creating keys based on data that can change.
>
> So what is the general (elegant, easy to maintain) strategy here? Always
> sort in your server-side code and don't bother trying to have the data
> sorted?
>

Having row keys based on something potentially volatile is something I would
avoid since that determines which machine the row belongs to and moving data
between machines isn't a cheap operation.

What you'll probably want to do is make the key something unique (like a
uuid), store the user's name as a column on the row (thus making it easy to
update) and maintain a secondary index to get the named-based sorting you
want.  If you're expecting a few million users, maintaining the index in a
special row will work fine (eg, the row name is "NAMEINDEX" and the columns
are the name+uuid similar to what you described.)  If you have billions of
users, you'll need to get a bit fancier (partition based on letter of the
last name, for example.)

-Brandon


Re: Grails Cassandra plugin

2010-03-12 Thread Jonathan Ellis
Great!

You should also link it from
http://wiki.apache.org/cassandra/ClientExamples (click "Login" at the
top to create an account.)

On Fri, Mar 12, 2010 at 3:57 PM, Ned Wolpert  wrote:
> Folks-
>
>   I put together a quick n' dirty grails plugin for Cassandra, wrapped with
> Hector. Its available at http://github.com/wolpert/grails-cassandra in its
> initial state. I wouldn't call it 'production-ready' yet. :-)
>
>   We're using Cassandra at work and I wanted an easy way to access Cassandra
> from a grails application, but couldn't find anything. I have some plans on
> how where I want it to go, but I'm open to suggestions. I'll submit the code
> to grails plugins once I get a bit further along with it. Its pretty basic
> at this point.
>
> --
> Virtually, Ned Wolpert
> "Settle thy studies, Faustus, and begin..."   --Marlowe
>


Re: Cassandra 0.5.1 get_key_range problem

2010-03-12 Thread Jonathan Ellis
get_key_range is deprecated.  You should use get_range_slice.

On Fri, Mar 12, 2010 at 3:59 PM, Jon Graham  wrote:
> Hello,
>
> When using the get_key_range method with ConsistencyLevel.ONE an entire
> block of keys is not returned.
> I loop over the get_key_range method, advancing the start key after each
> call (requesting 8K keys per call).
>
> When running the program several times, I got the same results with large
> key blocks not returned.
>
> Then, I change the program to use ConsistencyLevel.ALL, then all the keys
> are returned as expected.
>
> Change the program back to use ConsistencyLevel.ONE and all the keys are now
> returned.
>
> Has anyone else seen this issue?
>
> I would have expected ConsistencyLevel.ONE to be able to return all the
> keys. My 6 node cluster uses
> a replication factor of 3.
>
> Thanks for your help,
> Jon
>


Re: Grails Cassandra plugin

2010-03-12 Thread Ned Wolpert
Document updated

On Fri, Mar 12, 2010 at 2:50 PM, Jonathan Ellis  wrote:

> Great!
>
> You should also link it from
> http://wiki.apache.org/cassandra/ClientExamples (click "Login" at the
> top to create an account.)
>
> On Fri, Mar 12, 2010 at 3:57 PM, Ned Wolpert 
> wrote:
> > Folks-
> >
> >   I put together a quick n' dirty grails plugin for Cassandra, wrapped
> with
> > Hector. Its available at http://github.com/wolpert/grails-cassandra in
> its
> > initial state. I wouldn't call it 'production-ready' yet. :-)
> >
> >   We're using Cassandra at work and I wanted an easy way to access
> Cassandra
> > from a grails application, but couldn't find anything. I have some plans
> on
> > how where I want it to go, but I'm open to suggestions. I'll submit the
> code
> > to grails plugins once I get a bit further along with it. Its pretty
> basic
> > at this point.
> >
> > --
> > Virtually, Ned Wolpert
> > "Settle thy studies, Faustus, and begin..."   --Marlowe
> >
>



-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe


Re: Cassandra 0.5.1 get_key_range problem

2010-03-12 Thread Jon Graham
Thanks once again Jonathan,

I don't mind switching to an updated API call.

Was there any known issue like I described with the get_key_range method?

Could the use of certain start/end keys, return counts or consistency levels
contibute to the issue I'm seeing?

Best Regards,
Jon
On Fri, Mar 12, 2010 at 1:53 PM, Jonathan Ellis  wrote:

> get_key_range is deprecated.  You should use get_range_slice.
>
> On Fri, Mar 12, 2010 at 3:59 PM, Jon Graham  wrote:
> > Hello,
> >
> > When using the get_key_range method with ConsistencyLevel.ONE an entire
> > block of keys is not returned.
> > I loop over the get_key_range method, advancing the start key after each
> > call (requesting 8K keys per call).
> >
> > When running the program several times, I got the same results with large
> > key blocks not returned.
> >
> > Then, I change the program to use ConsistencyLevel.ALL, then all the keys
> > are returned as expected.
> >
> > Change the program back to use ConsistencyLevel.ONE and all the keys are
> now
> > returned.
> >
> > Has anyone else seen this issue?
> >
> > I would have expected ConsistencyLevel.ONE to be able to return all the
> > keys. My 6 node cluster uses
> > a replication factor of 3.
> >
> > Thanks for your help,
> > Jon
> >
>


Re: Grails Cassandra plugin

2010-03-12 Thread Ran Tavory
great, I'm happy you found Hector useful :)
btw, in hector 0.5.0-8 I added some interesting performance JMX counters so
may be worth to update yours from 0.5.0-6 to -8 when you have time.

On Fri, Mar 12, 2010 at 11:55 PM, Ned Wolpert wrote:

> Document updated
>
>
> On Fri, Mar 12, 2010 at 2:50 PM, Jonathan Ellis  wrote:
>
>> Great!
>>
>> You should also link it from
>> http://wiki.apache.org/cassandra/ClientExamples (click "Login" at the
>> top to create an account.)
>>
>> On Fri, Mar 12, 2010 at 3:57 PM, Ned Wolpert 
>> wrote:
>> > Folks-
>> >
>> >   I put together a quick n' dirty grails plugin for Cassandra, wrapped
>> with
>> > Hector. Its available at http://github.com/wolpert/grails-cassandra in
>> its
>> > initial state. I wouldn't call it 'production-ready' yet. :-)
>> >
>> >   We're using Cassandra at work and I wanted an easy way to access
>> Cassandra
>> > from a grails application, but couldn't find anything. I have some plans
>> on
>> > how where I want it to go, but I'm open to suggestions. I'll submit the
>> code
>> > to grails plugins once I get a bit further along with it. Its pretty
>> basic
>> > at this point.
>> >
>> > --
>> > Virtually, Ned Wolpert
>> > "Settle thy studies, Faustus, and begin..."   --Marlowe
>> >
>>
>
>
>
> --
> Virtually, Ned Wolpert
>
> "Settle thy studies, Faustus, and begin..."   --Marlowe
>


Re: SuperColumn.getSubColumns() ordering

2010-03-12 Thread Matteo Caprari
Thanks.

On Thu, Mar 11, 2010 at 6:46 PM, Jonathan Ellis  wrote:
> it's ordered by the column name as determined by the subcolumn
> comparator you declared in the definition, yes
>
> On Thu, Mar 11, 2010 at 12:24 PM, Matteo Caprari
>  wrote:
>> Hi.
>>
>> If I iterate over SuperColumn.getSubColumn(), do I get
>> columns sorted by the column name?
>>
>> Thanks.
>> --
>> :Matteo Caprari
>> matteo.capr...@gmail.com
>>
>



-- 
:Matteo Caprari
matteo.capr...@gmail.com


Re: Grails Cassandra plugin

2010-03-12 Thread Ned Wolpert
I added an issue in my github project for the update.

Since I have your ear, in hector, if the cassandra server restarts (one
server in the pool) hector will not try to reconnect to the cassandra server
even if its listening. Is that a known issue?

On Fri, Mar 12, 2010 at 3:35 PM, Ran Tavory  wrote:

> great, I'm happy you found Hector useful :)
> btw, in hector 0.5.0-8 I added some interesting performance JMX counters so
> may be worth to update yours from 0.5.0-6 to -8 when you have time.
>
>
> On Fri, Mar 12, 2010 at 11:55 PM, Ned Wolpert 
> wrote:
>
>> Document updated
>>
>>
>> On Fri, Mar 12, 2010 at 2:50 PM, Jonathan Ellis wrote:
>>
>>> Great!
>>>
>>> You should also link it from
>>> http://wiki.apache.org/cassandra/ClientExamples (click "Login" at the
>>> top to create an account.)
>>>
>>> On Fri, Mar 12, 2010 at 3:57 PM, Ned Wolpert 
>>> wrote:
>>> > Folks-
>>> >
>>> >   I put together a quick n' dirty grails plugin for Cassandra, wrapped
>>> with
>>> > Hector. Its available at http://github.com/wolpert/grails-cassandra in
>>> its
>>> > initial state. I wouldn't call it 'production-ready' yet. :-)
>>> >
>>> >   We're using Cassandra at work and I wanted an easy way to access
>>> Cassandra
>>> > from a grails application, but couldn't find anything. I have some
>>> plans on
>>> > how where I want it to go, but I'm open to suggestions. I'll submit the
>>> code
>>> > to grails plugins once I get a bit further along with it. Its pretty
>>> basic
>>> > at this point.
>>> >
>>> > --
>>> > Virtually, Ned Wolpert
>>> > "Settle thy studies, Faustus, and begin..."   --Marlowe
>>> >
>>>
>>
>>
>>
>> --
>> Virtually, Ned Wolpert
>>
>> "Settle thy studies, Faustus, and begin..."   --Marlowe
>>
>
>


-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
But wouldn't name + UUID be considered volatile? That was the crux of my
questions.

On Fri, Mar 12, 2010 at 1:07 PM, Brandon Williams  wrote:

> On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang  wrote:
>
>> I'm wondering about good strategies for picking keys that I want to be
>> lexically sorted in a super column family. For example, my data looks like
>> this:
>>
>> [user1_uuid][connections][some_key_for_user2] = ""
>> [user1_uuid][connections][some_key_for_user3] = ""
>>
>> I was thinking that I wanted some_key_for_user2 to be sorted by a user's
>> name. So I was thinking I set the subcolumn compareWith to UTF8Type or
>> BytesType and construct a key
>>
>> [user's lastname + user's firstname + user's uuid]
>>
>> This would result in sorted subcolumn and user list. That's fine. But I
>> wonder what would happen if, say, a user changes their last name. Happens
>> rarely but I imagine people getting married and modifying their name. Now
>> the sort is no longer correct. There seems to be some bad consequences to
>> creating keys based on data that can change.
>>
>> So what is the general (elegant, easy to maintain) strategy here? Always
>> sort in your server-side code and don't bother trying to have the data
>> sorted?
>>
>
> Having row keys based on something potentially volatile is something I
> would avoid since that determines which machine the row belongs to and
> moving data between machines isn't a cheap operation.
>
> What you'll probably want to do is make the key something unique (like a
> uuid), store the user's name as a column on the row (thus making it easy to
> update) and maintain a secondary index to get the named-based sorting you
> want.  If you're expecting a few million users, maintaining the index in a
> special row will work fine (eg, the row name is "NAMEINDEX" and the columns
> are the name+uuid similar to what you described.)  If you have billions of
> users, you'll need to get a bit fancier (partition based on letter of the
> last name, for example.)
>
> -Brandon
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Fri, Mar 12, 2010 at 7:07 PM, Peter Chang  wrote:

> But wouldn't name + UUID be considered volatile? That was the crux of my
> questions.


It would, but the distinction here is that it is now a column, not a row
key.

-Brandon


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
My original post is probably confusing. I was originally talking about
columns and I don't see what the solution is.

* "So I was thinking I set the subcolumn compareWith to UTF8Type or
BytesType and construct a key [for the subcolumn, not a row key] *
*
*
*[user's lastname + user's firstname + user's uuid]*
* *
*This would result in sorted subcolumn and user list."*
*
*
Nevertheless, I still don't see/understand the solution. Let's say the
person's name changes. The sort is no longer valid. That column value would
need to be changed in order for the sort to be correct.


On Fri, Mar 12, 2010 at 5:10 PM, Brandon Williams  wrote:

> On Fri, Mar 12, 2010 at 7:07 PM, Peter Chang  wrote:
>
>> But wouldn't name + UUID be considered volatile? That was the crux of my
>> questions.
>
>
> It would, but the distinction here is that it is now a column, not a row
> key.
>
>  -Brandon
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
To be more explicit:

['500c9280-2cdd-11df-869b-005056c1'] ['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
['500c9280-2cdd-11df-869b-005056c1'] ['connections']
['Jones-Jim-1a6dd756b0-2ca1-11df-b937-005056c1']

But Alyssa gets married and changes her name to Zamboni. The next time I
read these subcolumns the user's will not be sorted.




On Fri, Mar 12, 2010 at 5:21 PM, Peter Chang  wrote:

> My original post is probably confusing. I was originally talking about
> columns and I don't see what the solution is.
>
> * "So I was thinking I set the subcolumn compareWith to UTF8Type or
> BytesType and construct a key [for the subcolumn, not a row key] *
> *
> *
> *[user's lastname + user's firstname + user's uuid]*
> * *
> *This would result in sorted subcolumn and user list."*
> *
> *
> Nevertheless, I still don't see/understand the solution. Let's say the
> person's name changes. The sort is no longer valid. That column value would
> need to be changed in order for the sort to be correct.
>
>
> On Fri, Mar 12, 2010 at 5:10 PM, Brandon Williams wrote:
>
>> On Fri, Mar 12, 2010 at 7:07 PM, Peter Chang  wrote:
>>
>>> But wouldn't name + UUID be considered volatile? That was the crux of my
>>> questions.
>>
>>
>> It would, but the distinction here is that it is now a column, not a row
>> key.
>>
>>  -Brandon
>>
>
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Fri, Mar 12, 2010 at 7:21 PM, Peter Chang  wrote:

> My original post is probably confusing. I was originally talking about
> columns and I don't see what the solution is.


Sorry, I misunderstood.

* "So I was thinking I set the subcolumn compareWith to UTF8Type or
> BytesType and construct a key [for the subcolumn, not a row key] *
> *
> *
> *[user's lastname + user's firstname + user's uuid]*
> * *
> *This would result in sorted subcolumn and user list."*
> *
> *
> Nevertheless, I still don't see/understand the solution. Let's say the
> person's name changes. The sort is no longer valid. That column value would
> need to be changed in order for the sort to be correct.
>

When their name changes, you delete the existing column and insert a new one
with the correct name, which will then sort correctly.

-Brandon


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
Yes, I can update that one entry. But what if that subcolumn key is used
across many different places?

['Jones-Bob']['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
['Crabtree-Sam']['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
['Rice-Brown']['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
...

I can update every single entry but now I need to keep track of them (which
I guess I'm doing anyway). I was wondering if there was a more elegant
solution but it seems unlikely based on the given constraints.


On Fri, Mar 12, 2010 at 5:26 PM, Brandon Williams  wrote:

> On Fri, Mar 12, 2010 at 7:21 PM, Peter Chang  wrote:
>
>> My original post is probably confusing. I was originally talking about
>> columns and I don't see what the solution is.
>
>
> Sorry, I misunderstood.
>
> * "So I was thinking I set the subcolumn compareWith to UTF8Type or
>> BytesType and construct a key [for the subcolumn, not a row key] *
>> *
>> *
>> *[user's lastname + user's firstname + user's uuid]*
>> * *
>> *This would result in sorted subcolumn and user list."*
>> *
>> *
>> Nevertheless, I still don't see/understand the solution. Let's say the
>> person's name changes. The sort is no longer valid. That column value would
>> need to be changed in order for the sort to be correct.
>>
>
> When their name changes, you delete the existing column and insert a new
> one with the correct name, which will then sort correctly.
>
> -Brandon
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Fri, Mar 12, 2010 at 7:46 PM, Peter Chang  wrote:

> Yes, I can update that one entry. But what if that subcolumn key is used
> across many different places?
>
> ['Jones-Bob']['connections']
> ['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
> ['Crabtree-Sam']['connections']
> ['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
> ['Rice-Brown']['connections']
> ['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
> ...
>
> I can update every single entry but now I need to keep track of them (which
> I guess I'm doing anyway). I was wondering if there was a more elegant
> solution but it seems unlikely based on the given constraints.
>

You have to update them all and track them, correct.  What you're looking
for sounds like transaction support, which Cassandra does not have.  On the
bright side, writes are cheap.

-Brandon


Re: Cassandra Demo/Tutorial Applications

2010-03-12 Thread Jonathan Ellis
On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar  wrote:
> I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
> for my OSCON tutorial. Have couple of questions. Would appreciate insights:
>
> A)  Cassandra-873 suggests Luenandra as one demo application
> B)  Are there other ideas that will bring out the various aspects of
> Cassandra ?

multi-user blog (single-user is too easy :)
 - extra credit: with full-text search using lucandra

discussion forum
 - also w/ FTS

> C)  What would be the goal of demo apps ? Tutorial to help folks learn the
> ins and outs of Cassandra ? Show case capabilities ? I think Cassandra-873
> belongs to the latter; Twissandra most probably belongs to the former.

I think you nailed it.

> D)  Hadoop on Cassandra might be a good demo/tutorial

Sure, I'll buy that.

I can't think of any standalone projects for that, but "compute a
twissandra tag cloud" would be pretty cool.  (Might need to write a
twissandra bot to load stuff in to make an interesting cloud. :)

> E)  How would one structure the infrastructure for the demo/tutorials ? What
> assumptions can we make in creating them ? As AMIs to be run in EC2 ?

I'd probably go with "virtualbox images" as being simpler for people
who don't have an AWS key already.  (VB can read vmware player images,
i think.  But there is no free vmware for OS X, so you'd want to check
that before going w/ vmware format.)

Or just have people d/l cassandra and a configuration xml.  Probably
easier than teaching people to use virtualbox who haven't before.

> Also
> to be run on 2-3 local machines for folks who can spare some ? Or as
> multiple processes - all in one machine ?

You're not going to have time to teach cluster management.  Keep it to 1.


Re: Cassandra Demo/Tutorial Applications

2010-03-12 Thread Ian Holsman
There are several large data sets on the net you could use to build.  
Demo with.

Search logs, wikipedia, uk govt stuff
Dbpedia may be interesting as they have some of the stuff extracted out


---
Sent from my phone
Ian Holsman - 703 879-3128

On 13/03/2010, at 4:46 PM, Jonathan Ellis  wrote:

On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar  
 wrote:
I was looking at this from CASSANDRA-873 as well as hands-on  
homework (!)
for my OSCON tutorial. Have couple of questions. Would appreciate  
insights:


A)  Cassandra-873 suggests Luenandra as one demo application
B)  Are there other ideas that will bring out the various aspects of
Cassandra ?


multi-user blog (single-user is too easy :)
- extra credit: with full-text search using lucandra

discussion forum
- also w/ FTS

C)  What would be the goal of demo apps ? Tutorial to help folks  
learn the
ins and outs of Cassandra ? Show case capabilities ? I think  
Cassandra-873
belongs to the latter; Twissandra most probably belongs to the  
former.


I think you nailed it.


D)  Hadoop on Cassandra might be a good demo/tutorial


Sure, I'll buy that.

I can't think of any standalone projects for that, but "compute a
twissandra tag cloud" would be pretty cool.  (Might need to write a
twissandra bot to load stuff in to make an interesting cloud. :)

E)  How would one structure the infrastructure for the demo/ 
tutorials ? What

assumptions can we make in creating them ? As AMIs to be run in EC2 ?


I'd probably go with "virtualbox images" as being simpler for people
who don't have an AWS key already.  (VB can read vmware player images,
i think.  But there is no free vmware for OS X, so you'd want to check
that before going w/ vmware format.)

Or just have people d/l cassandra and a configuration xml.  Probably
easier than teaching people to use virtualbox who haven't before.


Also
to be run on 2-3 local machines for folks who can spare some ? Or as
multiple processes - all in one machine ?


You're not going to have time to teach cluster management.  Keep it  
to 1.


About the replication strategy of Cassandra

2010-03-12 Thread Kauzki Aranami
Hi all. I am interested in the architecture of Cassandra.

Cassandra offers the replication policy such as "Rack Unaware" "Rack
Aware(within a datacenter)" "Datacenter Aware". It is necessary to
select these replication policies by the application.

The algorithm when the replication policy based on "Rack Aware(within
a datacenter)" and the "Datacenter Aware" strategy is selected might
be a little difficult. In Cassandra, Zookeeper was selected to the
election algorithm of the node that the system was using.


1. Please give notes the replication strategy of Cassandra is selected.


2. About the Zab protocol adopted with Zookeeper. The weak point of
the Paxos protocol of Chubby is a delay. Is the Zab protocol more
excellent than this Paxos protocol?


---
  Kazuki Aranami

 Twitter: http://twitter.com/kimtea
 http://d.hatena.ne.jp/kazuki-aranami/
 ---


Re: Incr/Decr Counters in Cassandra

2010-03-12 Thread Vijay
Badly need it for my work let me know if i can do something to speed it up
:)

Regards,




On Wed, Nov 4, 2009 at 1:32 PM, Chris Goffinet  wrote:

> Hey,
>
> At Digg we've been thinking about counters in Cassandra. In a lot of our
> use cases we need this type of support from a distributed storage system.
> Anyone else out there who has such needs as well? Zookeeper actually has
> such support and we might use that if we can't get the support in Cassandra.
>
> ---
> Chris Goffinet
> goffi...@digg.com
>
>
>
>
>
>