TTL and Cassandra counters

2012-08-26 Thread Avi-h
Hello

Our current application uses Cassandra to hold the chat items for user’s
conversation and a counter of unread chat messages (per each conversation).
We use TTL to delete old chat items, but we fail to see how we can define a
call back which will trigger an update (decrease) to the counters’ value.

Please consult on how we can achieve a solution for this issue..




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TTL-and-Cassandra-counters-tp7581990.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: are asynchronous schema updates possible ?

2012-08-26 Thread aaron morton
Concurrent schema changes are coming in 1.2. 

I could not find a single issue that covered it, that may be my bad search fu. 
The issues for 1.2 are here 
https://issues.apache.org/jira/browse/CASSANDRA/fixforversion/12319262

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 7:06 PM, Илья Шипицин chipits...@gmail.com wrote:

 Hello!
  
 we are looking into concurent schema updates (when multiple instances of 
 application create CFs at once.
  
 at the http://wiki.apache.org/cassandra/MultiTenant there's open ticket 1391, 
 it is said it is still open.
 however, in jura is said 1.1.0 is fixed
  
 can schema be updated asynchrously on 1.1.x ? or not ?
 if multiple server create the same CF ?
  
 Cheers,
 Ilya Shipitsin



Re: two-node cassandra cluster

2012-08-26 Thread aaron morton
 most of the time and then on burst days I want to bring one more
 server up (with more RAM and CPU than the first) to help serve the
 load.
Unless you are using virtual nodes (coming in 1.2) and I higher RF  I would 
recommend using machines that have the same HW spec. Otherwise you need to 
capacity plan against the combined machine with the lowest spec components from 
all machines. If you are scaling for throughput you will need to keep this in 
mind. With 2 nodes and RF 2 both nodes have to do the same amount of work. 

In _general_ the assumption is that cluster membership is reasonably stable. 
Routinely scaling up and down will be fighting things a little. 
 
To bring the node up I would:
* start it with auto_bootstrap off. 
* copy over all the data from the first node
* run repair. 

To decommission I would:
* run repair on the always on node.
* turn off the additional node
* run nodetool removetoken on the always on node to remove the additional node. 

If you want to go down this path make sure you do a lot of testing to get the 
process ironed out. I would be thinking about: 

Does the cluster have to be continuously available ? 
How much data are we talking about ? How long will it take to transfer ?
What will the network latency be like between the nodes ? Latency between new 
nodes can be a lottery. 
If you are storing the data on a single node that uses RAID 0, how will you 
handle disk failure ?

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 10:25 PM, Jason Axelson ja...@engagestage.com wrote:

 Hi, I have an application that will be very dormant most of the time
 but will need high-bursting a few days out of the month. Since we are
 deploying on EC2 I would like to keep only one Cassandra server up
 most of the time and then on burst days I want to bring one more
 server up (with more RAM and CPU than the first) to help serve the
 load. What is the best way to do this? Should I take a different
 approach?
 
 Some notes about what I plan to do:
 * Bring the node up and repair it immediately
 * After the burst time is over decommission the powerful node
 * Use the always-on server as the seed node
 * My main question is how to get the nodes to share all the data since
 I want a replication factor of 2 (so both nodes have all the data) but
 that won't work while there is only one server. Should I bring up 2
 extra servers instead of just one?
 
 Thanks,
 Jason



Re: Order of the cyclic group of hashed partitioners

2012-08-26 Thread aaron morton
 AbstractHashedPartitioner 
does not exist in the trunk. 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commitdiff;h=a89ef1ffd4cd2ee39a2751f37044dba3015d72f1


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 10:51 PM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote:

 
 Hi, 
 
 AbstractHashedPartitioner defines a maximum of 2**127 hence an order of 
 (2**127)+1. 
 I'd say that tokens of such partitioners are intented to be distributed in 
 Z/(127), hence a maximum of (2**127)-1. 
 Could there be a mix up between maximum and order? 
 This is a detail but could someone confirm/invalidate? 
 
 Regards, 
 
 Romain



Re: two-node cassandra cluster

2012-08-26 Thread Franc Carter
On Fri, Aug 24, 2012 at 8:25 PM, Jason Axelson ja...@engagestage.comwrote:

 Hi, I have an application that will be very dormant most of the time
 but will need high-bursting a few days out of the month. Since we are
 deploying on EC2 I would like to keep only one Cassandra server up
 most of the time and then on burst days I want to bring one more
 server up (with more RAM and CPU than the first) to help serve the
 load. What is the best way to do this? Should I take a different
 approach?

 Some notes about what I plan to do:
 * Bring the node up and repair it immediately
 * After the burst time is over decommission the powerful node
 * Use the always-on server as the seed node
 * My main question is how to get the nodes to share all the data since
 I want a replication factor of 2 (so both nodes have all the data) but
 that won't work while there is only one server. Should I bring up 2
 extra servers instead of just one?

 Thanks,
 Jason


Caveat: I haven't tried what I am about to suggest

Could you run the cluster on smaller instances for most of the time and
then when you need more performance increases the instance size to get more
CPU/Memory. If you use EBS with provisioned IOPs you should be able to make
the transition reasonably quickly.

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: Cluster temporarily split into segments

2012-08-26 Thread aaron morton
 using CL=ONE (read) and CL=ALL(write).
Using this setting you are saying the application should fail in the case of a 
network partition. You are valuing Consistency and Availability over  Partition 
Tolerance. Mixing the CL levels in response to a partition will make it 
difficult to reason about the consistency of the data. Consider other 
approaches such as CL QUORUM.

While using CL ONE for read looks good. If you require strong consistency you 
have to use ALL for writes. QUORUM for reads and writes may be a better choice. 
Using RF 3 and 6 nodes would give you a pretty good availability in the face of 
node failures (for background http://thelastpickle.com/2011/06/13/Down-For-Me/ 
) 

Or relax the Consistency and us CL ONE for reads and CL QUORUM for writes. The 
writes are still sent to RF nodes. But we can no longer guarantee reads will 
see them. 

If the high RF is for 
 Suppose that connectivity breaks down (for whatever reason) causing two 
 isolated segments:
 S1 = {A,B,C,D} and S2 = {E,F}.

Do clients still have access to the entire cluster ? Normally we would expect 
clients to try different nodes until they either fail or find a partition with 
enough UP nodes to service the request.

 to be able to write at all, the CL strategy definitely needs to be changed.
 In S1, for instance change to CL=QUORUM for both reads/writes
 In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO
Whatever the choice you can imagine a partition where the only thing that works 
for writes is CL ONE. e.g. if it split 3/3 QUOURM would not work. 

 So now to the interesting question, what happens when S1 and S2 reestablish 
 full connectivity again ?
If you are using CL ALL for writes the easiest things to do is stop writing 
when the cluster partitions. And resume when it comes back. 

If you drop the CL during writes reads will be inconsistent until either HH has 
finished or you run repairs. 


  It is extremly important that reads will continue to operate in both S1 and 
 S2
if it's important that reads continue and are Consistent, I would look at RF 3 
with QUOURM / QUOURM. 

If it's important that reads continue and consistency can be relaxed I would 
look at RF 3 (or 6) and read ONE write QUOURM

Hope that helps. 
  
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 11:00 PM, Robert Hellmans robert.hellm...@aastra.com wrote:

 Hi !
  
 I'm preparing the test below. I've found a lot of information about deadnode 
 replacements and adding extra nodes to increase capacity, but didn't find 
 anything about this segementation issue. Anyone that can share 
 experience/ideas ?
  
  
 Setup:
 Cluster with 6 nodes {A,B,C,D,E,F}, RF=6, using CL=ONE (read) and 
 CL=ALL(write).
  
  
 Suppose that connectivity breaks down (for whatever reason) causing two 
 isolated segments:
 S1 = {A,B,C,D} and S2 = {E,F}.
  
 Cluster connectivity anomalities will be detected by all nodes in this setup, 
 so clients in S1 and S2 can be advised
 to change their CL strategy. It is extremly important that reads will 
 continue to operate in both S1 and S2
 and I don't see any reason why they shouldn't. It is almost that important 
 that writes in each segment can continue, but
 to be able to write at all, the CL strategy definitely needs to be changed.
 In S1, for instance change to CL=QUORUM for both reads/writes
 In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO
  
 During the connectivity breakdown, clients in both S1 and S2 simultaneously 
 change/add/delete data.
  
  
  
 So now to the interesting question, what happens when S1 and S2 reestablish 
 full connectivity again ?
 Again, the re-connectivity event will be detected, so should I trig some 
 special repair sequence ?
 Or should I've been doing some actions already when the connectivity broke ?
 What about connectivity dropout time, longer/shorter than max_hint_window ?
  
  
  
  
 Rds /Robert
  
  
  



Re: Data Modelling Suggestions

2012-08-26 Thread aaron morton
 Im finding that only the first component is used ….is this understanding 
 correct?
The result is correct. 

 to (end)component1=timestamp3,component2=123 
is less than 
 Timestamp3: 777

Example:

CREATE COLUMN FAMILY 
Foo
WITH 
key_validation_class = UTF8Type
AND 
comparator = 'CompositeType(IntegerType, IntegerType)'
AND 
default_validation_class = UTF8Type
;


set Foo['bar']['1:1'] = 'baz1';
set Foo['bar']['2:2'] = 'baz2';
set Foo['bar']['3:3'] = 'baz3';
set Foo['bar']['4:4'] = 'baz4';


aarons-MBP-2011:pycassa aaron$ ./pycassaShell -k dev
In [2]: FOO.get(bar)
Out[2]: OrderedDict([((1, 1), u'baz1'), ((2, 2), u'baz2'), ((3, 3), u'baz3'), 
((4, 4), u'baz4')])

In [6]: FOO.get(bar, column_start=(2,2))
Out[6]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')])

In [8]: FOO.get(bar, column_start=(2,2), column_finish=(3,3))
Out[8]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')])

In [9]: FOO.get(bar, column_start=(2,2), column_finish=(3,1))
Out[9]: OrderedDict([((2, 2), u'baz2')])

In [10]: FOO.get(bar, column_start=(2,), column_finish=(3,))
Out[10]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')])

 We see a lot of examples about Timeseries modelling ...

Sorry I do not understand this question. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 11:17 PM, Roshni Rajagopal roshni.rajago...@wal-mart.com 
wrote:

 Thank you Aaron  Guillermo,
 
 I find composite columns very confusing :(
 To reconfirm ,
 
 1.  we can only search for columns  range with the first component on the 
 composite column.
 2.  After specifying a range for the first component, we cannot further 
 filter for the second component.  I found this link 
 http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/ 
  which seems to suggest filtering is possible by second component in addition 
 to first, and I tried the same example but I couldn't get it to work. Does 
 anyone have an example where suppose I have data like this in my column names
 
 Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654  ---get 
 range of columns for (start)component1 = timestamp1, component2=123 , to 
 (end)component1=timestamp3,component2=123  -- should give me only one column
 Im finding that only the first component is used ….is this understanding 
 correct?
 
 
 We see a lot of examples about Timeseries modelling with TimeUUID as column 
 names. But how is the updating or deletion of columns happening here, how are 
 the columns found to know which ones to delete or modify. Does one always 
 need a separate column family to handle updating/deletion for time series, or 
 is usually handled by setting TTL for data outside the archival period, or 
 does time series modelling usually not involve any manipulation of past 
 records?
 
 Regards,
 Roshni
 
 
 
 From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Data Modelling Suggestions
 
 I was trying to find hector examples where we search for second column in a 
 composite column, but I couldn't find any good one. Im not sure if its 
 possible.…if you have any do have any example please share.
 It's not. When slicing columns you can only return one contiguous range.
 
 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item
 +1
 Have the orders somewhere, and build a time ordered custom index to show them 
 in order.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/08/2012, at 6:28 AM, Guillermo Winkler 
 gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote:
 
 I think you need another CF as index.
 
 user_itemid - timestamped column_name
 
 Otherwise you can't guess what's the timestamp to use in the column name.
 
 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item information.
 
 Maybe you can solve it with a secondary index by timestamp too.
 
 Guille
 
 
 On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote:
 Hi,
 
 Need some help on a data modelling question. We're using Hector  Datastax 
 Enterprise 2.1.
 
 
 I want to associate a list of items for a user. It should be sorted on the 
 time added. 

Decreasing the number of nodes in the ring

2012-08-26 Thread Senthilvel Rangaswamy
We have a cluster of 9 nodes in the ring. We would like SSD backed boxes.
But we may not need 9
nodes in that case. What is the best way to downscale the cluster to 6 or 3
nodes.

-- 
..Senthil

If there's anything more important than my ego around, I want it
 caught and shot now.
- Douglas Adams.


Re: Decreasing the number of nodes in the ring

2012-08-26 Thread Mohit Anchlia
use nodetool decommission and nodetool removetoken

On Sun, Aug 26, 2012 at 5:31 PM, Senthilvel Rangaswamy senthil...@gmail.com
 wrote:

 We have a cluster of 9 nodes in the ring. We would like SSD backed boxes.
 But we may not need 9
 nodes in that case. What is the best way to downscale the cluster to 6 or
 3 nodes.

 --
 ..Senthil

 If there's anything more important than my ego around, I want it
  caught and shot now.
 - Douglas Adams.




Re: help required to resolve super column family problems

2012-08-26 Thread Amit Handa
Hi,

i basically want to do hands-on on Super Column family concept, making some
examples using hector api, and manually adding the data.
I explored hector-example project, but only got very starting level of
super column family example.
i am in search of more super column family examples using hector api. can i
get some link and references related to it?


Regards,
Amit


On Sat, Aug 25, 2012 at 3:20 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 If you are starting out new use composite column names/values or you could
 also use JSON style doc as a column value.


 On Fri, Aug 24, 2012 at 2:31 PM, Rob Coli rc...@palominodb.com wrote:

 On Fri, Aug 24, 2012 at 4:33 AM, Amit Handa amithand...@gmail.com
 wrote:
  kindly help in resolving the following problem with respect to super
 column
  family.
  i am using cassandra version 1.1.3

 Well, THERE's your problem... ;D

 But seriously.. as I understand project intent, super columns will
 ultimately be a weird API wrapper around composite keys. Also,  super
 column families have not been well supported for years. You probably
 just want to use composite keys if you are just starting out in 1.1.x.

 https://issues.apache.org/jira/browse/CASSANDRA-3237

 =Rob

 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb





Re: help required to resolve super column family problems

2012-08-26 Thread Aaron Turner
On Sun, Aug 26, 2012 at 9:28 PM, Amit Handa amithand...@gmail.com wrote:
 Hi,

 i basically want to do hands-on on Super Column family concept, making some
 examples using hector api, and manually adding the data.
 I explored hector-example project, but only got very starting level of super
 column family example.
 i am in search of more super column family examples using hector api. can i
 get some link and references related to it?

Is there a specific reason you want to use SuperColumns?  Basically,
it's a feature that the developers say you shouldn't use.  Most
people don't use them because of the rather poor performance
characteristics SC's have.

-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero