TTL and Cassandra counters
Hello Our current application uses Cassandra to hold the chat items for user’s conversation and a counter of unread chat messages (per each conversation). We use TTL to delete old chat items, but we fail to see how we can define a call back which will trigger an update (decrease) to the counters’ value. Please consult on how we can achieve a solution for this issue.. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TTL-and-Cassandra-counters-tp7581990.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: are asynchronous schema updates possible ?
Concurrent schema changes are coming in 1.2. I could not find a single issue that covered it, that may be my bad search fu. The issues for 1.2 are here https://issues.apache.org/jira/browse/CASSANDRA/fixforversion/12319262 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 7:06 PM, Илья Шипицин chipits...@gmail.com wrote: Hello! we are looking into concurent schema updates (when multiple instances of application create CFs at once. at the http://wiki.apache.org/cassandra/MultiTenant there's open ticket 1391, it is said it is still open. however, in jura is said 1.1.0 is fixed can schema be updated asynchrously on 1.1.x ? or not ? if multiple server create the same CF ? Cheers, Ilya Shipitsin
Re: two-node cassandra cluster
most of the time and then on burst days I want to bring one more server up (with more RAM and CPU than the first) to help serve the load. Unless you are using virtual nodes (coming in 1.2) and I higher RF I would recommend using machines that have the same HW spec. Otherwise you need to capacity plan against the combined machine with the lowest spec components from all machines. If you are scaling for throughput you will need to keep this in mind. With 2 nodes and RF 2 both nodes have to do the same amount of work. In _general_ the assumption is that cluster membership is reasonably stable. Routinely scaling up and down will be fighting things a little. To bring the node up I would: * start it with auto_bootstrap off. * copy over all the data from the first node * run repair. To decommission I would: * run repair on the always on node. * turn off the additional node * run nodetool removetoken on the always on node to remove the additional node. If you want to go down this path make sure you do a lot of testing to get the process ironed out. I would be thinking about: Does the cluster have to be continuously available ? How much data are we talking about ? How long will it take to transfer ? What will the network latency be like between the nodes ? Latency between new nodes can be a lottery. If you are storing the data on a single node that uses RAID 0, how will you handle disk failure ? Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 10:25 PM, Jason Axelson ja...@engagestage.com wrote: Hi, I have an application that will be very dormant most of the time but will need high-bursting a few days out of the month. Since we are deploying on EC2 I would like to keep only one Cassandra server up most of the time and then on burst days I want to bring one more server up (with more RAM and CPU than the first) to help serve the load. What is the best way to do this? Should I take a different approach? Some notes about what I plan to do: * Bring the node up and repair it immediately * After the burst time is over decommission the powerful node * Use the always-on server as the seed node * My main question is how to get the nodes to share all the data since I want a replication factor of 2 (so both nodes have all the data) but that won't work while there is only one server. Should I bring up 2 extra servers instead of just one? Thanks, Jason
Re: Order of the cyclic group of hashed partitioners
AbstractHashedPartitioner does not exist in the trunk. https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commitdiff;h=a89ef1ffd4cd2ee39a2751f37044dba3015d72f1 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 10:51 PM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Hi, AbstractHashedPartitioner defines a maximum of 2**127 hence an order of (2**127)+1. I'd say that tokens of such partitioners are intented to be distributed in Z/(127), hence a maximum of (2**127)-1. Could there be a mix up between maximum and order? This is a detail but could someone confirm/invalidate? Regards, Romain
Re: two-node cassandra cluster
On Fri, Aug 24, 2012 at 8:25 PM, Jason Axelson ja...@engagestage.comwrote: Hi, I have an application that will be very dormant most of the time but will need high-bursting a few days out of the month. Since we are deploying on EC2 I would like to keep only one Cassandra server up most of the time and then on burst days I want to bring one more server up (with more RAM and CPU than the first) to help serve the load. What is the best way to do this? Should I take a different approach? Some notes about what I plan to do: * Bring the node up and repair it immediately * After the burst time is over decommission the powerful node * Use the always-on server as the seed node * My main question is how to get the nodes to share all the data since I want a replication factor of 2 (so both nodes have all the data) but that won't work while there is only one server. Should I bring up 2 extra servers instead of just one? Thanks, Jason Caveat: I haven't tried what I am about to suggest Could you run the cluster on smaller instances for most of the time and then when you need more performance increases the instance size to get more CPU/Memory. If you use EBS with provisioned IOPs you should be able to make the transition reasonably quickly. cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: Cluster temporarily split into segments
using CL=ONE (read) and CL=ALL(write). Using this setting you are saying the application should fail in the case of a network partition. You are valuing Consistency and Availability over Partition Tolerance. Mixing the CL levels in response to a partition will make it difficult to reason about the consistency of the data. Consider other approaches such as CL QUORUM. While using CL ONE for read looks good. If you require strong consistency you have to use ALL for writes. QUORUM for reads and writes may be a better choice. Using RF 3 and 6 nodes would give you a pretty good availability in the face of node failures (for background http://thelastpickle.com/2011/06/13/Down-For-Me/ ) Or relax the Consistency and us CL ONE for reads and CL QUORUM for writes. The writes are still sent to RF nodes. But we can no longer guarantee reads will see them. If the high RF is for Suppose that connectivity breaks down (for whatever reason) causing two isolated segments: S1 = {A,B,C,D} and S2 = {E,F}. Do clients still have access to the entire cluster ? Normally we would expect clients to try different nodes until they either fail or find a partition with enough UP nodes to service the request. to be able to write at all, the CL strategy definitely needs to be changed. In S1, for instance change to CL=QUORUM for both reads/writes In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO Whatever the choice you can imagine a partition where the only thing that works for writes is CL ONE. e.g. if it split 3/3 QUOURM would not work. So now to the interesting question, what happens when S1 and S2 reestablish full connectivity again ? If you are using CL ALL for writes the easiest things to do is stop writing when the cluster partitions. And resume when it comes back. If you drop the CL during writes reads will be inconsistent until either HH has finished or you run repairs. It is extremly important that reads will continue to operate in both S1 and S2 if it's important that reads continue and are Consistent, I would look at RF 3 with QUOURM / QUOURM. If it's important that reads continue and consistency can be relaxed I would look at RF 3 (or 6) and read ONE write QUOURM Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 11:00 PM, Robert Hellmans robert.hellm...@aastra.com wrote: Hi ! I'm preparing the test below. I've found a lot of information about deadnode replacements and adding extra nodes to increase capacity, but didn't find anything about this segementation issue. Anyone that can share experience/ideas ? Setup: Cluster with 6 nodes {A,B,C,D,E,F}, RF=6, using CL=ONE (read) and CL=ALL(write). Suppose that connectivity breaks down (for whatever reason) causing two isolated segments: S1 = {A,B,C,D} and S2 = {E,F}. Cluster connectivity anomalities will be detected by all nodes in this setup, so clients in S1 and S2 can be advised to change their CL strategy. It is extremly important that reads will continue to operate in both S1 and S2 and I don't see any reason why they shouldn't. It is almost that important that writes in each segment can continue, but to be able to write at all, the CL strategy definitely needs to be changed. In S1, for instance change to CL=QUORUM for both reads/writes In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO During the connectivity breakdown, clients in both S1 and S2 simultaneously change/add/delete data. So now to the interesting question, what happens when S1 and S2 reestablish full connectivity again ? Again, the re-connectivity event will be detected, so should I trig some special repair sequence ? Or should I've been doing some actions already when the connectivity broke ? What about connectivity dropout time, longer/shorter than max_hint_window ? Rds /Robert
Re: Data Modelling Suggestions
Im finding that only the first component is used ….is this understanding correct? The result is correct. to (end)component1=timestamp3,component2=123 is less than Timestamp3: 777 Example: CREATE COLUMN FAMILY Foo WITH key_validation_class = UTF8Type AND comparator = 'CompositeType(IntegerType, IntegerType)' AND default_validation_class = UTF8Type ; set Foo['bar']['1:1'] = 'baz1'; set Foo['bar']['2:2'] = 'baz2'; set Foo['bar']['3:3'] = 'baz3'; set Foo['bar']['4:4'] = 'baz4'; aarons-MBP-2011:pycassa aaron$ ./pycassaShell -k dev In [2]: FOO.get(bar) Out[2]: OrderedDict([((1, 1), u'baz1'), ((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')]) In [6]: FOO.get(bar, column_start=(2,2)) Out[6]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')]) In [8]: FOO.get(bar, column_start=(2,2), column_finish=(3,3)) Out[8]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')]) In [9]: FOO.get(bar, column_start=(2,2), column_finish=(3,1)) Out[9]: OrderedDict([((2, 2), u'baz2')]) In [10]: FOO.get(bar, column_start=(2,), column_finish=(3,)) Out[10]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')]) We see a lot of examples about Timeseries modelling ... Sorry I do not understand this question. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 11:17 PM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Thank you Aaron Guillermo, I find composite columns very confusing :( To reconfirm , 1. we can only search for columns range with the first component on the composite column. 2. After specifying a range for the first component, we cannot further filter for the second component. I found this link http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/ which seems to suggest filtering is possible by second component in addition to first, and I tried the same example but I couldn't get it to work. Does anyone have an example where suppose I have data like this in my column names Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654 ---get range of columns for (start)component1 = timestamp1, component2=123 , to (end)component1=timestamp3,component2=123 -- should give me only one column Im finding that only the first component is used ….is this understanding correct? We see a lot of examples about Timeseries modelling with TimeUUID as column names. But how is the updating or deletion of columns happening here, how are the columns found to know which ones to delete or modify. Does one always need a separate column family to handle updating/deletion for time series, or is usually handled by setting TTL for data outside the archival period, or does time series modelling usually not involve any manipulation of past records? Regards, Roshni From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Data Modelling Suggestions I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. It's not. When slicing columns you can only return one contiguous range. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item +1 Have the orders somewhere, and build a time ordered custom index to show them in order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote: I think you need another CF as index. user_itemid - timestamped column_name Otherwise you can't guess what's the timestamp to use in the column name. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information. Maybe you can solve it with a secondary index by timestamp too. Guille On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote: Hi, Need some help on a data modelling question. We're using Hector Datastax Enterprise 2.1. I want to associate a list of items for a user. It should be sorted on the time added.
Decreasing the number of nodes in the ring
We have a cluster of 9 nodes in the ring. We would like SSD backed boxes. But we may not need 9 nodes in that case. What is the best way to downscale the cluster to 6 or 3 nodes. -- ..Senthil If there's anything more important than my ego around, I want it caught and shot now. - Douglas Adams.
Re: Decreasing the number of nodes in the ring
use nodetool decommission and nodetool removetoken On Sun, Aug 26, 2012 at 5:31 PM, Senthilvel Rangaswamy senthil...@gmail.com wrote: We have a cluster of 9 nodes in the ring. We would like SSD backed boxes. But we may not need 9 nodes in that case. What is the best way to downscale the cluster to 6 or 3 nodes. -- ..Senthil If there's anything more important than my ego around, I want it caught and shot now. - Douglas Adams.
Re: help required to resolve super column family problems
Hi, i basically want to do hands-on on Super Column family concept, making some examples using hector api, and manually adding the data. I explored hector-example project, but only got very starting level of super column family example. i am in search of more super column family examples using hector api. can i get some link and references related to it? Regards, Amit On Sat, Aug 25, 2012 at 3:20 AM, Mohit Anchlia mohitanch...@gmail.comwrote: If you are starting out new use composite column names/values or you could also use JSON style doc as a column value. On Fri, Aug 24, 2012 at 2:31 PM, Rob Coli rc...@palominodb.com wrote: On Fri, Aug 24, 2012 at 4:33 AM, Amit Handa amithand...@gmail.com wrote: kindly help in resolving the following problem with respect to super column family. i am using cassandra version 1.1.3 Well, THERE's your problem... ;D But seriously.. as I understand project intent, super columns will ultimately be a weird API wrapper around composite keys. Also, super column families have not been well supported for years. You probably just want to use composite keys if you are just starting out in 1.1.x. https://issues.apache.org/jira/browse/CASSANDRA-3237 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: help required to resolve super column family problems
On Sun, Aug 26, 2012 at 9:28 PM, Amit Handa amithand...@gmail.com wrote: Hi, i basically want to do hands-on on Super Column family concept, making some examples using hector api, and manually adding the data. I explored hector-example project, but only got very starting level of super column family example. i am in search of more super column family examples using hector api. can i get some link and references related to it? Is there a specific reason you want to use SuperColumns? Basically, it's a feature that the developers say you shouldn't use. Most people don't use them because of the rather poor performance characteristics SC's have. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero