Re: new nodetool ring output and unbalanced ring?
out of interest, why -100 and not -1 or + 1? any particular reason? On 06/09/2012 19:17, Tyler Hobbs wrote: To minimize the impact on the cluster, I would bootstrap a new 1d node at (42535295865117307932921825928971026432 - 100), then decommission the 1c node at 42535295865117307932921825928971026432 and run cleanup on your us-east nodes. On Thu, Sep 6, 2012 at 1:11 PM, William Oberman ober...@civicscience.com mailto:ober...@civicscience.com wrote: Didn't notice the racks! Of course If I change a 1c to a 1d, what would I have to do to make sure data shuffles around correctly? Repair everywhere? will On Thu, Sep 6, 2012 at 2:09 PM, Tyler Hobbs ty...@datastax.com mailto:ty...@datastax.com wrote: The main issue is that one of your us-east nodes is in rack 1d, while the restart are in rack 1c. With NTS and multiple racks, Cassandra will try use one node from each rack as a replica for a range until it either meets the RF for the DC, or runs out of racks, in which case it just picks nodes sequentially going clockwise around the ring (starting from the range being considered, not the last node that was chosen as a replica). To fix this, you'll either need to make the 1d node a 1c node, or make 42535295865117307932921825928971026432 a 1d node so that you're alternating racks within that DC. On Thu, Sep 6, 2012 at 12:54 PM, William Oberman ober...@civicscience.com mailto:ober...@civicscience.com wrote: Hi, I recently upgraded from 0.8.x to 1.1.x (through 1.0 briefly) and nodetool -ring seems to have changed from owns to effectively owns. Effectively owns seems to account for replication factor (RF). I'm ok with all of this, yet I still can't figure out what's up with my cluster. I have a NetworkTopologyStrategy with two data centers (DCs) with RF/number nodes in DC combinations of: DC Name, RF, # in DC analytics, 1, 2 us-east, 3, 4 So I'd expect 50% on each analytics node, and 75% for each us-east node. Instead, I have two nodes in us-east with 50/100??? (the other two are 75/75 as expected). Here is the output of nodetool (all nodes report the same thing): Address DC RackStatus State Load Effective-Ownership Token 127605887595351923798765477786913079296 x.x.x.x us-east 1c Up Normal 94.57 GB 75.00%0 x.x.x.x analytics 1c Up Normal 60.64 GB 50.00%1 x.x.x.x us-east 1c Up Normal 131.76 GB 75.00% 42535295865117307932921825928971026432 x.x.x.xus-east 1c Up Normal 43.45 GB 50.00% 85070591730234615865843651857942052864 x.x.x.xanalytics 1d Up Normal 60.88 GB 50.00% 85070591730234615865843651857942052865 x.x.x.x us-east 1d Up Normal 98.56 GB 100.00% 127605887595351923798765477786913079296 If I use cassandra-cli to do show keyspaces; I get (and again, all nodes report the same thing): Keyspace: civicscience: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [analytics:1, us-east:3] I removed the output about all of my column families (CFs), hopefully that doesn't matter. Did I compute the tokens wrong? Is there a combination of nodetool commands I can run to migrate the data around to rebalance to 75/75/75/75? I routinely run repair already. And as the release notes required, I ran upgradesstables during the upgrade process. Before the upgrade, I was getting analytics = 0%, and us-east = 25% on each node, which I expected for owns. will -- Tyler Hobbs DataStax http://datastax.com/ -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 tel:412-480-7835 (E) ober...@civicscience.com mailto:ober...@civicscience.com -- Tyler Hobbs DataStax http://datastax.com/
Re: Data Modeling- another question
i would respectfully disagree, what you have said is true but it really depends on the use case. 1) do you expect to be doing updates to individual fields of an item, or will you always update all fields at once? if you are doing separate updates then the first is definitely easier to handle updates. 2) do you expect to do paging of the list? this will be easier with the json approach, as in the first your item may span across a page boundary - not an insurmountable problem by any means, but more complicated nonetheless. this is not an issue obviously if all your items have the same number of fields. 3) do you expect to read or delete multiple items individually? you may have to do multiple reads/deletes of a row if the items are not adjacent to each other as you cannot do 'disjoint' slices of columns at the moment. with the json approach you can just specify individual columns and you're done. again this is less of an issue if items have a known set of fields, but your list of columns to read/delete may get quite large fairly quickly the first is definitely better if you want to update individual fields, read-then-write is not a good idea in cassandra. but it is more complicated for most usage scenarios, so you have to work out if you really need the extra flexibility. On 24/08/2012 13:54, samal wrote: First is better choice, each filed can be updated separately(write only). Second you have to take care json yourself (read first-modify-then write). On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal roshni.rajago...@wal-mart.com mailto:roshni.rajago...@wal-mart.com wrote: Hi, Suppose I have a column family to associate a user to a dynamic list of items. I want to store 5-10 key information about the item, no specific sorting requirements are there. I have two options A) use composite columns UserId1 : { itemid1:Name = Betty Crocker, itemid1:Descr = Cake itemid1:Qty = 5 itemid2:Name = Nutella, itemid2:Descr = Choc spread itemid2:Qty = 15 } B) use a json with the data UserId1 : { itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5}, itemid2 ={name: Nutella,descr: Choc spread, Qty: 15} } Which do you suggest would be better? Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Truncate failing with 1.0 client against 0.7 cluster
i'm doing an upgrade of Cassandra 0.7 to 1.0 at the moment, and as part of the preparation i'm upgrading to 1.0 client libraries (we use Hector 1.0-5) prior to upgrading the cluster itself. I'm seeing some of our integration tests against the dev 0.7 cluster fail as they get UnavailableExceptions when trying to truncate the test column families. This is new behaviour with the 1.0 client libraries, it doesn't happen with the 0.7 libraries. It seems to fail immediately, it doesn't eg wait for eg the 10 second RPC timeout, it fails straight away. Anyone have any ideas as to what may be happening? Interestingly I seem to be able to get around it if i only tell Hector about one of the nodes (we have 4). If I give it all four then it throws the UnavailableException.
Re: Truncate failing with 1.0 client against 0.7 cluster
sorry i don't have the exact text right now but it's along the lines of 'not enough replicas available to handle the requested consistency level'. i'm requesting quorum but i've tried with one, and any and it made no difference. On 16/07/2012 19:30, aaron morton wrote: UnavailableException is a server side error, whats the full error message ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 5:31 AM, Guy Incognito wrote: i'm doing an upgrade of Cassandra 0.7 to 1.0 at the moment, and as part of the preparation i'm upgrading to 1.0 client libraries (we use Hector 1.0-5) prior to upgrading the cluster itself. I'm seeing some of our integration tests against the dev 0.7 cluster fail as they get UnavailableExceptions when trying to truncate the test column families. This is new behaviour with the 1.0 client libraries, it doesn't happen with the 0.7 libraries. It seems to fail immediately, it doesn't eg wait for eg the 10 second RPC timeout, it fails straight away. Anyone have any ideas as to what may be happening? Interestingly I seem to be able to get around it if i only tell Hector about one of the nodes (we have 4). If I give it all four then it throws the UnavailableException.
Re: cassandra 1.0.9 error - Read an invalid frame size of 0
i have seen this as well, is it a known issue? On 18/06/2012 19:38, Gurpreet Singh wrote: I found a fix for this one, rather a workaround. I changed the rpc_server_type in cassandra.yaml, from hsha to sync, and the error went away. I guess, there is some issue with the thrift nonblocking server. Thanks Gurpreet On Wed, May 16, 2012 at 7:04 PM, Gurpreet Singh gurpreet.si...@gmail.com mailto:gurpreet.si...@gmail.com wrote: Thanks Aaron. will do! On Mon, May 14, 2012 at 1:14 PM, aaron morton aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote: Are you using framed transport on the client side ? Try the Hector user list for hector specific help https://groups.google.com/forum/?fromgroups#!searchin/hector-users https://groups.google.com/forum/?fromgroups#%21searchin/hector-users Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/05/2012, at 5:44 AM, Gurpreet Singh wrote: This is hampering our testing of cassandra a lot, and our move to cassandra 1.0.9. Has anyone seen this before? Should I be trying a different version of cassandra? /G On Thu, May 10, 2012 at 11:29 PM, Gurpreet Singh gurpreet.si...@gmail.com mailto:gurpreet.si...@gmail.com wrote: Hi, i have created 1 node cluster of cassandra 1.0.9. I am setting this up for testing reads/writes. I am seeing the following error in the server system.log ERROR [Selector-Thread-7] 2012-05-10 22:44:02,607 TNonblockingServer.java (line 467) Read an invalid frame size of 0. Are you using TFramedTransport on the client side? Initially i was using a old hector 0.7.x, but even after switching to hector 1.0-5 and thrift version 0.6.1, i still see this error. I am using 20 threads writing/reading from cassandra. The max write batch size is 10 with payload size constant per key to be 600 bytes. On the client side, i see Hector exceptions happenning coinciding with these messages on the server. Any ideas why these errors are happenning? Thanks Gurpreet
Re: Schema advice/help
well, no. my assumption is that he knows what the 5 itemTypes (or appropriate corresponding ids) are, so he can do a known 5-rowkey lookup. if he does not know, then agreed, my proposal is not a great fit. could do (as originally suggested) userId - itemType:activityId if you want to keep everything in the same row (again assumes that you know what the itemTypes are). but then you can't really do a multiget, you have to do 5 separate slice queries, one for each item type. can also do some wacky stuff around maintaining a row that explicitly only holds the last 10 items by itemType (meaning you have to delete the oldest one everytime you insert a new one), but that prolly requires read-on-write etc and is a lot messier. and you will prolly need to worry about the case where you (transiently) have more than 10 'latest' items for a single itemType. On 28/03/2012 09:49, Maciej Miklas wrote: yes - but anyway in your example you need key range quey and that requires OOP, right? On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito dnd1...@gmail.com mailto:dnd1...@gmail.com wrote: multiget does not require OPP. On 27/03/2012 09:51, Maciej Miklas wrote: multiget would require Order Preserving Partitioner, and this can lead to unbalanced ring and hot spots. Maybe you can use secondary index on itemtype - is must have small cardinality: http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/ On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito dnd1...@gmail.com mailto:dnd1...@gmail.com wrote: without the ability to do disjoint column slices, i would probably use 5 different rows. userId:itemType - activityId then it's a multiget slice of 10 items from each of your 5 rows. On 26/03/2012 22:16, Ertio Lew wrote: I need to store activities by each user, on 5 items types. I always want to read last 10 activities on each item type, by a user (ie, total activities to read at a time =50). I am wanting to store these activities in a single row for each user so that they can be retrieved in single row query, since I want to read all the last 10 activities on each item.. I am thinking of creating composite names appending itemtype : activityId(activityId is just timestamp value) but then, I don't see about how to read the last 10 activities from all itemtypes. Any ideas about schema to do this better way ?
Re: Schema advice/help
without the ability to do disjoint column slices, i would probably use 5 different rows. userId:itemType - activityId then it's a multiget slice of 10 items from each of your 5 rows. On 26/03/2012 22:16, Ertio Lew wrote: I need to store activities by each user, on 5 items types. I always want to read last 10 activities on each item type, by a user (ie, total activities to read at a time =50). I am wanting to store these activities in a single row for each user so that they can be retrieved in single row query, since I want to read all the last 10 activities on each item.. I am thinking of creating composite names appending itemtype : activityId(activityId is just timestamp value) but then, I don't see about how to read the last 10 activities from all itemtypes. Any ideas about schema to do this better way ?
Re: problem in create column family
why don't you show us the command you're actually trying to run? On 27/03/2012 08:52, puneet loya wrote: I m using cassandra 1.0.8.. Please reply On Tue, Mar 27, 2012 at 12:28 PM, R. Verlangen ro...@us2.nl mailto:ro...@us2.nl wrote: Not sure about that, what version of Cassandra are you using? Maybe someone else here knows how to solve this.. 2012/3/27 puneet loya puneetl...@gmail.com mailto:puneetl...@gmail.com ya had created with UTF8Type before.. It gave the same error. On executing help assume command it is giving 'utf8' as a type. so can i use comparator='utf8' or not?? Please reply On Mon, Mar 26, 2012 at 9:17 PM, R. Verlangen ro...@us2.nl mailto:ro...@us2.nl wrote: You should use the full type names, e.g. create column family MyColumnFamily with comparator=UTF8Type; 2012/3/26 puneet loya puneetl...@gmail.com mailto:puneetl...@gmail.com It is giving errors like Unable to find abstract-type class 'org.apache.cassandra.db.marshal.utf8' and java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: cannot parse 'catalogueId' as hex bytes where catalogueId is a column that has utf8 as its data type. they may be just synactical errors.. Please suggest if u can help me out on dis?? -- With kind regards, Robin Verlangen www.robinverlangen.nl http://www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl http://www.robinverlangen.nl
Re: Schema advice/help
multiget does not require OPP. On 27/03/2012 09:51, Maciej Miklas wrote: multiget would require Order Preserving Partitioner, and this can lead to unbalanced ring and hot spots. Maybe you can use secondary index on itemtype - is must have small cardinality: http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/ On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito dnd1...@gmail.com mailto:dnd1...@gmail.com wrote: without the ability to do disjoint column slices, i would probably use 5 different rows. userId:itemType - activityId then it's a multiget slice of 10 items from each of your 5 rows. On 26/03/2012 22:16, Ertio Lew wrote: I need to store activities by each user, on 5 items types. I always want to read last 10 activities on each item type, by a user (ie, total activities to read at a time =50). I am wanting to store these activities in a single row for each user so that they can be retrieved in single row query, since I want to read all the last 10 activities on each item.. I am thinking of creating composite names appending itemtype : activityId(activityId is just timestamp value) but then, I don't see about how to read the last 10 activities from all itemtypes. Any ideas about schema to do this better way ?
Re: Exceptions related to thrift transport
are you perhaps trying to send a large batch mutate? i've seen broken pipes etc in cassandra 0.7 (currently in the process of upgrading to 1.0.8) when a large batch mutate is sent. On 22/03/2012 07:09, Tiwari, Dushyant wrote: Hector version 1.0-3 What is the reason for the second exception, BTW? Thanks, Dushyant *From:*aaron morton [mailto:aa...@thelastpickle.com] *Sent:* Wednesday, March 21, 2012 10:46 PM *To:* user@cassandra.apache.org *Subject:* Re: Exceptions related to thrift transport 1.org.apache.thrift.TException: Message length exceeded: 134218240 thrift_mas_message_length_in_mb https://github.com/apache/cassandra/blob/cassandra-1.0/conf/cassandra.yaml#L243 (134218240 is 128MB, which is a lot of data( 2.org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client? What version of hector are you using ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/03/2012, at 12:02 AM, Tiwari, Dushyant wrote: Hi Cassandra Users, A couple of questions on the server side exceptions that I see sometimes -- 1.org.apache.thrift.TException: Message length exceeded: 134218240 nHow to configure message length? 2.org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client? -How to rectify this exception? Some related client side Exceptions are -- 1.org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe 2.me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException Caused by: org.apache.thrift.transport.TTransportException: null at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) Using Hector as client. The queries are writes to CF with indexes. The frequency of these exceptions are very low. Thanks, Dushyant NOTICE:Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act.If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link:http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing. NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
Re: Newbie Question: Cassandra consuming 100% CPU on ubuntu server
perhaps entirely unrelated, but somebody was asking about lockups on EC2 yesterday and found: http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs On 18/02/2012 14:58, Aditya Gupta wrote: Am I installing it the right way ? While installing I didn't verify the signatures using public key. On Sat, Feb 18, 2012 at 8:21 PM, Aditya Gupta ady...@gmail.com mailto:ady...@gmail.com wrote: No data at all. just a fresh installation On Sat, Feb 18, 2012 at 6:57 PM, R. Verlangen ro...@us2.nl mailto:ro...@us2.nl wrote: You might want to check your Cassandra logs, they contain important information that might lead you to the actual cause of the problems. 2012/2/18 Aditya Gupta ady...@gmail.com mailto:ady...@gmail.com Thanks! But what about the 100% cpu consumption that is causing the server to hang? On Sat, Feb 18, 2012 at 6:19 PM, Watanabe Maki watanabe.m...@gmail.com mailto:watanabe.m...@gmail.com wrote: I haven't use the packaged kit, but Cassandra uses half of physical memory on your system by default. You need to edit cassandra-env.sh to decrease heap size. Update MAX_HEAP_SIZE and NEW_HEAP_SIZE and restart. From iPhone On 2012/02/18, at 20:40, Aditya Gupta ady...@gmail.com mailto:ady...@gmail.com wrote: I just installed Cassandra on my ubuntu server by adding the following to the sources list: deb http://www.apache.org/dist/cassandra/debian 10x main deb-src http://www.apache.org/dist/cassandra/debian 10x main Soon after install I started getting OOM errors then the server became unresponsive. I added more RAM to the server but found that cassandra was consuming 100% CPU 1GB RAM as soon the server was being started. Why is this happening how can get it to normal conditions ?
Re: read-repair?
sorry to be dense, but which is it? do i get the old version or the new version? or is it indeterminate? On 02/02/2012 01:42, Peter Schuller wrote: i have RF=3, my row/column lives on 3 nodes right? if (for some reason, eg a timed-out write at quorum) node 1 has a 'new' version of the row/column (eg clock = 10), but node 2 and 3 have 'old' versions (clock = 5), when i try to read my row/column at quorum, what do i get back? You either get back the new version or the old version, depending on whether node 1 was participated in the read. In your scenario, the prevoius write at quorum failed (since it only made it to one node), so this is not a violation of the contract. Once node 2 and/or 3 return their response, read repair (if it is active) will cause re-read and re-conciliation followed by a row mutation being send to the nodes to correct the column. do i get the clock 5 version because that is what the quorum agrees on, and No; a quorum of node is waited for, and the newest column wins. This accomplish the reads-see-write invariant.
read-repair?
how does read repair work in the following scenario? i have RF=3, my row/column lives on 3 nodes right? if (for some reason, eg a timed-out write at quorum) node 1 has a 'new' version of the row/column (eg clock = 10), but node 2 and 3 have 'old' versions (clock = 5), when i try to read my row/column at quorum, what do i get back? do i get the clock 5 version because that is what the quorum agrees on, and then read-repair kicks in and nodes 2 and 3 are updated to clock 10 so a subsequent read returns clock 10? or are nodes 2 and 3 updated to clock 10 first, and i get the clock 10 version on the initial read?
atomicity of a row write
hi all, having read: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic i would like some clarification: is a write to a single row key in a single column family atomic in the sense that i can do a batch mutate where i 1) write col 'A' to key 'B' 2) write 'col 'C' to key 'B' and either both column writes will succeed, or both will fail? i won't get the situation where eg col 'A' is written and col 'B' fails, my client receives an error, but col 'A' is actually persisted and becomes visible to other clients? does this hold if i write key 'B' across two different column families? (i assume not, but the faq doesn't seem to explicitly exclude this). PS i'm not worried about isolation per se, i'm interested in what the 'eventually consistent' state is.
Re: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name
i know it's a throwaway example, but i would probably structure your column the other way around in that case. ie steve.4, steve.5, steve.6, greg.4, greg.6, greg.9. and then do two slice queries, steve.4-steve.10, greg.4-greg.10. On 04/01/2012 15:41, Jeremiah Jordan wrote: You can't use a slice range. But you can query for the specific columns. 4.steve, 5.steve, 6.steve ... 4.greg, 5.greg, 6.greg. Just have to ask for all of the possible columns you want. On 01/03/2012 04:31 PM, Stephen Pope wrote: The bonus you're talking about here, how do I apply that? For example, my columns are in the form of number.id such as 4.steve, 4.greg, 5.steve, 5.george. Is there a way to query a slice of numbers with a list of ids? As in, I want all the columns with numbers between 4 and 10 which have ids steve or greg. Cheers, Steve -Original Message- From: Jeremiah Jordan [mailto:jeremiah.jor...@morningstar.com] Sent: Tuesday, January 03, 2012 3:12 PM To: user@cassandra.apache.org Cc: Asil Klin Subject: Re: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name The main issue with replacing super columns with composite columns right now is that if you don't know all your sub-column names you can't select multiple super columns worth of data in the same query without getting extra stuff. You have to use a slice to get all subcolumns of a given super column, and you can't have disjoint slices, so if you want two super columns full, you have to get all the other stuff that is in between them, or make two queries. If you know what all of the sub-column names are you can ask for all of the super/sub column pairs for all of the super columns you want and not get extra data. If you don't need to pull multiple super columns at a time with slices like that, then there isn't really an issue. A bonus of using composite keys like this, is that if there is a specific sub column you want from multiple super columns, you can pull all those out with a single multiget and you don't have to pull the rest of the columns... So there are pros and cons... -Jeremiah On 01/03/2012 01:58 PM, Asil Klin wrote: I have a super columns family which I always use to retrieve a list of supercolumns(with all subcolumns) by name. I am looking forward to replace all SuperColumns in my schema with the composite columns. How could I design schema so that I could do the equivalent of retrieving a list of supercolumns by name, in case of using composite columns. (As of now I thought of using the supercolumn name as the first component of the composite name and the subcolumn name as 2nd component of composite name.)
Re: Doubts related to composite type column names/values
afaik composite lets you do sorting in a way that would be difficult/impossible with string concatenation. eg String, Integer with the string ascending, and the integer descending. if i had composites available (which i don't b/c we are on 0.7), i would use them over string concatenation. string concatenation is a pain. On 20/12/2011 20:33, Maxim Potekhin wrote: Thank you Aaron! As long as I have plain strings, would you say that I would do almost as well with catenation? Of course I realize that mixed types are a very different case where the composite is very useful. Thanks Maxim On 12/20/2011 2:44 PM, aaron morton wrote: Component values are compared in a type aware fashion, an Integer is an Integer. Not a 10 character zero padded string. You can also slice on the components. Just like with string concat, but nicer. . e.g. If you app is storing comments for a thing, and the column names have the form comment_id, field or Integer, String you can slice for all properties of a comment or all properties for comments between two comment_id's Finally, the client library knows what's going on. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote: With regards to static, what are major benefits as it compares with string catenation (with some convenient separator inserted)? Thanks Maxim On 12/20/2011 1:39 PM, Richard Low wrote: On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lewertio...@gmail.com mailto:ertio...@gmail.com wrote: With regard to the composite columns stuff in Cassandra, I have the following doubts : 1. What is the storage overhead of the composite type column names/values, The values are the same. For each dimension, there is 3 bytes overhead. 2. what exactly is the difference between the DynamicComposite and Static Composite ? Static composite type has the types of each dimension specified in the column family definition, so all names within that column family have the same type. Dynamic composite type lets you specify the type for each column, so they can be different. There is extra storage overhead for this and care must be taken to ensure all column names remain comparable.
Re: memory estimate for each key in the key cache
to be blunt, this doesn't sound right to me, unless it's doing something rather more clever to manage the memory. i mocked up a simple class containing a byte[], ByteBuffer and long, and the shallow size alone is 32 bytes. deep size with a byte[16], 1-byte bytebuffer and long is 132. this is a on a 64-bit jvm on win x64, but is consistent(ish) with what i've seen in the past on linux jvms. the actual code has rather more objects than this (it's a map, it has a pair, decoratedKey) so would be quite a bit bigger per key. On 17/12/2011 03:42, Brandon Williams wrote: On Fri, Dec 16, 2011 at 9:31 PM, Dave Brosiusdbros...@mebigfatguy.com wrote: Wow, Java is a lot better than I thought if it can perform that kind of magic. I'm guessing the wiki information is just old and out of date. It's probably more like 60 + sizeof(key) With jamm and MAT it's fairly easy to test. The number is accurate last I checked. -Brandon
Re: best practices for simulating transactions in Cassandra
you could try writing with the clock of the initial replay entry? On 06/12/2011 20:26, John Laban wrote: Ah, neat. It is similar to what was proposed in (4) above with adding transactions to Cages, but instead of snapshotting the data to be rolled back (the before data), you snapshot the data to be replayed (the after data). And then later, if you find that the transaction didn't complete, you just keep replaying the transaction until it takes. The part I don't understand with this approach though: how do you ensure that someone else didn't change the data between your initial failed transaction and the later replaying of the transaction? You could get lost writes in that situation. Dominic (in the Cages blog post) explained a workaround with that for his rollback proposal: all subsequent readers or writers of that data would have to check for abandoned transactions and roll them back themselves before they could read the data. I don't think this is possible with the XACT_LOG replay approach in these slides though, based on how the data is indexed (cassandra node token + timeUUID). PS: How are you liking Cages? 2011/12/6 Jérémy SEVELLEC jsevel...@gmail.com mailto:jsevel...@gmail.com Hi John, I had exactly the same reflexions. I'm using zookeeper and cage to lock et isolate. but how to rollback? It's impossible so try replay! the idea is explained in this presentation http://www.slideshare.net/mattdennis/cassandra-data-modeling (starting from slide 24) - insert your whole data into one column - make the job - remove (or expire) your column. if there is a problem during making the job, you keep the possibility to replay and replay and replay (synchronously or in a batch). Regards Jérémy 2011/12/5 John Laban j...@pagerduty.com mailto:j...@pagerduty.com Hello, I'm building a system using Cassandra as a datastore and I have a few places where I am need of transactions. I'm using ZooKeeper to provide locking when I'm in need of some concurrency control or isolation, so that solves that half of the puzzle. What I need now is to sometimes be able to get atomicity across multiple writes by simulating the begin/rollback/commit abilities of a relational DB. In other words, there are places where I need to perform multiple updates/inserts, and if I fail partway through, I would ideally be able to rollback the partially-applied updates. Now, I *know* this isn't possible with Cassandra. What I'm looking for are all the best practices, or at least tips and tricks, so that I can get around this limitation in Cassandra and still maintain a consistent datastore. (I am using quorum reads/writes so that eventual consistency doesn't kick my ass here as well.) Below are some ideas I've been able to dig up. Please let me know if any of them don't make sense, or if there are better approaches: 1) Updates to a row in a column family are atomic. So try to model your data so that you would only ever need to update a single row in a single CF at once. Essentially, you model your data around transactions. This is tricky but can certainly be done in some situations. 2) If you are only dealing with multiple row *inserts* (and not updates), have one of the rows act as a 'commit' by essentially validating the presence of the other rows. For example, say you were performing an operation where you wanted to create an Account row and 5 User rows all at once (this is an unlikely example, but bear with me). You could insert 5 rows into the Users CF, and then the 1 row into the Accounts CF, which acts as the commit. If something went wrong before the Account could be created, any Users that had been created so far would be orphaned and unusable, as your business logic can ensure that they can't exist without an Account. You could also have an offline cleanup process that swept away orphans. 3) Try to model your updates as idempotent column inserts instead. How do you model updates as inserts? Instead of munging the value directly, you could insert a column containing the operation you want to perform (like +5). It would work kind of like the Consistent Vote Counting implementation: ( https://gist.github.com/41 ). How do you make the inserts idempotent? Make sure the column names correspond to a request ID or some other identifier that would be identical across re-drives of a given (perhaps originally failed) request. This could leave your datastore in a temporarily inconsistent state, but would eventually
Re: UUIDType
no particular reason, just wanting clarification b/c i saw a post (from ed anuff i think) about java.util.UUID being inconsistent with RFC4122, and this coming to light when looking at Cassandra's TimeUUIDType and LexicalUUIDType. so i wondered if cassandra's types were consistent with RFC4122, and it seems like they are not either. On 21/11/2011 18:34, Jonathan Ellis wrote: I think that's correct, but why would you want to do that? On Sun, Nov 20, 2011 at 2:55 AM, Guy Incognitodnd1...@gmail.com wrote: am i correct that neither of Cassandra's UUIDTypes (at least in 0.7) compare UUIDs according to RFC4122 (ie as two unsigned longs)?
UUIDType
am i correct that neither of Cassandra's UUIDTypes (at least in 0.7) compare UUIDs according to RFC4122 (ie as two unsigned longs)?
Re: Mass deletion -- slowing down
i think what he means is...do you know what day the 'oldest' day is? eg if you have a rolling window of say 2 weeks, structure your query so that your slice range only goes back 2 weeks, rather than to the beginning of time. this would avoid iterating over all the tombstones from prior to the 2 week window. this wouldn't work if you are deleting arbitrary days in the middle of your date range. On 14/11/2011 02:02, Maxim Potekhin wrote: Thanks Peter, I'm not sure I entirely follow. By the oldest data, do you mean the primary key corresponding to the limit of the time horizon? Unfortunately, unique IDs and the timstamps do not correlate in the sense that chronologically newer entries might have a smaller sequential ID. That's because the timestamp corresponds to the last update that's stochastic in the sense that the jobs can take from seconds to days to complete. As I said I'm not sure I understood you correctly. Also, I note that queries on different dates (i.e. not contaminated with lots of tombstones) work just fine, which is consistent with the picture that emerged so far. Theoretically -- would compaction or cleanup help? Thanks Maxim On 11/13/2011 8:39 PM, Peter Schuller wrote: I do limit the number of rows I'm asking for in Pycassa. Queries on primary keys still work fine, Is it feasable in your situation to keep track of the oldest possible data (for example, if there is a single sequential writer that rotates old entries away it could keep a record of what the oldest might be) so that you can bound your index lookup= that value (and avoid the tombstones)?
Re: indexes from CassandraSF
ok great, thanks ed, that's really helpful. just wanted to make sure i wasn't missing something fundamental. On 13/11/2011 23:57, Ed Anuff wrote: Yes, correct, it's not going to clean itself. Using your example with a little more detail: 1 ) A(T1) reads previous location (T0,L0) from index_entries for user U0 2 ) B(T2) reads previous location (T0,L0) from index_entries for user U0 3 ) A(T1) deletes previous location (T0,L0) from index_entries for user U0 4 ) B(T2) deletes previous location (T0,L0) from index_entries for user U0 5 ) A(T1) deletes previous location (L0,T0,U0) for user U0 from index 6 ) B(T2) deletes previous location (L0,T0,U0) for user U0 from index 7 ) A(T1) inserts new location (T1,L1) into index_entries for user U0 8 ) B(T2) inserts new location (T2,L2) into index_entries for user U0 9 ) index_entries for user U0 now contains (T1,L1),(T2,L2) 10) A(T1) inserts new location (L1,T1,U0) for user U0 into index 11) B(T2) inserts new location (L2,T2,U0) for user U0 into index 12) A(T1) sets new location (L1) on user U0 13) B(T2) sets new location (L2) on user U0 14) C(T3) queries for users where location equals L1, gets back user U0 where current location is actually L2 So, you want to either verify on read by making sure the queried field is correct before returning it in your result set to the rest of your app, or you want to use locking (ex. lock on (U0,location) during updates). The key thing here is that although the index is not in the desired state at (14), the information is in the system to get to that state (the previous values in index_entries). This lets the cleanup happen on the next update of location for user U0: 15) D(T4) reads previous locations (T1,L1),(T2,L2) from index entries for user U0 16) D(T4) deletes previous locations (T1,L1),(T2,L2) from index entries for user U0 17) D(T4) deletes previous locations (L1,T1,U0),(L2,T2,U0) for user U0 from index 18) D(T4) inserts new location (T4,L3) into index entries for user U0 19) D(T4) inserts new location (L3,T4,U0) for user U0 into index 20) D(T4) sets new location (L3) on user U0 BTW, just to reiterate since this sometimes comes up, the timestamps being stored in these tuples are not longs, they're time UUIDs, so T1 and T2 are never equal. Ed On Sun, Nov 13, 2011 at 6:52 AM, Guy Incognitodnd1...@gmail.com wrote: [1] i'm not particularly worried about transient conditions so that's ok. i think there's still the possibility of a non-transient false positive...if 2 writes were to happen at exactly the same time (highly unlikely), eg 1) A reads previous location (L1) from index entries 2) B reads previous location (L1) from index entries 3) A deletes previous location (L1) from index entries 4) B deletes previous location (L1) from index entries 5) A deletes previous location (L1) from index 6) B deletes previous location (L1) from index 7) A enters new location (L2) into index entries 8) B enters new location (L3) into index entries 9 ) A enters new location (L2) into index 10) B enters new location (L3) into index 11) A sets new location (L2) on users 12) B sets new location (L2) on users after this, don't i end up with an incorrect L2 location in index entries and in the index, that won't be resolved until the next write of location for that user? [2] ah i see...so the client would continuously retry until the update works. that's fine provided the client doesn't bomb out with some other error, if that were to happen then i have potentially deleted the index entry columns without deleting the corresponding index columns. i can handle both of the above for my use case, i just want to clarify whether they are possible (however unlikely) scenarios. On 13/11/2011 02:41, Ed Anuff wrote: 1) The index updates should be eventually consistent. This does mean that you can get a transient false-positive on your search results. If this doesn't work for you, then you either need to use ZK or some other locking solution or do read repair by making sure that the row you retrieve contains the value you're searching for before passing it on to the rest of your applicaiton. 2) You should be able to reapply the batch updates til they succeed. The update is idempotent. One thing that's important that the slides don't make clear is that this requires using time-based uuids as your timestamp components. Take a look at the sample code. Hope this helps, Ed On Sat, Nov 12, 2011 at 3:59 PM, Guy Incognitodnd1...@gmail.comwrote: help? On 10/11/2011 19:34, Guy Incognito wrote: hi, i've been looking at the model below from Ed Anuff's presentation at Cassandra CF (http://www.slideshare.net/edanuff/indexing-in-cassandra). Couple of questions: 1) Isn't there still the chance that two concurrent updates may end up with the index containing two entries for the given user, only one of which would be match the actual value in the Users cf? 2) What happens if your batch fails partway through the update? If i understand
Re: indexes from CassandraSF
[1] i'm not particularly worried about transient conditions so that's ok. i think there's still the possibility of a non-transient false positive...if 2 writes were to happen at exactly the same time (highly unlikely), eg 1) A reads previous location (L1) from index entries 2) B reads previous location (L1) from index entries 3) A deletes previous location (L1) from index entries 4) B deletes previous location (L1) from index entries 5) A deletes previous location (L1) from index 6) B deletes previous location (L1) from index 7) A enters new location (L2) into index entries 8) B enters new location (L3) into index entries 9 ) A enters new location (L2) into index 10) B enters new location (L3) into index 11) A sets new location (L2) on users 12) B sets new location (L2) on users after this, don't i end up with an incorrect L2 location in index entries and in the index, that won't be resolved until the next write of location for that user? [2] ah i see...so the client would continuously retry until the update works. that's fine provided the client doesn't bomb out with some other error, if that were to happen then i have potentially deleted the index entry columns without deleting the corresponding index columns. i can handle both of the above for my use case, i just want to clarify whether they are possible (however unlikely) scenarios. On 13/11/2011 02:41, Ed Anuff wrote: 1) The index updates should be eventually consistent. This does mean that you can get a transient false-positive on your search results. If this doesn't work for you, then you either need to use ZK or some other locking solution or do read repair by making sure that the row you retrieve contains the value you're searching for before passing it on to the rest of your applicaiton. 2) You should be able to reapply the batch updates til they succeed. The update is idempotent. One thing that's important that the slides don't make clear is that this requires using time-based uuids as your timestamp components. Take a look at the sample code. Hope this helps, Ed On Sat, Nov 12, 2011 at 3:59 PM, Guy Incognitodnd1...@gmail.com wrote: help? On 10/11/2011 19:34, Guy Incognito wrote: hi, i've been looking at the model below from Ed Anuff's presentation at Cassandra CF (http://www.slideshare.net/edanuff/indexing-in-cassandra). Couple of questions: 1) Isn't there still the chance that two concurrent updates may end up with the index containing two entries for the given user, only one of which would be match the actual value in the Users cf? 2) What happens if your batch fails partway through the update? If i understand correctly there are no guarantees about ordering when a batch is executed, so isn't it possible that eg the previous value entries in Users_Index_Entries may have been deleted, and then the batch fails before the entries in Indexes are deleted, ie the mechanism has 'lost' those values? I assume this can be addressed by not deleting the old entries until the batch has succeeded (ie put the previous entry deletion into a separate, subsequent batch). this at least lets you retry at a later time. perhaps i'm missing something? SELECT {location}..{location, *} FROM Users_Index_Entries WHERE KEY =user_key; BEGIN BATCH DELETE {location, ts1}, {location, ts2}, ... FROM Users_Index_Entries WHERE KEY =user_key; DELETE {value1,user_key, ts1}, {value2,user_key, ts2}, ... FROM Indexes WHERE KEY = Users_By_Location; UPDATE Users_Index_Entries SET {location, ts3} =value3 WHERE KEY=user_key; UPDATE Indexes SET {value3,user_key, ts3) = null WHERE KEY = Users_By_Location; UPDATE Users SET location =value3 WHERE KEY =user_key; APPLY BATCH
Re: indexes from CassandraSF
help? On 10/11/2011 19:34, Guy Incognito wrote: hi, i've been looking at the model below from Ed Anuff's presentation at Cassandra CF (http://www.slideshare.net/edanuff/indexing-in-cassandra). Couple of questions: 1) Isn't there still the chance that two concurrent updates may end up with the index containing two entries for the given user, only one of which would be match the actual value in the Users cf? 2) What happens if your batch fails partway through the update? If i understand correctly there are no guarantees about ordering when a batch is executed, so isn't it possible that eg the previous value entries in Users_Index_Entries may have been deleted, and then the batch fails before the entries in Indexes are deleted, ie the mechanism has 'lost' those values? I assume this can be addressed by not deleting the old entries until the batch has succeeded (ie put the previous entry deletion into a separate, subsequent batch). this at least lets you retry at a later time. perhaps i'm missing something? SELECT {location}..{location, *} FROM Users_Index_Entries WHERE KEY = user_key; BEGIN BATCH DELETE {location, ts1}, {location, ts2}, ... FROM Users_Index_Entries WHERE KEY = user_key; DELETE {value1, user_key, ts1}, {value2, user_key, ts2}, ... FROM Indexes WHERE KEY = Users_By_Location; UPDATE Users_Index_Entries SET {location, ts3} = value3 WHERE KEY=user_key; UPDATE Indexes SET {value3, user_key, ts3) = null WHERE KEY = Users_By_Location; UPDATE Users SET location = value3 WHERE KEY = user_key; APPLY BATCH
indexes from CassandraSF
hi, i've been looking at the model below from Ed Anuff's presentation at Cassandra CF (http://www.slideshare.net/edanuff/indexing-in-cassandra). Couple of questions: 1) Isn't there still the chance that two concurrent updates may end up with the index containing two entries for the given user, only one of which would be match the actual value in the Users cf? 2) What happens if your batch fails partway through the update? If i understand correctly there are no guarantees about ordering when a batch is executed, so isn't it possible that eg the previous value entries in Users_Index_Entries may have been deleted, and then the batch fails before the entries in Indexes are deleted, ie the mechanism has 'lost' those values? I assume this can be addressed by not deleting the old entries until the batch has succeeded (ie put the previous entry deletion into a separate, subsequent batch). this at least lets you retry at a later time. perhaps i'm missing something? SELECT {location}..{location, *} FROM Users_Index_Entries WHERE KEY = user_key; BEGIN BATCH DELETE {location, ts1}, {location, ts2}, ... FROM Users_Index_Entries WHERE KEY = user_key; DELETE {value1, user_key, ts1}, {value2, user_key, ts2}, ... FROM Indexes WHERE KEY = Users_By_Location; UPDATE Users_Index_Entries SET {location, ts3} = value3 WHERE KEY=user_key; UPDATE Indexes SET {value3, user_key, ts3) = null WHERE KEY = Users_By_Location; UPDATE Users SET location = value3 WHERE KEY = user_key; APPLY BATCH
Re: security
ok, thx for the input! On 09/11/2011 15:19, Mohit Anchlia wrote: We lockdown ssh to root from any network. We also provide individual logins including sysadmin and they go through LDAP authentication. Anyone who does sudo su as root gets logged and alerted via trapsend. We use firewalls and also have a separate vlan for datastore servers. We then open only specific ports from our application servers to datastore servers. You should also look at Cassandra authentication as additional means of securing your data. On Wed, Nov 9, 2011 at 6:39 AM, Sasha Dolgysdo...@gmail.com wrote: Firewall with appropriate rules. On Tue, Nov 8, 2011 at 6:30 PM, Guy Incognitodnd1...@gmail.com wrote: hi, is there a standard approach to securing cassandra eg within a corporate network? at the moment in our dev environment, anybody with network connectivity to the cluster can connect to it and mess with it. this would not be acceptable in prod. do people generally write custom authenticators etc, or just put the cluster behind a firewall with the appropriate rules to limit access?
security
hi, is there a standard approach to securing cassandra eg within a corporate network? at the moment in our dev environment, anybody with network connectivity to the cluster can connect to it and mess with it. this would not be acceptable in prod. do people generally write custom authenticators etc, or just put the cluster behind a firewall with the appropriate rules to limit access?
CompositeType for use with 0.7
Is this a lib I can just drop into a 0.7 instance of cassandra and use? I'm not sure what to make of the README about not using it with versions earlier than 0.8.0-rc1. https://github.com/riptano/hector-composite the goal is to start using CompositeTypes in 0.7 (which I can't upgrade to 0.8 at the moment), with a seamless transition to 0.8 when I do upgrade. will using this with hector 0.8.x allow this?
super sub slice query?
is there such a thing? a query that runs against a SC family and returns a subset of subcolumns from a set of super-columns? is there a way to have eg a slice query (or super slice query) only return the column names, rather than the value as well?