Re: [jira] [Commented] (CASSANDRA-2474) CQL support for compound columns
On Sun, 2011-06-12 at 18:53 +, Mick Semb Wever wrote: This issue could stand to be summarized (I still wish we used a mailing list for monsters like this). This i actually really appreciate about the cassandra community. To formulate this: As a newbie here it has allowed me to understand individual issues, their history and development discussion, without having to go all in and subscribe to the development list. The latter can be quite a daunting task in some communities to begin and keep up with if it is where all development discussion is happening. that's 2cents anyway from someone still finding their way into the code. ~mck -- We are born naked, wet and hungry. Then things get worse. | http://semb.wever.org | http://sesat.no | http://tech.finn.no | Java XSS Filter signature.asc Description: This is a digitally signed message part
Re: SSL Streaming
Performance-wise, I think it would be better to just let the client encrypt sensitive data before storing it, versus encrypting all traffic all the time. If individual values are encrypted, then they don't have to be encrypted/decrypted during transit between nodes during the initial updates as well as during the commissioning of a new node or other times. A drawback, however, is now you have to manage one or more keys for the lifetime of the data. It will also complicate your data view interfaces. However, if Cassandra had data encryption built-in somehow, that would solve this problem... just thinking out loud. Can anyone think of other pro/cons of both strategies? On 3/22/2011 2:21 AM, Sasha Dolgy wrote: Hi, Is there documentation available anywhere that describes how one can use org.apache.cassandra.security.streaming.* ? After the EC2 posts yesterday, one question I was asked was about the security of data being shifted between nodes. Is it done in clear text, or encrypted..? I haven't seen anything to suggest that it's encrypted, but see in the source that security.streaming does leverage SSL ... Thanks in advance for some pointers to documentation. Also, for anyone who is using SSL .. how much of a performance impact have you noticed? Is it minimal or significant?
Re: Troubleshooting IO performance ?
To reduce the number of SSTables increase the memtable_threshold for the CF. The IO numbers may be because of compaction kicking in. The CompactionManager provides information via JMX on it's progress, or you can check the logs. You could increase the min_compaction_threshold for the CF or disable compaction if you want to during the bulk load. In your case it's probably a bad idea as in your case every write requires a read and compaction will help the read performance. These numbers show that about 37GB of data was written to disk, but compaction has shrunk this down to about 7GB. If you are not doing deletes that you are doing a log of overwrites Space used (live): 8283980539 Space used (total): 39972827945 I'm getting a bit confused about what the problem is. Is it increasing latency for read requests or the unexplained IO ? - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12 Jun 2011, at 08:55, Philippe wrote: More info below I just loaded 4.8GB of similar data in another keyspace and ran the same process as in my previous tests but on that data. I started with three threads hitting cassandra. No I/O, hardly any CPU (15% on a 4 core server) After an hour or so, I raised it to 6 threads in parallel. Then to 9 threads in parallel. I never got any IO, in fact iostat showed me there wasn't any disk reads. I hardly saw the CPU elevate except at the end. The only difference between the two datasets is that the size of the other one is 8.4Gb. So the second one doesn't fit completely in memory.So my woes are related to how well cassandra is fetching the data in the SSTAbles right ? So what are my options ? My rows are very small at the moment (like well 4 kBytes). Should I reduce the read buffer ? Should I reduce the number of SST tables ? I'm reloading the data from scratch using the incremental update I will be using in production. I'm hitting the cluster with 6 threads. Yes I should have decreased it but I was too optimistic and I dont want to stop it now. The data I'm loading is used for computing running averages (sum total) and so every update requires a read. As soon as the data no longer fits in memory, I'm seeing huge amounts of io (almost 380MB/s reading) that I'd like to understand. My hypothesis is that because I am updating the same key so many times (dozens if not hundreds of times in some cases), the row is split across the SSTables and every read needs to go through all the SST tables. Unfortunately, at some point, cassandra compacted the keys from 5 tables to 3 and the throughput did not increase after that so I'm not even sure this makes sense. 1) Is there another explanation ? Can I do something about this ? 2) why is the readlatency displayed twice in cfstats and why does it differ ? Thanks vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 7 47724 94604 22556 737325200 391646 2152 10498 10297 6 6 26 62 0 6 47724 93736 22556 737240000 396774 0 10881 11177 5 6 29 60 2 5 47724 92496 22556 737482400 37240615 11212 11149 8 7 25 59 0 5 47724 89520 22568 737848400 399730 526 10964 11975 6 7 24 63 0 7 47724 87908 22568 737944400 396216 0 10405 10880 5 7 22 66 iostat -dmx 2 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 168.50 0.00 3152.000.50 185.49 0.25 120.66 54.86 17.39 17.394.00 0.31 96.80 sda 178.50 0.50 3090.500.50 184.47 0.19 122.35 61.16 19.71 19.714.00 0.31 97.20 md1 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 md5 0.00 0.00 6612.501.50 372.82 0.44 115.58 0.000.000.000.00 0.00 0.00 dm-0 0.00 0.00 6612.500.00 372.82 0.00 115.47 123.15 18.58 18.580.00 0.15 97.20 cfstats Read Count: 88215069 Read Latency: 1.821254258759351 ms. Write Count: 88215059 Write Latency: 0.013311765885686253 ms. Pending Tasks: 0 Column Family: PUBLIC_MONTHLY SSTable count: 3 Space used (live): 8283980539 Space used (total): 39972827945 Memtable Columns Count: 449201 Memtable Data Size: 21788245 Memtable Switch Count: 72 Read Count: 88215069 Read Latency: 7.433 ms. Write Count: 88215069 Write Latency: 0.016 ms. Pending Tasks: 0
problem in using get_range() function
Hi, I am trying to retrieve the row_keys in a column_family witht he following code. $rows = $column_family-get_range($key_start='R17889000', $key_finish='R17893999', $row_count=1000); $count = 0; foreach($rows as $rows) { echo $count.'br/'; $count += 1; print_r($rows); echo 'br/'; } there are 5000 records in the datatbase. But only 526 are getting retrieved. am i missing out something here??? can anyone help
Re: problem in using get_range() function
Are you using the order preserving partitioner or the random partitioner for this CF? In order to get the results you expect, you'll need to use the OPP. More info: http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ On Mon, Jun 13, 2011 at 8:47 AM, Amrita Jayakumar amritajayakuma...@gmail.com wrote: Hi, I am trying to retrieve the row_keys in a column_family witht he following code. $rows = $column_family-get_range($key_start='R17889000', $key_finish='R17893999', $row_count=1000); $count = 0; foreach($rows as $rows) { echo $count.'br/'; $count += 1; print_r($rows); echo 'br/'; } there are 5000 records in the datatbase. But only 526 are getting retrieved. am i missing out something here??? can anyone help
Re: insufficient space to compact even the two smallest files, aborting
Hi All. I found a way to be able to compact. I have to call scrub on the column family. Then scrub gets stuck forever. I restart the node, and voila! I can compact again without any message about not having enough space. This looks like a bug to me. What info would be needed to fill a report? This is on 0.8 updating from 0.7.5
Docs: Why do deleted keys show up during range scans?
http://wiki.apache.org/cassandra/FAQ#range_ghosts So to special case leaving out result entries for deletions, we would have to check the entire rest of the row to make sure there is no undeleted data anywhere else either (in which case leaving the key out would be an error). The above doesn't read well and I don't get it. Can anyone rephrase it or elaborate? Thanks!
Re: odd logs after repair
You can double check with node tool e.g. $ ./bin/nodetool -h localhost version ReleaseVersion: 0.8.0-SNAPSHOT This error is about the internode wire protocol one node thinks another is using. Not sure how it could get confused, does it go away if you restart the node that logged the error ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 06:19, Sasha Dolgy wrote: Hi Everyone, Last week, upgraded all 4 nodes to apache-cassandra-0.8.0 .. no issues. Trolling the logs today, I find messages like this on all four nodes: INFO [manual-repair-0b61c9e2-3593-4633-a80f-b6ca52cfe948] 2011-06-13 02:16:45,978 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. Maybe it would be nice to have the version of all nodes print in nodetool ring ? I don't think I'm crazy though ... have manually checked all are on 0.8.0 -- Sasha Dolgy sasha.do...@gmail.com
Re: Docs: Why do deleted keys show up during range scans?
It returns the set of columns for the set of rows... how do you determine the difference between a completely empty row and a row that just does not have any of the matching columns? Well the answer is that Cassandra does not go and check whether there are any columns outside of the range you are querying, so it will just return the empty (for the column range you specified) row your code needs to be robust enough to be able to understand that an empty list of columns does not imply that there are no columns at all for that row key (i.e. it is deleted and waiting tombstone expiry gc) or there is a column outside the range you queried. On 13 June 2011 13:59, AJ a...@dude.podzone.net wrote: http://wiki.apache.org/cassandra/FAQ#range_ghosts So to special case leaving out result entries for deletions, we would have to check the entire rest of the row to make sure there is no undeleted data anywhere else either (in which case leaving the key out would be an error). The above doesn't read well and I don't get it. Can anyone rephrase it or elaborate? Thanks!
Re: problem in using get_range() function
can u tell me how to retrieve all the row keys in a column family??? On Mon, Jun 13, 2011 at 6:25 PM, Dan Kuebrich dan.kuebr...@gmail.comwrote: Are you using the order preserving partitioner or the random partitioner for this CF? In order to get the results you expect, you'll need to use the OPP. More info: http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ On Mon, Jun 13, 2011 at 8:47 AM, Amrita Jayakumar amritajayakuma...@gmail.com wrote: Hi, I am trying to retrieve the row_keys in a column_family witht he following code. $rows = $column_family-get_range($key_start='R17889000', $key_finish='R17893999', $row_count=1000); $count = 0; foreach($rows as $rows) { echo $count.'br/'; $count += 1; print_r($rows); echo 'br/'; } there are 5000 records in the datatbase. But only 526 are getting retrieved. am i missing out something here??? can anyone help
Re: odd logs after repair
I recall there being a discussion about a default port changing from 0.7.x to 0.8.x ...this was JMX, correct? Or were there others. On Mon, Jun 13, 2011 at 3:34 PM, Sasha Dolgy sdo...@gmail.com wrote: Hi Aaron, The error is being reported on all 4 nodes. I have confirmed (for my own sanity) that each node is running: ReleaseVersion: 0.8.0 I can reproduce the error on any node by trailing cassandra/logs/system.log and running nodetool repair INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime] java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$KeyIterator.next(HashMap.java:828) at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173) at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776) When I run nodetool ring, the ring looks balanced and nothing out of sorts. I also have this set up with RF=3 on 4 nodes ... but repair was working fine prior to the 0.8.0 upgrade. Are there any special commands I need to run? I've tried scrub, cleanup, flush too ... still, repair gives the same issues. -- I have stopped one of the nodes and started it. Issue still persists. I stop another node that is reported in the logs (like .18 above) and start it ... run repair again ... issue is persisted to the log file still. -sd On Mon, Jun 13, 2011 at 3:02 PM, aaron morton aa...@thelastpickle.com wrote: You can double check with node tool e.g. $ ./bin/nodetool -h localhost version ReleaseVersion: 0.8.0-SNAPSHOT This error is about the internode wire protocol one node thinks another is using. Not sure how it could get confused, does it go away if you restart the node that logged the error ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 06:19, Sasha Dolgy wrote: Hi Everyone, Last week, upgraded all 4 nodes to apache-cassandra-0.8.0 .. no issues. Trolling the logs today, I find messages like this on all four nodes: INFO [manual-repair-0b61c9e2-3593-4633-a80f-b6ca52cfe948] 2011-06-13 02:16:45,978 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. Maybe it would be nice to have the version of all nodes print in nodetool ring ? I don't think I'm crazy though ... have manually checked all are on 0.8.0 -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
Re: insufficient space to compact even the two smallest files, aborting
That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report. 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com Hi All. I found a way to be able to compact. I have to call scrub on the column family. Then scrub gets stuck forever. I restart the node, and voila! I can compact again without any message about not having enough space. This looks like a bug to me. What info would be needed to fill a report? This is on 0.8 updating from 0.7.5
Re: odd logs after repair
On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote: I recall there being a discussion about a default port changing from 0.7.x to 0.8.x ...this was JMX, correct? Or were there others. Yes, the default JMX port changed from 8080 to 7199. I don't think there were any others. -- Tyler Hobbs Software Engineer, DataStax http://datastax.com/ Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra Python client library
Re: insufficient space to compact even the two smallest files, aborting
I was already way over the minimum. There were 12 sstables. Also, is there any reason why scrub got stuck? I did not see anything in the logs. Via jmx I saw that the scrubbed bytes were equal to one of the sstables size, and it stuck there for a couple hours . El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió: That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report.
Re: Docs: Why do deleted keys show up during range scans?
On 6/13/2011 7:03 AM, Stephen Connolly wrote: It returns the set of columns for the set of rows... how do you determine the difference between a completely empty row and a row that just does not have any of the matching columns? I would expect it to not return anything (no row at all) for both of those cases. Are you saying that an empty row is returned for rows that do not match the predicate? So, if I perform a range slice where the range is every row of the CF and the slice equates to no matches and I have 1 million rows in the CF, then I will get a result set of 1 million empty rows?
Re: Docs: Why do deleted keys show up during range scans?
On 13 June 2011 16:14, AJ a...@dude.podzone.net wrote: On 6/13/2011 7:03 AM, Stephen Connolly wrote: It returns the set of columns for the set of rows... how do you determine the difference between a completely empty row and a row that just does not have any of the matching columns? I would expect it to not return anything (no row at all) for both of those cases. Are you saying that an empty row is returned for rows that do not match the predicate? So, if I perform a range slice where the range is every row of the CF and the slice equates to no matches and I have 1 million rows in the CF, then I will get a result set of 1 million empty rows? No I am saying that for each row that matches, you will get a result, even if the columns that you request happen to be empty for that specific row. Likewise, any deleted rows in the same row range will show as empty because C* would have a tone of work to figure out the difference between being deleted and being empty.
Re: odd logs after repair
Hi, All I am newbie to cassandra. I have a simple question but don't find any clear answer by searching google: What's the meaning of count column in Cassandra? Thanks.
Counter Column in Cassandra
Hi, All I am newbie to cassandra. I have a simple question but don't find any clear answer by searching google: What's the meaning of counter column in Cassandra? Best
Re: insufficient space to compact even the two smallest files, aborting
As Terje already said in this thread, the threshold is per bucket (group of similarly sized sstables) not per CF. 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com: I was already way over the minimum. There were 12 sstables. Also, is there any reason why scrub got stuck? I did not see anything in the logs. Via jmx I saw that the scrubbed bytes were equal to one of the sstables size, and it stuck there for a couple hours . El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió: That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: count column in Cassandra
probably helpful if you change the subject when posting about a different topic. Is your question about counters or the count function? Counters are cool. Count allows you to determine how many columns exist in a row. -sd On Mon, Jun 13, 2011 at 5:27 PM, Sijie YANG iyan...@gmail.com wrote: Hi, All I am newbie to cassandra. I have a simple question but don't find any clear answer by searching google: What's the meaning of count column in Cassandra? Thanks.
one way to make counter delete work better
as https://issues.apache.org/jira/browse/CASSANDRA-2101 indicates, the problem with counter delete is in scenarios like the following: add 1, clock 100 delete , clock 200 add 2 , clock 300 if the 1st and 3rd operations are merged in SStable compaction, then we have delete clock 200 add 3, clock 300 which shows wrong result. I think a relatively simple extension can be used to complete fix this issue: similar to ZooKeeper, we can prefix an Epoch number to the clock, so that 1) a delete operation increases future epoch number by 1 2) merging of delta adds can be between only deltas of the same epoch, deltas of older epoch are simply ignored during merging. merged result keeps the epoch number of the newest seen. other operations remain the same as current. note that the above 2 rules are only concerned with merging within the deltas on the leader, and not related to the replicated count, which is a simple final state, and observes the rule of larger clock trumps. naturally the ordering rule is: epoch1.clock1 epoch2.clock2 iff epoch1 epoch2 || epoch1 == epoch2 clock1 clock2 intuitively epoch can be seen as the serial number on a new incarnation of a counter. code change should be mostly localized to CounterColumn.reconcile(), although, if an update does not find existing entry in memtable, we need to go to sstable to fetch any possible epoch number, so compared to current write path, in the no replicate-on-write case, we need to add a read to sstable. but in the replicate-on-write case, we already read that, so it's no extra time cost. no replicate-on-write is not a very useful setup in reality anyway. does this sound a feasible way? if this works, expiring counter should also naturally work. Thanks Yang
Re: one way to make counter delete work better
I don't think that's bulletproof either. For instance, what if the two adds go to replica 1 but the delete to replica 2? Bottom line (and this was discussed on the original delete-for-counters ticket, https://issues.apache.org/jira/browse/CASSANDRA-2101), counter deletes are not fully commutative which makes them fragile. On Mon, Jun 13, 2011 at 10:54 AM, Yang tedd...@gmail.com wrote: as https://issues.apache.org/jira/browse/CASSANDRA-2101 indicates, the problem with counter delete is in scenarios like the following: add 1, clock 100 delete , clock 200 add 2 , clock 300 if the 1st and 3rd operations are merged in SStable compaction, then we have delete clock 200 add 3, clock 300 which shows wrong result. I think a relatively simple extension can be used to complete fix this issue: similar to ZooKeeper, we can prefix an Epoch number to the clock, so that 1) a delete operation increases future epoch number by 1 2) merging of delta adds can be between only deltas of the same epoch, deltas of older epoch are simply ignored during merging. merged result keeps the epoch number of the newest seen. other operations remain the same as current. note that the above 2 rules are only concerned with merging within the deltas on the leader, and not related to the replicated count, which is a simple final state, and observes the rule of larger clock trumps. naturally the ordering rule is: epoch1.clock1 epoch2.clock2 iff epoch1 epoch2 || epoch1 == epoch2 clock1 clock2 intuitively epoch can be seen as the serial number on a new incarnation of a counter. code change should be mostly localized to CounterColumn.reconcile(), although, if an update does not find existing entry in memtable, we need to go to sstable to fetch any possible epoch number, so compared to current write path, in the no replicate-on-write case, we need to add a read to sstable. but in the replicate-on-write case, we already read that, so it's no extra time cost. no replicate-on-write is not a very useful setup in reality anyway. does this sound a feasible way? if this works, expiring counter should also naturally work. Thanks Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Docs: Why do deleted keys show up during range scans?
On 6/13/2011 9:25 AM, Stephen Connolly wrote: On 13 June 2011 16:14, AJa...@dude.podzone.net wrote: On 6/13/2011 7:03 AM, Stephen Connolly wrote: It returns the set of columns for the set of rows... how do you determine the difference between a completely empty row and a row that just does not have any of the matching columns? I would expect it to not return anything (no row at all) for both of those cases. Are you saying that an empty row is returned for rows that do not match the predicate? So, if I perform a range slice where the range is every row of the CF and the slice equates to no matches and I have 1 million rows in the CF, then I will get a result set of 1 million empty rows? No I am saying that for each row that matches, you will get a result, even if the columns that you request happen to be empty for that specific row. Ok, this I understand I guess. If I query a range of rows and want only a certain column and a row does not have that column, I would like to know that. Likewise, any deleted rows in the same row range will show as empty because C* would have a tone of work to figure out the difference between being deleted and being empty. But, if a row does indeed have the column, but that row was deleted, why would I get an empty row? You say because of a ton of work. So, the tombstone for the row is not stored close-by for quick access... or something like that? At any rate, how do I figure out if the empty row is empty because it was deleted? Sorry if I'm being dense.
Re: Docs: Why do deleted keys show up during range scans?
On 13 June 2011 17:09, AJ a...@dude.podzone.net wrote: On 6/13/2011 9:25 AM, Stephen Connolly wrote: On 13 June 2011 16:14, AJa...@dude.podzone.net wrote: On 6/13/2011 7:03 AM, Stephen Connolly wrote: It returns the set of columns for the set of rows... how do you determine the difference between a completely empty row and a row that just does not have any of the matching columns? I would expect it to not return anything (no row at all) for both of those cases. Are you saying that an empty row is returned for rows that do not match the predicate? So, if I perform a range slice where the range is every row of the CF and the slice equates to no matches and I have 1 million rows in the CF, then I will get a result set of 1 million empty rows? No I am saying that for each row that matches, you will get a result, even if the columns that you request happen to be empty for that specific row. Ok, this I understand I guess. If I query a range of rows and want only a certain column and a row does not have that column, I would like to know that. deleted rows don't have the column either which is the point. Likewise, any deleted rows in the same row range will show as empty because C* would have a tone of work to figure out the difference between being deleted and being empty. But, if a row does indeed have the column, but that row was deleted, why would I get an empty row? You say because of a ton of work. So, the tombstone for the row is not stored close-by for quick access... or something like that? At any rate, how do I figure out if the empty row is empty because it was deleted? Sorry if I'm being dense. store the query inverted. that way empty - deleted the tombstones are stored for each column that had data IIRC... but at this point my grok of C* is lacking
Re: Docs: Why do deleted keys show up during range scans?
On 6/13/2011 10:14 AM, Stephen Connolly wrote: store the query inverted. that way empty - deleted I don't know what that means... get the other columns? Can you elaborate? Is there docs for this or is this a hack/workaround? the tombstones are stored for each column that had data IIRC... but at this point my grok of C* is lacking I suspected this, but wasn't sure. It sounds like when a row is deleted, a tombstone is not attached to the row, but to each column??? So, if all columns are deleted then the row is considered deleted? Hmmm, that doesn't sound right, but that doesn't mean it isn't ! ;o)
Re: one way to make counter delete work better
I think this approach also works for your scenario: I thought that the issue is only concerned with merging within the same leader; but you pointed out that a similar merging happens between leaders too, now I see that the same rules on epoch number also applies to inter-leader data merging, specifically in your case: everyone starts with epoch of 0, ( they should be same, if not, it also works, we just consider them to be representing diffferent time snapshots of the same counter state) node A add 1clock: 0.100 (epoch = 0, clock number = 100) node A deleteclock: 0.200 node B add 2 clock: 0.300 node Agets B's state: add 2 clock 0.300, but rejects it because A has already produced a delete, with epoch of 0, so A considers epoch 0 already ended, it won't accept any replicated state with epoch 1. node Bgets A's delete 0.200, it zeros its own count of 2, and updates its future expected epoch to 1. at this time, the state of system is: node A expected epoch =1 [A:nil] [B:nil] same for node B let's say we have following further writes: node B add 3 clock 1.400 node A adds 4 clock 1.500 node B receives A's add 4, node B updates its copy of A node A receives B's add 3,updates its copy of B then state is: node A , expected epoch == 1[A:4 clock=400] [B:3 clock=500] node B same generally I think it should be complete if we add the following rule for inter-leader replication: each leader keeps a var in memory (and also persist to sstable when flushing) expected_epoch , initially set to 0 node P does: on receiving updates from node Q if Q.expected_epoch P.expected_epoch /** an epoch bump inherently means a previous delete, which we probably missed , so we need to apply the delete a delete is global to all leaders, so apply it on all my replicas **/ for all leaders in my vector count = nil P.expected_epoch = Q.expected_epoch if Q.expected_epoch == P.expected_epoch update P's copy of Q according to standard rules /** if Q.expected_epoch P.expected_epoch , that means Q is less up to date than us, just ignore replicate_on_write(to Q): if P.operation == delete P.expected_epoch ++ set all my copies of all leaders to nil send to Q ( P.total , P.expected_epoch) overall I don't think delete being not commutative is a fundamental blocker : regular columns are also not commutative, yet we achieve stable result no matter what order they are applied, because of the ordering rule used in reconciliation; here we just need to find a similar ordering rule. the epoch thing could be a step on this direction. Thanks Yang On Mon, Jun 13, 2011 at 9:04 AM, Jonathan Ellis jbel...@gmail.com wrote: I don't think that's bulletproof either. For instance, what if the two adds go to replica 1 but the delete to replica 2? Bottom line (and this was discussed on the original delete-for-counters ticket, https://issues.apache.org/jira/browse/CASSANDRA-2101), counter deletes are not fully commutative which makes them fragile. On Mon, Jun 13, 2011 at 10:54 AM, Yang tedd...@gmail.com wrote: as https://issues.apache.org/jira/browse/CASSANDRA-2101 indicates, the problem with counter delete is in scenarios like the following: add 1, clock 100 delete , clock 200 add 2 , clock 300 if the 1st and 3rd operations are merged in SStable compaction, then we have delete clock 200 add 3, clock 300 which shows wrong result. I think a relatively simple extension can be used to complete fix this issue: similar to ZooKeeper, we can prefix an Epoch number to the clock, so that 1) a delete operation increases future epoch number by 1 2) merging of delta adds can be between only deltas of the same epoch, deltas of older epoch are simply ignored during merging. merged result keeps the epoch number of the newest seen. other operations remain the same as current. note that the above 2 rules are only concerned with merging within the deltas on the leader, and not related to the replicated count, which is a simple final state, and observes the rule of larger clock trumps. naturally the ordering rule is: epoch1.clock1 epoch2.clock2 iff epoch1 epoch2 || epoch1 == epoch2 clock1 clock2 intuitively epoch can be seen as the serial number on a new incarnation of a counter. code change should be mostly localized to CounterColumn.reconcile(), although, if an update does not find existing entry in memtable, we need to go to sstable to fetch any possible epoch number, so compared to current write path, in the no replicate-on-write case, we need to add a read to sstable. but in the replicate-on-write case, we already read that, so it's no extra time cost. no replicate-on-write is not a very useful setup in reality anyway. does
Re: minor vs major compaction and purging data
How about cleanups? What would be the difference between cleanup and compactions? On Sat, Jun 11, 2011 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Yes. On Sat, Jun 11, 2011 at 6:08 AM, Jonathan Colby jonathan.co...@gmail.com wrote: I've been reading inconsistent descriptions of what major and minor compactions do. So my question for clarification: Are tombstones purges (ie, space reclaimed) for minor AND major compactions? Thanks. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Counter Column in Cassandra
It's a column whose content represents a distributed counter. http://wiki.apache.org/cassandra/Counters On Mon, Jun 13, 2011 at 8:29 AM, Sijie YANG iyan...@gmail.com wrote: Hi, All I am newbie to cassandra. I have a simple question but don't find any clear answer by searching google: What's the meaning of counter column in Cassandra? Best
Re: Re: minor vs major compaction and purging data
Cleanup removes any data that node is no longer responsible for, according to the node's token range. A node can have data it is no longer responsible for if you do certain maintenance operations like move or loadbalance. On , Sebastien Coutu sco...@openplaces.org wrote: How about cleanups? What would be the difference between cleanup and compactions? On Sat, Jun 11, 2011 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Yes. On Sat, Jun 11, 2011 at 6:08 AM, Jonathan Colby jonathan.co...@gmail.com wrote: I've been reading inconsistent descriptions of what major and minor compactions do. So my question for clarification: Are tombstones purges (ie, space reclaimed) for minor AND major compactions? Thanks. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Re: minor vs major compaction and purging data
Thanks! This clarifies a few things :) On Mon, Jun 13, 2011 at 4:09 PM, jonathan.co...@gmail.com wrote: Cleanup removes any data that node is no longer responsible for, according to the node's token range. A node can have data it is no longer responsible for if you do certain maintenance operations like move or loadbalance. On , Sebastien Coutu sco...@openplaces.org wrote: How about cleanups? What would be the difference between cleanup and compactions? On Sat, Jun 11, 2011 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Yes. On Sat, Jun 11, 2011 at 6:08 AM, Jonathan Colby jonathan.co...@gmail.com wrote: I've been reading inconsistent descriptions of what major and minor compactions do. So my question for clarification: Are tombstones purges (ie, space reclaimed) for minor AND major compactions? Thanks. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: insufficient space to compact even the two smallest files, aborting
You may also have been running into https://issues.apache.org/jira/browse/CASSANDRA-2765. We'll have a fix for this in 0.8.1. 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com: I was already way over the minimum. There were 12 sstables. Also, is there any reason why scrub got stuck? I did not see anything in the logs. Via jmx I saw that the scrubbed bytes were equal to one of the sstables size, and it stuck there for a couple hours . El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió: That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: get_indexed_slices ~ simple map-reduce
From a quick read of the code in o.a.c.db.ColumnFamilyStore.scan()... Candidate rows are first read by applying the most selected equality predicate. From those candidate rows... 1) If the SlicePredicate has a SliceRange the query execution will read all columns for the candidate row if the byte size of the largest tracked row is less than column_index_size_in_kb config setting (defaults to 64K). Meaning if no more than 1 column index page of columns is (probably) going to be read, they will all be read. 2) Otherwise if the query will read the columns specified by the SliceRange. 3) If the SlicePredicate uses a list of columns names, those columns and the ones referenced in the IndexExpressions (except the one selected as the primary pivot above) are read from disk. If additional columns are needed (in case 2 above) they are read in a separate reads from the candidate row. Then when applying the SlicePredicate to produce the final projection into the result set, all the columns required to satisfy the filter will be in memory. So, yes it reads just the columns from disk you you ask for. Unless it thinks it will take no more work to read more. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 08:34, Michal Augustýn wrote: Hi, as I wrote, I don't want to install Hadoop etc. - I want just to use the Thrift API. The core of my question is how does get_indexed_slices function work. I know that it must get all keys using equality expression firstly - but what about additional expressions? Does Cassandra fetch whole filtered rows, or just columns used in additional filtering expression? Thanks! Augi 2011/6/12 aaron morton aa...@thelastpickle.com: Not exactly sure what you mean here, all data access is through the thrift API unless you code java and embed cassandra in your app. As well as Pig support there is also Hive support in brisk (which will also have Pig support soon) http://www.datastax.com/products/brisk Can you provide some more info on the use case ? Personally if you have a read query you know you need to support, I would consider supporting it in the data model without secondary indexes. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11 Jun 2011, at 19:23, Michal Augustýn wrote: Hi all, I'm thinking of get_indexed_slices function as a simple map-reduce job (that just maps) - am I right? Well, I would like to be able to run simple queries on values but I don't want to install Hadoop, write map-reduce jobs in Java (the whole application is in C# and I don't want to introduce new development stack - maybe Pig would help) and have some second interface to Cassandra (in addition to Thrift). So secondary indexes seem to be rescue for me. I would have just one indexed column that will have day-timestamp value (~100k items per day) and the equality expression for this column would be in each query (and I would add more ad-hoc expressions). Will this scenario work or is there some issue I could run in? Thanks! Augi
Re: SSL Streaming
Sasha does https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L362 help ? A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 23:26, AJ wrote: Performance-wise, I think it would be better to just let the client encrypt sensitive data before storing it, versus encrypting all traffic all the time. If individual values are encrypted, then they don't have to be encrypted/decrypted during transit between nodes during the initial updates as well as during the commissioning of a new node or other times. A drawback, however, is now you have to manage one or more keys for the lifetime of the data. It will also complicate your data view interfaces. However, if Cassandra had data encryption built-in somehow, that would solve this problem... just thinking out loud. Can anyone think of other pro/cons of both strategies? On 3/22/2011 2:21 AM, Sasha Dolgy wrote: Hi, Is there documentation available anywhere that describes how one can use org.apache.cassandra.security.streaming.* ? After the EC2 posts yesterday, one question I was asked was about the security of data being shifted between nodes. Is it done in clear text, or encrypted..? I haven't seen anything to suggest that it's encrypted, but see in the source that security.streaming does leverage SSL ... Thanks in advance for some pointers to documentation. Also, for anyone who is using SSL .. how much of a performance impact have you noticed? Is it minimal or significant?
Re: odd logs after repair
Count of the columns in a row, not an exact count as that would requiring stopping clients from writing to the row and we do not do that. Have a poke around http://www.datastax.com/docs/0.8/data_model/index and http://wiki.apache.org/cassandra/DataModel Or are you asking about counter columns ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 14 Jun 2011, at 03:27, Sijie YANG wrote: Hi, All I am newbie to cassandra. I have a simple question but don't find any clear answer by searching google: What's the meaning of count column in Cassandra? Thanks.
Re: SSL Streaming
AJ was responding to an email I sent in Marchalthough i do appreciate the quick reaponse from the community ;) i moved on to our implementation of vpn... On Jun 14, 2011 1:35 AM, aaron morton aa...@thelastpickle.com wrote: Sasha does https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L362help ? A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 23:26, AJ wrote: Performance-wise, I think it would be better to just let the client encrypt sensitive data before storing it, versus encrypting all traffic all the time. If individual values are encrypted, then they don't have to be encrypted/decrypted during transit between nodes during the initial updates as well as during the commissioning of a new node or other times. A drawback, however, is now you have to manage one or more keys for the lifetime of the data. It will also complicate your data view interfaces. However, if Cassandra had data encryption built-in somehow, that would solve this problem... just thinking out loud. Can anyone think of other pro/cons of both strategies? On 3/22/2011 2:21 AM, Sasha Dolgy wrote: Hi, Is there documentation available anywhere that describes how one can use org.apache.cassandra.security.streaming.* ? After the EC2 posts yesterday, one question I was asked was about the security of data being shifted between nodes. Is it done in clear text, or encrypted..? I haven't seen anything to suggest that it's encrypted, but see in the source that security.streaming does leverage SSL ... Thanks in advance for some pointers to documentation. Also, for anyone who is using SSL .. how much of a performance impact have you noticed? Is it minimal or significant?