Re: [jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-06-13 Thread Mick Semb Wever
On Sun, 2011-06-12 at 18:53 +, Mick Semb Wever wrote:
  This issue could stand to be summarized (I still wish we used a
  mailing list for monsters like this).
 
 
 This i actually really appreciate about the cassandra community. 

To formulate this: As a newbie here it has allowed me to understand
individual issues, their history and development discussion, without
having to go all in and subscribe to the development list. The latter
can be quite a daunting task in some communities to begin and keep up
with if it is where all development discussion is happening.
that's 2cents anyway from someone still finding their way into the code.

~mck

-- 
We are born naked, wet and hungry. Then things get worse. 
| http://semb.wever.org | http://sesat.no
| http://tech.finn.no   | Java XSS Filter



signature.asc
Description: This is a digitally signed message part


Re: SSL Streaming

2011-06-13 Thread AJ
Performance-wise, I think it would be better to just let the client 
encrypt sensitive data before storing it, versus encrypting all traffic 
all the time.  If individual values are encrypted, then they don't have 
to be encrypted/decrypted during transit between nodes during the 
initial updates as well as during the commissioning of a new node or 
other times.


A drawback, however, is now you have to manage one or more keys for the 
lifetime of the data.  It will also complicate your data view 
interfaces.  However, if Cassandra had data encryption built-in somehow, 
that would solve this problem... just thinking out loud.


Can anyone think of other pro/cons of both strategies?

On 3/22/2011 2:21 AM, Sasha Dolgy wrote:

Hi,

Is there documentation available anywhere that describes how one can
use org.apache.cassandra.security.streaming.* ?   After the EC2 posts
yesterday, one question I was asked was about the security of data
being shifted between nodes.  Is it done in clear text, or
encrypted..?  I haven't seen anything to suggest that it's encrypted,
but see in the source that security.streaming does leverage SSL ...

Thanks in advance for some pointers to documentation.

Also, for anyone who is using SSL .. how much of a performance impact
have you noticed?  Is it minimal or significant?





Re: Troubleshooting IO performance ?

2011-06-13 Thread aaron morton
To reduce the number of SSTables increase the memtable_threshold for the CF. 

The IO numbers may be because of compaction kicking in. The CompactionManager 
provides information via JMX on it's progress, or you can check the logs. You 
could increase the min_compaction_threshold for the CF or disable compaction if 
you want to during the bulk load. In your case it's probably a bad idea as in 
your case every write requires a read and compaction will help the read 
performance. 

These numbers show that about 37GB of data was written to disk, but compaction 
has shrunk this down to about 7GB. If you are not doing deletes that you are 
doing a log of overwrites

 Space used (live): 8283980539
 Space used (total): 39972827945


I'm getting a bit confused about what the problem is. Is it increasing latency 
for read requests or the unexplained IO ? 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12 Jun 2011, at 08:55, Philippe wrote:

 More info below
  
 I just loaded 4.8GB of similar data in another keyspace and ran the same 
 process as in my previous tests but on that data.
 I started with three threads hitting cassandra. No I/O, hardly any CPU (15% 
 on a 4 core server)
 After an hour or so, I raised it to 6 threads in parallel. Then to 9 threads 
 in parallel.
 
 I never got any IO, in fact iostat showed me there wasn't any disk reads. I 
 hardly saw the CPU elevate except at the end.
 
 The only difference between the two datasets is that the size of the other 
 one is 8.4Gb. So the second one doesn't fit completely in memory.So my woes 
 are related to how well cassandra is fetching the data in the SSTAbles right ?
 
 
 So what are my options ? My rows are very small at the moment (like well  4 
 kBytes). Should I reduce the read buffer ? Should I reduce the number of SST 
 tables ?
 I'm reloading the data from scratch using the incremental update I will be 
 using in production. I'm hitting the cluster with 6 threads. Yes I should 
 have decreased it but I was too optimistic and I dont want to stop it now. 
 The data I'm loading is used for computing running averages (sum  total) and 
 so every update requires a read. 
 
 As soon as the data no longer fits in memory, I'm seeing huge amounts of io 
 (almost 380MB/s reading) that I'd like to understand.
 My hypothesis is that because I am updating the same key so many times 
 (dozens if not hundreds of times in some cases), the row is split across the 
 SSTables and every read needs to go through all the SST tables.
 Unfortunately, at some point, cassandra compacted the keys from 5 tables to 3 
 and the throughput did not increase after that so I'm not even sure this 
 makes sense.
 1) Is there another explanation ? Can I do something about this ?
 2) why is the readlatency displayed twice in cfstats and why does it differ 
 ?
 
 Thanks
 
 vmstat
 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
  0  7  47724  94604  22556 737325200 391646  2152 10498 10297  6  6 
 26 62
  0  6  47724  93736  22556 737240000 396774 0 10881 11177  5  6 
 29 60
  2  5  47724  92496  22556 737482400 37240615 11212 11149  8  7 
 25 59
  0  5  47724  89520  22568 737848400 399730   526 10964 11975  6  7 
 24 63
  0  7  47724  87908  22568 737944400 396216 0 10405 10880  5  7 
 22 66
 
 iostat -dmx 2
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
 avgqu-sz   await r_await w_await  svctm  %util
 sdb 168.50 0.00 3152.000.50   185.49 0.25   120.66
 54.86   17.39   17.394.00   0.31  96.80
 sda 178.50 0.50 3090.500.50   184.47 0.19   122.35
 61.16   19.71   19.714.00   0.31  97.20
 md1   0.00 0.000.000.00 0.00 0.00 0.00
  0.000.000.000.00   0.00   0.00
 md5   0.00 0.00 6612.501.50   372.82 0.44   115.58
  0.000.000.000.00   0.00   0.00
 dm-0  0.00 0.00 6612.500.00   372.82 0.00   115.47   
 123.15   18.58   18.580.00   0.15  97.20
 
 cfstats
 Read Count: 88215069
 Read Latency: 1.821254258759351 ms.
 Write Count: 88215059
 Write Latency: 0.013311765885686253 ms.
 Pending Tasks: 0
 Column Family: PUBLIC_MONTHLY
 SSTable count: 3
 Space used (live): 8283980539
 Space used (total): 39972827945
 Memtable Columns Count: 449201
 Memtable Data Size: 21788245
 Memtable Switch Count: 72
 Read Count: 88215069
 Read Latency: 7.433 ms.
 Write Count: 88215069
 Write Latency: 0.016 ms.
 Pending Tasks: 0
   

problem in using get_range() function

2011-06-13 Thread Amrita Jayakumar
Hi,
I am trying to retrieve the row_keys in a column_family witht he following
code.

$rows = $column_family-get_range($key_start='R17889000',
$key_finish='R17893999', $row_count=1000);
$count = 0;
foreach($rows as $rows) {
echo $count.'br/';
$count += 1;

print_r($rows);
echo 'br/';
}


there are 5000 records in the datatbase. But only 526 are getting retrieved.
am i missing out something here???
can anyone help


Re: problem in using get_range() function

2011-06-13 Thread Dan Kuebrich
Are you using the order preserving partitioner or the random partitioner for
this CF?  In order to get the results you expect, you'll need to use the
OPP.

More info:
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

On Mon, Jun 13, 2011 at 8:47 AM, Amrita Jayakumar 
amritajayakuma...@gmail.com wrote:

 Hi,
 I am trying to retrieve the row_keys in a column_family witht he following
 code.

 $rows = $column_family-get_range($key_start='R17889000',
 $key_finish='R17893999', $row_count=1000);
 $count = 0;
 foreach($rows as $rows) {
 echo $count.'br/';
 $count += 1;

 print_r($rows);
 echo 'br/';
 }


 there are 5000 records in the datatbase. But only 526 are getting
 retrieved. am i missing out something here???
 can anyone help



Re: insufficient space to compact even the two smallest files, aborting

2011-06-13 Thread Héctor Izquierdo Seliva
Hi All.  I found a way to be able to compact. I have to call scrub on
the column family. Then scrub gets stuck forever. I restart the node,
and voila! I can compact again without any message about not having
enough space. This looks like a bug to me. What info would be needed to
fill a report? This is on 0.8 updating from 0.7.5




Docs: Why do deleted keys show up during range scans?

2011-06-13 Thread AJ

http://wiki.apache.org/cassandra/FAQ#range_ghosts

So to special case leaving out result entries for deletions, we would 
have to check the entire rest of the row to make sure there is no 
undeleted data anywhere else either (in which case leaving the key out 
would be an error).


The above doesn't read well and I don't get it.  Can anyone rephrase it 
or elaborate?


Thanks!


Re: odd logs after repair

2011-06-13 Thread aaron morton
You can double check with node tool e.g.  

$ ./bin/nodetool -h localhost version
ReleaseVersion: 0.8.0-SNAPSHOT

This error is about the internode wire protocol one node thinks another is 
using. Not sure how it could get confused, does it go away if you restart the 
node that logged the error ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13 Jun 2011, at 06:19, Sasha Dolgy wrote:

 Hi Everyone,
 
 Last week, upgraded all 4 nodes to apache-cassandra-0.8.0 .. no
 issues.  Trolling the logs today, I find messages like this on all
 four nodes:
 
 INFO [manual-repair-0b61c9e2-3593-4633-a80f-b6ca52cfe948] 2011-06-13
 02:16:45,978 AntiEntropyService.java (line 177) Excluding
 /10.128.34.18 from repair because it is on version 0.7 or sooner. You
 should consider updating this node before running repair again.
 
 Maybe it would be nice to have the version of all nodes print in
 nodetool ring ?  I don't think I'm crazy though ... have manually
 checked all are on 0.8.0
 
 
 -- 
 Sasha Dolgy
 sasha.do...@gmail.com



Re: Docs: Why do deleted keys show up during range scans?

2011-06-13 Thread Stephen Connolly
It returns the set of columns for the set of rows... how do you
determine the difference between a completely empty row and a row that
just does not have any of the matching columns?

Well the answer is that Cassandra does not go and check whether there
are any columns outside of the range you are querying, so it will just
return the empty (for the column range you specified) row your
code needs to be robust enough to be able to understand that an empty
list of columns does not imply that there are no columns at all for
that row key (i.e. it is deleted and waiting tombstone expiry  gc) or
there is a column outside the range you queried.

On 13 June 2011 13:59, AJ a...@dude.podzone.net wrote:
 http://wiki.apache.org/cassandra/FAQ#range_ghosts

 So to special case leaving out result entries for deletions, we would have
 to check the entire rest of the row to make sure there is no undeleted data
 anywhere else either (in which case leaving the key out would be an error).

 The above doesn't read well and I don't get it.  Can anyone rephrase it or
 elaborate?

 Thanks!



Re: problem in using get_range() function

2011-06-13 Thread Amrita Jayakumar
can u tell me how to retrieve all the row keys in a column family???

On Mon, Jun 13, 2011 at 6:25 PM, Dan Kuebrich dan.kuebr...@gmail.comwrote:

 Are you using the order preserving partitioner or the random partitioner
 for this CF?  In order to get the results you expect, you'll need to use the
 OPP.

 More info:
 http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

 On Mon, Jun 13, 2011 at 8:47 AM, Amrita Jayakumar 
 amritajayakuma...@gmail.com wrote:

 Hi,
 I am trying to retrieve the row_keys in a column_family witht he following
 code.

 $rows = $column_family-get_range($key_start='R17889000',
 $key_finish='R17893999', $row_count=1000);
 $count = 0;
 foreach($rows as $rows) {
 echo $count.'br/';
 $count += 1;

 print_r($rows);
 echo 'br/';
 }


 there are 5000 records in the datatbase. But only 526 are getting
 retrieved. am i missing out something here???
 can anyone help





Re: odd logs after repair

2011-06-13 Thread Sasha Dolgy
I recall there being a discussion about a default port changing from
0.7.x to 0.8.x ...this was JMX, correct?  Or were there others.

On Mon, Jun 13, 2011 at 3:34 PM, Sasha Dolgy sdo...@gmail.com wrote:
 Hi Aaron,

 The error is being reported on all 4 nodes. I have confirmed (for my
 own sanity) that each node is running:  ReleaseVersion: 0.8.0

 I can reproduce the error on any node by trailing
 cassandra/logs/system.log and running nodetool repair

  INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13
 21:28:39,877 AntiEntropyService.java (line 177) Excluding
 /10.128.34.18 from repair because it is on version 0.7 or sooner. You
 should consider updating this node before running repair again.
 ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13
 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception
 in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI
 Runtime]
 java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
        at java.util.HashMap$KeyIterator.next(HashMap.java:828)
        at 
 org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
        at 
 org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)

 When I run nodetool ring, the ring looks balanced and nothing out of sorts.

 I also have this set up with RF=3 on 4 nodes ... but repair was
 working fine prior to the 0.8.0 upgrade.

 Are there any special commands I need to run?  I've tried scrub,
 cleanup, flush too ... still, repair gives the same issues.

 -- I have stopped one of the nodes and started it.  Issue still
 persists.  I stop another node that is reported in the logs (like .18
 above) and start it ... run repair again ... issue is persisted to the
 log file still.

 -sd



 On Mon, Jun 13, 2011 at 3:02 PM, aaron morton aa...@thelastpickle.com wrote:
 You can double check with node tool e.g.

 $ ./bin/nodetool -h localhost version
 ReleaseVersion: 0.8.0-SNAPSHOT

 This error is about the internode wire protocol one node thinks another is 
 using. Not sure how it could get confused, does it go away if you restart 
 the node that logged the error ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 13 Jun 2011, at 06:19, Sasha Dolgy wrote:

 Hi Everyone,

 Last week, upgraded all 4 nodes to apache-cassandra-0.8.0 .. no
 issues.  Trolling the logs today, I find messages like this on all
 four nodes:

 INFO [manual-repair-0b61c9e2-3593-4633-a80f-b6ca52cfe948] 2011-06-13
 02:16:45,978 AntiEntropyService.java (line 177) Excluding
 /10.128.34.18 from repair because it is on version 0.7 or sooner. You
 should consider updating this node before running repair again.

 Maybe it would be nice to have the version of all nodes print in
 nodetool ring ?  I don't think I'm crazy though ... have manually
 checked all are on 0.8.0


 --
 Sasha Dolgy
 sasha.do...@gmail.com





 --
 Sasha Dolgy
 sasha.do...@gmail.com




-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: insufficient space to compact even the two smallest files, aborting

2011-06-13 Thread Terje Marthinussen
That most likely happened just because after scrub you had new files and got
over the 4 file minimum limit.

https://issues.apache.org/jira/browse/CASSANDRA-2697

Is the bug report.

2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com

 Hi All.  I found a way to be able to compact. I have to call scrub on
 the column family. Then scrub gets stuck forever. I restart the node,
 and voila! I can compact again without any message about not having
 enough space. This looks like a bug to me. What info would be needed to
 fill a report? This is on 0.8 updating from 0.7.5





Re: odd logs after repair

2011-06-13 Thread Tyler Hobbs
On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote:

 I recall there being a discussion about a default port changing from
 0.7.x to 0.8.x ...this was JMX, correct?  Or were there others.


Yes, the default JMX port changed from 8080 to 7199.  I don't think there
were any others.

-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Re: insufficient space to compact even the two smallest files, aborting

2011-06-13 Thread Héctor Izquierdo Seliva
I was already way over the minimum. There were 12 sstables. Also, is
there any reason why scrub got stuck? I did not see anything in the
logs. Via jmx I saw that the scrubbed bytes were equal to one of the
sstables size, and it stuck there for a couple hours .

El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió:
 That most likely happened just because after scrub you had new files
 and got over the 4 file minimum limit.
 
 https://issues.apache.org/jira/browse/CASSANDRA-2697
 
 Is the bug report.
 





Re: Docs: Why do deleted keys show up during range scans?

2011-06-13 Thread AJ

On 6/13/2011 7:03 AM, Stephen Connolly wrote:

It returns the set of columns for the set of rows... how do you
determine the difference between a completely empty row and a row that
just does not have any of the matching columns?


I would expect it to not return anything (no row at all) for both of 
those cases.  Are you saying that an empty row is returned for rows that 
do not match the predicate?  So, if I perform a range slice where the 
range is every row of the CF and the slice equates to no matches and I 
have 1 million rows in the CF, then I will get a result set of 1 million 
empty rows?


Re: Docs: Why do deleted keys show up during range scans?

2011-06-13 Thread Stephen Connolly
On 13 June 2011 16:14, AJ a...@dude.podzone.net wrote:
 On 6/13/2011 7:03 AM, Stephen Connolly wrote:

 It returns the set of columns for the set of rows... how do you
 determine the difference between a completely empty row and a row that
 just does not have any of the matching columns?

 I would expect it to not return anything (no row at all) for both of those
 cases.  Are you saying that an empty row is returned for rows that do not
 match the predicate?  So, if I perform a range slice where the range is
 every row of the CF and the slice equates to no matches and I have 1 million
 rows in the CF, then I will get a result set of 1 million empty rows?

No I am saying that for each row that matches, you will get a result,
even if the columns that you request happen to be empty for that
specific row.

Likewise, any deleted rows in the same row range will show as empty
because C* would have a tone of work to figure out the difference
between being deleted and being empty.


Re: odd logs after repair

2011-06-13 Thread Sijie YANG
Hi, All

I am newbie to cassandra. I have a simple question but don't find any clear
answer by searching google:
What's the meaning of count column in Cassandra? Thanks.


Counter Column in Cassandra

2011-06-13 Thread Sijie YANG
Hi, All

I am newbie to cassandra. I have a simple question but don't find any clear
answer by searching google:
What's the meaning of counter column in Cassandra?

Best


Re: insufficient space to compact even the two smallest files, aborting

2011-06-13 Thread Jonathan Ellis
As Terje already said in this thread, the threshold is per bucket
(group of similarly sized sstables) not per CF.

2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com:
 I was already way over the minimum. There were 12 sstables. Also, is
 there any reason why scrub got stuck? I did not see anything in the
 logs. Via jmx I saw that the scrubbed bytes were equal to one of the
 sstables size, and it stuck there for a couple hours .

 El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió:
 That most likely happened just because after scrub you had new files
 and got over the 4 file minimum limit.

 https://issues.apache.org/jira/browse/CASSANDRA-2697

 Is the bug report.








-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: count column in Cassandra

2011-06-13 Thread Sasha Dolgy
probably helpful if you change the subject when posting about a
different topic.

Is your question about counters or the count function?

Counters are cool.
Count allows you to determine how many columns exist in a row.

-sd

On Mon, Jun 13, 2011 at 5:27 PM, Sijie YANG iyan...@gmail.com wrote:
 Hi, All
 I am newbie to cassandra. I have a simple question but don't find any clear
 answer by searching google:
 What's the meaning of count column in Cassandra? Thanks.


one way to make counter delete work better

2011-06-13 Thread Yang
as https://issues.apache.org/jira/browse/CASSANDRA-2101
indicates, the problem with counter delete is  in scenarios like the
following:

add 1, clock 100
delete , clock 200
add  2 , clock 300

if the 1st and 3rd operations are merged in SStable compaction, then we
have
delete  clock 200
add 3,  clock 300

which shows wrong result.


I think a relatively simple extension can be used to complete fix this
issue: similar to ZooKeeper, we can prefix an Epoch number to the clock,
so that
   1) a delete operation increases future epoch number by 1
   2) merging of delta adds can be between only deltas of the same epoch,
deltas of older epoch are simply ignored during merging. merged result keeps
the epoch number of the newest seen.

other operations remain the same as current. note that the above 2 rules are
only concerned with merging within the deltas on the leader, and not related
to the replicated count, which is a simple final state, and observes the
rule of larger clock trumps. naturally the ordering rule is: epoch1.clock1
 epoch2.clock2  iff epoch1  epoch2 || epoch1 == epoch2  clock1  clock2

intuitively epoch can be seen as the serial number on a new incarnation
of a counter.


code change should be mostly localized to CounterColumn.reconcile(),
 although, if an update does not find existing entry in memtable, we need to
go to sstable to fetch any possible epoch number, so
compared to current write path, in the no replicate-on-write case, we need
to add a read to sstable. but in the replicate-on-write case, we already
read that, so it's no extra time cost.  no replicate-on-write is not a
very useful setup in reality anyway.


does this sound a feasible way?   if this works, expiring counter should
also naturally work.


Thanks
Yang


Re: one way to make counter delete work better

2011-06-13 Thread Jonathan Ellis
I don't think that's bulletproof either.  For instance, what if the
two adds go to replica 1 but the delete to replica 2?

Bottom line (and this was discussed on the original
delete-for-counters ticket,
https://issues.apache.org/jira/browse/CASSANDRA-2101), counter deletes
are not fully commutative which makes them fragile.

On Mon, Jun 13, 2011 at 10:54 AM, Yang tedd...@gmail.com wrote:
 as https://issues.apache.org/jira/browse/CASSANDRA-2101
 indicates, the problem with counter delete is  in scenarios like the
 following:
 add 1, clock 100
 delete , clock 200
 add  2 , clock 300
 if the 1st and 3rd operations are merged in SStable compaction, then we
 have
 delete  clock 200
 add 3,  clock 300
 which shows wrong result.

 I think a relatively simple extension can be used to complete fix this
 issue: similar to ZooKeeper, we can prefix an Epoch number to the clock,
 so that
    1) a delete operation increases future epoch number by 1
    2) merging of delta adds can be between only deltas of the same epoch,
 deltas of older epoch are simply ignored during merging. merged result keeps
 the epoch number of the newest seen.
 other operations remain the same as current. note that the above 2 rules are
 only concerned with merging within the deltas on the leader, and not related
 to the replicated count, which is a simple final state, and observes the
 rule of larger clock trumps. naturally the ordering rule is: epoch1.clock1
 epoch2.clock2  iff epoch1  epoch2 || epoch1 == epoch2  clock1  clock2
 intuitively epoch can be seen as the serial number on a new incarnation
 of a counter.

 code change should be mostly localized to CounterColumn.reconcile(),
  although, if an update does not find existing entry in memtable, we need to
 go to sstable to fetch any possible epoch number, so
 compared to current write path, in the no replicate-on-write case, we need
 to add a read to sstable. but in the replicate-on-write case, we already
 read that, so it's no extra time cost.  no replicate-on-write is not a
 very useful setup in reality anyway.

 does this sound a feasible way?   if this works, expiring counter should
 also naturally work.

 Thanks
 Yang



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Docs: Why do deleted keys show up during range scans?

2011-06-13 Thread AJ

On 6/13/2011 9:25 AM, Stephen Connolly wrote:

On 13 June 2011 16:14, AJa...@dude.podzone.net  wrote:

On 6/13/2011 7:03 AM, Stephen Connolly wrote:

It returns the set of columns for the set of rows... how do you
determine the difference between a completely empty row and a row that
just does not have any of the matching columns?

I would expect it to not return anything (no row at all) for both of those
cases.  Are you saying that an empty row is returned for rows that do not
match the predicate?  So, if I perform a range slice where the range is
every row of the CF and the slice equates to no matches and I have 1 million
rows in the CF, then I will get a result set of 1 million empty rows?


No I am saying that for each row that matches, you will get a result,
even if the columns that you request happen to be empty for that
specific row.



Ok, this I understand I guess.  If I query a range of rows and want only 
a certain column and a row does not have that column, I would like to 
know that.



Likewise, any deleted rows in the same row range will show as empty
because C* would have a tone of work to figure out the difference
between being deleted and being empty.



But, if a row does indeed have the column, but that row was deleted, why 
would I get an empty row?  You say because of a ton of work.  So, the 
tombstone for the row is not stored close-by for quick access... or 
something like that?  At any rate, how do I figure out if the empty row 
is empty because it was deleted?  Sorry if I'm being dense.





Re: Docs: Why do deleted keys show up during range scans?

2011-06-13 Thread Stephen Connolly
On 13 June 2011 17:09, AJ a...@dude.podzone.net wrote:
 On 6/13/2011 9:25 AM, Stephen Connolly wrote:

 On 13 June 2011 16:14, AJa...@dude.podzone.net  wrote:

 On 6/13/2011 7:03 AM, Stephen Connolly wrote:

 It returns the set of columns for the set of rows... how do you
 determine the difference between a completely empty row and a row that
 just does not have any of the matching columns?

 I would expect it to not return anything (no row at all) for both of
 those
 cases.  Are you saying that an empty row is returned for rows that do not
 match the predicate?  So, if I perform a range slice where the range is
 every row of the CF and the slice equates to no matches and I have 1
 million
 rows in the CF, then I will get a result set of 1 million empty rows?

 No I am saying that for each row that matches, you will get a result,
 even if the columns that you request happen to be empty for that
 specific row.


 Ok, this I understand I guess.  If I query a range of rows and want only a
 certain column and a row does not have that column, I would like to know
 that.

deleted rows don't have the column either which is the point.


 Likewise, any deleted rows in the same row range will show as empty
 because C* would have a tone of work to figure out the difference
 between being deleted and being empty.


 But, if a row does indeed have the column, but that row was deleted, why
 would I get an empty row?  You say because of a ton of work.  So, the
 tombstone for the row is not stored close-by for quick access... or
 something like that?  At any rate, how do I figure out if the empty row is
 empty because it was deleted?  Sorry if I'm being dense.


store the query inverted.

that way empty - deleted

the tombstones are stored for each column that had data IIRC... but at
this point my grok of C* is lacking





Re: Docs: Why do deleted keys show up during range scans?

2011-06-13 Thread AJ

On 6/13/2011 10:14 AM, Stephen Connolly wrote:


store the query inverted.

that way empty -  deleted

I don't know what that means... get the other columns?  Can you 
elaborate?  Is there docs for this or is this a hack/workaround?



the tombstones are stored for each column that had data IIRC... but at
this point my grok of C* is lacking
I suspected this, but wasn't sure.  It sounds like when a row is 
deleted, a tombstone is not attached to the row, but to each 
column???  So, if all columns are deleted then the row is considered 
deleted?  Hmmm, that doesn't sound right, but that doesn't mean it isn't 
! ;o)


Re: one way to make counter delete work better

2011-06-13 Thread Yang
I think this approach also works for your scenario:

I thought that the issue is only concerned with merging within the same
leader; but you pointed out
that a similar merging happens between leaders too, now I see that the same
rules on epoch number
also applies to inter-leader data merging, specifically in your case:


everyone starts with epoch of 0, ( they should be same, if not, it also
works, we just consider them to be representing diffferent time snapshots of
the same counter state)

node A  add 1clock:  0.100  (epoch = 0, clock number = 100)

node A  deleteclock:  0.200

node B add 2 clock:  0.300

node Agets B's state:  add 2 clock 0.300, but rejects it because A has
already produced a delete, with epoch of 0, so A considers epoch 0 already
ended, it won't accept any replicated state with epoch  1.

node Bgets A's delete  0.200,  it zeros its own count of 2, and
updates its future expected epoch to 1.

at this time, the state of system is:
node A expected epoch =1  [A:nil] [B:nil]
same for node B



let's say we have following further writes:

node B  add 3  clock  1.400

node A adds 4  clock 1.500

node B receives A's add 4,   node B updates its copy of A
node A receives B's add 3,updates its copy of B


then state is:
node A  , expected epoch == 1[A:4  clock=400] [B:3   clock=500]
node B same



generally I think it should be complete if we add the following rule for
inter-leader replication:

each leader keeps a var in memory (and also persist to sstable when
flushing)  expected_epoch , initially set to 0

node P does:
on receiving updates from  node Q
if Q.expected_epoch  P.expected_epoch
  /** an epoch bump inherently means a previous delete, which we
probably missed , so we need to apply the delete
  a delete is global to all leaders, so apply it on all my
replicas **/
 for all leaders in my vector
  count = nil

 P.expected_epoch =  Q.expected_epoch
if Q.expected_epoch == P.expected_epoch
 update P's copy of Q according to standard rules
/** if Q.expected_epoch  P.expected_epoch  , that means Q is less
up to date than us, just ignore


replicate_on_write(to Q):
  if  P.operation == delete
P.expected_epoch ++
set all my copies of all leaders to nil
  send to Q ( P.total , P.expected_epoch)




overall I don't think delete being not commutative is a fundamental blocker
: regular columns are also not commutative, yet we achieve stable result no
matter what order they are applied, because of the ordering rule used in
reconciliation; here we just need to find a similar ordering rule. the epoch
thing could be a step on this direction.


Thanks
Yang




On Mon, Jun 13, 2011 at 9:04 AM, Jonathan Ellis jbel...@gmail.com wrote:

 I don't think that's bulletproof either.  For instance, what if the
 two adds go to replica 1 but the delete to replica 2?

 Bottom line (and this was discussed on the original
 delete-for-counters ticket,
 https://issues.apache.org/jira/browse/CASSANDRA-2101), counter deletes
 are not fully commutative which makes them fragile.

 On Mon, Jun 13, 2011 at 10:54 AM, Yang tedd...@gmail.com wrote:
  as https://issues.apache.org/jira/browse/CASSANDRA-2101
  indicates, the problem with counter delete is  in scenarios like the
  following:
  add 1, clock 100
  delete , clock 200
  add  2 , clock 300
  if the 1st and 3rd operations are merged in SStable compaction, then we
  have
  delete  clock 200
  add 3,  clock 300
  which shows wrong result.
 
  I think a relatively simple extension can be used to complete fix this
  issue: similar to ZooKeeper, we can prefix an Epoch number to the
 clock,
  so that
 1) a delete operation increases future epoch number by 1
 2) merging of delta adds can be between only deltas of the same epoch,
  deltas of older epoch are simply ignored during merging. merged result
 keeps
  the epoch number of the newest seen.
  other operations remain the same as current. note that the above 2 rules
 are
  only concerned with merging within the deltas on the leader, and not
 related
  to the replicated count, which is a simple final state, and observes the
  rule of larger clock trumps. naturally the ordering rule is:
 epoch1.clock1
  epoch2.clock2  iff epoch1  epoch2 || epoch1 == epoch2  clock1 
 clock2
  intuitively epoch can be seen as the serial number on a new
 incarnation
  of a counter.
 
  code change should be mostly localized to CounterColumn.reconcile(),
   although, if an update does not find existing entry in memtable, we need
 to
  go to sstable to fetch any possible epoch number, so
  compared to current write path, in the no replicate-on-write case, we
 need
  to add a read to sstable. but in the replicate-on-write case, we
 already
  read that, so it's no extra time cost.  no replicate-on-write is not a
  very useful setup in reality anyway.
 
  does 

Re: minor vs major compaction and purging data

2011-06-13 Thread Sebastien Coutu
How about cleanups? What would be the difference between cleanup and
compactions?

On Sat, Jun 11, 2011 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Yes.

 On Sat, Jun 11, 2011 at 6:08 AM, Jonathan Colby
 jonathan.co...@gmail.com wrote:
  I've been reading inconsistent descriptions of what major and minor
 compactions do. So my question for clarification:
 
  Are tombstones purges (ie, space reclaimed) for minor AND major
 compactions?
 
  Thanks.



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Counter Column in Cassandra

2011-06-13 Thread Patricio Echagüe
It's a column whose content represents a distributed counter.

http://wiki.apache.org/cassandra/Counters

On Mon, Jun 13, 2011 at 8:29 AM, Sijie YANG iyan...@gmail.com wrote:

 Hi, All

 I am newbie to cassandra. I have a simple question but don't find any clear
 answer by searching google:
 What's the meaning of counter column in Cassandra?

 Best





Re: Re: minor vs major compaction and purging data

2011-06-13 Thread jonathan . colby
Cleanup removes any data that node is no longer responsible for, according  
to the node's token range. A node can have data it is no longer responsible  
for if you do certain maintenance operations like move or loadbalance.


On , Sebastien Coutu sco...@openplaces.org wrote:
How about cleanups? What would be the difference between cleanup and  
compactions?



On Sat, Jun 11, 2011 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote:



Yes.






On Sat, Jun 11, 2011 at 6:08 AM, Jonathan Colby



jonathan.co...@gmail.com wrote:


 I've been reading inconsistent descriptions of what major and minor  
compactions do. So my question for clarification:






 Are tombstones purges (ie, space reclaimed) for minor AND major  
compactions?







 Thanks.











--



Jonathan Ellis



Project Chair, Apache Cassandra



co-founder of DataStax, the source for professional Cassandra support



http://www.datastax.com







Re: Re: minor vs major compaction and purging data

2011-06-13 Thread Sebastien Coutu
Thanks! This clarifies a few things :)

On Mon, Jun 13, 2011 at 4:09 PM, jonathan.co...@gmail.com wrote:

 Cleanup removes any data that node is no longer responsible for, according
 to the node's token range. A node can have data it is no longer responsible
 for if you do certain maintenance operations like move or loadbalance.


 On , Sebastien Coutu sco...@openplaces.org wrote:
  How about cleanups? What would be the difference between cleanup and
 compactions?
 
  On Sat, Jun 11, 2011 at 8:14 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  Yes.
 
 
 
 
  On Sat, Jun 11, 2011 at 6:08 AM, Jonathan Colby
 
  jonathan.co...@gmail.com wrote:
 
   I've been reading inconsistent descriptions of what major and minor
 compactions do. So my question for clarification:
 
  
 
   Are tombstones purges (ie, space reclaimed) for minor AND major
 compactions?
 
  
 
   Thanks.
 
 
 
 
 
 
 
 
 
  --
 
  Jonathan Ellis
 
  Project Chair, Apache Cassandra
 
  co-founder of DataStax, the source for professional Cassandra support
 
  http://www.datastax.com
 
 
 
 



Re: insufficient space to compact even the two smallest files, aborting

2011-06-13 Thread Jonathan Ellis
You may also have been running into
https://issues.apache.org/jira/browse/CASSANDRA-2765. We'll have a fix
for this in 0.8.1.

2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com:
 I was already way over the minimum. There were 12 sstables. Also, is
 there any reason why scrub got stuck? I did not see anything in the
 logs. Via jmx I saw that the scrubbed bytes were equal to one of the
 sstables size, and it stuck there for a couple hours .

 El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió:
 That most likely happened just because after scrub you had new files
 and got over the 4 file minimum limit.

 https://issues.apache.org/jira/browse/CASSANDRA-2697

 Is the bug report.








-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: get_indexed_slices ~ simple map-reduce

2011-06-13 Thread aaron morton
From a quick read of the code in o.a.c.db.ColumnFamilyStore.scan()...

Candidate rows are first read by applying the most selected equality predicate. 

From those candidate rows...

1) If the SlicePredicate has a SliceRange the query execution will read all 
columns for the candidate row  if the byte size of the largest tracked row is 
less than column_index_size_in_kb config setting (defaults to 64K). Meaning if 
no more than 1 column index page of columns is (probably) going to be read, 
they will all be read. 

2) Otherwise if the query will read the columns specified by the SliceRange. 

3) If the SlicePredicate uses a list of columns names, those columns and the 
ones referenced in the IndexExpressions (except the one selected as the primary 
pivot above) are read from disk. 

If additional columns are needed (in case 2 above) they are read in a separate 
reads from the candidate row. 

Then when applying the SlicePredicate to produce the final projection into the 
result set, all the columns required to satisfy the filter will be in memory.  


So, yes it reads just the columns from disk you you ask for. Unless it thinks 
it will take no more work to read more. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13 Jun 2011, at 08:34, Michal Augustýn wrote:

 Hi,
 
 as I wrote, I don't want to install Hadoop etc. - I want just to use
 the Thrift API. The core of my question is how does get_indexed_slices
 function work.
 
 I know that it must get all keys using equality expression firstly -
 but what about additional expressions? Does Cassandra fetch whole
 filtered rows, or just columns used in additional filtering
 expression?
 
 Thanks!
 
 Augi
 
 2011/6/12 aaron morton aa...@thelastpickle.com:
 Not exactly sure what you mean here, all data access is through the thrift
 API unless you code java and embed cassandra in your app.
 As well as Pig support there is also Hive support in brisk (which will also
 have Pig support soon) http://www.datastax.com/products/brisk
 Can you provide some more info on the use case ? Personally if you have a
 read query you know you need to support, I would consider supporting it in
 the data model without secondary indexes.
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 11 Jun 2011, at 19:23, Michal Augustýn wrote:
 
 Hi all,
 
 I'm thinking of get_indexed_slices function as a simple map-reduce job
 (that just maps) - am I right?
 
 Well, I would like to be able to run simple queries on values but I
 don't want to install Hadoop, write map-reduce jobs in Java (the whole
 application is in C# and I don't want to introduce new development
 stack - maybe Pig would help) and have some second interface to
 Cassandra (in addition to Thrift). So secondary indexes seem to be
 rescue for me. I would have just one indexed column that will have
 day-timestamp value (~100k items per day) and the equality expression
 for this column would be in each query (and I would add more ad-hoc
 expressions).
 Will this scenario work or is there some issue I could run in?
 
 Thanks!
 
 Augi
 
 



Re: SSL Streaming

2011-06-13 Thread aaron morton
Sasha does 
https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L362
 help ?

A


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13 Jun 2011, at 23:26, AJ wrote:

 Performance-wise, I think it would be better to just let the client encrypt 
 sensitive data before storing it, versus encrypting all traffic all the time. 
  If individual values are encrypted, then they don't have to be 
 encrypted/decrypted during transit between nodes during the initial updates 
 as well as during the commissioning of a new node or other times.
 
 A drawback, however, is now you have to manage one or more keys for the 
 lifetime of the data.  It will also complicate your data view interfaces.  
 However, if Cassandra had data encryption built-in somehow, that would solve 
 this problem... just thinking out loud.
 
 Can anyone think of other pro/cons of both strategies?
 
 On 3/22/2011 2:21 AM, Sasha Dolgy wrote:
 Hi,
 
 Is there documentation available anywhere that describes how one can
 use org.apache.cassandra.security.streaming.* ?   After the EC2 posts
 yesterday, one question I was asked was about the security of data
 being shifted between nodes.  Is it done in clear text, or
 encrypted..?  I haven't seen anything to suggest that it's encrypted,
 but see in the source that security.streaming does leverage SSL ...
 
 Thanks in advance for some pointers to documentation.
 
 Also, for anyone who is using SSL .. how much of a performance impact
 have you noticed?  Is it minimal or significant?
 
 



Re: odd logs after repair

2011-06-13 Thread aaron morton
Count of the columns in a row, not an exact count as that would requiring 
stopping clients from writing to the row and we do not do that. 

Have a poke around 
http://www.datastax.com/docs/0.8/data_model/index
and
http://wiki.apache.org/cassandra/DataModel

Or are you asking about counter columns ?

Cheers

 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 14 Jun 2011, at 03:27, Sijie YANG wrote:

 Hi, All
 
 I am newbie to cassandra. I have a simple question but don't find any clear 
 answer by searching google: 
 What's the meaning of count column in Cassandra? Thanks.



Re: SSL Streaming

2011-06-13 Thread Sasha Dolgy
AJ was responding to an email I sent in Marchalthough i do appreciate
the quick reaponse from the community ;) i moved on to our implementation of
vpn...
On Jun 14, 2011 1:35 AM, aaron morton aa...@thelastpickle.com wrote:
 Sasha does
https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L362help
?

 A


 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 13 Jun 2011, at 23:26, AJ wrote:

 Performance-wise, I think it would be better to just let the client
encrypt sensitive data before storing it, versus encrypting all traffic all
the time. If individual values are encrypted, then they don't have to be
encrypted/decrypted during transit between nodes during the initial updates
as well as during the commissioning of a new node or other times.

 A drawback, however, is now you have to manage one or more keys for the
lifetime of the data. It will also complicate your data view interfaces.
However, if Cassandra had data encryption built-in somehow, that would solve
this problem... just thinking out loud.

 Can anyone think of other pro/cons of both strategies?

 On 3/22/2011 2:21 AM, Sasha Dolgy wrote:
 Hi,

 Is there documentation available anywhere that describes how one can
 use org.apache.cassandra.security.streaming.* ? After the EC2 posts
 yesterday, one question I was asked was about the security of data
 being shifted between nodes. Is it done in clear text, or
 encrypted..? I haven't seen anything to suggest that it's encrypted,
 but see in the source that security.streaming does leverage SSL ...

 Thanks in advance for some pointers to documentation.

 Also, for anyone who is using SSL .. how much of a performance impact
 have you noticed? Is it minimal or significant?