[jira] [Created] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Boris Yen (JIRA)
Enormous counter 
-

 Key: CASSANDRA-3006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.3
 Environment: ubuntu 10.04
Reporter: Boris Yen


I have two-node cluster with the following keyspace and column family settings.

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]

Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [datacenter1:2]
  Column Families:
ColumnFamily: testCounter (Super)
"APP status information."
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: 
org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []

Then, I use a test program based on hector to add a counter column 
(testCounter[sc][column]) 1000 times. In the middle the adding process, I 
intentional shut down the node 172.17.19.152. In addition to that, the test 
program is smart enough to switch the consistency level from Quorum to One, so 
that the following adding actions would not fail. 

After all the adding actions are done, I start the cassandra on 172.17.19.152, 
and I use cassandra-cli to check if the counter is correct on both nodes, and I 
got a result 1001 which should be reasonable because hector will retry once. 
However, when I shut down 172.17.19.151 and after 172.17.19.152 is aware of 
172.17.19.151 is down, I try to start the cassandra on 172.17.19.151 again. 
Then, I check the counter again, this time I got a result 481387 which is so 
wrong.

I use 0.8.3 the reproduce this bug, but I think this also happens on 0.8.2 or 
before also. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Boris Yen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081545#comment-13081545
 ] 

Boris Yen commented on CASSANDRA-3006:
--

I forgot the mention that the counter is out of sync between these two nodes, 
one shows 481387 and the other one shows 20706.

> Enormous counter 
> -
>
> Key: CASSANDRA-3006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.3
> Environment: ubuntu 10.04
>Reporter: Boris Yen
>
> I have two-node cluster with the following keyspace and column family 
> settings.
> Cluster Information:
>Snitch: org.apache.cassandra.locator.SimpleSnitch
>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>Schema versions: 
>   63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]
> Keyspace: test:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
> Options: [datacenter1:2]
>   Column Families:
> ColumnFamily: testCounter (Super)
> "APP status information."
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.CounterColumnType
>   Columns sorted by: 
> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>   Row cache size / save period in seconds: 0.0/0
>   Key cache size / save period in seconds: 20.0/14400
>   Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: true
>   Built indexes: []
> Then, I use a test program based on hector to add a counter column 
> (testCounter[sc][column]) 1000 times. In the middle the adding process, I 
> intentional shut down the node 172.17.19.152. In addition to that, the test 
> program is smart enough to switch the consistency level from Quorum to One, 
> so that the following adding actions would not fail. 
> After all the adding actions are done, I start the cassandra on 
> 172.17.19.152, and I use cassandra-cli to check if the counter is correct on 
> both nodes, and I got a result 1001 which should be reasonable because hector 
> will retry once. However, when I shut down 172.17.19.151 and after 
> 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the cassandra 
> on 172.17.19.151 again. Then, I check the counter again, this time I got a 
> result 481387 which is so wrong.
> I use 0.8.3 the reproduce this bug, but I think this also happens on 0.8.2 or 
> before also. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Boris Yen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Yen updated CASSANDRA-3006:
-

Description: 
I have two-node cluster with the following keyspace and column family settings.

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]

Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [datacenter1:2]
  Column Families:
ColumnFamily: testCounter (Super)
"APP status information."
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: 
org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []

Then, I use a test program based on hector to add a counter column 
(testCounter[sc][column]) 1000 times. In the middle the adding process, I 
intentional shut down the node 172.17.19.152. In addition to that, the test 
program is smart enough to switch the consistency level from Quorum to One, so 
that the following adding actions would not fail. 

After all the adding actions are done, I start the cassandra on 172.17.19.152, 
and I use cassandra-cli to check if the counter is correct on both nodes, and I 
got a result 1001 which should be reasonable because hector will retry once. 
However, when I shut down 172.17.19.151 and after 172.17.19.152 is aware of 
172.17.19.151 is down, I try to start the cassandra on 172.17.19.151 again. 
Then, I check the counter again, this time I got a result 481387 which is so 
wrong.

I use 0.8.3 to reproduce this bug, but I think this also happens on 0.8.2 or 
before also. 

  was:
I have two-node cluster with the following keyspace and column family settings.

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]

Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [datacenter1:2]
  Column Families:
ColumnFamily: testCounter (Super)
"APP status information."
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: 
org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []

Then, I use a test program based on hector to add a counter column 
(testCounter[sc][column]) 1000 times. In the middle the adding process, I 
intentional shut down the node 172.17.19.152. In addition to that, the test 
program is smart enough to switch the consistency level from Quorum to One, so 
that the following adding actions would not fail. 

After all the adding actions are done, I start the cassandra on 172.17.19.152, 
and I use cassandra-cli to check if the counter is correct on both nodes, and I 
got a result 1001 which should be reasonable because hector will retry once. 
However, when I shut down 172.17.19.151 and after 172.17.19.152 is aware of 
172.17.19.151 is down, I try to start the cassandra on 172.17.19.151 again. 
Then, I check the counter again, this time I got a result 481387 which is so 
wrong.

I use 0.8.3 the reproduce this bug, but I think this also happens on 0.8.2 or 
before also. 


> Enormous counter 
> -
>
> Key: CASSANDRA-3006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.3
> Environment: ubuntu 10.04
>Reporter: Boris Yen
>
> I have two-node cluster with the following keyspace and column family 
> settings.
> Cluster Information:
>Snitch: org.apache.cassandra.locator.SimpleSnitch
>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>Schema versions: 
>   63fda700-c243-11e0--2d

[jira] [Updated] (CASSANDRA-2843) better performance on long row read

2011-08-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2843:


Attachment: 2843_h.patch

bq. the IColumnMap name when it does not implement Map interface, and some 
things it has in common with Map (iteration) it changes semantics of (iterating 
values instead of keys). not sure what to use instead though, since we already 
have an IColumnContainer. Maybe ISortedColumns?

Yeah, I'm not sure I have a better name either, maybe ISortedColumnHolder, but 
not sure it's better than ISortedColumns so attached rebased patch simply 
rename ColumnMap -> SortedColumns

bq. TSCM and ALCM extending instead of wrapping CSLM/AL, respectively

The idea was to save one object creation. I admit this is probably not a huge 
deal, but it felt that in this case it was no big deal to extend instead of 
wrapping either, so felt like worth "optimizing". I still stand by that choice 
but I have no good argument against the criticism that it is possibly premature.

bq. unrelated reformatting

If we're talking about the ones in SuperColumn.java, sorry, I mistakenly forced 
re-indentation on the file which rewrote the tab to spaces. New patch keeps the 
old formatting.  I'd mention that there is also a few places where I've rewrote 
cf.getSortedColumns().iterator() to cf.iterator(), which is arguably a bit 
gratuitous for this patch, but I figured this avoids creating a new Collection 
in the case of CLSM and there's not so many occurrences.


> better performance on long row read
> ---
>
> Key: CASSANDRA-2843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Yang Yang
> Fix For: 1.0
>
> Attachments: 2843.patch, 2843_d.patch, 2843_g.patch, 2843_h.patch, 
> fix.diff, microBenchmark.patch, patch_timing, std_timing
>
>
> currently if a row contains > 1000 columns, the run time becomes considerably 
> slow (my test of 
> a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
> 40 bytes in value, is about 16ms.
> this is all running in memory, no disk read is involved.
> through debugging we can find
> most of this time is spent on 
> [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
> int, ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
>  Iterator, int)
> [Wall Time]  
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
>  Iterator, int)
> [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
> ColumnFamily.addColumn() is slow because it inserts into an internal 
> concurrentSkipListMap() that maps column names to values.
> this structure is slow for two reasons: it needs to do synchronization; it 
> needs to maintain a more complex structure of map.
> but if we look at the whole read path, thrift already defines the read output 
> to be List so it does not make sense to use a luxury map 
> data structure in the interium and finally convert it to a list. on the 
> synchronization side, since the return CF is never going to be 
> shared/modified by other threads, we know the access is always single thread, 
> so no synchronization is needed.
> but these 2 features are indeed needed for ColumnFamily in other cases, 
> particularly write. so we can provide a different ColumnFamily to 
> CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
> creates the standard ColumnFamily, but take a provided returnCF, whose cost 
> is much cheaper.
> the provided patch is for demonstration now, will work further once we agree 
> on the general direction. 
> CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
> provided. the main work is to let the FastColumnFamily use an array  for 
> internal storage. at first I used binary search to insert new columns in 
> addColumn(), but later I found that even this is not necessary, since all 
> calling scenarios of ColumnFamily.addColumn() has an invariant that the 
> inserted columns come in sorted order (I still have an issue to resolve 
> descending or ascending  now, but ascending works). so the current logic is 
> simply to compare the new column against the end column in the array, if 
> names not equal, append, if equal, reconcile.
> slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
> flavors of the method, one accepting a returnCF. but we could definitely

[jira] [Created] (CASSANDRA-3007) NullPointerException in MessagingService.java:420

2011-08-09 Thread Viliam Holub (JIRA)
NullPointerException in MessagingService.java:420
-

 Key: CASSANDRA-3007
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3007
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.3
 Environment: Linux w0 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 
05:15:26 UTC 2010 x86_64 GNU/Linux
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8.7) (6b18-1.8.7-2~squeeze1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
Reporter: Viliam Holub
Priority: Minor


I'm getting large quantity of exceptions during streaming. It is always in 
MessagingService.java:420. The streaming appears to be blocked.

 INFO 10:11:14,734 Streaming to /10.235.77.27
ERROR 10:11:14,734 Fatal exception in thread Thread[StreamStage:2,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.net.MessagingService.stream(MessagingService.java:420)
at 
org.apache.cassandra.streaming.StreamOutSession.begin(StreamOutSession.java:176)
at 
org.apache.cassandra.streaming.StreamOut.transferRangesForRequest(StreamOut.java:148)
at 
org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:54)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-1717:
---

Attachment: CASSANDRA-1717-v2.patch

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit 
tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int 
internally (getValue() returns a long only because CRC32 implements the 
interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to 
modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to 
checksum the uncompressed data. The advantage of checksumming compressed data 
is the speed (less data to checksum), but checksumming the uncompressed data 
would be a little bit safer. In particular, it would prevent us from messing up 
in the decompression (and we don't have to trust the compression algorithm, not 
that I don't trust Snappy, but...). This is a clearly a trade-off that we have 
to make, but I admit that my personal preference would lean towards safety (in 
particular, I know that checksumming the uncompressed data give a bit more 
safety, I don't know what is our exact gain quantitatively with checksumming 
compressed data). On the other side, checksumming the uncompressed data would 
likely mean that a good part of the bitrot would result in a decompression 
error rather than a checksum error, which is maybe less convenient from the 
implementation point of view. So I don't know, I guess I'm thinking aloud to 
have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few 
blocks, switch one bit in the resulting file, and checking this is caught at 
read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of 
java CRC32. In particular, it seems they have been able to close to double the 
speed of the CRC32, with a solution that seems fairly simple to me. It would be 
ok to use java native CRC32 and leave the improvement to another ticket, but 
quite frankly if it is that simple and since the hadoop guys have done all the 
hard work for us, I say we start with the efficient version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081569#comment-13081569
 ] 

Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:25 AM:
-

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit 
tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int 
internally (getValue() returns a long only because CRC32 implements the 
interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to 
modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to 
checksum the uncompressed data. The advantage of checksumming compressed data 
is the speed (less data to checksum), but checksumming the uncompressed data 
would be a little bit safer. In particular, it would prevent us from messing up 
in the decompression (and we don't have to trust the compression algorithm, not 
that I don't trust Snappy, but...). This is a clearly a trade-off that we have 
to make, but I admit that my personal preference would lean towards safety (in 
particular, I know that checksumming the uncompressed data give a bit more 
safety, I don't know what is our exact gain quantitatively with checksumming 
compressed data). On the other side, checksumming the uncompressed data would 
likely mean that a good part of the bitrot would result in a decompression 
error rather than a checksum error, which is maybe less convenient from the 
implementation point of view. So I don't know, I guess I'm thinking aloud to 
have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few 
blocks, switch one bit in the resulting file, and checking this is caught at 
read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the 
efficiency of java CRC32. In particular, it seems they have been able to close 
to double the speed of the CRC32, with a solution that seems fairly simple to 
me. It would be ok to use java native CRC32 and leave the improvement to 
another ticket, but quite frankly if it is that simple and since the hadoop 
guys have done all the hard work for us, I say we start with the efficient 
version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

  was (Author: xedin):
bq. CSW.flushData() forgot to reset the checksum (this is caught by the 
unit tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int 
internally (getValue() returns a long only because CRC32 implements the 
interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to 
modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to 
checksum the uncompressed data. The advantage of checksumming compressed data 
is the speed (less data to checksum), but checksumming the uncompressed data 
would be a little bit safer. In particular, it would prevent us from messing up 
in the decompression (and we don't have to trust the compression algorithm, not 
that I don't trust Snappy, but...). This is a clearly a trade-off that we have 
to make, but I admit that my personal preference would lean towards safety (in 
particular, I know that checksumming the uncompressed data give a bit more 
safety, I don't know what is our exact gain quantitatively with checksumming 
compressed data). On the other side, checksumming the uncompressed data would 
likely mean that a good part of the bitrot would result in a decompression 
error rather than a checksum error, which is maybe less convenient from the 
implementation point of view. So I don't know, I guess I'm thinking aloud to 
have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few 
blocks, switch one bit in the resulting file, and checking this is caught at 
read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of 
java CRC32. In particular, it seems they have been able to close to double the 
speed of the CRC32, with a solution that seems fairly simple to me. It would be 
ok to use java native CRC32 and leave the improvement to another ticket, but 
quite 

[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081569#comment-13081569
 ] 

Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:29 AM:
-

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit 
tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int 
internally (getValue() returns a long only because CRC32 implements the 
interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to 
modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to 
checksum the uncompressed data. The advantage of checksumming compressed data 
is the speed (less data to checksum), but checksumming the uncompressed data 
would be a little bit safer. In particular, it would prevent us from messing up 
in the decompression (and we don't have to trust the compression algorithm, not 
that I don't trust Snappy, but...). This is a clearly a trade-off that we have 
to make, but I admit that my personal preference would lean towards safety (in 
particular, I know that checksumming the uncompressed data give a bit more 
safety, I don't know what is our exact gain quantitatively with checksumming 
compressed data). On the other side, checksumming the uncompressed data would 
likely mean that a good part of the bitrot would result in a decompression 
error rather than a checksum error, which is maybe less convenient from the 
implementation point of view. So I don't know, I guess I'm thinking aloud to 
have other's opinions more than anything else.

  It checksums original (non-compressed) data and stores checksum at the end of 
the compressed chunk, reader makes a checksum check after decompression.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few 
blocks, switch one bit in the resulting file, and checking this is caught at 
read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the 
efficiency of java CRC32. In particular, it seems they have been able to close 
to double the speed of the CRC32, with a solution that seems fairly simple to 
me. It would be ok to use java native CRC32 and leave the improvement to 
another ticket, but quite frankly if it is that simple and since the hadoop 
guys have done all the hard work for us, I say we start with the efficient 
version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

  was (Author: xedin):
bq. CSW.flushData() forgot to reset the checksum (this is caught by the 
unit tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int 
internally (getValue() returns a long only because CRC32 implements the 
interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to 
modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to 
checksum the uncompressed data. The advantage of checksumming compressed data 
is the speed (less data to checksum), but checksumming the uncompressed data 
would be a little bit safer. In particular, it would prevent us from messing up 
in the decompression (and we don't have to trust the compression algorithm, not 
that I don't trust Snappy, but...). This is a clearly a trade-off that we have 
to make, but I admit that my personal preference would lean towards safety (in 
particular, I know that checksumming the uncompressed data give a bit more 
safety, I don't know what is our exact gain quantitatively with checksumming 
compressed data). On the other side, checksumming the uncompressed data would 
likely mean that a good part of the bitrot would result in a decompression 
error rather than a checksum error, which is maybe less convenient from the 
implementation point of view. So I don't know, I guess I'm thinking aloud to 
have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few 
blocks, switch one bit in the resulting file, and checking this is caught at 
read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the 
efficiency of java CRC32. In particular, it seems they have been able to close 
to double the speed of the CRC32, with a solution that seems fa

[jira] [Commented] (CASSANDRA-3007) NullPointerException in MessagingService.java:420

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081601#comment-13081601
 ] 

Jonathan Ellis commented on CASSANDRA-3007:
---

What kind of streaming are you attempting?  

> NullPointerException in MessagingService.java:420
> -
>
> Key: CASSANDRA-3007
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3007
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.8.3
> Environment: Linux w0 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 
> 05:15:26 UTC 2010 x86_64 GNU/Linux
> java version "1.6.0_18"
> OpenJDK Runtime Environment (IcedTea6 1.8.7) (6b18-1.8.7-2~squeeze1)
> OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
>Reporter: Viliam Holub
>Priority: Minor
>  Labels: nullpointerexception, streaming
>
> I'm getting large quantity of exceptions during streaming. It is always in 
> MessagingService.java:420. The streaming appears to be blocked.
>  INFO 10:11:14,734 Streaming to /10.235.77.27
> ERROR 10:11:14,734 Fatal exception in thread Thread[StreamStage:2,5,main]
> java.lang.NullPointerException
> at 
> org.apache.cassandra.net.MessagingService.stream(MessagingService.java:420)
> at 
> org.apache.cassandra.streaming.StreamOutSession.begin(StreamOutSession.java:176)
> at 
> org.apache.cassandra.streaming.StreamOut.transferRangesForRequest(StreamOut.java:148)
> at 
> org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:54)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3007) NullPointerException in MessagingService.java:420

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3007:
--

Attachment: 3007.txt

Never mind, not relevant.  Looks like you upgraded from 0.7 without updating 
your configuration file?

Fix for missing encryption_options attached.

> NullPointerException in MessagingService.java:420
> -
>
> Key: CASSANDRA-3007
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3007
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.8.3
> Environment: Linux w0 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 
> 05:15:26 UTC 2010 x86_64 GNU/Linux
> java version "1.6.0_18"
> OpenJDK Runtime Environment (IcedTea6 1.8.7) (6b18-1.8.7-2~squeeze1)
> OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
>Reporter: Viliam Holub
>Priority: Minor
>  Labels: nullpointerexception, streaming
> Fix For: 0.8.4
>
> Attachments: 3007.txt
>
>
> I'm getting large quantity of exceptions during streaming. It is always in 
> MessagingService.java:420. The streaming appears to be blocked.
>  INFO 10:11:14,734 Streaming to /10.235.77.27
> ERROR 10:11:14,734 Fatal exception in thread Thread[StreamStage:2,5,main]
> java.lang.NullPointerException
> at 
> org.apache.cassandra.net.MessagingService.stream(MessagingService.java:420)
> at 
> org.apache.cassandra.streaming.StreamOutSession.begin(StreamOutSession.java:176)
> at 
> org.apache.cassandra.streaming.StreamOut.transferRangesForRequest(StreamOut.java:148)
> at 
> org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:54)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-3006:
-

Assignee: Sylvain Lebresne

> Enormous counter 
> -
>
> Key: CASSANDRA-3006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.3
> Environment: ubuntu 10.04
>Reporter: Boris Yen
>Assignee: Sylvain Lebresne
>
> I have two-node cluster with the following keyspace and column family 
> settings.
> Cluster Information:
>Snitch: org.apache.cassandra.locator.SimpleSnitch
>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>Schema versions: 
>   63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]
> Keyspace: test:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
> Options: [datacenter1:2]
>   Column Families:
> ColumnFamily: testCounter (Super)
> "APP status information."
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.CounterColumnType
>   Columns sorted by: 
> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>   Row cache size / save period in seconds: 0.0/0
>   Key cache size / save period in seconds: 20.0/14400
>   Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: true
>   Built indexes: []
> Then, I use a test program based on hector to add a counter column 
> (testCounter[sc][column]) 1000 times. In the middle the adding process, I 
> intentional shut down the node 172.17.19.152. In addition to that, the test 
> program is smart enough to switch the consistency level from Quorum to One, 
> so that the following adding actions would not fail. 
> After all the adding actions are done, I start the cassandra on 
> 172.17.19.152, and I use cassandra-cli to check if the counter is correct on 
> both nodes, and I got a result 1001 which should be reasonable because hector 
> will retry once. However, when I shut down 172.17.19.151 and after 
> 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the cassandra 
> on 172.17.19.151 again. Then, I check the counter again, this time I got a 
> result 481387 which is so wrong.
> I use 0.8.3 to reproduce this bug, but I think this also happens on 0.8.2 or 
> before also. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081603#comment-13081603
 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
-

{quote}
bq. We should convert the CRC32 to an int (and only write that) as it is an int 
internally (getValue() returns a long only because CRC32 implements the 
interface Checksum that require that).

Lets leave that to the ticket for CRC optimization which will allow us to 
modify that system-wide
{quote}
Let's not:
* this is completely orthogonal to switching to a drop-in, faster, CRC 
implementation.
* it is unclear we want to make that system-wide. Imho, it is not worth 
breaking commit log compatibility for that, but it it stupid to commit new code 
that perpetuate the mistake, especially to change it later.

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081605#comment-13081605
 ] 

Jonathan Ellis commented on CASSANDRA-1717:
---

Saving 4 bytes out of 64K doesn't seem like enough benefit to make life harder 
for ourselves if we want to use a long checksum later.

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081609#comment-13081609
 ] 

Pavel Yaskevich commented on CASSANDRA-1717:


+1 with Jonathan, also it is better if we satisfy interface instead of relying 
on internal implementation details that also could be helpful if we will decide 
to change checksum algorithm.

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081629#comment-13081629
 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
-

What are the chance we'll switch from CRC32 any time soon ? And even if we do, 
why would that help us to save 4 bytes of 0's right now ? We will still have to 
bump the file format versioning and to keep the code to be compatible with the 
old CRC32 format if we do so. It's not like the only difference between 
checksum algorithms is the size of the checksum.

So yes, 4 bytes out of 64K is not a lot of data, but to knowingly write 4 bytes 
of 0's every 64k every time for the vague remote chance that it may save us 1 
or 2 lines of code someday (again, that even remains to be proven) feels 
ridiculous to me. But if I'm the only one to feel that way, fine, it's not a 
big deal.

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081637#comment-13081637
 ] 

Pavel Yaskevich commented on CASSANDRA-1717:


I still think that such change is a matter of the separate ticket as we will 
want to change CRC stuff globally, we can make own Checksum class with will 
return int value, apply performance improvements mentioned by HADOOP-6148 to it 
and use system-wide.

Is there anything else that keeps this from being committed?

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1155374 - /cassandra/branches/cassandra-0.8/debian/control

2011-08-09 Thread eevans
Author: eevans
Date: Tue Aug  9 14:05:55 2011
New Revision: 1155374

URL: http://svn.apache.org/viewvc?rev=1155374&view=rev
Log:
build requires subversion (line 235 of build.xml)

Patch by Sven Wilhelm; reviewed by eevans

Modified:
cassandra/branches/cassandra-0.8/debian/control

Modified: cassandra/branches/cassandra-0.8/debian/control
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/debian/control?rev=1155374&r1=1155373&r2=1155374&view=diff
==
--- cassandra/branches/cassandra-0.8/debian/control (original)
+++ cassandra/branches/cassandra-0.8/debian/control Tue Aug  9 14:05:55 2011
@@ -2,7 +2,7 @@ Source: cassandra
 Section: misc
 Priority: extra
 Maintainer: Eric Evans 
-Build-Depends: debhelper (>= 5), openjdk-6-jdk (>= 6b11) | java6-sdk, ant (>= 
1.7), ant-optional (>= 1.7)
+Build-Depends: debhelper (>= 5), openjdk-6-jdk (>= 6b11) | java6-sdk, ant (>= 
1.7), ant-optional (>= 1.7), subversion
 Homepage: http://cassandra.apache.org
 Vcs-Svn: https://svn.apache.org/repos/asf/cassandra/trunk
 Vcs-Browser: http://svn.apache.org/viewvc/cassandra/trunk




[jira] [Created] (CASSANDRA-3008) Error getting range slices

2011-08-09 Thread Luis Eduardo Villares Matta (JIRA)
Error getting range slices
--

 Key: CASSANDRA-3008
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3008
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
 Environment: Ubuntu, using the 08x repository
Reporter: Luis Eduardo Villares Matta
Priority: Critical


I can get a range slice on one of my column families.

ERROR 14:16:26,672 Internal error processing get_range_slices
java.io.IOError: java.io.EOFException: EOF after 26948 bytes out of 1681403191
at 
org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:66)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:91)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:86)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:71)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:87)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:184)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144)
at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:136)
at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:39)
at 
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
at 
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
at 
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at org.apache.cassandra.db.RowIterator.hasNext(RowIterator.java:49)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1392)
at 
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:684)
at 
org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617)
at 
org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException: EOF after 26948 bytes out of 1681403191
at 
org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:229)
at 
org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:50)
at 
org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:57)
... 24 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081660#comment-13081660
 ] 

Sylvain Lebresne commented on CASSANDRA-3006:
-

I've haven't had luck with reproducing so far. I've tried to stick with the 
description above but did not used hector (not saying it is hector fault 
though, maybe it is the way it does retry that I don't emulate well). If you 
are able to share a minimal hector script with which you reproduce this easily, 
that would be very helpful.

> Enormous counter 
> -
>
> Key: CASSANDRA-3006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.3
> Environment: ubuntu 10.04
>Reporter: Boris Yen
>Assignee: Sylvain Lebresne
>
> I have two-node cluster with the following keyspace and column family 
> settings.
> Cluster Information:
>Snitch: org.apache.cassandra.locator.SimpleSnitch
>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>Schema versions: 
>   63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]
> Keyspace: test:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
> Options: [datacenter1:2]
>   Column Families:
> ColumnFamily: testCounter (Super)
> "APP status information."
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.CounterColumnType
>   Columns sorted by: 
> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>   Row cache size / save period in seconds: 0.0/0
>   Key cache size / save period in seconds: 20.0/14400
>   Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: true
>   Built indexes: []
> Then, I use a test program based on hector to add a counter column 
> (testCounter[sc][column]) 1000 times. In the middle the adding process, I 
> intentional shut down the node 172.17.19.152. In addition to that, the test 
> program is smart enough to switch the consistency level from Quorum to One, 
> so that the following adding actions would not fail. 
> After all the adding actions are done, I start the cassandra on 
> 172.17.19.152, and I use cassandra-cli to check if the counter is correct on 
> both nodes, and I got a result 1001 which should be reasonable because hector 
> will retry once. However, when I shut down 172.17.19.151 and after 
> 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the cassandra 
> on 172.17.19.151 again. Then, I check the counter again, this time I got a 
> result 481387 which is so wrong.
> I use 0.8.3 to reproduce this bug, but I think this also happens on 0.8.2 or 
> before also. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-08-09 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081665#comment-13081665
 ] 

T Jake Luciani commented on CASSANDRA-2474:
---

I don't (yet) know how to add hint types to hive but once a transposed hint 
operator was added we should be able to hook it into the hive driver.  

> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081669#comment-13081669
 ] 

Jonathan Ellis commented on CASSANDRA-2474:
---

Isn't changing query semantics kind of the opposite of what hints are supposed 
to be for?

> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3008) Error getting range slices

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081673#comment-13081673
 ] 

Jonathan Ellis commented on CASSANDRA-3008:
---

did you try "nodetool scrub"?

> Error getting range slices
> --
>
> Key: CASSANDRA-3008
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3008
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.2
> Environment: Ubuntu, using the 08x repository
>Reporter: Luis Eduardo Villares Matta
>Priority: Critical
>
> I can get a range slice on one of my column families.
> ERROR 14:16:26,672 Internal error processing get_range_slices
> java.io.IOError: java.io.EOFException: EOF after 26948 bytes out of 1681403191
> at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:66)
> at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:91)
> at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:86)
> at 
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:71)
> at 
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:87)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:184)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:136)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:39)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
> at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at org.apache.cassandra.db.RowIterator.hasNext(RowIterator.java:49)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1392)
> at 
> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:684)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException: EOF after 26948 bytes out of 1681403191
> at 
> org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:229)
> at 
> org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:50)
> at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:57)
> ... 24 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-08-09 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081679#comment-13081679
 ] 

Chris Burroughs commented on CASSANDRA-2749:


It would also be cool (but this is obviously speculative) to have the ability 
to keep Index files on an SSD, and the larger data files on rotating disks.

> fine-grained control over data directories
> --
>
> Key: CASSANDRA-2749
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
>Priority: Minor
> Fix For: 1.0
>
>
> Currently Cassandra supports multiple data directories but no way to control 
> what sstables are placed where. Particularly for systems with mixed SSDs and 
> rotational disks, it would be nice to pin frequently accessed columnfamilies 
> to the SSDs.
> Postgresql does this with tablespaces 
> (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
> should probably avoid using that name because of confusing similarity to 
> "keyspaces."

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3007) NullPointerException in MessagingService.java:420

2011-08-09 Thread Viliam Holub (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081680#comment-13081680
 ] 

Viliam Holub commented on CASSANDRA-3007:
-

It's removetoken command.

Yes, I updated the node and forgot to specify encryption_options - thanks!

> NullPointerException in MessagingService.java:420
> -
>
> Key: CASSANDRA-3007
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3007
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.8.3
> Environment: Linux w0 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 
> 05:15:26 UTC 2010 x86_64 GNU/Linux
> java version "1.6.0_18"
> OpenJDK Runtime Environment (IcedTea6 1.8.7) (6b18-1.8.7-2~squeeze1)
> OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
>Reporter: Viliam Holub
>Assignee: Jonathan Ellis
>Priority: Minor
>  Labels: nullpointerexception, streaming
> Fix For: 0.8.4
>
> Attachments: 3007.txt
>
>
> I'm getting large quantity of exceptions during streaming. It is always in 
> MessagingService.java:420. The streaming appears to be blocked.
>  INFO 10:11:14,734 Streaming to /10.235.77.27
> ERROR 10:11:14,734 Fatal exception in thread Thread[StreamStage:2,5,main]
> java.lang.NullPointerException
> at 
> org.apache.cassandra.net.MessagingService.stream(MessagingService.java:420)
> at 
> org.apache.cassandra.streaming.StreamOutSession.begin(StreamOutSession.java:176)
> at 
> org.apache.cassandra.streaming.StreamOut.transferRangesForRequest(StreamOut.java:148)
> at 
> org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:54)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081685#comment-13081685
 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
-

As previously said, I both disagree on using 8 bytes when we need 4 and that 
using 4 is a matter for another ticket, but since this is probably me being too 
anal as usual, +1 on the rest of the patch, modulo a small optional nitpick: 
the toLong() function is a bit hard to read imho. It's hard to see where the 
parenthesis are, and if it does the right thing. It seems ok though, I just 
think a simple for loop on the bytes would be more readable. We also 
historically keep ByteBufferUtil for ByteBuffer manipulations and use 
FBUtilities for byte[] manipulation.


> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081689#comment-13081689
 ] 

Pavel Yaskevich commented on CASSANDRA-1717:


Ok, I will move toLong(byte[] bytes) to FBUtilities and commit, thanks!

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081690#comment-13081690
 ] 

Jonathan Ellis commented on CASSANDRA-1717:
---

You're right, if we change checksum implementation we need to bump sstable 
revision anyway.  +1 casting to int here.  (But as you said above, -1 changing 
this in CommitLog.)

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, 
> checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3008) Error getting range slices

2011-08-09 Thread Luis Eduardo Villares Matta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081701#comment-13081701
 ] 

Luis Eduardo Villares Matta commented on CASSANDRA-3008:


No I did not, it seams to have fixed my issues.
Thank you Very Much. 
(I am inclined to close this issue, but I do not know if I should. Also I am 
testing every thing in the next few hours)

> Error getting range slices
> --
>
> Key: CASSANDRA-3008
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3008
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.2
> Environment: Ubuntu, using the 08x repository
>Reporter: Luis Eduardo Villares Matta
>Priority: Critical
>
> I can get a range slice on one of my column families.
> ERROR 14:16:26,672 Internal error processing get_range_slices
> java.io.IOError: java.io.EOFException: EOF after 26948 bytes out of 1681403191
> at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:66)
> at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:91)
> at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:86)
> at 
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:71)
> at 
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:87)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:184)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:136)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:39)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
> at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at org.apache.cassandra.db.RowIterator.hasNext(RowIterator.java:49)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1392)
> at 
> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:684)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException: EOF after 26948 bytes out of 1681403191
> at 
> org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:229)
> at 
> org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:50)
> at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:57)
> ... 24 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-1717:
---

Attachment: CASSANDRA-1717-v3.patch

v3 which removes BBU.toLong and adds FBU.byteArrayToInt + uses int instead of 
long for checksum

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717-v3.patch, 
> CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3008) Error getting range slices

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081712#comment-13081712
 ] 

Jonathan Ellis commented on CASSANDRA-3008:
---

Check (scrub) your other nodes -- data corruption can happen (usually from bad 
memory) but if there's a pattern of all the nodes being affected at the same 
time there could be a Cassandra bug.

> Error getting range slices
> --
>
> Key: CASSANDRA-3008
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3008
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.2
> Environment: Ubuntu, using the 08x repository
>Reporter: Luis Eduardo Villares Matta
>Priority: Critical
>
> I can get a range slice on one of my column families.
> ERROR 14:16:26,672 Internal error processing get_range_slices
> java.io.IOError: java.io.EOFException: EOF after 26948 bytes out of 1681403191
> at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:66)
> at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:91)
> at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:86)
> at 
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:71)
> at 
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:87)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:184)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:136)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:39)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
> at 
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
> at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at org.apache.cassandra.db.RowIterator.hasNext(RowIterator.java:49)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1392)
> at 
> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:684)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException: EOF after 26948 bytes out of 1681403191
> at 
> org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:229)
> at 
> org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:50)
> at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.(SimpleSliceReader.java:57)
> ... 24 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081718#comment-13081718
 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
-

lgtm, +1

> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717-v3.patch, 
> CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3009) 404 on apt-get install from http://www.apache.org/dist/cassandra/debian

2011-08-09 Thread Chris Lohfink (JIRA)
404 on apt-get install from http://www.apache.org/dist/cassandra/debian
---

 Key: CASSANDRA-3009
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3009
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation & website
Affects Versions: 0.8.3
 Environment: ubuntu maverick 64-bit
Reporter: Chris Lohfink
Priority: Minor


First bug report on here so sorry if I am doing something incorrectly.  I 
followed the wiki (http://wiki.apache.org/cassandra/DebianPackaging) but I am 
receiving a 404 error during the install.  Looks like the 
{code}
clohfink@roc-lvm-dev:~dev$ sudo apt-get install cassandra
[sudo] password for clohfink: 
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libcommons-pool-java authbind libmcrypt4 libtomcat6-java libcommons-dbcp-java 
tomcat6-common
Use 'apt-get autoremove' to remove them.
The following NEW packages will be installed:
  cassandra
0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
Need to get 8,415kB of archives.
After this operation, 9,540kB of additional disk space will be used.
Err http://www.apache.org/dist/cassandra/debian/ unstable/main cassandra all 
0.8.0
  404  Not Found
Failed to fetch 
http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.0_all.deb
  404  Not Found
E: Unable to fetch some archives, maybe run apt-get update or try with 
--fix-missing?
{code}
for debugging info:
{code}
clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
N: Can't select versions from package 'cassandra' as it purely virtual
N: No packages found
clohfink@roc-lvm-dev:~dev/fabrictests$ sudo add-apt-repository "deb 
http://www.apache.org/dist/cassandra/debian unstable main"
clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-get update
...
Ign http://www.apache.org/dist/cassandra/debian/ unstable/main Translation-en   
   
Ign http://www.apache.org/dist/cassandra/debian/ unstable/main 
Translation-en_US
...
Hit http://us.archive.ubuntu.com maverick-proposed/universe amd64 Packages
Fetched 6,989B in 1s (5,974B/s)
Reading package lists... Done
clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
Package: cassandra
Version: 0.8.0
Architecture: all
Maintainer: Eric Evans 
Installed-Size: 9316
Depends: openjdk-6-jre-headless (>= 6b11) | java6-runtime, jsvc (>= 1.0), 
libcommons-daemon-java (>= 1.0), adduser
Recommends: libjna-java
Homepage: http://cassandra.apache.org
Priority: extra
Section: misc
Filename: pool/main/c/cassandra/cassandra_0.8.0_all.deb
Size: 8415180
SHA256: 7eaaeb9d3ef5af6abff834fe93f1a84349dff98776eaee83f8dabb267ffe4833
SHA1: 9cca3ffbcbab9e6ba2385f734691c97afeaa8be6
MD5sum: 01e0435495f7ff40e1b4e4be5857a1ea
Description: distributed storage system for structured data
 Cassandra is a distributed (peer-to-peer) system for the management
 and storage of structured data.
{code}

included fabric script, if have fabric installed can run
{code}
fab -H localhost install_cassandra
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-1974) PFEPS-like snitch that uses gossip instead of a property file

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-1974:
-

Assignee: (was: Brandon Williams)

I think the biggest win is when you can automatically determine rack/dc from 
the environment somehow (e.g.: ec2snitch).  Otherwise the advantage of editing 
a file, vs edit + rsync, is small.  Small enough that it's probably not worth 
the education headache.

> PFEPS-like snitch that uses gossip instead of a property file
> -
>
> Key: CASSANDRA-1974
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1974
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Brandon Williams
>Priority: Minor
>
> Now that we have an ec2 snitch that propagates its rack/dc info via gossip 
> from CASSANDRA-1654, it doesn't make a lot of sense to use PFEPS where you 
> have to rsync the property file across all the machines when you add a node.  
> Instead, we could have a snitch where you specify its rack/dc in a property 
> file, and propagate this via gossip like the ec2 snitch.  In order to not 
> break PFEPS, this should probably be a new snitch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3009) 404 on apt-get install from http://www.apache.org/dist/cassandra/debian

2011-08-09 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-3009:
-

Attachment: fabfile.py

> 404 on apt-get install from http://www.apache.org/dist/cassandra/debian
> ---
>
> Key: CASSANDRA-3009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3009
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation & website
>Affects Versions: 0.8.3
> Environment: ubuntu maverick 64-bit
>Reporter: Chris Lohfink
>Priority: Minor
> Attachments: fabfile.py
>
>
> First bug report on here so sorry if I am doing something incorrectly.  I 
> followed the wiki (http://wiki.apache.org/cassandra/DebianPackaging) but I am 
> receiving a 404 error during the install.  Looks like the 
> {code}
> clohfink@roc-lvm-dev:~dev$ sudo apt-get install cassandra
> [sudo] password for clohfink: 
> Reading package lists... Done
> Building dependency tree   
> Reading state information... Done
> The following packages were automatically installed and are no longer 
> required:
>   libcommons-pool-java authbind libmcrypt4 libtomcat6-java 
> libcommons-dbcp-java tomcat6-common
> Use 'apt-get autoremove' to remove them.
> The following NEW packages will be installed:
>   cassandra
> 0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
> Need to get 8,415kB of archives.
> After this operation, 9,540kB of additional disk space will be used.
> Err http://www.apache.org/dist/cassandra/debian/ unstable/main cassandra all 
> 0.8.0
>   404  Not Found
> Failed to fetch 
> http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.0_all.deb
>   404  Not Found
> E: Unable to fetch some archives, maybe run apt-get update or try with 
> --fix-missing?
> {code}
> for debugging info:
> {code}
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> N: Can't select versions from package 'cassandra' as it purely virtual
> N: No packages found
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo add-apt-repository "deb 
> http://www.apache.org/dist/cassandra/debian unstable main"
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-get update
> ...
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main Translation-en 
>  
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main 
> Translation-en_US
> ...
> Hit http://us.archive.ubuntu.com maverick-proposed/universe amd64 Packages
> Fetched 6,989B in 1s (5,974B/s)
> Reading package lists... Done
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> Package: cassandra
> Version: 0.8.0
> Architecture: all
> Maintainer: Eric Evans 
> Installed-Size: 9316
> Depends: openjdk-6-jre-headless (>= 6b11) | java6-runtime, jsvc (>= 1.0), 
> libcommons-daemon-java (>= 1.0), adduser
> Recommends: libjna-java
> Homepage: http://cassandra.apache.org
> Priority: extra
> Section: misc
> Filename: pool/main/c/cassandra/cassandra_0.8.0_all.deb
> Size: 8415180
> SHA256: 7eaaeb9d3ef5af6abff834fe93f1a84349dff98776eaee83f8dabb267ffe4833
> SHA1: 9cca3ffbcbab9e6ba2385f734691c97afeaa8be6
> MD5sum: 01e0435495f7ff40e1b4e4be5857a1ea
> Description: distributed storage system for structured data
>  Cassandra is a distributed (peer-to-peer) system for the management
>  and storage of structured data.
> {code}
> included fabric script, if have fabric installed can run
> {code}
> fab -H localhost install_cassandra
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2892) Don't "replicate_on_write" with RF=1

2011-08-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2892:


Attachment: 2892.patch

That's a super easy one and it removes some nasty boolean flag from 
SP.sendToHintedEndpoints so let's do it.

> Don't "replicate_on_write" with RF=1
> 
>
> Key: CASSANDRA-2892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2892
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2892.patch
>
>
> For counters with RF=1, we still do a read to replicate, even though there is 
> nothing to replicate it too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2892) Don't "replicate_on_write" with RF=1

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081728#comment-13081728
 ] 

Jonathan Ellis commented on CASSANDRA-2892:
---

can you spell out what's going on with this part?

{code}
-if (cm.shouldReplicateOnWrite())
+hintedEndpoints.removeAll(FBUtilities.getLocalAddress());
+
+if (cm.shouldReplicateOnWrite() && !hintedEndpoints.isEmpty())
{code}

> Don't "replicate_on_write" with RF=1
> 
>
> Key: CASSANDRA-2892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2892
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2892.patch
>
>
> For counters with RF=1, we still do a read to replicate, even though there is 
> nothing to replicate it too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Cassandra Wiki] Trivial Update of "DebianPackaging" by SylvainLebresne

2011-08-09 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "DebianPackaging" page has been changed by SylvainLebresne:
http://wiki.apache.org/cassandra/DebianPackaging?action=diff&rev1=22&rev2=23

  To install on Debian or Debian derivatives, use the following sources:
  
  {{{
- deb http://www.apache.org/dist/cassandra/debian unstable main
+ deb http://www.apache.org/dist/cassandra/debian 08x main
- deb-src http://www.apache.org/dist/cassandra/debian unstable main
+ deb-src http://www.apache.org/dist/cassandra/debian 08x main
  }}}
  
- ''Note: the unstable suite points to the most current branch of development 
(for historical reasons).  Production systems should use a version-specific 
suite/codename, (for example, `06x` for the 0.6.x series, `07x` for the 0.7.x 
series, etc).''
+ You will want to replace `08x` by the series you want to use: `06x` for the 
0.6.x series, 07x for the 0.7.x series, etc... It does mean that you will not 
get major version update unless you change the series, but that is ''a 
feature''.
+ 
  
  If you run ''apt-get update'' now, you will see an error similar to this:
  {{{


[jira] [Commented] (CASSANDRA-2843) better performance on long row read

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081729#comment-13081729
 ] 

Jonathan Ellis commented on CASSANDRA-2843:
---

+1

> better performance on long row read
> ---
>
> Key: CASSANDRA-2843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Yang Yang
> Fix For: 1.0
>
> Attachments: 2843.patch, 2843_d.patch, 2843_g.patch, 2843_h.patch, 
> fix.diff, microBenchmark.patch, patch_timing, std_timing
>
>
> currently if a row contains > 1000 columns, the run time becomes considerably 
> slow (my test of 
> a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
> 40 bytes in value, is about 16ms.
> this is all running in memory, no disk read is involved.
> through debugging we can find
> most of this time is spent on 
> [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
> int, ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
>  Iterator, int)
> [Wall Time]  
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
>  Iterator, int)
> [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
> ColumnFamily.addColumn() is slow because it inserts into an internal 
> concurrentSkipListMap() that maps column names to values.
> this structure is slow for two reasons: it needs to do synchronization; it 
> needs to maintain a more complex structure of map.
> but if we look at the whole read path, thrift already defines the read output 
> to be List so it does not make sense to use a luxury map 
> data structure in the interium and finally convert it to a list. on the 
> synchronization side, since the return CF is never going to be 
> shared/modified by other threads, we know the access is always single thread, 
> so no synchronization is needed.
> but these 2 features are indeed needed for ColumnFamily in other cases, 
> particularly write. so we can provide a different ColumnFamily to 
> CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
> creates the standard ColumnFamily, but take a provided returnCF, whose cost 
> is much cheaper.
> the provided patch is for demonstration now, will work further once we agree 
> on the general direction. 
> CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
> provided. the main work is to let the FastColumnFamily use an array  for 
> internal storage. at first I used binary search to insert new columns in 
> addColumn(), but later I found that even this is not necessary, since all 
> calling scenarios of ColumnFamily.addColumn() has an invariant that the 
> inserted columns come in sorted order (I still have an issue to resolve 
> descending or ascending  now, but ascending works). so the current logic is 
> simply to compare the new column against the end column in the array, if 
> names not equal, append, if equal, reconcile.
> slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
> flavors of the method, one accepting a returnCF. but we could definitely 
> think about what is the better way to provide this returnCF.
> this patch compiles fine, no tests are provided yet. but I tested it in my 
> application, and the performance improvement is dramatic: it offers about 50% 
> reduction in read time in the 3000-column case.
> thanks
> Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-3009) 404 on apt-get install from http://www.apache.org/dist/cassandra/debian

2011-08-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-3009.
-

Resolution: Not A Problem

Sorry, this is because I don't update the 'unstable' series anymore. You should 
use 08x instead (or 07x if you feel inclined to).

It felt more easily harmful to have an 'unstable' version that would silently 
do major version upgrade, so we've switched to numbered series instead. I've 
updated the wiki accordingly. Sorry for the inconvenience.

> 404 on apt-get install from http://www.apache.org/dist/cassandra/debian
> ---
>
> Key: CASSANDRA-3009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3009
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation & website
>Affects Versions: 0.8.3
> Environment: ubuntu maverick 64-bit
>Reporter: Chris Lohfink
>Priority: Minor
> Attachments: fabfile.py
>
>
> First bug report on here so sorry if I am doing something incorrectly.  I 
> followed the wiki (http://wiki.apache.org/cassandra/DebianPackaging) but I am 
> receiving a 404 error during the install.  Looks like the 
> {code}
> clohfink@roc-lvm-dev:~dev$ sudo apt-get install cassandra
> [sudo] password for clohfink: 
> Reading package lists... Done
> Building dependency tree   
> Reading state information... Done
> The following packages were automatically installed and are no longer 
> required:
>   libcommons-pool-java authbind libmcrypt4 libtomcat6-java 
> libcommons-dbcp-java tomcat6-common
> Use 'apt-get autoremove' to remove them.
> The following NEW packages will be installed:
>   cassandra
> 0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
> Need to get 8,415kB of archives.
> After this operation, 9,540kB of additional disk space will be used.
> Err http://www.apache.org/dist/cassandra/debian/ unstable/main cassandra all 
> 0.8.0
>   404  Not Found
> Failed to fetch 
> http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.0_all.deb
>   404  Not Found
> E: Unable to fetch some archives, maybe run apt-get update or try with 
> --fix-missing?
> {code}
> for debugging info:
> {code}
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> N: Can't select versions from package 'cassandra' as it purely virtual
> N: No packages found
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo add-apt-repository "deb 
> http://www.apache.org/dist/cassandra/debian unstable main"
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-get update
> ...
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main Translation-en 
>  
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main 
> Translation-en_US
> ...
> Hit http://us.archive.ubuntu.com maverick-proposed/universe amd64 Packages
> Fetched 6,989B in 1s (5,974B/s)
> Reading package lists... Done
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> Package: cassandra
> Version: 0.8.0
> Architecture: all
> Maintainer: Eric Evans 
> Installed-Size: 9316
> Depends: openjdk-6-jre-headless (>= 6b11) | java6-runtime, jsvc (>= 1.0), 
> libcommons-daemon-java (>= 1.0), adduser
> Recommends: libjna-java
> Homepage: http://cassandra.apache.org
> Priority: extra
> Section: misc
> Filename: pool/main/c/cassandra/cassandra_0.8.0_all.deb
> Size: 8415180
> SHA256: 7eaaeb9d3ef5af6abff834fe93f1a84349dff98776eaee83f8dabb267ffe4833
> SHA1: 9cca3ffbcbab9e6ba2385f734691c97afeaa8be6
> MD5sum: 01e0435495f7ff40e1b4e4be5857a1ea
> Description: distributed storage system for structured data
>  Cassandra is a distributed (peer-to-peer) system for the management
>  and storage of structured data.
> {code}
> included fabric script, if have fabric installed can run
> {code}
> fab -H localhost install_cassandra
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2892) Don't "replicate_on_write" with RF=1

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081739#comment-13081739
 ] 

Sylvain Lebresne commented on CASSANDRA-2892:
-

Sure. Applying a counter update on the first replica has two parts: we apply 
locally (we know we are on a replica at that point), we read back (in 
cm.makeReplicationMutation, not showed in that diff) and finally we use 
sendToHintedEndpoint to send what was read to the remaining replica. To avoid 
reapplying locally in sendToHintedEndpoint, we used to give it the 
'applyMutationLocally' flag to false.

Instead, this patch remove the local node from hintedEndpoints (since we just 
applied locally already) before calling sendToHintedEndpoint. We can then just 
check if hintedEndpoints is empty as a synonym of 'is it RF=1'.

> Don't "replicate_on_write" with RF=1
> 
>
> Key: CASSANDRA-2892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2892
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2892.patch
>
>
> For counters with RF=1, we still do a read to replicate, even though there is 
> nothing to replicate it too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081738#comment-13081738
 ] 

Jonathan Ellis commented on CASSANDRA-1608:
---

Where I was going is if we are compacting {L2.1, L3.1, L3.2, ..., L3.11} we can 
also compact {L2.9, L3.90, L3.91, ..., L3.99}, for instance.  Because if the 
input keys are non-overlapping, we know that the output keys will be as well.  
Right?

> Redesigned Compaction
> -
>
> Key: CASSANDRA-1608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Benjamin Coverston
> Attachments: 1608-v11.txt, 1608-v2.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-08-09 Thread Benjamin Coverston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081746#comment-13081746
 ] 

Benjamin Coverston commented on CASSANDRA-1608:
---

We know the input and output keys, yes.

If we isolate the problem to concurrent compactions in the same level, and 
staggered levels {L2, L3}, {L4, L5}.

> Redesigned Compaction
> -
>
> Key: CASSANDRA-1608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Benjamin Coverston
> Attachments: 1608-v11.txt, 1608-v2.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1608) Redesigned Compaction

2011-08-09 Thread Benjamin Coverston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081746#comment-13081746
 ] 

Benjamin Coverston edited comment on CASSANDRA-1608 at 8/9/11 4:36 PM:
---

We know the input and output keys, yes.

If we isolate the problem to concurrent compactions in the same level, and 
staggered levels {L2, L3}, {L4, L5} it is certainly an easier problem.

  was (Author: bcoverston):
We know the input and output keys, yes.

If we isolate the problem to concurrent compactions in the same level, and 
staggered levels {L2, L3}, {L4, L5}.
  
> Redesigned Compaction
> -
>
> Key: CASSANDRA-1608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Benjamin Coverston
> Attachments: 1608-v11.txt, 1608-v2.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3009) 404 on apt-get install from http://www.apache.org/dist/cassandra/debian

2011-08-09 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081748#comment-13081748
 ] 

Chris Lohfink commented on CASSANDRA-3009:
--

Thanks!  Worked great.

> 404 on apt-get install from http://www.apache.org/dist/cassandra/debian
> ---
>
> Key: CASSANDRA-3009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3009
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation & website
>Affects Versions: 0.8.3
> Environment: ubuntu maverick 64-bit
>Reporter: Chris Lohfink
>Priority: Minor
> Attachments: fabfile.py
>
>
> First bug report on here so sorry if I am doing something incorrectly.  I 
> followed the wiki (http://wiki.apache.org/cassandra/DebianPackaging) but I am 
> receiving a 404 error during the install.  Looks like the 
> {code}
> clohfink@roc-lvm-dev:~dev$ sudo apt-get install cassandra
> [sudo] password for clohfink: 
> Reading package lists... Done
> Building dependency tree   
> Reading state information... Done
> The following packages were automatically installed and are no longer 
> required:
>   libcommons-pool-java authbind libmcrypt4 libtomcat6-java 
> libcommons-dbcp-java tomcat6-common
> Use 'apt-get autoremove' to remove them.
> The following NEW packages will be installed:
>   cassandra
> 0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
> Need to get 8,415kB of archives.
> After this operation, 9,540kB of additional disk space will be used.
> Err http://www.apache.org/dist/cassandra/debian/ unstable/main cassandra all 
> 0.8.0
>   404  Not Found
> Failed to fetch 
> http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.0_all.deb
>   404  Not Found
> E: Unable to fetch some archives, maybe run apt-get update or try with 
> --fix-missing?
> {code}
> for debugging info:
> {code}
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> N: Can't select versions from package 'cassandra' as it purely virtual
> N: No packages found
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo add-apt-repository "deb 
> http://www.apache.org/dist/cassandra/debian unstable main"
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-get update
> ...
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main Translation-en 
>  
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main 
> Translation-en_US
> ...
> Hit http://us.archive.ubuntu.com maverick-proposed/universe amd64 Packages
> Fetched 6,989B in 1s (5,974B/s)
> Reading package lists... Done
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> Package: cassandra
> Version: 0.8.0
> Architecture: all
> Maintainer: Eric Evans 
> Installed-Size: 9316
> Depends: openjdk-6-jre-headless (>= 6b11) | java6-runtime, jsvc (>= 1.0), 
> libcommons-daemon-java (>= 1.0), adduser
> Recommends: libjna-java
> Homepage: http://cassandra.apache.org
> Priority: extra
> Section: misc
> Filename: pool/main/c/cassandra/cassandra_0.8.0_all.deb
> Size: 8415180
> SHA256: 7eaaeb9d3ef5af6abff834fe93f1a84349dff98776eaee83f8dabb267ffe4833
> SHA1: 9cca3ffbcbab9e6ba2385f734691c97afeaa8be6
> MD5sum: 01e0435495f7ff40e1b4e4be5857a1ea
> Description: distributed storage system for structured data
>  Cassandra is a distributed (peer-to-peer) system for the management
>  and storage of structured data.
> {code}
> included fabric script, if have fabric installed can run
> {code}
> fab -H localhost install_cassandra
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3009) 404 on apt-get install from http://www.apache.org/dist/cassandra/debian

2011-08-09 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-3009:
-

Comment: was deleted

(was: wiki was updated with distribution changes)

> 404 on apt-get install from http://www.apache.org/dist/cassandra/debian
> ---
>
> Key: CASSANDRA-3009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3009
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation & website
>Affects Versions: 0.8.3
> Environment: ubuntu maverick 64-bit
>Reporter: Chris Lohfink
>Priority: Minor
> Attachments: fabfile.py
>
>
> First bug report on here so sorry if I am doing something incorrectly.  I 
> followed the wiki (http://wiki.apache.org/cassandra/DebianPackaging) but I am 
> receiving a 404 error during the install.  Looks like the 
> {code}
> clohfink@roc-lvm-dev:~dev$ sudo apt-get install cassandra
> [sudo] password for clohfink: 
> Reading package lists... Done
> Building dependency tree   
> Reading state information... Done
> The following packages were automatically installed and are no longer 
> required:
>   libcommons-pool-java authbind libmcrypt4 libtomcat6-java 
> libcommons-dbcp-java tomcat6-common
> Use 'apt-get autoremove' to remove them.
> The following NEW packages will be installed:
>   cassandra
> 0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
> Need to get 8,415kB of archives.
> After this operation, 9,540kB of additional disk space will be used.
> Err http://www.apache.org/dist/cassandra/debian/ unstable/main cassandra all 
> 0.8.0
>   404  Not Found
> Failed to fetch 
> http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.0_all.deb
>   404  Not Found
> E: Unable to fetch some archives, maybe run apt-get update or try with 
> --fix-missing?
> {code}
> for debugging info:
> {code}
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> N: Can't select versions from package 'cassandra' as it purely virtual
> N: No packages found
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo add-apt-repository "deb 
> http://www.apache.org/dist/cassandra/debian unstable main"
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-get update
> ...
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main Translation-en 
>  
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main 
> Translation-en_US
> ...
> Hit http://us.archive.ubuntu.com maverick-proposed/universe amd64 Packages
> Fetched 6,989B in 1s (5,974B/s)
> Reading package lists... Done
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> Package: cassandra
> Version: 0.8.0
> Architecture: all
> Maintainer: Eric Evans 
> Installed-Size: 9316
> Depends: openjdk-6-jre-headless (>= 6b11) | java6-runtime, jsvc (>= 1.0), 
> libcommons-daemon-java (>= 1.0), adduser
> Recommends: libjna-java
> Homepage: http://cassandra.apache.org
> Priority: extra
> Section: misc
> Filename: pool/main/c/cassandra/cassandra_0.8.0_all.deb
> Size: 8415180
> SHA256: 7eaaeb9d3ef5af6abff834fe93f1a84349dff98776eaee83f8dabb267ffe4833
> SHA1: 9cca3ffbcbab9e6ba2385f734691c97afeaa8be6
> MD5sum: 01e0435495f7ff40e1b4e4be5857a1ea
> Description: distributed storage system for structured data
>  Cassandra is a distributed (peer-to-peer) system for the management
>  and storage of structured data.
> {code}
> included fabric script, if have fabric installed can run
> {code}
> fab -H localhost install_cassandra
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Closed] (CASSANDRA-3009) 404 on apt-get install from http://www.apache.org/dist/cassandra/debian

2011-08-09 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink closed CASSANDRA-3009.



wiki was updated with distribution changes

> 404 on apt-get install from http://www.apache.org/dist/cassandra/debian
> ---
>
> Key: CASSANDRA-3009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3009
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation & website
>Affects Versions: 0.8.3
> Environment: ubuntu maverick 64-bit
>Reporter: Chris Lohfink
>Priority: Minor
> Attachments: fabfile.py
>
>
> First bug report on here so sorry if I am doing something incorrectly.  I 
> followed the wiki (http://wiki.apache.org/cassandra/DebianPackaging) but I am 
> receiving a 404 error during the install.  Looks like the 
> {code}
> clohfink@roc-lvm-dev:~dev$ sudo apt-get install cassandra
> [sudo] password for clohfink: 
> Reading package lists... Done
> Building dependency tree   
> Reading state information... Done
> The following packages were automatically installed and are no longer 
> required:
>   libcommons-pool-java authbind libmcrypt4 libtomcat6-java 
> libcommons-dbcp-java tomcat6-common
> Use 'apt-get autoremove' to remove them.
> The following NEW packages will be installed:
>   cassandra
> 0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
> Need to get 8,415kB of archives.
> After this operation, 9,540kB of additional disk space will be used.
> Err http://www.apache.org/dist/cassandra/debian/ unstable/main cassandra all 
> 0.8.0
>   404  Not Found
> Failed to fetch 
> http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.0_all.deb
>   404  Not Found
> E: Unable to fetch some archives, maybe run apt-get update or try with 
> --fix-missing?
> {code}
> for debugging info:
> {code}
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> N: Can't select versions from package 'cassandra' as it purely virtual
> N: No packages found
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo add-apt-repository "deb 
> http://www.apache.org/dist/cassandra/debian unstable main"
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-get update
> ...
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main Translation-en 
>  
> Ign http://www.apache.org/dist/cassandra/debian/ unstable/main 
> Translation-en_US
> ...
> Hit http://us.archive.ubuntu.com maverick-proposed/universe amd64 Packages
> Fetched 6,989B in 1s (5,974B/s)
> Reading package lists... Done
> clohfink@roc-lvm-dev:~dev/fabrictests$ sudo apt-cache show cassandra
> Package: cassandra
> Version: 0.8.0
> Architecture: all
> Maintainer: Eric Evans 
> Installed-Size: 9316
> Depends: openjdk-6-jre-headless (>= 6b11) | java6-runtime, jsvc (>= 1.0), 
> libcommons-daemon-java (>= 1.0), adduser
> Recommends: libjna-java
> Homepage: http://cassandra.apache.org
> Priority: extra
> Section: misc
> Filename: pool/main/c/cassandra/cassandra_0.8.0_all.deb
> Size: 8415180
> SHA256: 7eaaeb9d3ef5af6abff834fe93f1a84349dff98776eaee83f8dabb267ffe4833
> SHA1: 9cca3ffbcbab9e6ba2385f734691c97afeaa8be6
> MD5sum: 01e0435495f7ff40e1b4e4be5857a1ea
> Description: distributed storage system for structured data
>  Cassandra is a distributed (peer-to-peer) system for the management
>  and storage of structured data.
> {code}
> included fabric script, if have fabric installed can run
> {code}
> fab -H localhost install_cassandra
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2919) CQL system test for counters is failing

2011-08-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-2919.
-

Resolution: Cannot Reproduce

Ok, I cannot reproduce either anymore. Probably got fixed, or I screwed up the 
first time. Sorry for that.

> CQL system test for counters is failing
> ---
>
> Key: CASSANDRA-2919
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2919
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tests
> Environment: ubuntu 11.04 64 bit
>Reporter: Sylvain Lebresne
>Assignee: Tyler Hobbs
>Priority: Minor
>  Labels: cql, test
>
> On my machine (and on current 0.8 branch) the CQL system test for counters is 
> failing. While reading the counter value, junk bytes are apparently returned 
> instead of the value (on the following excerpt it looks like a empty value, 
> but on the terminal it does show a random character):
> {noformat}
> ==
> FAIL: update statement should be able to work with counter columns
> --
> Traceback (most recent call last):
>   File "/usr/lib/pymodules/python2.7/nose/case.py", line 186, in runTest
> self.test(*self.arg)
>   File "/home/pcmanus/Git/cassandra/test/system/test_cql.py", line 1130, in 
> test_counter_column_support
> "unrecognized value '%s'" % r[1]
> AssertionError: unrecognized value ''
> --
> {noformat}
> I've checked, the server correctly fetch the right column and return what it 
> should. So this seems to be on the python driver side.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081773#comment-13081773
 ] 

Hudson commented on CASSANDRA-1717:
---

Integrated in Cassandra #1010 (See 
[https://builds.apache.org/job/Cassandra/1010/])
Add block level checksum for compressed data
patch by Pavel Yaskevich; reviewed by Sylvain Lebresne for CASSANDRA-1717

xedin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1155420
Files : 
* /cassandra/trunk/test/unit/org/apache/cassandra/Util.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/io/compress/CompressedRandomAccessReaderTest.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/compress/CorruptedBlockException.java
* /cassandra/trunk/CHANGES.txt
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/compress/CompressedRandomAccessReader.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/io/util/BufferedRandomAccessFileTest.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/compress/CompressionMetadata.java
* /cassandra/trunk/src/java/org/apache/cassandra/utils/FBUtilities.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java


> Cassandra cannot detect corrupt-but-readable column data
> 
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717-v3.patch, 
> CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2843) better performance on long row read

2011-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081772#comment-13081772
 ] 

Hudson commented on CASSANDRA-2843:
---

Integrated in Cassandra #1010 (See 
[https://builds.apache.org/job/Cassandra/1010/])
Make ColumnFamily backing column map pluggable and introduce unsynchronized 
ArrayList backed map for reads
patch by slebresne; reviewed by jbellis for CASSANDRA-2843

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1155426
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/IFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/db/columniterator/SSTableNamesIterator.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/NamesQueryFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Table.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/RowRepairResolver.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/db/ArrayBackedSortedColumnsTest.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/db/ArrayBackedSortedColumns.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ThreadSafeSortedColumns.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
* /cassandra/trunk/src/java/org/apache/cassandra/thrift/CassandraServer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CounterMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ISortedColumns.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ReadResponse.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/test/unit/org/apache/cassandra/db/RowTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Row.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/streaming/StreamingTransferTest.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/service/AntiEntropyServiceTestAbstract.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java


> better performance on long row read
> ---
>
> Key: CASSANDRA-2843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Yang Yang
>Assignee: Sylvain Lebresne
> Fix For: 1.0
>
> Attachments: 2843.patch, 2843_d.patch, 2843_g.patch, 2843_h.patch, 
> fix.diff, microBenchmark.patch, patch_timing, std_timing
>
>
> currently if a row contains > 1000 columns, the run time becomes considerably 
> slow (my test of 
> a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
> 40 bytes in value, is about 16ms.
> this is all running in memory, no disk read is involved.
> through debugging we can find
> most of this time is spent on 
> [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
> int, ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
>  Iterator, int)
> [Wall Time]  
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
>  Iterator, int)
> [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
> ColumnFamily.addColumn() is slow because it inserts into an internal 
> concurrentSkipListMap() that maps column names to values.
> this structure is slow for two reasons: it needs to do synchronization; it 
> needs to maintain a more complex structure of map.
> but if we look at the whole read path, thrift already defines the read output 
> to be List so it does not make sense to use a luxury map 
> data structure in the interium and finally convert it to a list. on the 
> synchronization side, since the return CF is never going to be 
> shared/modified by other threads, we know the access is always single thread, 
> so no synchronization is needed.
> but these 2 features are indeed needed for ColumnFamily in other cases, 
> parti

[jira] [Updated] (CASSANDRA-2990) We should refuse query for counters at CL.ANY

2011-08-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2990:


Attachment: 2990.patch

> We should refuse query for counters at CL.ANY
> -
>
> Key: CASSANDRA-2990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2990
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2990.patch
>
>
> We currently do not reject writes for counters at CL.ANY, even though this is 
> not supported (and rightly so).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2892) Don't "replicate_on_write" with RF=1

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2892:
--

Attachment: 2892-v1.5.txt

v1.5 attached.  I thought I could improve it more, but couldn't. :)

Ended up just extracting counterWriteTask() to remove the 
executeOnMutationStage flag.

> Don't "replicate_on_write" with RF=1
> 
>
> Key: CASSANDRA-2892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2892
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2892-v1.5.txt, 2892.patch
>
>
> For counters with RF=1, we still do a read to replicate, even though there is 
> nothing to replicate it too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2892) Don't "replicate_on_write" with RF=1

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081802#comment-13081802
 ] 

Sylvain Lebresne commented on CASSANDRA-2892:
-

v1.5 lgtm

> Don't "replicate_on_write" with RF=1
> 
>
> Key: CASSANDRA-2892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2892
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2892-v1.5.txt, 2892.patch
>
>
> For counters with RF=1, we still do a read to replicate, even though there is 
> nothing to replicate it too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1155460 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/service/StorageProxy.java

2011-08-09 Thread jbellis
Author: jbellis
Date: Tue Aug  9 18:37:20 2011
New Revision: 1155460

URL: http://svn.apache.org/viewvc?rev=1155460&view=rev
Log:
avoid doing read forno-op replicate-on-write at CL=1
patch by slebresne and jbellis for CASSANDRA-2892

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1155460&r1=1155459&r2=1155460&view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Tue Aug  9 18:37:20 2011
@@ -1,6 +1,7 @@
 0.8.4
  * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972)
  * use JAVA env var in cassandra-env.sh (CASSANDRA-2785, 2992)
+ * avoid doing read for no-op replicate-on-write at CL=1 (CASSANDRA-2892)
 
 
 0.8.3

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java?rev=1155460&r1=1155459&r2=1155460&view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java
 Tue Aug  9 18:37:20 2011
@@ -96,7 +96,7 @@ public class StorageProxy implements Sto
 public void apply(IMutation mutation, Multimap hintedEndpoints, IWriteResponseHandler responseHandler, String 
localDataCenter, ConsistencyLevel consistency_level) throws IOException
 {
 assert mutation instanceof RowMutation;
-sendToHintedEndpoints((RowMutation) mutation, hintedEndpoints, 
responseHandler, localDataCenter, true, consistency_level);
+sendToHintedEndpoints((RowMutation) mutation, hintedEndpoints, 
responseHandler, localDataCenter, consistency_level);
 }
 };
 
@@ -110,7 +110,11 @@ public class StorageProxy implements Sto
 {
 public void apply(IMutation mutation, Multimap hintedEndpoints, IWriteResponseHandler responseHandler, String 
localDataCenter, ConsistencyLevel consistency_level) throws IOException
 {
-applyCounterMutation(mutation, hintedEndpoints, 
responseHandler, localDataCenter, consistency_level, false);
+if (logger.isDebugEnabled())
+logger.debug("insert writing local & replicate " + 
mutation.toString(true));
+
+Runnable runnable = counterWriteTask(mutation, 
hintedEndpoints, responseHandler, localDataCenter, consistency_level);
+runnable.run();
 }
 };
 
@@ -118,7 +122,11 @@ public class StorageProxy implements Sto
 {
 public void apply(IMutation mutation, Multimap hintedEndpoints, IWriteResponseHandler responseHandler, String 
localDataCenter, ConsistencyLevel consistency_level) throws IOException
 {
-applyCounterMutation(mutation, hintedEndpoints, 
responseHandler, localDataCenter, consistency_level, true);
+if (logger.isDebugEnabled())
+logger.debug("insert writing local & replicate " + 
mutation.toString(true));
+
+Runnable runnable = counterWriteTask(mutation, 
hintedEndpoints, responseHandler, localDataCenter, consistency_level);
+StageManager.getStage(Stage.MUTATION).execute(runnable);
 }
 };
 }
@@ -218,7 +226,7 @@ public class StorageProxy implements Sto
 return 
ss.getTokenMetadata().getWriteEndpoints(StorageService.getPartitioner().getToken(key),
 table, naturalEndpoints);
 }
 
-private static void sendToHintedEndpoints(final RowMutation rm, 
Multimap hintedEndpoints, IWriteResponseHandler 
responseHandler, String localDataCenter, boolean insertLocalMessages, 
ConsistencyLevel consistency_level)
+private static void sendToHintedEndpoints(final RowMutation rm, 
Multimap hintedEndpoints, IWriteResponseHandler 
responseHandler, String localDataCenter, ConsistencyLevel consistency_level)
 throws IOException
 {
 // Multimap that holds onto all the messages and addresses meant for a 
specific datacenter
@@ -237,8 +245,7 @@ public class StorageProxy implements Sto
 // unhinted writes
 if (destination.equals(FBUtilities.getLocalAddress()))
 {
-if (insertLocalMessages)
-insertLocal(rm, responseHandler);
+insertLocal(rm, responseHandler);
 }
 else
 {
@@ -425,13 +432,9 @@ public class Storag

[jira] [Resolved] (CASSANDRA-2892) Don't "replicate_on_write" with RF=1

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2892.
---

Resolution: Fixed
  Reviewer: jbellis

committed

> Don't "replicate_on_write" with RF=1
> 
>
> Key: CASSANDRA-2892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2892
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2892-v1.5.txt, 2892.patch
>
>
> For counters with RF=1, we still do a read to replicate, even though there is 
> nothing to replicate it too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1155466 - in /cassandra/trunk: ./ contrib/ debian/ interface/thrift/gen-java/org/apache/cassandra/thrift/ redhat/ src/java/org/apache/cassandra/cli/ src/java/org/apache/cassandra/service/

2011-08-09 Thread jbellis
Author: jbellis
Date: Tue Aug  9 18:40:54 2011
New Revision: 1155466

URL: http://svn.apache.org/viewvc?rev=1155466&view=rev
Log:
merge from 0.8

Modified:
cassandra/trunk/   (props changed)
cassandra/trunk/CHANGES.txt
cassandra/trunk/contrib/   (props changed)
cassandra/trunk/debian/control

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)
cassandra/trunk/redhat/cassandra
cassandra/trunk/src/java/org/apache/cassandra/cli/Cli.g
cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java
cassandra/trunk/src/java/org/apache/cassandra/cli/CliCompleter.java
cassandra/trunk/src/java/org/apache/cassandra/service/StorageProxy.java
cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java
cassandra/trunk/src/resources/org/apache/cassandra/cli/CliHelp.yaml
cassandra/trunk/test/unit/org/apache/cassandra/cli/CliTest.java

Propchange: cassandra/trunk/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Aug  9 18:40:54 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291
 /cassandra/branches/cassandra-0.7:1026516-1151306
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
-/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1154424
+/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1155460
 /cassandra/branches/cassandra-0.8.0:1125021-1130369
 /cassandra/branches/cassandra-0.8.1:1101014-1125018
 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1155466&r1=1155465&r2=1155466&view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Tue Aug  9 18:40:54 2011
@@ -33,6 +33,8 @@
 
 0.8.4
  * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972)
+ * use JAVA env var in cassandra-env.sh (CASSANDRA-2785, 2992)
+ * avoid doing read for no-op replicate-on-write at CL=1 (CASSANDRA-2892)
 
 
 0.8.3

Propchange: cassandra/trunk/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Aug  9 18:40:54 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009
 /cassandra/branches/cassandra-0.7/contrib:1026516-1151306
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
-/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1154424
+/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1155460
 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369
 /cassandra/branches/cassandra-0.8.1/contrib:1101014-1125018
 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689

Modified: cassandra/trunk/debian/control
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/debian/control?rev=1155466&r1=1155465&r2=1155466&view=diff
==
--- cassandra/trunk/debian/control (original)
+++ cassandra/trunk/debian/control Tue Aug  9 18:40:54 2011
@@ -2,7 +2,7 @@ Source: cassandra
 Section: misc
 Priority: extra
 Maintainer: Eric Evans 
-Build-Depends: debhelper (>= 5), openjdk-6-jdk (>= 6b11) | java6-sdk, ant (>= 
1.7), ant-optional (>= 1.7)
+Build-Depends: debhelper (>= 5), openjdk-6-jdk (>= 6b11) | java6-sdk, ant (>= 
1.7), ant-optional (>= 1.7), subversion
 Homepage: http://cassandra.apache.org
 Vcs-Svn: https://svn.apache.org/repos/asf/cassandra/trunk
 Vcs-Browser: http://svn.apache.org/viewvc/cassandra/trunk

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Aug  9 18:40:54 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
 
/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1151306
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
-/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thr

[jira] [Commented] (CASSANDRA-2990) We should refuse query for counters at CL.ANY

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081831#comment-13081831
 ] 

Jonathan Ellis commented on CASSANDRA-2990:
---

A few days ago, you said, "A counter mutation only live enough so that it is 
applied to the first replica. Once this is done, a *row* mutation is generated 
for the other replica. That second mutation can be hinted. But that is a row 
mutation, so there should be no special casing at all for that."

Why can't we hint the first replica?

> We should refuse query for counters at CL.ANY
> -
>
> Key: CASSANDRA-2990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2990
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2990.patch
>
>
> We currently do not reject writes for counters at CL.ANY, even though this is 
> not supported (and rightly so).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




buildbot failure in ASF Buildbot on cassandra-trunk

2011-08-09 Thread buildbot
The Buildbot has detected a new failure on builder cassandra-trunk while 
building ASF Buildbot.
Full details are available at:
 http://ci.apache.org/builders/cassandra-trunk/builds/1503

Buildbot URL: http://ci.apache.org/

Buildslave for this Build: isis_ubuntu

Build Reason: scheduler
Build Source Stamp: [branch cassandra/trunk] 1155466
Blamelist: jbellis

BUILD FAILED: failed compile

sincerely,
 -The Buildbot



[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-09 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834
 ] 

Brandon Williams commented on CASSANDRA-2868:
-

bq. Wouldn't it be worth indicating that how many collection have been done 
since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there 
were no GCs (the api is flakey.)  I've never actually been able to get > 1 to 
happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where 
we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The 
worst case is >1 GC inflates the gctime enough that we errantly log when it's 
not needed, but I imagine to trigger that you would have to be in a gc pressure 
situation already.

bq. I think I'd rather have something like the dropped messages logger, where 
every N seconds we log the summary we get from the mbean.

That seems like it could a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be 
removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap 
still 80% full?") and it seems to work as well as it always has.



> Native Memory Leak
> --
>
> Key: CASSANDRA-2868
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Daniel Doubleday
>Assignee: Brandon Williams
>Priority: Minor
> Fix For: 0.8.4
>
> Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, 
> low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by 
> several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until 
> it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is 
> related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2868) Native Memory Leak

2011-08-09 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834
 ] 

Brandon Williams edited comment on CASSANDRA-2868 at 8/9/11 6:43 PM:
-

bq. Wouldn't it be worth indicating that how many collection have been done 
since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there 
were no GCs (the api is flakey.)  I've never actually been able to get > 1 to 
happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where 
we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The 
worst case is >1 GC inflates the gctime enough that we errantly log when it's 
not needed, but I imagine to trigger that you would have to be in a gc pressure 
situation already.

bq. I think I'd rather have something like the dropped messages logger, where 
every N seconds we log the summary we get from the mbean.

That seems like it could be a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be 
removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap 
still 80% full?") and it seems to work as well as it always has.



  was (Author: brandon.williams):
bq. Wouldn't it be worth indicating that how many collection have been done 
since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there 
were no GCs (the api is flakey.)  I've never actually been able to get > 1 to 
happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where 
we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The 
worst case is >1 GC inflates the gctime enough that we errantly log when it's 
not needed, but I imagine to trigger that you would have to be in a gc pressure 
situation already.

bq. I think I'd rather have something like the dropped messages logger, where 
every N seconds we log the summary we get from the mbean.

That seems like it could a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be 
removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap 
still 80% full?") and it seems to work as well as it always has.


  
> Native Memory Leak
> --
>
> Key: CASSANDRA-2868
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Daniel Doubleday
>Assignee: Brandon Williams
>Priority: Minor
> Fix For: 0.8.4
>
> Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, 
> low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by 
> several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until 
> it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is 
> related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2990) We should refuse query for counters at CL.ANY

2011-08-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081854#comment-13081854
 ] 

Sylvain Lebresne commented on CASSANDRA-2990:
-

bq. Why can't we hint the first replica?

Well, actually I think we could. Or at least if we cannot I forgot why. We 
would need to be sure we never replay an hint twice though, which I'm not sure 
is a guarantee right now. Also, we can only make this if what we store as a 
hint is the serialized mutation (in this case, the serialized CounterMutation): 
we can't apply the CounterMutation on a non-replica (partly because that would 
potentially increase the counter context too much, partly because counter 
remove suck, which would probably be a problem at some point).

So it should be doable, but it's a bit of work.

> We should refuse query for counters at CL.ANY
> -
>
> Key: CASSANDRA-2990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2990
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2990.patch
>
>
> We currently do not reject writes for counters at CL.ANY, even though this is 
> not supported (and rightly so).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2892) Don't "replicate_on_write" with RF=1

2011-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081858#comment-13081858
 ] 

Hudson commented on CASSANDRA-2892:
---

Integrated in Cassandra-0.8 #264 (See 
[https://builds.apache.org/job/Cassandra-0.8/264/])
avoid doing read forno-op replicate-on-write at CL=1
patch by slebresne and jbellis for CASSANDRA-2892

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1155460
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java


> Don't "replicate_on_write" with RF=1
> 
>
> Key: CASSANDRA-2892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2892
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2892-v1.5.txt, 2892.patch
>
>
> For counters with RF=1, we still do a read to replicate, even though there is 
> nothing to replicate it too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2990) We should refuse query for counters at CL.ANY

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081862#comment-13081862
 ] 

Jonathan Ellis commented on CASSANDRA-2990:
---

Okay, +1 on making the validation match what is actually currently supported 
(no ANY for counters), although I'd change "not supported" to "not yet 
supported."

We can deal w/ adding ANY support if and when someone actually needs it.

> We should refuse query for counters at CL.ANY
> -
>
> Key: CASSANDRA-2990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2990
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2990.patch
>
>
> We currently do not reject writes for counters at CL.ANY, even though this is 
> not supported (and rightly so).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2518) invalid column name length 0

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2518.
---

Resolution: Duplicate

probably CASSANDRA-2675, fixed in 0.7.7

> invalid column name length 0
> 
>
> Key: CASSANDRA-2518
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2518
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.7.3
> Environment: three nodes, 
> JVM:
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn2400M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
> -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
>Reporter: lichenglin
>
> one of the three nodes cassandra 0.7.3 report error after start up:
> ERROR [CompactionExecutor:1] 2011-04-16 22:18:39,281 PrecompactedRow.java 
> (line 82) Skipping row DecoratedKey(3813860378406449638560060231106122758, 
> 79616e79776275636b65743030303030303030312f6f626a303030303030323534) in 
> /opt/cassandra/data/Keyspace/cf-f-4715-Data.db
> org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid 
> column name length 0
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:68)
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:176)
> at 
> org.apache.cassandra.io.PrecompactedRow.(PrecompactedRow.java:78)
> at 
> org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
> at 
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
> at 
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
> at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> at 
> org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
> at 
> org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
> at 
> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:449)
> at 
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
> at 
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> and few minutes later,
> ERROR [CompactionExecutor:1] 2011-04-16 22:20:20,073 
> AbstractCassandraDaemon.java (line 114) Fatal exception in thread 
> Thread[CompactionExecutor:1,1,main]
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.cassandra.io.util.BufferedRandomAccessFile.readBytes(BufferedRandomAccessFile.java:267)
> at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:310)
> at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:267)
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94)
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:176)
> at 
> org.apache.cassandra.io.PrecompactedRow.(PrecompactedRow.java:78)
> at 
> org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
> at 
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
> at 
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
> at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

[Cassandra Wiki] Trivial Update of "Committers" by JonathanEllis

2011-08-09 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Committers" page has been changed by JonathanEllis:
http://wiki.apache.org/cassandra/Committers?action=diff&rev1=15&rev2=16

Comment:
update release manager

  ||Avinash Lakshman||Jan 2009||Facebook||Co-author of Facebook Cassandra||
  ||Prashant Malik||Jan 2009||Facebook||Co-author of Facebook Cassandra||
  ||Jonathan Ellis||Mar 2009||Datastax||Project chair||
- ||Eric Evans||Jun 2009||Rackspace||PMC member, Release manager, Debian 
packager||
+ ||Eric Evans||Jun 2009||Rackspace||PMC member, Debian packager||
  ||Jun Rao||Jun 2009||!LinkedIn||PMC member||
  ||Chris Goffinet||Sept 2009||Twitter||PMC member||
  ||Johan Oskarsson||Nov 2009||Twitter||Also a 
[[http://hadoop.apache.org/|Hadoop]] committer||
@@ -12, +12 @@

  ||Jaakko Laine||Dec 2009||?|| ||
  ||Brandon Williams||Jun 2010||Datastax||PMC member||
  ||Jake Luciani||Jan 2011||Datastax||Also a 
[[http://thrift.apache.org/|Thrift]] committer||
- ||Sylvain Lebresne||Mar 2011||Datastax||PMC member||
+ ||Sylvain Lebresne||Mar 2011||Datastax||PMC member, Release manager||
  ||Pavel Yaskevich||Aug 2011||Datastax|| ||
  


[jira] [Commented] (CASSANDRA-2993) Issues with parameters being escaped correctly in Python CQL

2011-08-09 Thread Blake Visin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081879#comment-13081879
 ] 

Blake Visin commented on CASSANDRA-2993:


Works for me too.  Thanks Tyler!

> Issues with parameters being escaped correctly in Python CQL
> 
>
> Key: CASSANDRA-2993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2993
> Project: Cassandra
>  Issue Type: Bug
> Environment: Python CQL
>Reporter: Blake Visin
>Assignee: Tyler Hobbs
>  Labels: CQL, parameter, python
> Attachments: 2993-cql-grammar.txt, 2993-pycql.txt, 
> 2993-system-test.txt
>
>
> When using parameterised queries in Python CQL strings are not being escaped 
> correctly.
> Query and Parameters:
> {code}
> 'UPDATE sites SET :col = :val WHERE KEY = :site_id'
> {'col': 'feed_stats:1312493736688033024',
>  'site_id': '899d15e8-bd4a-11e0-bc8c-001fe14cba06',
>  'val': 
> "(dp0\nS'1'\np1\n(lp2\nI1\naI2\naI3\naI4\nasS'0'\np3\n(lp4\nI1\naI2\naI3\naI4\nasS'3'\np5\n(lp6\nI1\naI2\naI3\naI4\nasS'2'\np7\n(lp8\nI1\naI2\naI3\naI4\nas."}
> {code}
> Query trying to be executed after processing parameters
> {code} 
> "UPDATE sites SET 'feed_stats:1312493736688033024' = 
> '(dp0\nS''1''\np1\n(lp2\nI1\naI2\naI3\naI4\nasS''0''\np3\n(lp4\nI1\naI2\naI3\naI4\nasS''3''\np5\n(lp6\nI1\naI2\naI3\naI4\nasS''2''\np7\n(lp8\nI1\naI2\naI3\naI4\nas.'
>  WHERE KEY = '899d15e8-bd4a-11e0-bc8c-001fe14cba06'"
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2993) Issues with parameters being escaped correctly in Python CQL

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2993:
--

Reviewer: xedin

> Issues with parameters being escaped correctly in Python CQL
> 
>
> Key: CASSANDRA-2993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2993
> Project: Cassandra
>  Issue Type: Bug
> Environment: Python CQL
>Reporter: Blake Visin
>Assignee: Tyler Hobbs
>  Labels: CQL, parameter, python
> Attachments: 2993-cql-grammar.txt, 2993-pycql.txt, 
> 2993-system-test.txt
>
>
> When using parameterised queries in Python CQL strings are not being escaped 
> correctly.
> Query and Parameters:
> {code}
> 'UPDATE sites SET :col = :val WHERE KEY = :site_id'
> {'col': 'feed_stats:1312493736688033024',
>  'site_id': '899d15e8-bd4a-11e0-bc8c-001fe14cba06',
>  'val': 
> "(dp0\nS'1'\np1\n(lp2\nI1\naI2\naI3\naI4\nasS'0'\np3\n(lp4\nI1\naI2\naI3\naI4\nasS'3'\np5\n(lp6\nI1\naI2\naI3\naI4\nasS'2'\np7\n(lp8\nI1\naI2\naI3\naI4\nas."}
> {code}
> Query trying to be executed after processing parameters
> {code} 
> "UPDATE sites SET 'feed_stats:1312493736688033024' = 
> '(dp0\nS''1''\np1\n(lp2\nI1\naI2\naI3\naI4\nasS''0''\np3\n(lp4\nI1\naI2\naI3\naI4\nasS''3''\np5\n(lp6\nI1\naI2\naI3\naI4\nasS''2''\np7\n(lp8\nI1\naI2\naI3\naI4\nas.'
>  WHERE KEY = '899d15e8-bd4a-11e0-bc8c-001fe14cba06'"
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2325) invalidateKeyCache / invalidateRowCache should remove saved cache files from disk

2011-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated CASSANDRA-2325:
---

Attachment: cassandra-2325.patch.2.txt

> invalidateKeyCache / invalidateRowCache should remove saved cache files from 
> disk
> -
>
> Key: CASSANDRA-2325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2325
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 0.7.8, 0.8.2
>Reporter: Matthew F. Dennis
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: cassandra-2325-1.patch.txt, cassandra-2325.patch.2.txt
>
>
> the invalidate[Key|Row]Cache calls don't remove the saved caches from disk.
> It seems logical that if you are clearing the caches you don't expect them to 
> be reinstantiated with the old values the next time C* starts.
> This is not a huge issue since next time the caches are saved the old values 
> will be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1155544 - /cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java

2011-08-09 Thread jbellis
Author: jbellis
Date: Tue Aug  9 20:18:47 2011
New Revision: 1155544

URL: http://svn.apache.org/viewvc?rev=1155544&view=rev
Log:
r/m merged reference to obsolete memtable_flush_after_mins

Modified:
cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java

Modified: cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java?rev=1155544&r1=1155543&r2=1155544&view=diff
==
--- cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java (original)
+++ cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java Tue Aug  9 
20:18:47 2011
@@ -1671,7 +1671,6 @@ public class CliClient
 normaliseType(cfDef.key_validation_class, 
"org.apache.cassandra.db.marshal"));
 writeAttr(sb, false, "memtable_operations", 
cfDef.memtable_operations_in_millions);
 writeAttr(sb, false, "memtable_throughput", 
cfDef.memtable_throughput_in_mb);
-writeAttr(sb, false, "memtable_flush_after", 
cfDef.memtable_flush_after_mins);
 writeAttr(sb, false, "rows_cached", cfDef.row_cache_size);
 writeAttr(sb, false, "row_cache_save_period", 
cfDef.row_cache_save_period_in_seconds);
 writeAttr(sb, false, "keys_cached", cfDef.key_cache_size);




buildbot success in ASF Buildbot on cassandra-trunk

2011-08-09 Thread buildbot
The Buildbot has detected a restored build on builder cassandra-trunk while 
building ASF Buildbot.
Full details are available at:
 http://ci.apache.org/builders/cassandra-trunk/builds/1504

Buildbot URL: http://ci.apache.org/

Buildslave for this Build: isis_ubuntu

Build Reason: scheduler
Build Source Stamp: [branch cassandra/trunk] 1155544
Blamelist: jbellis

Build succeeded!

sincerely,
 -The Buildbot



svn commit: r1155548 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/cql/UpdateStatement.java src/java/org/apache/cassandra/thrift/ThriftValidation.java test/system/t

2011-08-09 Thread slebresne
Author: slebresne
Date: Tue Aug  9 20:24:17 2011
New Revision: 1155548

URL: http://svn.apache.org/viewvc?rev=1155548&view=rev
Log:
Refuse counter write at CL.ANY
patch by slebresne; reviewed by jbellis for CASSANDRA-2990

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cql/UpdateStatement.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/ThriftValidation.java
cassandra/branches/cassandra-0.8/test/system/test_cql.py
cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1155548&r1=1155547&r2=1155548&view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Tue Aug  9 20:24:17 2011
@@ -2,6 +2,7 @@
  * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972)
  * use JAVA env var in cassandra-env.sh (CASSANDRA-2785, 2992)
  * avoid doing read for no-op replicate-on-write at CL=1 (CASSANDRA-2892)
+ * refuse counter write for CL.ANY (CASSANDRA-2990)
 
 
 0.8.3

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cql/UpdateStatement.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cql/UpdateStatement.java?rev=1155548&r1=1155547&r2=1155548&view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cql/UpdateStatement.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cql/UpdateStatement.java
 Tue Aug  9 20:24:17 2011
@@ -39,6 +39,7 @@ import static org.apache.cassandra.cql.Q
 
 import static org.apache.cassandra.cql.Operation.OperationType;
 import static 
org.apache.cassandra.thrift.ThriftValidation.validateColumnFamily;
+import static 
org.apache.cassandra.thrift.ThriftValidation.validateCommutativeForWrite;
 
 /**
  * An UPDATE statement parsed from a CQL query statement.
@@ -142,6 +143,8 @@ public class UpdateStatement extends Abs
 }
 
 CFMetaData metadata = validateColumnFamily(keyspace, columnFamily, 
hasCommutativeOperation);
+if (hasCommutativeOperation)
+validateCommutativeForWrite(metadata, cLevel);
 
 QueryProcessor.validateKeyAlias(metadata, keyName);
 

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/ThriftValidation.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/ThriftValidation.java?rev=1155548&r1=1155547&r2=1155548&view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/ThriftValidation.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/ThriftValidation.java
 Tue Aug  9 20:24:17 2011
@@ -627,7 +627,11 @@ public class ThriftValidation
 
 public static void validateCommutativeForWrite(CFMetaData metadata, 
ConsistencyLevel consistency) throws InvalidRequestException
 {
-if (!metadata.getReplicateOnWrite() && consistency != 
ConsistencyLevel.ONE)
+if (consistency == ConsistencyLevel.ANY)
+{
+throw new InvalidRequestException("Consistency level ANY is not 
yet supported for counter columnfamily " + metadata.cfName);
+}
+else if (!metadata.getReplicateOnWrite() && consistency != 
ConsistencyLevel.ONE)
 {
 throw new InvalidRequestException("cannot achieve CL > CL.ONE 
without replicate_on_write on columnfamily " + metadata.cfName);
 }

Modified: cassandra/branches/cassandra-0.8/test/system/test_cql.py
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/test/system/test_cql.py?rev=1155548&r1=1155547&r2=1155548&view=diff
==
--- cassandra/branches/cassandra-0.8/test/system/test_cql.py (original)
+++ cassandra/branches/cassandra-0.8/test/system/test_cql.py Tue Aug  9 
20:24:17 2011
@@ -1260,6 +1260,11 @@ class TestCql(ThriftTester):
   cursor.execute,
   "UPDATE CounterCF SET count_me = count_not_me + 2 WHERE 
key = 'counter1'")
 
+# counters can't do ANY
+assert_raises(cql.ProgrammingError,
+  cursor.execute,
+  "UPDATE CounterCF USING CONSISTENCY ANY SET count_me = 
count_me + 2 WHERE key = 'counter1'")
+
 def test_key_alias_support(self):
 "should be possible to use alias instead of KEY keyword"
 cursor = init()

Modified: cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py
URL: 
http://svn.a

svn commit: r1155549 - in /cassandra/trunk: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/cql/ src/java/org/apache/cassandra/thrift/ test/system/

2011-08-09 Thread slebresne
Author: slebresne
Date: Tue Aug  9 20:26:07 2011
New Revision: 1155549

URL: http://svn.apache.org/viewvc?rev=1155549&view=rev
Log:
commit from 0.8

Modified:
cassandra/trunk/   (props changed)
cassandra/trunk/CHANGES.txt
cassandra/trunk/contrib/   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)
cassandra/trunk/src/java/org/apache/cassandra/cql/UpdateStatement.java
cassandra/trunk/src/java/org/apache/cassandra/thrift/ThriftValidation.java
cassandra/trunk/test/system/test_cql.py
cassandra/trunk/test/system/test_thrift_server.py

Propchange: cassandra/trunk/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Aug  9 20:26:07 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291
 /cassandra/branches/cassandra-0.7:1026516-1151306
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
-/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1155460
+/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1155460,1155548
 /cassandra/branches/cassandra-0.8.0:1125021-1130369
 /cassandra/branches/cassandra-0.8.1:1101014-1125018
 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1155549&r1=1155548&r2=1155549&view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Tue Aug  9 20:26:07 2011
@@ -35,6 +35,7 @@
  * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972)
  * use JAVA env var in cassandra-env.sh (CASSANDRA-2785, 2992)
  * avoid doing read for no-op replicate-on-write at CL=1 (CASSANDRA-2892)
+ * refuse counter write for CL.ANY (CASSANDRA-2990)
 
 
 0.8.3

Propchange: cassandra/trunk/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Aug  9 20:26:07 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009
 /cassandra/branches/cassandra-0.7/contrib:1026516-1151306
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
-/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1155460
+/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1155460,1155548
 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369
 /cassandra/branches/cassandra-0.8.1/contrib:1101014-1125018
 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Aug  9 20:26:07 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
 
/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1151306
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
-/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1155460
+/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1155460,1155548
 
/cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369
 
/cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1101014-1125018
 
/cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Aug  9 20:26:07 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
 
/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandr

[jira] [Commented] (CASSANDRA-3004) Once a message has been dropped, cassandra logs total messages dropped and tpstats every 5s forever

2011-08-09 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081901#comment-13081901
 ] 

Brandon Williams commented on CASSANDRA-3004:
-

+1

> Once a message has been dropped, cassandra logs total messages dropped and 
> tpstats every 5s forever
> ---
>
> Key: CASSANDRA-3004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3004
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.8.3
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
>Priority: Minor
>  Labels: lhf
> Fix For: 0.8.4
>
> Attachments: 3004.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1155558 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/net/MessagingService.java

2011-08-09 Thread jbellis
Author: jbellis
Date: Tue Aug  9 20:47:35 2011
New Revision: 118

URL: http://svn.apache.org/viewvc?rev=118&view=rev
Log:
switch back to only logging recent dropped messages
patch by jbellis; reviewed by brandonwilliams for CASSANDRA-3004

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=118&r1=117&r2=118&view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Tue Aug  9 20:47:35 2011
@@ -3,6 +3,7 @@
  * use JAVA env var in cassandra-env.sh (CASSANDRA-2785, 2992)
  * avoid doing read for no-op replicate-on-write at CL=1 (CASSANDRA-2892)
  * refuse counter write for CL.ANY (CASSANDRA-2990)
+ * switch back to only logging recent dropped messages (CASSANDRA-3004)
 
 
 0.8.3

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java?rev=118&r1=117&r2=118&view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java
 Tue Aug  9 20:47:35 2011
@@ -100,18 +100,11 @@ public final class MessagingService impl
 private final Map droppedMessages = 
new EnumMap(StorageService.Verb.class);
 // dropped count when last requested for the Recent api.  high concurrency 
isn't necessary here.
 private final Map lastDropped = 
Collections.synchronizedMap(new EnumMap(StorageService.Verb.class));
+private final Map lastDroppedInternal = new 
EnumMap(StorageService.Verb.class);
 
 private final List subscribers = new 
ArrayList();
 private static final long DEFAULT_CALLBACK_TIMEOUT = (long) (1.1 * 
DatabaseDescriptor.getRpcTimeout());
 
-{
-for (StorageService.Verb verb : DROPPABLE_VERBS)
-{
-droppedMessages.put(verb, new AtomicInteger());
-lastDropped.put(verb, 0);
-}
-}
-
 private static class MSHandle
 {
 public static final MessagingService instance = new MessagingService();
@@ -123,6 +116,13 @@ public final class MessagingService impl
 
 private MessagingService()
 {
+for (StorageService.Verb verb : DROPPABLE_VERBS)
+{
+droppedMessages.put(verb, new AtomicInteger());
+lastDropped.put(verb, 0);
+lastDroppedInternal.put(verb, 0);
+}
+
 listenGate = new SimpleCondition();
 verbHandlers_ = new EnumMap(StorageService.Verb.class);
 streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", 
DatabaseDescriptor.getCompactionThreadPriority());
@@ -584,11 +584,13 @@ public final class MessagingService impl
 for (Map.Entry entry : 
droppedMessages.entrySet())
 {
 AtomicInteger dropped = entry.getValue();
-if (dropped.get() > 0)
+StorageService.Verb verb = entry.getKey();
+int recent = dropped.get() - lastDroppedInternal.get(verb);
+if (recent > 0)
 {
 logTpstats = true;
-logger_.info("{} {} messages dropped in server lifetime",
- dropped, entry.getKey());
+logger_.info("{} {} messages dropped in server lifetime", 
recent, verb);
+lastDroppedInternal.put(verb, dropped.get());
 }
 }
 




[jira] [Updated] (CASSANDRA-3004) Once a message has been dropped, cassandra logs total messages dropped and tpstats every 5s forever

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3004:
--

Affects Version/s: (was: 0.8.3)
   0.8.2
   Issue Type: Improvement  (was: Bug)

> Once a message has been dropped, cassandra logs total messages dropped and 
> tpstats every 5s forever
> ---
>
> Key: CASSANDRA-3004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3004
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.2
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
>Priority: Minor
>  Labels: lhf
> Fix For: 0.8.4
>
> Attachments: 3004.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2325) invalidateKeyCache / invalidateRowCache should remove saved cache files from disk

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2325:
--

Affects Version/s: (was: 0.7.8)
   (was: 0.8.2)
   0.6
Fix Version/s: 0.8.4

> invalidateKeyCache / invalidateRowCache should remove saved cache files from 
> disk
> -
>
> Key: CASSANDRA-2325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2325
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 0.6
>Reporter: Matthew F. Dennis
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.8.4
>
> Attachments: cassandra-2325-1.patch.txt, cassandra-2325.patch.2.txt
>
>
> the invalidate[Key|Row]Cache calls don't remove the saved caches from disk.
> It seems logical that if you are clearing the caches you don't expect them to 
> be reinstantiated with the old values the next time C* starts.
> This is not a huge issue since next time the caches are saved the old values 
> will be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2325) invalidateKeyCache / invalidateRowCache should remove saved cache files from disk

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081920#comment-13081920
 ] 

Jonathan Ellis commented on CASSANDRA-2325:
---

Shouldn't we check that the file exists first?  otherwise we log spurious 
errors.

> invalidateKeyCache / invalidateRowCache should remove saved cache files from 
> disk
> -
>
> Key: CASSANDRA-2325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2325
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 0.6
>Reporter: Matthew F. Dennis
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.8.4
>
> Attachments: cassandra-2325-1.patch.txt, cassandra-2325.patch.2.txt
>
>
> the invalidate[Key|Row]Cache calls don't remove the saved caches from disk.
> It seems logical that if you are clearing the caches you don't expect them to 
> be reinstantiated with the old values the next time C* starts.
> This is not a huge issue since next time the caches are saved the old values 
> will be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-3005) OutboundTcpConnection's sending queue goes unboundedly without any backpressure logic

2011-08-09 Thread Melvin Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Melvin Wang reassigned CASSANDRA-3005:
--

Assignee: Melvin Wang

> OutboundTcpConnection's sending queue goes unboundedly without any 
> backpressure logic
> -
>
> Key: CASSANDRA-3005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3005
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Melvin Wang
>Assignee: Melvin Wang
>
> OutboundTcpConnection's sending queue unconditionally queues up the request 
> and process them in sequence. Thinking about tagging the message coming in 
> with timestamp and drop them before actually sending it if the message stays 
> in the queue for too long, which is defined by the message's own time out 
> value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2988) Improve SSTableReader.load() when loading index files

2011-08-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081930#comment-13081930
 ] 

Pavel Yaskevich commented on CASSANDRA-2988:


First of all I would like to point you to 
http://wiki.apache.org/cassandra/CodeStyle, please modify your code according 
to conventions listed in there.

According to c2988-modified-buffer.patch:

 - please encapsulate your modifications because if you compare how it was and 
how it is in your patch it's hard to undertand and just looks like a mess, I 
would like to suggest moving those modifications to separate inner class 
(IndexReader maybe?) and replace only RandomAccessReader initialization in the 
SSTableReader.load(...) method...
 - let's add a test comparing "getEstimatedRowSize().count();" and 
"SSTable.estimateRowsFromIndex(input);" just to be sure it works correctly.

Also I don't quiet understand logic behind "while (buffer.remaining() > 10) {" 
in SSTableReader.loadByteBuffer, let's avoid any hardcoding or at least comment 
why you did that.

I'm going to take a closer look at patch for parallel index file loading after 
we will be done with index reader patch (c2988-modified-buffer.patch).

> Improve SSTableReader.load() when loading index files
> -
>
> Key: CASSANDRA-2988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2988
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Melvin Wang
>Assignee: Melvin Wang
>Priority: Minor
> Fix For: 1.0
>
> Attachments: c2988-modified-buffer.patch, 
> c2988-parallel-load-sstables.patch
>
>
> * when we create BufferredRandomAccessFile, we pass skipCache=true. This 
> hurts the read performance because we always process the index files 
> sequentially. Simple fix would be set it to false.
> * multiple index files of a single column family can be loaded in parallel. 
> This buys a lot when you have multiple super large index files.
> * we may also change how we buffer. By using BufferredRandomAccessFile, for 
> every read, we need bunch of checking like
>   - do we need to rebuffer?
>   - isEOF()?
>   - assertions
>   These can be simplified to some extent.  We can blindly buffer the index 
> file by chunks and process the buffer until a key lies across boundary of a 
> chunk. Then we rebuffer and start from the beginning of the partially read 
> key. Conceptually, this is same as what BRAF does but w/o the overhead in the 
> read**() methods in BRAF.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2950) Data from truncated CF reappears after server restart

2011-08-09 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081941#comment-13081941
 ] 

Brandon Williams commented on CASSANDRA-2950:
-

Currently, truncate does:
* force a flush
* record the time
* delete any sstables older than the time

This isn't quite enough if the machine crashes shortly afterward, however, 
since there can be mutations present in the commitlog that were previously 
truncated and are now resurrected by CL replay.

One thing we could do is record the truncate time for the CF in the system ks 
and then ignore mutations older than that, however this would require time 
synchronization between the client and the server to be accurate.


> Data from truncated CF reappears after server restart
> -
>
> Key: CASSANDRA-2950
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2950
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cathy Daw
>Assignee: Brandon Williams
>
> * Configure 3 node cluster
> * Ensure the java stress tool creates Keyspace1 with RF=3
> {code}
> // Run Stress Tool to generate 10 keys, 1 column
> stress --operation=INSERT -t 2 --num-keys=50 --columns=20 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 50 keys in CLI
> use Keyspace1; 
> list Standard1; 
> // TRUNCATE CF in CLI
> use Keyspace1;
> truncate counter1;
> list counter1;
> // Run stress tool and verify creation of 1 key with 10 columns
> stress --operation=INSERT -t 2 --num-keys=1 --columns=10 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 1 key in CLI
> use Keyspace1; 
> list Standard1; 
> // Restart all three nodes
> // You will see 51 keys in CLI
> use Keyspace1; 
> list Standard1; 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3004) Once a message has been dropped, cassandra logs total messages dropped and tpstats every 5s forever

2011-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081948#comment-13081948
 ] 

Hudson commented on CASSANDRA-3004:
---

Integrated in Cassandra-0.8 #265 (See 
[https://builds.apache.org/job/Cassandra-0.8/265/])
switch back to only logging recent dropped messages
patch by jbellis; reviewed by brandonwilliams for CASSANDRA-3004

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=118
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java


> Once a message has been dropped, cassandra logs total messages dropped and 
> tpstats every 5s forever
> ---
>
> Key: CASSANDRA-3004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3004
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8.2
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
>Priority: Minor
>  Labels: lhf
> Fix For: 0.8.4
>
> Attachments: 3004.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2990) We should refuse query for counters at CL.ANY

2011-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081947#comment-13081947
 ] 

Hudson commented on CASSANDRA-2990:
---

Integrated in Cassandra-0.8 #265 (See 
[https://builds.apache.org/job/Cassandra-0.8/265/])
Refuse counter write at CL.ANY
patch by slebresne; reviewed by jbellis for CASSANDRA-2990

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1155548
Files : 
* /cassandra/branches/cassandra-0.8/test/system/test_cql.py
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/ThriftValidation.java
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cql/UpdateStatement.java


> We should refuse query for counters at CL.ANY
> -
>
> Key: CASSANDRA-2990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2990
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Trivial
>  Labels: counters
> Fix For: 0.8.4
>
> Attachments: 2990.patch
>
>
> We currently do not reject writes for counters at CL.ANY, even though this is 
> not supported (and rightly so).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2982) Refactor secondary index api

2011-08-09 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-2982:
--

Attachment: 2982-v1.txt

refactored api, should cover new index types. Should we consider removing 
IndexType enum and just use classname?

> Refactor secondary index api
> 
>
> Key: CASSANDRA-2982
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2982
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 1.0
>
> Attachments: 2982-v1.txt
>
>
> Secondary indexes currently make some bad assumptions about the underlying 
> indexes.
> 1. That they are always stored in other column families.
> 2. That there is a unique index per column
> In the case of CASSANDRA-2915 neither of these are true.  The new api should 
> abstract the search concepts and allow any search api to plug in.
> Once the code is refactored and basically pluggable we can remove the 
> IndexType enum and use class names similar to how we handle partitioners and 
> comparators.
> Basic api is to add a SecondaryIndexManager that handles different index 
> types per CF and a SecondaryIndex base class that handles a particular type 
> implementation.
> This requires major changes to ColumnFamilyStore and Table.IndexBuilder

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2950) Data from truncated CF reappears after server restart

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081966#comment-13081966
 ] 

Jonathan Ellis commented on CASSANDRA-2950:
---

but we record CL "context" at time of flush in the sstable it makes, and we on 
replay we ignore any mutations from before that position.

checked and we do wait for flush to complete in truncate.

> Data from truncated CF reappears after server restart
> -
>
> Key: CASSANDRA-2950
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2950
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cathy Daw
>Assignee: Brandon Williams
>
> * Configure 3 node cluster
> * Ensure the java stress tool creates Keyspace1 with RF=3
> {code}
> // Run Stress Tool to generate 10 keys, 1 column
> stress --operation=INSERT -t 2 --num-keys=50 --columns=20 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 50 keys in CLI
> use Keyspace1; 
> list Standard1; 
> // TRUNCATE CF in CLI
> use Keyspace1;
> truncate counter1;
> list counter1;
> // Run stress tool and verify creation of 1 key with 10 columns
> stress --operation=INSERT -t 2 --num-keys=1 --columns=10 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 1 key in CLI
> use Keyspace1; 
> list Standard1; 
> // Restart all three nodes
> // You will see 51 keys in CLI
> use Keyspace1; 
> list Standard1; 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2982) Refactor secondary index api

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081968#comment-13081968
 ] 

Jonathan Ellis commented on CASSANDRA-2982:
---

I don't think full index pluggability is a goal here.  So I don't see the point 
of that.

> Refactor secondary index api
> 
>
> Key: CASSANDRA-2982
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2982
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 1.0
>
> Attachments: 2982-v1.txt
>
>
> Secondary indexes currently make some bad assumptions about the underlying 
> indexes.
> 1. That they are always stored in other column families.
> 2. That there is a unique index per column
> In the case of CASSANDRA-2915 neither of these are true.  The new api should 
> abstract the search concepts and allow any search api to plug in.
> Once the code is refactored and basically pluggable we can remove the 
> IndexType enum and use class names similar to how we handle partitioners and 
> comparators.
> Basic api is to add a SecondaryIndexManager that handles different index 
> types per CF and a SecondaryIndex base class that handles a particular type 
> implementation.
> This requires major changes to ColumnFamilyStore and Table.IndexBuilder

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2950) Data from truncated CF reappears after server restart

2011-08-09 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081969#comment-13081969
 ] 

Brandon Williams commented on CASSANDRA-2950:
-

bq. but we record CL "context" at time of flush in the sstable it makes, and we 
on replay we ignore any mutations from before that position.

I think there's something wrong with that, then:

{noformat}
 INFO 21:25:15,274 Replaying 
/var/lib/cassandra/commitlog/CommitLog-1312924388053.log
DEBUG 21:25:15,290 Replaying 
/var/lib/cassandra/commitlog/CommitLog-1312924388053.log starting at 0
DEBUG 21:25:15,291 Reading mutation at 0
DEBUG 21:25:15,295 replaying mutation for system.4c: {ColumnFamily(LocationInfo 
[47656e65726174696f6e:false:4@131292438814,])}
DEBUG 21:25:15,321 Reading mutation at 89
DEBUG 21:25:15,322 replaying mutation for system.426f6f747374726170: 
{ColumnFamily(LocationInfo [42:false:1@1312924388203,])}
DEBUG 21:25:15,322 Reading mutation at 174
DEBUG 21:25:15,322 replaying mutation for system.4c: {ColumnFamily(LocationInfo 
[546f6b656e:false:16@1312924388204,])}
DEBUG 21:25:15,322 Reading mutation at 270
DEBUG 21:25:15,324 replaying mutation for Keyspace1.3030: 
{ColumnFamily(Standard1 
[C0:false:34@1312924813259,C1:false:34@1312924813260,C2:false:34@1312924813260,C3:false:34@1312924813260,C4:false:34@1312924813260,])}
{noformat}

The last entry there is the first of many errant mutations.

> Data from truncated CF reappears after server restart
> -
>
> Key: CASSANDRA-2950
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2950
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cathy Daw
>Assignee: Brandon Williams
>
> * Configure 3 node cluster
> * Ensure the java stress tool creates Keyspace1 with RF=3
> {code}
> // Run Stress Tool to generate 10 keys, 1 column
> stress --operation=INSERT -t 2 --num-keys=50 --columns=20 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 50 keys in CLI
> use Keyspace1; 
> list Standard1; 
> // TRUNCATE CF in CLI
> use Keyspace1;
> truncate counter1;
> list counter1;
> // Run stress tool and verify creation of 1 key with 10 columns
> stress --operation=INSERT -t 2 --num-keys=1 --columns=10 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 1 key in CLI
> use Keyspace1; 
> list Standard1; 
> // Restart all three nodes
> // You will see 51 keys in CLI
> use Keyspace1; 
> list Standard1; 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-2950) Data from truncated CF reappears after server restart

2011-08-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-2950:
-

Assignee: Jonathan Ellis  (was: Brandon Williams)

> Data from truncated CF reappears after server restart
> -
>
> Key: CASSANDRA-2950
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2950
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cathy Daw
>Assignee: Jonathan Ellis
>
> * Configure 3 node cluster
> * Ensure the java stress tool creates Keyspace1 with RF=3
> {code}
> // Run Stress Tool to generate 10 keys, 1 column
> stress --operation=INSERT -t 2 --num-keys=50 --columns=20 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 50 keys in CLI
> use Keyspace1; 
> list Standard1; 
> // TRUNCATE CF in CLI
> use Keyspace1;
> truncate counter1;
> list counter1;
> // Run stress tool and verify creation of 1 key with 10 columns
> stress --operation=INSERT -t 2 --num-keys=1 --columns=10 
> --consistency-level=QUORUM --average-size-values --replication-factor=3 
> --create-index=KEYS --nodes=cathy1,cathy2
> // Verify 1 key in CLI
> use Keyspace1; 
> list Standard1; 
> // Restart all three nodes
> // You will see 51 keys in CLI
> use Keyspace1; 
> list Standard1; 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Jonathan Ellis (JIRA)
Java CQL command-line shell
---

 Key: CASSANDRA-3010
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0


We need a "real" CQL shell that:

- does not require installing additional environments
- includes "show keyspaces" and other introspection tools
- does not break existing cli scripts

I.e., it needs to be java, but it should be a new tool instead of replacing the 
existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081974#comment-13081974
 ] 

Jonathan Ellis commented on CASSANDRA-3010:
---

I.e., do we do "\d CF" (postgresql) or "describe CF" (mysql) or "desc CF" 
(oracle)?

> Java CQL command-line shell
> ---
>
> Key: CASSANDRA-3010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
>
> We need a "real" CQL shell that:
> - does not require installing additional environments
> - includes "show keyspaces" and other introspection tools
> - does not break existing cli scripts
> I.e., it needs to be java, but it should be a new tool instead of replacing 
> the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081973#comment-13081973
 ] 

Jonathan Ellis commented on CASSANDRA-3010:
---

We should also pick a SQL command line to imitate for the introspection stuff. 
Might as well get that degree of familiarity as well since there is no reason 
not to.

> Java CQL command-line shell
> ---
>
> Key: CASSANDRA-3010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
>
> We need a "real" CQL shell that:
> - does not require installing additional environments
> - includes "show keyspaces" and other introspection tools
> - does not break existing cli scripts
> I.e., it needs to be java, but it should be a new tool instead of replacing 
> the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081977#comment-13081977
 ] 

Jeremy Hanna commented on CASSANDRA-3010:
-

If I had to choose one, it would be nice to be more descriptive (describe 
versus \d).  However, it would be really nice to have a basic concept of 
synonyms.  For example mysql's cli supports both describe and desc.  Building 
that type of functionality in from the start shouldn't be too onerous.

> Java CQL command-line shell
> ---
>
> Key: CASSANDRA-3010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
>
> We need a "real" CQL shell that:
> - does not require installing additional environments
> - includes "show keyspaces" and other introspection tools
> - does not break existing cli scripts
> I.e., it needs to be java, but it should be a new tool instead of replacing 
> the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081977#comment-13081977
 ] 

Jeremy Hanna edited comment on CASSANDRA-3010 at 8/9/11 10:36 PM:
--

If I had to choose one, it would be nice to be more descriptive (describe 
versus \d).  However, it would be really nice to have a basic concept of 
synonyms.  For example mysql's cli supports both describe and desc.  Building 
that type of functionality in from the start would hopefully not be too onerous.

  was (Author: jeromatron):
If I had to choose one, it would be nice to be more descriptive (describe 
versus \d).  However, it would be really nice to have a basic concept of 
synonyms.  For example mysql's cli supports both describe and desc.  Building 
that type of functionality in from the start shouldn't be too onerous.
  
> Java CQL command-line shell
> ---
>
> Key: CASSANDRA-3010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
>
> We need a "real" CQL shell that:
> - does not require installing additional environments
> - includes "show keyspaces" and other introspection tools
> - does not break existing cli scripts
> I.e., it needs to be java, but it should be a new tool instead of replacing 
> the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081982#comment-13081982
 ] 

Pavel Yaskevich commented on CASSANDRA-3010:


I don't think that we should choose anything because we can support all of 
those notations using synonyms in the ANTLR grammar. That would be hard from 
the begging to include all of the possible synonyms but grammar will be 
designed in the way which will allow to easy add new synonyms as we go.

> Java CQL command-line shell
> ---
>
> Key: CASSANDRA-3010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 1.0
>
>
> We need a "real" CQL shell that:
> - does not require installing additional environments
> - includes "show keyspaces" and other introspection tools
> - does not break existing cli scripts
> I.e., it needs to be java, but it should be a new tool instead of replacing 
> the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2988) Improve SSTableReader.load() when loading index files

2011-08-09 Thread Melvin Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082008#comment-13082008
 ] 

Melvin Wang commented on CASSANDRA-2988:


bq. First of all I would like to point you to 
http://wiki.apache.org/cassandra/CodeStyle, please modify your code according 
to conventions listed in there.
Sure. This boils down to where to put the curly braces

bq. please encapsulate your modifications because if you compare how it was and 
how it is in your patch it's hard to undertand and just looks like a mess, I 
would like to suggest moving those modifications to separate inner class 
(IndexReader maybe?) and replace only RandomAccessReader initialization in the 
SSTableReader.load(...) method...
This patch is about changing the most part of the load() method. I am not clear 
how we could only change the initialization of RandomAcessReader.

bq. Also I don't quiet understand logic behind "while (buffer.remaining() > 10) 
{" in SSTableReader.loadByteBuffer, let's avoid any hardcoding or at least 
comment why you did that.
Sorry for lacking comments. I will add it. However, this is not a hard coding 
in the sense that, Short consists of 2 bytes and Long consists of 8 bytes, the 
sum is 10 bytes. It is just a quick checking if we reach the end.

bq. I'm going to take a closer look at patch for parallel index file loading 
after we will be done with index reader patch (c2988-modified-buffer.patch).
FYI, these two patches are completely independent with each other.

> Improve SSTableReader.load() when loading index files
> -
>
> Key: CASSANDRA-2988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2988
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Melvin Wang
>Assignee: Melvin Wang
>Priority: Minor
> Fix For: 1.0
>
> Attachments: c2988-modified-buffer.patch, 
> c2988-parallel-load-sstables.patch
>
>
> * when we create BufferredRandomAccessFile, we pass skipCache=true. This 
> hurts the read performance because we always process the index files 
> sequentially. Simple fix would be set it to false.
> * multiple index files of a single column family can be loaded in parallel. 
> This buys a lot when you have multiple super large index files.
> * we may also change how we buffer. By using BufferredRandomAccessFile, for 
> every read, we need bunch of checking like
>   - do we need to rebuffer?
>   - isEOF()?
>   - assertions
>   These can be simplified to some extent.  We can blindly buffer the index 
> file by chunks and process the buffer until a key lies across boundary of a 
> chunk. Then we rebuffer and start from the beginning of the partially read 
> key. Conceptually, this is same as what BRAF does but w/o the overhead in the 
> read**() methods in BRAF.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2777) Pig storage handler should implement LoadMetadata

2011-08-09 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-2777:


Attachment: 2777-v2.txt

v2 rebased.

> Pig storage handler should implement LoadMetadata
> -
>
> Key: CASSANDRA-2777
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2777
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Contrib
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Minor
> Fix For: 0.7.9
>
> Attachments: 2777-v2.txt, 2777.txt
>
>
> The reason for this is many builtin functions like SUM won't work on longs 
> (you can workaround using LongSum, but that's lame) because the query planner 
> doesn't know about the types beforehand, even though we are casting to native 
> longs.
> There is some impact to this, though.  With LoadMetadata implemented, 
> existing scripts that specify schema will need to remove it (since LM is 
> doing it for them) and they will need to conform to LM's terminology (key, 
> columns, name, value) within the script.  This is trivial to change, however, 
> and the increased functionality is worth the switch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2988) Improve SSTableReader.load() when loading index files

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082016#comment-13082016
 ] 

Jonathan Ellis commented on CASSANDRA-2988:
---

bq. Short consists of 2 bytes and Long consists of 8 bytes, the sum is 10 bytes

IMO that's more obvious if you leave it as "2 + 8," or use the DBConstants 
class.

> Improve SSTableReader.load() when loading index files
> -
>
> Key: CASSANDRA-2988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2988
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Melvin Wang
>Assignee: Melvin Wang
>Priority: Minor
> Fix For: 1.0
>
> Attachments: c2988-modified-buffer.patch, 
> c2988-parallel-load-sstables.patch
>
>
> * when we create BufferredRandomAccessFile, we pass skipCache=true. This 
> hurts the read performance because we always process the index files 
> sequentially. Simple fix would be set it to false.
> * multiple index files of a single column family can be loaded in parallel. 
> This buys a lot when you have multiple super large index files.
> * we may also change how we buffer. By using BufferredRandomAccessFile, for 
> every read, we need bunch of checking like
>   - do we need to rebuffer?
>   - isEOF()?
>   - assertions
>   These can be simplified to some extent.  We can blindly buffer the index 
> file by chunks and process the buffer until a key lies across boundary of a 
> chunk. Then we rebuffer and start from the beginning of the partially read 
> key. Conceptually, this is same as what BRAF does but w/o the overhead in the 
> read**() methods in BRAF.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

2011-08-09 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-2810:


Attachment: 2810-v2.txt

It looks like the final problem here is that IntegerType always returns a 
BigInteger, which pig does not like.  This is unfortunate since IntegerType 
can't be easily subclassed and overridden to return ints.

v2 instead adds a setTupleValue method that is always used for adding values to 
tuples, and houses all the special-casing currently needed and provides a spot 
for more in the future, rather than proliferating custom type converters since 
I'm sure IntegerType won't be alone here.

> RuntimeException in Pig when using "dump" command on column name
> 
>
> Key: CASSANDRA-2810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.1
> Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>Reporter: Silvère Lestang
>Assignee: Brandon Williams
> Attachments: 2810-v2.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug 
> tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
> with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
> and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> ---
> RowKey: 726f7731
> => (column=0001, value=35, timestamp=1308744931122000)
> => (column=0002, value=36, timestamp=1308744931124000)
> => (column=0003, value=38, timestamp=1308744931125000)
> ---
> RowKey: 726f7732
> => (column=0001, value=45, timestamp=1308744931127000)
> => (column=0002, value=42, timestamp=1308744931128000)
> => (column=0003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
> Options: [replication_factor:1]
>   Column Families:
> ColumnFamily: test
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.BytesType
>   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>   Row cache size / save period in seconds: 0.0/0
>   Key cache size / save period in seconds: 20.0/14400
>   Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: false
>   Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS 
> (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 
> TaskInProgress.java (line 551) Error from 
> attempt_201106210955_0051_m_00_3: java.lang.RuntimeException: Unexpected 
> data type -1 found in stream.
>   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
>   at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
>   at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
>   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
>   at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
>   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
>   at 
> org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
>   at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
>   at 
> org.apache.hadoop.mapred.MapTask$

[jira] [Commented] (CASSANDRA-2982) Refactor secondary index api

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082039#comment-13082039
 ] 

Jonathan Ellis commented on CASSANDRA-2982:
---

Want to give a high-level overview of the changes here?

> Refactor secondary index api
> 
>
> Key: CASSANDRA-2982
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2982
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 1.0
>
> Attachments: 2982-v1.txt
>
>
> Secondary indexes currently make some bad assumptions about the underlying 
> indexes.
> 1. That they are always stored in other column families.
> 2. That there is a unique index per column
> In the case of CASSANDRA-2915 neither of these are true.  The new api should 
> abstract the search concepts and allow any search api to plug in.
> Once the code is refactored and basically pluggable we can remove the 
> IndexType enum and use class names similar to how we handle partitioners and 
> comparators.
> Basic api is to add a SecondaryIndexManager that handles different index 
> types per CF and a SecondaryIndex base class that handles a particular type 
> implementation.
> This requires major changes to ColumnFamilyStore and Table.IndexBuilder

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Boris Yen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082091#comment-13082091
 ] 

Boris Yen commented on CASSANDRA-3006:
--

Here is the test program I am using now. the hector version is 0.8.0-2.
Hope this will be helpful.


import java.util.Arrays;

import me.prettyprint.cassandra.model.AllOneConsistencyLevelPolicy;
import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.cassandra.service.ThriftCluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.beans.HCounterColumn;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class CounterTest {
private Logger logger = LoggerFactory.getLogger(CounterTest.class) ;
private static final Integer COUNTER_NUM = 1000 ;
private static final StringSerializer ss = StringSerializer.get();
private static final String HOST = "172.17.19.151:9160" ;
private ThriftCluster cluster ;

/**
 * @param args
 */
public static void main(String[] args) {
CounterTest tc = new CounterTest() ;

try {
tc.testAlarmCounter() ;
} catch (InterruptedException e) {

}
}

public CounterTest(){
CassandraHostConfigurator chc = new 
CassandraHostConfigurator(HOST) ;
chc.setMaxActive(100) ;
chc.setMaxIdle(10) ;
chc.setCassandraThriftSocketTimeout(6) ;

cluster = new ThriftCluster("Test Cluster", chc) ;
}

public void testAlarmCounter() throws InterruptedException{
int successCounter = 0 ;
int cl = 0;

for(int i=0; i mutator = 
HFactory.createMutator(getKeyspace(cl), StringSerializer.get());

HCounterColumn column = 
HFactory.createCounterColumn("testSC", 1L) ;
mutator.addCounter("sc", "testCounter", 
HFactory.createCounterSuperColumn("testC", Arrays.asList(column), ss, ss));
mutator.execute() ;

successCounter++ ;
} catch(Exception e){
logger.info("Error! Change consistency level to 
1.", e) ;
cl=1 ;
}

Thread.sleep(50) ;
}

logger.info("\nsuccess counter: "+successCounter) ;
}

private Keyspace getKeyspace(int cl){
if(cl == 1)
return HFactory.createKeyspace("test", cluster, new 
AllOneConsistencyLevelPolicy()) ;
else
return HFactory.createKeyspace("test", cluster) ; // 
default consistency level is Quorum
}
}

> Enormous counter 
> -
>
> Key: CASSANDRA-3006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.3
> Environment: ubuntu 10.04
>Reporter: Boris Yen
>Assignee: Sylvain Lebresne
>
> I have two-node cluster with the following keyspace and column family 
> settings.
> Cluster Information:
>Snitch: org.apache.cassandra.locator.SimpleSnitch
>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>Schema versions: 
>   63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]
> Keyspace: test:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
> Options: [datacenter1:2]
>   Column Families:
> ColumnFamily: testCounter (Super)
> "APP status information."
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.CounterColumnType
>   Columns sorted by: 
> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>   Row cache size / save period in seconds: 0.0/0
>   Key cache size / save period in seconds: 20.0/14400
>   Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: true
>   Built indexes: []
> Then, I use a test program based on hector

[jira] [Commented] (CASSANDRA-2991) Add a 'load new sstables' JMX/nodetool command

2011-08-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082097#comment-13082097
 ] 

Jonathan Ellis commented on CASSANDRA-2991:
---

What about the "restore snapshot" scenario?

> Add a 'load new sstables' JMX/nodetool command
> --
>
> Key: CASSANDRA-2991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2991
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Brandon Williams
>Priority: Minor
> Fix For: 0.8.4
>
>
> Sometimes people have to create a new cluster to get around a problem and 
> need to copy sstables around.  It would be convenient to be able to trigger 
> this from nodetool or JMX instead of doing a restart of the node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1608) Redesigned Compaction

2011-08-09 Thread Benjamin Coverston (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Coverston updated CASSANDRA-1608:
--

Attachment: 1608-v13.txt

1608 without some of the cruft

> Redesigned Compaction
> -
>
> Key: CASSANDRA-1608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Benjamin Coverston
> Attachments: 1608-v11.txt, 1608-v13.txt, 1608-v2.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >