[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

2014-04-15 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970487#comment-13970487
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


Also it seems like for some of the methods e.g. updateDigest, delta, dataSize, 
diff, reconcile, hashCode etc. it would be much better to have native 
implementations which work with underlying bytes directly from day one. Some of 
them, for example, use value().remaining(), value().compareTo(), 
value().duplicate(), or name.toByteBuffer() convert data from one 
representation to another for no real reason, so we can actually end up 
generating a lot more temporary objects then we anticipate. There is another 
concern related to value() method which converts pointer to DirectBuffer, the 
problem is that (at least in OpenJDK and I think Oracle done the same) 
initialization of that class is synchronized and creates PhantomReference, 
which with most collectors only be purged by Full GC.

> Slightly More Off-Heap Memtables
> 
>
> Key: CASSANDRA-6694
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
> the on-heap overhead is still very large. It should not be tremendously 
> difficult to extend these changes so that we allocate entire Cells off-heap, 
> instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
> bytes per cell on average for the btree overhead, for a total overhead of 
> around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
> address (we will do alignment tricks like the VM to allow us to address a 
> reasonably large memory space, although this trick is unlikely to last us 
> forever, at which point we will have to bite the bullet and accept a 24-byte 
> per cell overhead), and 4-byte object reference for maintaining our internal 
> list of allocations, which is unfortunately necessary since we cannot safely 
> (and cheaply) walk the object graph we allocate otherwise, which is necessary 
> for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName 
> instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

2014-04-15 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970462#comment-13970462
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


[~benedict] While working on trying to avoid usage of Impl classes and looking 
closer at the code I have a question, which knowing that future is going to be 
totally off-heap makes sense to ask now: current Native*Cell classes re-use 
Impl code from static implementations of interfaces but some of the methods 
e.g. reconcile for Counter(Update)Cell in certain conditions need to generate a 
new object (for now we are allocating BufferCounterCell which allows as to use 
CounterCell.Impl.reconcile for both implementations), do you have an action 
plan regarding required changes in that regard for the next step in this series 
when we are not going to copy things back to heap? 

> Slightly More Off-Heap Memtables
> 
>
> Key: CASSANDRA-6694
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
> the on-heap overhead is still very large. It should not be tremendously 
> difficult to extend these changes so that we allocate entire Cells off-heap, 
> instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
> bytes per cell on average for the btree overhead, for a total overhead of 
> around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
> address (we will do alignment tricks like the VM to allow us to address a 
> reasonably large memory space, although this trick is unlikely to last us 
> forever, at which point we will have to bite the bullet and accept a 24-byte 
> per cell overhead), and 4-byte object reference for maintaining our internal 
> list of allocations, which is unfortunately necessary since we cannot safely 
> (and cheaply) walk the object graph we allocate otherwise, which is necessary 
> for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName 
> instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7042) Disk space growth until restart

2014-04-15 Thread Zach Aller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach Aller updated CASSANDRA-7042:
--

Attachment: after.log
before.log

> Disk space growth until restart
> ---
>
> Key: CASSANDRA-7042
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7042
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 12.04
> Sun Java 7
> Cassandra 2.0.6
>Reporter: Zach Aller
>Priority: Critical
> Attachments: after.log, before.log
>
>
> Cassandra will constantly eat disk space not sure whats causing it the only 
> thing that seems to fix it is a restart of cassandra this happens about every 
> 3-5 hrs we will grow from about 350GB to 650GB with no end in site. Once we 
> restart cassandra it usually all clears itself up and disks return to normal 
> for a while then something triggers its and starts climbing again. Sometimes 
> when we restart compactions pending skyrocket and if we restart a second time 
> the compactions pending drop off back to a normal level. One other thing to 
> note is the space is not free'd until cassandra starts back up and not when 
> shutdown.
> I will get a clean log of before and after restarting next time it happens 
> and post it.
> Here is a common ERROR in our logs that might be related
> ERROR [CompactionExecutor:46] 2014-04-15 09:12:51,040 CassandraDaemon.java 
> (line 196) Exception in thread Thread[CompactionExecutor:46,1,main]
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
> (No such file or directory)
> at 
> org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
> at 
> org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1355)
> at 
> org.apache.cassandra.io.sstable.SSTableScanner.(SSTableScanner.java:67)
> at 
> org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1161)
> at 
> org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1173)
> at 
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getScanners(LeveledCompactionStrategy.java:194)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:258)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:126)
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.FileNotFoundException: 
> /local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
> (No such file or directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(Unknown Source)
> at 
> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
> at 
> org.apache.cassandra.io.util.ThrottledReader.(ThrottledReader.java:35)
> at 
> org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:49)
> ... 17 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7030:


Attachment: benchmark.21.diff.txt

bq. As mentioned earlier i don't mind removing it either

Well, if it demonstrates an advantage I'd prefer to keep it still :-)

Could you try running my benchmark, so we can compare the more specific stats, 
and can rule out interference by CLHM? I'm particularly surprised that it is 
anything like as fast, let alone faster, given how much dramatically slower it 
is on my box (36MB/s is laughable). It's possible I have an older version of 
jemalloc bundled with Ubuntu (I cannot run multi-threaded, but I think this is 
down to compile options), but I assume the only explanation for such awful 
performance is JNA.

I've attached a diff that should apply to 2.1.

> Remove JEMallocAllocator
> 
>
> Key: CASSANDRA-7030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 2.1 beta2
>
> Attachments: 7030.txt, benchmark.21.diff.txt
>
>
> JEMalloc, whilst having some nice performance properties by comparison to 
> Doug Lea's standard malloc algorithm in principle, is pointless in practice 
> because of the JNA cost. In general it is around 30x more expensive to call 
> than unsafe.allocate(); malloc does not have a variability of response time 
> as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
> sensible idea. I doubt if custom JNI would make it worthwhile either.
> I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970297#comment-13970297
 ] 

Vijay commented on CASSANDRA-7030:
--

You are right i had the synchronization in the test attached in the old ticket 
because initially we had some segfaults which was fixed in the later JEM 
releases, but the synchronization was never committed into cassandra repo 
because by then it was fixed.

Rerunning the test after removing the locks in the same old test classes, the 
results the time take is much better in jemalloc, you might need more runs. The 
memory foot print is better too (malloc is slower and uses more memory 
comparatively as per my tests).
http://pastebin.com/JtixVvGU

As mentioned earlier i don't mind removing it either :)

> Remove JEMallocAllocator
> 
>
> Key: CASSANDRA-7030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 2.1 beta2
>
> Attachments: 7030.txt
>
>
> JEMalloc, whilst having some nice performance properties by comparison to 
> Doug Lea's standard malloc algorithm in principle, is pointless in practice 
> because of the JNA cost. In general it is around 30x more expensive to call 
> than unsafe.allocate(); malloc does not have a variability of response time 
> as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
> sensible idea. I doubt if custom JNI would make it worthwhile either.
> I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


git commit: Allow cassandra to compile under java 8

2014-04-15 Thread dbrosius
Repository: cassandra
Updated Branches:
  refs/heads/trunk 2804ce994 -> 4d0691759


Allow cassandra to compile under java 8

patch by dbrosius reviewed by jmckenzie for cassandra-7028


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4d069175
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4d069175
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4d069175

Branch: refs/heads/trunk
Commit: 4d0691759a19f1faafe889d765145ae6a5096397
Parents: 2804ce9
Author: Dave Brosius 
Authored: Tue Apr 15 20:36:16 2014 -0400
Committer: Dave Brosius 
Committed: Tue Apr 15 20:38:32 2014 -0400

--
 CHANGES.txt  |   1 +
 build.xml|  11 ---
 lib/antlr-3.2.jar| Bin 1928009 -> 0 bytes
 lib/antlr-runtime-3.5.2.jar  | Bin 0 -> 167761 bytes
 lib/licenses/antlr-3.2.txt   |  27 --
 lib/licenses/antlr-runtime-3.5.2.txt |  27 ++
 lib/licenses/stringtemplate-4.0.2.txt|  27 ++
 lib/stringtemplate-4.0.2.jar | Bin 0 -> 226406 bytes
 src/java/org/apache/cassandra/cql3/Cql.g |  22 -
 9 files changed, 80 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d069175/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index cbf82de..2fbf3ae 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -4,6 +4,7 @@
  * Remove CQL2 (CASSANDRA-5918)
  * Add Thrift get_multi_slice call (CASSANDRA-6757)
  * Optimize fetching multiple cells by name (CASSANDRA-6933)
+ * Allow compilation in java 8 (CASSANDRA-7208)
 
 
 2.1.0-beta2

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d069175/build.xml
--
diff --git a/build.xml b/build.xml
index 8c4cb7b..9326424 100644
--- a/build.xml
+++ b/build.xml
@@ -190,7 +190,7 @@
 
   Building Grammar ${build.src.java}/org/apache/cassandra/cli/Cli.g  

   
  
@@ -211,7 +211,7 @@
 
   Building Grammar ${build.src.java}/org/apache/cassandra/cql3/Cql.g 
 ...
   
  
@@ -330,7 +330,9 @@
   
   
   
-  
+  
+  
+  
   
   
   
@@ -403,6 +405,7 @@

 

+
 
   
 
@@ -444,6 +447,8 @@
 
 
 
+
+
 
 
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d069175/lib/antlr-3.2.jar
--
diff --git a/lib/antlr-3.2.jar b/lib/antlr-3.2.jar
deleted file mode 100644
index fdd167d..000
Binary files a/lib/antlr-3.2.jar and /dev/null differ

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d069175/lib/antlr-runtime-3.5.2.jar
--
diff --git a/lib/antlr-runtime-3.5.2.jar b/lib/antlr-runtime-3.5.2.jar
new file mode 100644
index 000..d48e3e8
Binary files /dev/null and b/lib/antlr-runtime-3.5.2.jar differ

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d069175/lib/licenses/antlr-3.2.txt
--
diff --git a/lib/licenses/antlr-3.2.txt b/lib/licenses/antlr-3.2.txt
deleted file mode 100644
index 015a53d..000
--- a/lib/licenses/antlr-3.2.txt
+++ /dev/null
@@ -1,27 +0,0 @@
-
-Copyright (c) 2003-2006 Terence Parr
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions
-are met:
-
- 1. Redistributions of source code must retain the above copyright
-notice, this list of conditions and the following disclaimer.
- 2. Redistributions in binary form must reproduce the above copyright
-notice, this list of conditions and the following disclaimer in the
-documentation and/or other materials provided with the distribution.
- 3. The name of the author may not be used to endorse or promote products
-derived from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
-IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
-OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
-IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
-INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
-NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-THEORY OF LI

[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-04-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970268#comment-13970268
 ] 

Aleksey Yeschenko commented on CASSANDRA-6572:
--

I'd say 3.0, with 2.1 being so close, and delayed.

> Workload recording / playback
> -
>
> Key: CASSANDRA-6572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core, Tools
>Reporter: Jonathan Ellis
>Assignee: Lyuben Todorov
> Fix For: 2.0.8
>
> Attachments: 6572-trunk.diff
>
>
> "Write sample mode" gets us part way to testing new versions against a real 
> world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6602) Compaction improvements to optimize time series data

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6602:


Summary: Compaction improvements to optimize time series data  (was: 
Enhancements to optimize for the storing of time series data)

> Compaction improvements to optimize time series data
> 
>
> Key: CASSANDRA-6602
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6602
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Tupshin Harper
>  Labels: performance
> Fix For: 3.0
>
>
> There are some unique characteristics of many/most time series use cases that 
> both provide challenges, as well as provide unique opportunities for 
> optimizations.
> One of the major challenges is in compaction. The existing compaction 
> strategies will tend to re-compact data on disk at least a few times over the 
> lifespan of each data point, greatly increasing the cpu and IO costs of that 
> write.
> Compaction exists to
> 1) ensure that there aren't too many files on disk
> 2) ensure that data that should be contiguous (part of the same partition) is 
> laid out contiguously
> 3) deleting data due to ttls or tombstones
> The special characteristics of time series data allow us to optimize away all 
> three.
> Time series data
> 1) tends to be delivered in time order, with relatively constrained exceptions
> 2) often has a pre-determined and fixed expiration date
> 3) Never gets deleted prior to TTL
> 4) Has relatively predictable ingestion rates
> Note that I filed CASSANDRA-5561 and this ticket potentially replaces or 
> lowers the need for it. In that ticket, jbellis reasonably asks, how that 
> compaction strategy is better than disabling compaction.
> Taking that to heart, here is a compaction-strategy-less approach that could 
> be extremely efficient for time-series use cases that follow the above 
> pattern.
> (For context, I'm thinking of an example use case involving lots of streams 
> of time-series data with a 5GB per day ingestion rate, and a 1000 day 
> retention with TTL, resulting in an eventual steady state of 5TB per node)
> 1) You have an extremely large memtable (preferably off heap, if/when doable) 
> for the table, and that memtable is sized to be able to hold a lengthy window 
> of time. A typical period might be one day. At the end of that period, you 
> flush the contents of the memtable to an sstable and move to the next one. 
> This is basically identical to current behaviour, but with thresholds 
> adjusted so that you can ensure flushing at predictable intervals. (Open 
> question is whether predictable intervals is actually necessary, or whether 
> just waiting until the huge memtable is nearly full is sufficient)
> 2) Combine the behaviour with CASSANDRA-5228 so that sstables will be 
> efficiently dropped once all of the columns have. (Another side note, it 
> might be valuable to have a modified version of CASSANDRA-3974 that doesn't 
> bother storing per-column TTL since it is required that all columns have the 
> same TTL)
> 3) Be able to mark column families as read/write only (no explicit deletes), 
> so no tombstones.
> 4) Optionally add back an additional type of delete that would delete all 
> data earlier than a particular timestamp, resulting in immediate dropping of 
> obsoleted sstables.
> The result is that for in-order delivered data, Every cell will be laid out 
> optimally on disk on the first pass, and over the course of 1000 days and 5TB 
> of data, there will "only" be 1000 5GB sstables, so the number of filehandles 
> will be reasonable.
> For exceptions (out-of-order delivery), most cases will be caught by the 
> extended (24 hour+) memtable flush times and merged correctly automatically. 
> For those that were slightly askew at flush time, or were delivered so far 
> out of order that they go in the wrong sstable, there is relatively low 
> overhead to reading from two sstables for a time slice, instead of one, and 
> that overhead would be incurred relatively rarely unless out-of-order 
> delivery was the common case, in which case, this strategy should not be used.
> Another possible optimization to address out-of-order would be to maintain 
> more than one time-centric memtables in memory at a time (e.g. two 12 hour 
> ones), and then you always insert into whichever one of the two "owns" the 
> appropriate range of time. By delaying flushing the ahead one until we are 
> ready to roll writes over to a third one, we are able to avoid any 
> fragmentation as long as all deliveries come in no more than 12 hours late 
> (in this example, presumably tunable).
> Anything that triggers compactions will have to be looked at, since there 
> won't be any. The one concern I have is the ramificaiton of 

[jira] [Updated] (CASSANDRA-6066) LHF 2i performance improvements

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6066:


Labels: performance  (was: )

> LHF 2i performance improvements
> ---
>
> Key: CASSANDRA-6066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6066
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksey Yeschenko
>Assignee: Lyuben Todorov
>Priority: Minor
>  Labels: performance
> Fix For: 2.0.8
>
>
> We should perform more aggressive paging over the index partition (costs us 
> nothing) and also fetch the rows from the base table in one slice query (at 
> least the ones belonging to the same partition).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-5220:


Labels: performance repair  (was: )

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-7029) Investigate alternative transport protocols for both client and inter-server communications

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-7029:
---

Assignee: Benedict

> Investigate alternative transport protocols for both client and inter-server 
> communications
> ---
>
> Key: CASSANDRA-7029
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7029
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> There are a number of reasons to think we can do better than TCP for our 
> communications:
> 1) We can actually tolerate sporadic small message losses, so guaranteed 
> delivery isn't essential (although for larger messages it probably is)
> 2) As shown in \[1\] and \[2\], Linux can behave quite suboptimally with 
> regard to TCP message delivery when the system is under load. Judging from 
> the theoretical description, this is likely to apply even when the 
> system-load is not high, but the number of processes to schedule is high. 
> Cassandra generally has a lot of threads to schedule, so this is quite 
> pertinent for us. UDP performs substantially better here.
> 3) Even when the system is not under load, UDP has a lower CPU burden, and 
> that burden is constant regardless of the number of connections it processes. 
> 4) On a simple benchmark on my local PC, using non-blocking IO for UDP and 
> busy spinning on IO I can actually push 20-40% more throughput through 
> loopback (where TCP should be optimal, as no latency), even for very small 
> messages. Since we can see networking taking multiple CPUs' worth of time 
> during a stress test, using a busy-spin for ~100micros after last message 
> receipt is almost certainly acceptable, especially as we can (ultimately) 
> process inter-server and client communications on the same thread/socket in 
> this model.
> 5) We can optimise the threading model heavily: since we generally process 
> very small messages (200 bytes not at all implausible), the thread signalling 
> costs on the processing thread can actually dramatically impede throughput. 
> In general it costs ~10micros to signal (and passing the message to another 
> thread for processing in the current model requires signalling). For 200-byte 
> messages this caps our throughput at 20MB/s.
> I propose to knock up a highly naive UDP-based connection protocol with 
> super-trivial congestion control over the course of a few days, with the only 
> initial goal being maximum possible performance (not fairness, reliability, 
> or anything else), and trial it in Netty (possibly making some changes to 
> Netty to mitigate thread signalling costs). The reason for knocking up our 
> own here is to get a ceiling on what the absolute limit of potential for this 
> approach is. Assuming this pans out with performance gains in C* proper, we 
> then look to contributing to/forking the udt-java project and see how easy it 
> is to bring performance in line with what we can get with our naive approach 
> (I don't suggest starting here, as the project is using blocking old-IO, and 
> modifying it with latency in mind may be challenging, and we won't know for 
> sure what the best case scenario is).
> \[1\] 
> http://test-docdb.fnal.gov/0016/001648/002/Potential%20Performance%20Bottleneck%20in%20Linux%20TCP.PDF
> \[2\] 
> http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=1968;filename=Performance%20Analysis%20of%20Linux%20Networking%20-%20Packet%20Receiving%20(Official).pdf;version=2
> Further related reading:
> http://public.dhe.ibm.com/software/commerce/doc/mft/cdunix/41/UDTWhitepaper.pdf
> https://mospace.umsystem.edu/xmlui/bitstream/handle/10355/14482/ChoiUndPerTcp.pdf?sequence=1
> https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.3762&rep=rep1&type=pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-5019) Still too much object allocation on reads

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-5019:
---

Assignee: (was: Benedict)

> Still too much object allocation on reads
> -
>
> Key: CASSANDRA-5019
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5019
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Ellis
>  Labels: performance
> Fix For: 3.0
>
>
> ArrayBackedSortedColumns was a step in the right direction but it's still 
> relatively heavyweight thanks to allocating individual Columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6809) Compressed Commit Log

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6809:
---

Assignee: (was: Benedict)

> Compressed Commit Log
> -
>
> Key: CASSANDRA-6809
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 3.0
>
>
> It seems an unnecessary oversight that we don't compress the commit log. 
> Doing so should improve throughput, but some care will need to be taken to 
> ensure we use as much of a segment as possible. I propose decoupling the 
> writing of the records from the segments. Basically write into a (queue of) 
> DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X 
> MB written to the CL (where X is ordinarily CLS size), and then pack as many 
> of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6861) Eliminate garbage in server-side native transport

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6861:
---

Assignee: (was: Benedict)

> Eliminate garbage in server-side native transport
> -
>
> Key: CASSANDRA-6861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6861
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Now we've upgraded to Netty 4, we're generating a lot of garbage that could 
> be avoided, so we should probably stop that. Should be reasonably easy to 
> hook into Netty's pooled buffers, returning them to the pool once a given 
> message is completed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6726) Recycle CRAR/RAR buffers independently of their owners, and move them off-heap when possible

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6726:
---

Assignee: (was: Benedict)

> Recycle CRAR/RAR buffers independently of their owners, and move them 
> off-heap when possible
> 
>
> Key: CASSANDRA-6726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6726
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 3.0
>
>
> Whilst CRAR and RAR are pooled, we could and probably should pool the buffers 
> independently, so that they are not tied to a specific sstable. It may be 
> possible to move the RAR buffer off-heap, and the CRAR sometimes (e.g. Snappy 
> may possibly support off-heap buffers)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6755) Optimise CellName/Composite comparisons for NativeCell

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6755:
---

Assignee: (was: Benedict)

> Optimise CellName/Composite comparisons for NativeCell
> --
>
> Key: CASSANDRA-6755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6755
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6694, to reduce temporary garbage generation we 
> should minimise the incidence of CellName component extraction. The biggest 
> win will be to perform comparisons on Cell where possible, instead of 
> CellName, so that Native*Cell can use its extra information to avoid creating 
> any ByteBuffer objects



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6976) Determining replicas to query is very slow with large numbers of nodes or vnodes

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6976:
---

Assignee: (was: Benedict)

> Determining replicas to query is very slow with large numbers of nodes or 
> vnodes
> 
>
> Key: CASSANDRA-6976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6976
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 2.1
>
>
> As described in CASSANDRA-6906, this can be ~100ms for a relatively small 
> cluster with vnodes, which is longer than it will spend in transit on the 
> network. This should be much faster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6935) Make clustering part of primary key a first order component in the storage engine

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6935:
---

Assignee: (was: Benedict)

> Make clustering part of primary key a first order component in the storage 
> engine
> -
>
> Key: CASSANDRA-6935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> It would be helpful for a number of upcoming improvements if the clustering 
> part of the primary key were extracted from CellName, and if a ColumnFamily 
> object could store multiple ClusteredRow (or similar) instances, within which 
> each cell is keyed only by the column identifier.
> This would also, by itself, reduce on comparison costs and also permit memory 
> savings in memtables, by sharing the clustering part of the primary key 
> across all cells in the same row. It might also make it easier to move more 
> data off-heap, by constructing an off-heap clustered row, but keeping the 
> partition level object on-heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6936:
---

Assignee: (was: Benedict)

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970143#comment-13970143
 ] 

Aleksey Yeschenko commented on CASSANDRA-6949:
--

Probably talking about this - 
https://github.com/apache/cassandra/blob/2804ce9945a83a696e36b4add7a684b132fdef7c/src/java/org/apache/cassandra/db/compaction/LazilyCompactedRow.java#L226-L230

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-7024) Create snapshot selectively during sequential repair

2014-04-15 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita resolved CASSANDRA-7024.
---

Resolution: Fixed

Thanks, committed.
And yes, it looks like SnapshotCommand is not used any more, but I still leave 
it for now.

> Create snapshot selectively during sequential repair 
> -
>
> Key: CASSANDRA-7024
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7024
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
> Fix For: 2.1 beta2
>
> Attachments: 
> 0001-Only-snapshot-SSTables-related-to-validating-range.patch
>
>
> When doing snapshot repair, right now we snapshot all SSTables, open them and 
> use just part of them for building MerkleTree.
> Instead, we can snapshot and use only SSTables that are necessary to build 
> MerkleTree of interested range.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[2/3] git commit: Snapshot only related SSTables when sequential repair

2014-04-15 Thread yukim
Snapshot only related SSTables when sequential repair

patch by yukim; reviewed by jmckenzie for CASSANDRA-7024


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/de8a479f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/de8a479f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/de8a479f

Branch: refs/heads/trunk
Commit: de8a479f2e1a8b536dedf2e6470301709bc3d9dc
Parents: b69f5e3
Author: Yuki Morishita 
Authored: Tue Apr 15 17:13:45 2014 -0500
Committer: Yuki Morishita 
Committed: Tue Apr 15 17:13:45 2014 -0500

--
 CHANGES.txt |  1 +
 .../apache/cassandra/db/ColumnFamilyStore.java  | 18 ++-
 .../repair/RepairMessageVerbHandler.java| 33 +---
 .../apache/cassandra/repair/SnapshotTask.java   |  8 +--
 .../repair/messages/RepairMessage.java  |  3 +-
 .../repair/messages/SnapshotMessage.java| 53 
 6 files changed, 100 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 592eef9..9f34023 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -45,6 +45,7 @@
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
  * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
+ * Only snapshot relative SSTables when sequential repair (CASSANDRA-7024)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index ffea243..923ea5b 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -30,6 +30,7 @@ import javax.management.*;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Function;
+import com.google.common.base.Predicate;
 import com.google.common.collect.*;
 import com.google.common.util.concurrent.*;
 import com.google.common.util.concurrent.Futures;
@@ -2153,6 +2154,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 
 public void snapshotWithoutFlush(String snapshotName)
 {
+snapshotWithoutFlush(snapshotName, null);
+}
+
+public void snapshotWithoutFlush(String snapshotName, 
Predicate predicate)
+{
 for (ColumnFamilyStore cfs : concatWithIndexes())
 {
 DataTracker.View currentView = cfs.markCurrentViewReferenced();
@@ -2161,6 +2167,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 {
 for (SSTableReader ssTable : currentView.sstables)
 {
+if (predicate != null && !predicate.apply(ssTable))
+{
+continue;
+}
+
 File snapshotDirectory = 
Directories.getSnapshotDirectory(ssTable.descriptor, snapshotName);
 ssTable.createLinks(snapshotDirectory.getPath()); // hard 
links
 if (logger.isDebugEnabled())
@@ -2190,8 +2201,13 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
  */
 public void snapshot(String snapshotName)
 {
+snapshot(snapshotName, null);
+}
+
+public void snapshot(String snapshotName, Predicate 
predicate)
+{
 forceBlockingFlush();
-snapshotWithoutFlush(snapshotName);
+snapshotWithoutFlush(snapshotName, predicate);
 }
 
 public boolean snapshotExists(String snapshotName)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
--
diff --git a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java 
b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
index bb66b69..d710652 100644
--- a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
+++ b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
@@ -18,30 +18,32 @@
 package org.apache.cassandra.repair;
 
 import java.util.ArrayList;
+import java.util.Collections;
 import java.util.List;
 import java.util.UUID;
 import java.util.concurrent.Future;
 
+import com.google.common.base.Predicate;
+import org.slf4j.

[1/3] git commit: Snapshot only related SSTables when sequential repair

2014-04-15 Thread yukim
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 b69f5e363 -> de8a479f2
  refs/heads/trunk fc4ae115a -> 2804ce994


Snapshot only related SSTables when sequential repair

patch by yukim; reviewed by jmckenzie for CASSANDRA-7024


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/de8a479f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/de8a479f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/de8a479f

Branch: refs/heads/cassandra-2.1
Commit: de8a479f2e1a8b536dedf2e6470301709bc3d9dc
Parents: b69f5e3
Author: Yuki Morishita 
Authored: Tue Apr 15 17:13:45 2014 -0500
Committer: Yuki Morishita 
Committed: Tue Apr 15 17:13:45 2014 -0500

--
 CHANGES.txt |  1 +
 .../apache/cassandra/db/ColumnFamilyStore.java  | 18 ++-
 .../repair/RepairMessageVerbHandler.java| 33 +---
 .../apache/cassandra/repair/SnapshotTask.java   |  8 +--
 .../repair/messages/RepairMessage.java  |  3 +-
 .../repair/messages/SnapshotMessage.java| 53 
 6 files changed, 100 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 592eef9..9f34023 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -45,6 +45,7 @@
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
  * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
+ * Only snapshot relative SSTables when sequential repair (CASSANDRA-7024)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index ffea243..923ea5b 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -30,6 +30,7 @@ import javax.management.*;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Function;
+import com.google.common.base.Predicate;
 import com.google.common.collect.*;
 import com.google.common.util.concurrent.*;
 import com.google.common.util.concurrent.Futures;
@@ -2153,6 +2154,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 
 public void snapshotWithoutFlush(String snapshotName)
 {
+snapshotWithoutFlush(snapshotName, null);
+}
+
+public void snapshotWithoutFlush(String snapshotName, 
Predicate predicate)
+{
 for (ColumnFamilyStore cfs : concatWithIndexes())
 {
 DataTracker.View currentView = cfs.markCurrentViewReferenced();
@@ -2161,6 +2167,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 {
 for (SSTableReader ssTable : currentView.sstables)
 {
+if (predicate != null && !predicate.apply(ssTable))
+{
+continue;
+}
+
 File snapshotDirectory = 
Directories.getSnapshotDirectory(ssTable.descriptor, snapshotName);
 ssTable.createLinks(snapshotDirectory.getPath()); // hard 
links
 if (logger.isDebugEnabled())
@@ -2190,8 +2201,13 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
  */
 public void snapshot(String snapshotName)
 {
+snapshot(snapshotName, null);
+}
+
+public void snapshot(String snapshotName, Predicate 
predicate)
+{
 forceBlockingFlush();
-snapshotWithoutFlush(snapshotName);
+snapshotWithoutFlush(snapshotName, predicate);
 }
 
 public boolean snapshotExists(String snapshotName)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
--
diff --git a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java 
b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
index bb66b69..d710652 100644
--- a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
+++ b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
@@ -18,30 +18,32 @@
 package org.apache.cassandra.repair;
 
 import java.util.ArrayList;
+import java.util.Collections;
 impo

[3/3] git commit: Merge branch 'cassandra-2.1' into trunk

2014-04-15 Thread yukim
Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2804ce99
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2804ce99
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2804ce99

Branch: refs/heads/trunk
Commit: 2804ce9945a83a696e36b4add7a684b132fdef7c
Parents: fc4ae11 de8a479
Author: Yuki Morishita 
Authored: Tue Apr 15 17:15:01 2014 -0500
Committer: Yuki Morishita 
Committed: Tue Apr 15 17:15:01 2014 -0500

--
 CHANGES.txt |  1 +
 .../apache/cassandra/db/ColumnFamilyStore.java  | 18 ++-
 .../repair/RepairMessageVerbHandler.java| 33 +---
 .../apache/cassandra/repair/SnapshotTask.java   |  8 +--
 .../repair/messages/RepairMessage.java  |  3 +-
 .../repair/messages/SnapshotMessage.java| 53 
 6 files changed, 100 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/2804ce99/CHANGES.txt
--



[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-04-15 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970130#comment-13970130
 ] 

Tyler Hobbs commented on CASSANDRA-6572:


It's a pretty safe patch, but as a non-essential feature I think it should be 
reserved for 2.1 or 3.0.

> Workload recording / playback
> -
>
> Key: CASSANDRA-6572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core, Tools
>Reporter: Jonathan Ellis
>Assignee: Lyuben Todorov
> Fix For: 2.0.8
>
> Attachments: 6572-trunk.diff
>
>
> "Write sample mode" gets us part way to testing new versions against a real 
> world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7043) CommitLogArchiver thread pool name inconsistent with others

2014-04-15 Thread Chris Lohfink (JIRA)
Chris Lohfink created CASSANDRA-7043:


 Summary: CommitLogArchiver thread pool name inconsistent with 
others
 Key: CASSANDRA-7043
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7043
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Chris Lohfink
Priority: Trivial
 Attachments: namechange.diff

Pretty trivial... The names of all ThreadPoolExecutors are in CamelCase except 
the CommitLogArchiver as commitlog_archiver.  This shows up a little more 
obvious in tpstats output:

{code}
nodetool tpstats

Pool NameActive   Pending  Completed   Blocked  
ReadStage 0 0 113702 0  
 
RequestResponseStage  0 0  0 0  
 
...
PendingRangeCalculator0 0  1 0  
   
commitlog_archiver0 0  0 0  
   
InternalResponseStage 0 0  0 0  
   
HintedHandoff 0 0  0 0  
   
{code}

Seems minor enough to update this to be CommitLogArchiver but it may mean 
changes in any monitoring applications (although I don't think this particular 
pool has had much runtime or monitoring needs).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969927#comment-13969927
 ] 

Benedict commented on CASSANDRA-5863:
-

I think there are at least three issues we're contending with here, and each 
need their own ticket (eventually). Putting historic data on slow drives is, I 
think, a different problem to putting a cache on some fast disks. Both will be 
helpful. Ideally I think we want the following tiers:

# Uncompressed Memory Cache
# Compressed Memory Cache (disjoint set from 1)
# Compressed SSD cache
# Regular Data
# Archived/Cold/Historic Data

The main distinction being the added "regular data" layer: any special "fast 
disk" cache should not store the full sstable hierarchy and its related files, 
it should just store the most popular blocks (or portions of blocks)

bq. Benedict you are describing building a custom page cache impl off heap 
which is pretty ambitious. Don't you think a baby step would be to rely on the 
OS page cache to start and build a custom one as a phase II?

People get very worried when they think they're competing with the kernel 
developers. Often for good reason, but since we don't have to be all things to 
all people we get the opportunity to make economies that aren't always as 
easily available to them. But also we only need to get roughly the same 
performance so we can build on this to make inroads elsewhere. What we're 
talking about here is pretty straight forward - it's one of the less 
challenging problems. A compressed page cache is more challenging, since we 
don't have a uniform size, but it is still probably not too difficult. Take a 
look at my suggestion for a key cache in CASSANDRA-6709 for a detailed 
description of how I would build the offheap structure.

The basic approach I would probably take is this: deal with 4Kb blocks. Any 
blocks we read from disk larger than this we split up into 4Kb chunks and 
insert each into the cache separately*. The cache itself is 8- or 16-way 
associative, with 3 components: a long storing the LRU information for the 
bucket, 16-longs storing identity information for the lookup within the bucket, 
and corresponding positions in a large address space storing each of the 4Kb 
data chunks. Readers always hit the cache, and if they miss they populate the 
cache using the appropriate reader before continuing. Regrettably we don't have 
access to SIMD instructions or we could do a lot of this stuff tremendously 
efficiently, but even without that it should be pretty nippy.

*This allows us to have a greater granularity for eviction and keeps cpu-cache 
traffic when reading from the cache to a minimum. It's also a pretty optimal 
size for reading/writing to SSD if we overflow to disk, and is a sufficiently 
large amount to get good compression for an in-memory compressed cache, whilst 
still being small enough to stream&decompress from main-memory without a major 
penalty to lookup a small part of it.

As to having a fast disk cache, I also think this is a great idea. But I think 
it fits in as an extension of this and any compressed in-memory cache, as we 
build a tiered-cache architecture.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: Pavel Yaskevich
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969891#comment-13969891
 ] 

Benedict commented on CASSANDRA-7034:
-

Your statement is that your files are 32Mb in size. This is correct. On all VMs 
they should be 32Mb in size, and there should be at most 32 of them on a 64-bit 
architecture, except when the data directories are behind the commit log, in 
which case there can be more. On a 32-bit architecture there would be only 1 
commit log file.

> commitlog files are 32MB in size, even with a 64bit  OS and jvm
> ---
>
> Key: CASSANDRA-7034
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Donald Smith
>
> We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
> {noformat}
> > java -version
> Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
> {noformat}
> That is the version of java CassandraDaemon is using.
> We used the default setting (None) in cassandra.yaml for 
> commitlog_total_space_in_mb:
> {noformat}
> # Total space to use for commitlogs.  Since commitlog segments are
> # mmapped, and hence use up address space, the default size is 32
> # on 32-bit JVMs, and 1024 on 64-bit JVMs.
> #
> # If space gets above this value (it will round up to the next nearest
> # segment multiple), Cassandra will flush every dirty CF in the oldest
> # segment and remove it.  So a small total commitlog space will tend
> # to cause more flush activity on less-active columnfamilies.
> # commitlog_total_space_in_mb: 4096
> {noformat}
> But our commitlog files are 32MB in size, not 1024MB.
> OpsCenter confirms that commitlog_total_space_in_mb is None.
> I don't think the problem is in cassandra-env.sh, because when I run it 
> manually and echo the  values of the version variables I get:
> {noformat}
> jvmver=1.7.0_40
> JVM_VERSION=1.7.0
> JVM_ARCH=64-Bit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-15 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969886#comment-13969886
 ] 

Donald Smith commented on CASSANDRA-7034:
-

Benedict, I'm aware that *commitlog_total_space_in_mb* has that purpose.  What 
I'm raising is the issue that this comment in cassandra,yaml is now wrong: "the 
default size is 32 on 32-bit JVMs, and 1024 on 64-bit JVMs.."  That's no longer 
being enforced.

> commitlog files are 32MB in size, even with a 64bit  OS and jvm
> ---
>
> Key: CASSANDRA-7034
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Donald Smith
>
> We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
> {noformat}
> > java -version
> Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
> {noformat}
> That is the version of java CassandraDaemon is using.
> We used the default setting (None) in cassandra.yaml for 
> commitlog_total_space_in_mb:
> {noformat}
> # Total space to use for commitlogs.  Since commitlog segments are
> # mmapped, and hence use up address space, the default size is 32
> # on 32-bit JVMs, and 1024 on 64-bit JVMs.
> #
> # If space gets above this value (it will round up to the next nearest
> # segment multiple), Cassandra will flush every dirty CF in the oldest
> # segment and remove it.  So a small total commitlog space will tend
> # to cause more flush activity on less-active columnfamilies.
> # commitlog_total_space_in_mb: 4096
> {noformat}
> But our commitlog files are 32MB in size, not 1024MB.
> OpsCenter confirms that commitlog_total_space_in_mb is None.
> I don't think the problem is in cassandra-env.sh, because when I run it 
> manually and echo the  values of the version variables I get:
> {noformat}
> jvmver=1.7.0_40
> JVM_VERSION=1.7.0
> JVM_ARCH=64-Bit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969882#comment-13969882
 ] 

Benedict commented on CASSANDRA-7040:
-

bq.  I don't think that's necessarily blocked by this work

Sure - and if you want to start building one right now, go to town :)

I only mean that I think it builds on the work here and in 5863, as they both 
involve intercepting the points at which we perform disk accesses and inserting 
some (minimal) coordination inebetween them. Swapping those interception points 
for something more intelligent is probably more straightforward once we've done 
that, and having a cache in which to deposit the result is _probably_ helpful 
too (definitely none of this is 100% essential though).

> Replace read/write stage with per-disk access coordination
> --
>
> Key: CASSANDRA-7040
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6995, current coordination of access to disk is 
> suboptimal: instead of ensuring disk accesses alone are coordinated, we 
> instead coordinate at the level of operations that may touch the disks, 
> ensuring only so many are proceeding at once. As such, tuning is difficult, 
> and we incur unnecessary delays for operations that would not touch the 
> disk(s).
> Ideally we would instead simply use a shared coordination primitive to gate 
> access to the disk when we perform a rebuffer. This work would dovetail very 
> nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
> context switching for data that we know to be cached. It also, as far as I 
> can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6487) Log WARN on large batch sizes

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969880#comment-13969880
 ] 

Benedict commented on CASSANDRA-6487:
-

I suggest using the ColumnFamily.dataSize() method as Aleksey suggested: in the 
BatchStatement.executeWithConditions() and executeWithoutConditions() methods 
we have access to the fully constructed ColumnFamily objects we will apply. In 
the former we construct a single CF _updates_, and in the latter we can iterate 
over each of the IMutations and call _getColumnFamilies()_.

Warning on the prepared size is probably not meaningful, because it does not 
say anything about how big the data we're applying is.

> Log WARN on large batch sizes
> -
>
> Key: CASSANDRA-6487
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Lyuben Todorov
>Priority: Minor
> Fix For: 2.0.8
>
> Attachments: 6487_trunk.patch, 6487_trunk_v2.patch, 
> cassandra-2.0-6487.diff
>
>
> Large batches on a coordinator can cause a lot of node stress. I propose 
> adding a WARN log entry if batch sizes go beyond a configurable size. This 
> will give more visibility to operators on something that can happen on the 
> developer side. 
> New yaml setting with 5k default.
> {{# Log WARN on any batch size exceeding this value. 5k by default.}}
> {{# Caution should be taken on increasing the size of this threshold as it 
> can lead to node instability.}}
> {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6572) Workload recording / playback

2014-04-15 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6572:
--

Reviewer: Tyler Hobbs

WDYT [~thobbs], is this uninvasive enough to make it into 2.0?

> Workload recording / playback
> -
>
> Key: CASSANDRA-6572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core, Tools
>Reporter: Jonathan Ellis
>Assignee: Lyuben Todorov
> Fix For: 2.0.8
>
> Attachments: 6572-trunk.diff
>
>
> "Write sample mode" gets us part way to testing new versions against a real 
> world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-04-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969839#comment-13969839
 ] 

Jonathan Ellis commented on CASSANDRA-6572:
---

How do you deal w/ prepared vs non-prepared queries?  Thinking of 
CASSANDRA-7021 here.

> Workload recording / playback
> -
>
> Key: CASSANDRA-6572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core, Tools
>Reporter: Jonathan Ellis
>Assignee: Lyuben Todorov
> Fix For: 2.0.8
>
> Attachments: 6572-trunk.diff
>
>
> "Write sample mode" gets us part way to testing new versions against a real 
> world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969831#comment-13969831
 ] 

T Jake Luciani commented on CASSANDRA-5863:
---

I do think having a set of fast disks for hot data that doesn't fit into memory 
is key because in a large per node deployment you want:

1.  Memory (Really hot data)
2.  SSD (Hot data that doesn't fit in memory)
3.  Spinning disk (Historic cold data) 

[~benedict] you are describing building a custom page cache impl off heap which 
is pretty ambitious.  Don't you think a baby step would be to rely on the OS 
page cache to start and build a custom one as a phase II?

What would be the page size for uncompressed data.  For compressed the chunk 
size (conceptually) fits nicely. 

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: Pavel Yaskevich
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6985) ReadExecutors should not rely on static StorageProxy

2014-04-15 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969798#comment-13969798
 ] 

Yuki Morishita commented on CASSANDRA-6985:
---

Ed,

I don't see the reason to pass StorageProxy to AbstractReadExecutor at all.
It is only used to get live sorted endpoints from it in getExecutor so why just 
pass List?

I see the only reason that StorageProxy singleton instance exists right now is 
mainly for JMX.
Is it more reasonable (for now?) to leave StorageProxy as an utility class/API 
and separate its managing aspect to another class?


> ReadExecutors should not rely on static StorageProxy
> 
>
> Key: CASSANDRA-6985
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6985
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 3.0
>
> Attachments: CASSANDRA_6985.1.patch
>
>
> All the Read Executor child classes require use of the Storage Proxy to carry 
> out read. We can pass the StorageProxy along in the constructor.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7028) Allow C* to compile under java 8

2014-04-15 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969765#comment-13969765
 ] 

Joshua McKenzie commented on CASSANDRA-7028:


Ah - good call on the runtime libraries.

v4 lost the full index so file deletion failed and the diff has references to 
the new .jar files which prevents it applying both with the files and without.  
I've attached a v5 that cleans up some whitespace complaints and includes the 
binary, both deletion and addition.  We should be able to just apply this to 
trunk and get all the changes - one-shot, no need to download libraries 
separately and place them for the committer.

The diff syntax I used to build this was 'git diff --full-index --binary 
 '.  Even w/full-index, if you don't include the binary flag 
it won't generate the data that goes with your new files you've added and you 
end up with an invalid patch as it has markers to add files but no binary data 
to place in them.

I reran tests on linux against this just to confirm changes to resolve 
HintedHandOffTest didn't munge with anything else and it all looks good on 
jdk7.  I'm +1 on the v5 patch; give it a run against trunk and let me know how 
it works for you.

> Allow C* to compile under java 8
> 
>
> Key: CASSANDRA-7028
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7028
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Dave Brosius
>Assignee: Dave Brosius
>Priority: Minor
> Fix For: 3.0
>
> Attachments: 7028.txt, 7028_v2.txt, 7028_v3.txt, 7028_v4.txt, 
> 7028_v5.patch
>
>
> antlr 3.2 has a problem with java 8, as described here: 
> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8015656
> updating to antlr 3.5.2 solves this, however they have split up the jars 
> differently, which adds some changes, but also the generation of 
> CqlParser.java causes a method to be too large, so i needed to split that 
> method to reduce the size of it.
> (patch against trunk)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7028) Allow C* to compile under java 8

2014-04-15 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-7028:
---

Attachment: 7028_v5.patch

> Allow C* to compile under java 8
> 
>
> Key: CASSANDRA-7028
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7028
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Dave Brosius
>Assignee: Dave Brosius
>Priority: Minor
> Fix For: 3.0
>
> Attachments: 7028.txt, 7028_v2.txt, 7028_v3.txt, 7028_v4.txt, 
> 7028_v5.patch
>
>
> antlr 3.2 has a problem with java 8, as described here: 
> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8015656
> updating to antlr 3.5.2 solves this, however they have split up the jars 
> differently, which adds some changes, but also the generation of 
> CqlParser.java causes a method to be too large, so i needed to split that 
> method to reduce the size of it.
> (patch against trunk)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969753#comment-13969753
 ] 

Jason Brown commented on CASSANDRA-7040:


CASSANDRA-5863 could be legit, as well :).

As to intelligent "storage manager", I don't think that's necessarily blocked 
by this work, but I do agree it's non-trivial undertaking.

> Replace read/write stage with per-disk access coordination
> --
>
> Key: CASSANDRA-7040
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6995, current coordination of access to disk is 
> suboptimal: instead of ensuring disk accesses alone are coordinated, we 
> instead coordinate at the level of operations that may touch the disks, 
> ensuring only so many are proceeding at once. As such, tuning is difficult, 
> and we incur unnecessary delays for operations that would not touch the 
> disk(s).
> Ideally we would instead simply use a shared coordination primitive to gate 
> access to the disk when we perform a rebuffer. This work would dovetail very 
> nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
> context switching for data that we know to be cached. It also, as far as I 
> can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7042) Disk space growth until restart

2014-04-15 Thread Zach Aller (JIRA)
Zach Aller created CASSANDRA-7042:
-

 Summary: Disk space growth until restart
 Key: CASSANDRA-7042
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7042
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04
Sun Java 7
Cassandra 2.0.6

Reporter: Zach Aller
Priority: Critical


Cassandra will constantly eat disk space not sure whats causing it the only 
thing that seems to fix it is a restart of cassandra this happens about every 
3-5 hrs we will grow from about 350GB to 650GB with no end in site. Once we 
restart cassandra it usually all clears itself up and disks return to normal 
for a while then something triggers its and starts climbing again. Sometimes 
when we restart compactions pending skyrocket and if we restart a second time 
the compactions pending drop off back to a normal level. One other thing to 
note is the space is not free'd until cassandra starts back up and not when 
shutdown.

I will get a clean log of before and after restarting next time it happens and 
post it.

Here is a common ERROR in our logs that might be related

ERROR [CompactionExecutor:46] 2014-04-15 09:12:51,040 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:46,1,main]
java.lang.RuntimeException: java.io.FileNotFoundException: 
/local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
(No such file or directory)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1355)
at 
org.apache.cassandra.io.sstable.SSTableScanner.(SSTableScanner.java:67)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1161)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1173)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getScanners(LeveledCompactionStrategy.java:194)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:258)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:126)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: 
/local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(Unknown Source)
at 
org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
at 
org.apache.cassandra.io.util.ThrottledReader.(ThrottledReader.java:35)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:49)
... 17 more





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969736#comment-13969736
 ] 

Benedict commented on CASSANDRA-7040:
-

bq. You could add in helpers like mincore (and row cache) to help inform you

Or CASSANDRA-5863 :-)

As to batching - that's another step further along: it would be interesting to 
experiment with an intelligent "storage manager" that requests are submitted 
to, and are coordinated by, but I think that comes after 5863 + this. There's 
lots of ways we might be able to get improved performance with that approach, 
but I'm not absolutely sure they'll pan out, and they'll be a non-trivial 
undertaking.

> Replace read/write stage with per-disk access coordination
> --
>
> Key: CASSANDRA-7040
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6995, current coordination of access to disk is 
> suboptimal: instead of ensuring disk accesses alone are coordinated, we 
> instead coordinate at the level of operations that may touch the disks, 
> ensuring only so many are proceeding at once. As such, tuning is difficult, 
> and we incur unnecessary delays for operations that would not touch the 
> disk(s).
> Ideally we would instead simply use a shared coordination primitive to gate 
> access to the disk when we perform a rebuffer. This work would dovetail very 
> nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
> context switching for data that we know to be cached. It also, as far as I 
> can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969726#comment-13969726
 ] 

Jason Brown commented on CASSANDRA-7040:


Martin Thompson mentions batching IO events in a talk at the react conf 2014: 
https://www.youtube.com/watch?v=4dfk3ucthN8 . The idea seems reasonable but I 
haven't investigated it yet. 

bq. that may touch the disks

Yeah, the key word here is *may*. You could add in helpers like mincore (and 
row cache) to help inform you if you have nothing in memory and that you'll be 
going to disk.

> Replace read/write stage with per-disk access coordination
> --
>
> Key: CASSANDRA-7040
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6995, current coordination of access to disk is 
> suboptimal: instead of ensuring disk accesses alone are coordinated, we 
> instead coordinate at the level of operations that may touch the disks, 
> ensuring only so many are proceeding at once. As such, tuning is difficult, 
> and we incur unnecessary delays for operations that would not touch the 
> disk(s).
> Ideally we would instead simply use a shared coordination primitive to gate 
> access to the disk when we perform a rebuffer. This work would dovetail very 
> nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
> context switching for data that we know to be cached. It also, as far as I 
> can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969724#comment-13969724
 ] 

Benedict commented on CASSANDRA-6949:
-

bq. Only until a compaction, which will also remove stale entries.

Does it? I don't see how...

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969716#comment-13969716
 ] 

Benedict commented on CASSANDRA-6949:
-

It's worth pointing out that a sensible intersection implementation over two 
ordered sets can be quite efficient and a fairly low computational burden, 
which is possibly a good middle ground. But if there's no real risk to getting 
rid of it, that's probably best.

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969715#comment-13969715
 ] 

Sam Tunnicliffe commented on CASSANDRA-6949:


Only until a compaction, which will also remove stale entries.

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969709#comment-13969709
 ] 

Benedict commented on CASSANDRA-6949:
-

I assume the only real risk with reverting is that if there are no reads we can 
get uncontrolled growth of the 2i?

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-6949:
---

Reviewer: Jonathan Ellis  (was: Sam Tunnicliffe)

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969692#comment-13969692
 ] 

Sam Tunnicliffe commented on CASSANDRA-6949:


That will help in the simple case where there are no indexes defined for the 
table, but it won't make a difference if there are. In other words, if the 
table has any indexes defined (including PerRowSecondaryIndexes, for which the 
specifics of the update are meaningless), we'll still iterate over every cell 
in that partition in the memtable to check it's not covered by the range 
tombstone. 

Personally, I'd prefer to revert the change to AtomicSortedColumns from 
CASSANDRA-5614 completely. It isn't necessary to ensure correctness in either 
KeysIndex or CompositesIndex as the repair-on-read behaviour cleans up any 
stale index entries (as does compaction). Given that, it doesn't seem worth the 
performance hit to ensure the 2i is kept absolutely in sync like this.

Attaching a patch against 2.0 to remove the ASC changes from 5614.


> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Jeremiah Jordan
> Attachments: 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-6949:
---

Attachment: 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Jeremiah Jordan
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-6949:
--

Assignee: Sam Tunnicliffe  (was: Jeremiah Jordan)

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 
> 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sergio Bossa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969655#comment-13969655
 ] 

Sergio Bossa commented on CASSANDRA-6949:
-

That's not enough: PRSI doesn't get notified of column-level deletes (they 
don't need to), so there would still be a performance regression in that case, 
even with that extra check.

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Jeremiah Jordan
> Attachments: 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-15 Thread Richard Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969613#comment-13969613
 ] 

Richard Low commented on CASSANDRA-5220:


It's going to be a lot slower when there's little data because there is 
num_tokens times as much work to do. But when there is lots of data the times 
should be pretty much independent of num_tokens because most of repair is spent 
reading data and hashing. I ran some tests when we were developing vnodes 
(sorry, I don't have the data still available) and this was the case. Something 
might have regressed though.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-6949:
---

Attachment: 6949.txt

Looks like we actually added that check in 2.1.  I don't know if there is more 
we want to do, but is it valid to just check
{noformat}
if (indexer != SecondaryIndexManager.nullUpdater && 
cm.deletionInfo().hasRanges())
{noformat}
instead of
{noformat}
if (cm.deletionInfo().hasRanges())
{noformat}

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
> Attachments: 6949.txt
>
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969587#comment-13969587
 ] 

Jeremiah Jordan commented on CASSANDRA-6949:


This code does't seem to check if there are actually indexes on the columns 
before checking all the range tombstone and isDeleted stuff.  If all those 
checks are really needed, can we at least only do them if there is actually a 
2i of some sort on the table?

> Performance regression in tombstone heavy workloads
> ---
>
> Key: CASSANDRA-6949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Sam Tunnicliffe
>
> CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
> workloads.  The isDeleted checks here cause a huge CPU overhead: 
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
> An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
> 2.0, with all of the mutation threads sitting in that loop.  For example:
> {noformat}
> "MutationStage:20" daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
> [0x7fb1b033]
>java.lang.Thread.State: RUNNABLE
> at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
> at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
> at 
> org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
> at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7041) Select query returns inconsistent result

2014-04-15 Thread Ngoc Minh Vo (JIRA)
Ngoc Minh Vo created CASSANDRA-7041:
---

 Summary: Select query returns inconsistent result
 Key: CASSANDRA-7041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7041
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra v2.0.6 (upgraded from v2.0.3)
4-node cluster: Windows7, 12GB JVM
Reporter: Ngoc Minh Vo
Priority: Critical


Hello,

We are running in an issue with C* v2.0.x: CQL queries randomly return empty 
result.
Here is the scenario:
1. Schema:
{noformat}
CREATE TABLE string_values (
  date int,
  field text,
  value text,
  PRIMARY KEY ((date, field), value)
) WITH
  bloom_filter_fp_chance=0.10 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};
{noformat}

2. There is no new data imported to the cluster during the test.

3. CQL query:
{noformat}
select * from string_values where date=20140122 and field='SCONYKSP1';
{noformat}

4. In Cqlsh, the same query has been executed several times during a short 
interval (~1-2 seconds). The first query results are empty and then we got the 
data. And from that point, we always get the correct result:
{noformat}
cqlsh:titan_test> select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
cqlsh:titan_test> select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
... ...
cqlsh:titan_test> select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
cqlsh:titan_test> select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
cqlsh:titan_test> select * from string_values where date=20140122 and 
field='SCONYKSP1';
 date | field | value
--+---+-
 20140122 | SCONYKSP1 | 201401220251826297a_0_3
(1 rows)
cqlsh:titan_test> select * from string_values where date=20140122 and 
field='SCONYKSP1';
 date | field | value
--+---+-
 20140122 | SCONYKSP1 | 201401220251826297a_0_3
(1 rows)
{noformat}

5. It might relate to some kind of "warmup" process. We tried to disable 
key/data caching but it does not help.

Upgrading cluster from v2.0.3 to v2.0.6 does not fix the issue (hence, not 
related to CASSANDRA-6555).

Long time ago, we posted a report on Java Driver JIRA: 
https://datastax-oss.atlassian.net/browse/JAVA-217. But it seems that the issue 
is in the server side.

Best regards,
Minh



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6802) Row cache improvements

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6802:


Labels: performance  (was: )

> Row cache improvements
> --
>
> Key: CASSANDRA-6802
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6802
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: performance
> Fix For: 3.0
>
>
> There are a few things we could do;
> * Start using the native memory constructs from CASSANDRA-6694 to avoid 
> serialization/deserialization costs and to minimize the on-heap overhead
> * Stop invalidating cached rows on writes (update on write instead).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-5863:


Summary: In process (uncompressed) page cache  (was: Create a Decompressed 
Chunk [block] Cache)

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: Pavel Yaskevich
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6487) Log WARN on large batch sizes

2014-04-15 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969501#comment-13969501
 ] 

Lyuben Todorov edited comment on CASSANDRA-6487 at 4/15/14 1:05 PM:


Just noticed that we're actually already using the memory meter for checking 
batch size when it might get placed into the prepared statement cache, so why 
not log based on that value (calculated in 
{{BatchStatement#measureForPreparedCache}}). As for non-prepared batch 
statements, there we can enforce a limit based on count of statements.


was (Author: lyubent):
Just noticed that we're actually already using the memory meter for checking 
batch size when it might get placed into the prepared statement cache, so why 
not log based on that value (calculated in 
{{BatchStatement#measureForPreparedCache}}). 

> Log WARN on large batch sizes
> -
>
> Key: CASSANDRA-6487
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Lyuben Todorov
>Priority: Minor
> Fix For: 2.0.8
>
> Attachments: 6487_trunk.patch, 6487_trunk_v2.patch, 
> cassandra-2.0-6487.diff
>
>
> Large batches on a coordinator can cause a lot of node stress. I propose 
> adding a WARN log entry if batch sizes go beyond a configurable size. This 
> will give more visibility to operators on something that can happen on the 
> developer side. 
> New yaml setting with 5k default.
> {{# Log WARN on any batch size exceeding this value. 5k by default.}}
> {{# Caution should be taken on increasing the size of this threshold as it 
> can lead to node instability.}}
> {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6995) Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to read stage

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969515#comment-13969515
 ] 

Benedict commented on CASSANDRA-6995:
-

I've split my suggestion out into another ticket: CASSANDRA-7040

> Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to 
> read stage
> 
>
> Key: CASSANDRA-6995
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6995
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: performance
> Fix For: 2.0.7
>
> Attachments: 6995-v1.diff, syncread-stress.txt
>
>
> When performing a read local to a coordinator node, AbstractReadExecutor will 
> create a new SP.LocalReadRunnable and drop it into the read stage for 
> asynchronous execution. If you are using a client that intelligently routes  
> read requests to a node holding the data for a given request, and are using 
> CL.ONE/LOCAL_ONE, the enqueuing SP.LocalReadRunnable and waiting for the 
> context switches (and possible NUMA misses) adds unneccesary latency. We can 
> reduce that latency and improve throughput by avoiding the queueing and 
> thread context switching by simply executing the SP.LocalReadRunnable 
> synchronously in the request thread. Testing on a three node cluster (each 
> with 32 cpus, 132 GB ram) yields ~10% improvement in throughput and ~20% 
> speedup on avg/95/99 percentiles (99.9% was about 5-10% improvement).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

2014-04-15 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969513#comment-13969513
 ] 

Jason Brown commented on CASSANDRA-4718:


OK, will give it a shot today. Also, just noticed I did not tune 
native_transport_max_threads at all (so I have the default of 128). Might play 
with that a bit, as well.

> More-efficient ExecutorService for improved throughput
> --
>
> Key: CASSANDRA-4718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: performance
> Fix For: 2.1
>
> Attachments: 4718-v1.patch, PerThreadQueue.java, baq vs trunk.png, op 
> costs of various queues.ods, stress op rate with various queues.ods, 
> v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969514#comment-13969514
 ] 

Benedict commented on CASSANDRA-7040:
-

Further, once we have this, we can experiment with periodically locking access 
to the disks (for short, say 20-50ms periods) in order to let 
compactions/flushes catch up with any outstanding work, if they appear to be 
getting behind.

> Replace read/write stage with per-disk access coordination
> --
>
> Key: CASSANDRA-7040
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6995, current coordination of access to disk is 
> suboptimal: instead of ensuring disk accesses alone are coordinated, we 
> instead coordinate at the level of operations that may touch the disks, 
> ensuring only so many are proceeding at once. As such, tuning is difficult, 
> and we incur unnecessary delays for operations that would not touch the 
> disk(s).
> Ideally we would instead simply use a shared coordination primitive to gate 
> access to the disk when we perform a rebuffer. This work would dovetail very 
> nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
> context switching for data that we know to be cached. It also, as far as I 
> can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)
Benedict created CASSANDRA-7040:
---

 Summary: Replace read/write stage with per-disk access coordination
 Key: CASSANDRA-7040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
 Fix For: 3.0


As discussed in CASSANDRA-6995, current coordination of access to disk is 
suboptimal: instead of ensuring disk accesses alone are coordinated, we instead 
coordinate at the level of operations that may touch the disks, ensuring only 
so many are proceeding at once. As such, tuning is difficult, and we incur 
unnecessary delays for operations that would not touch the disk(s).

Ideally we would instead simply use a shared coordination primitive to gate 
access to the disk when we perform a rebuffer. This work would dovetail very 
nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
context switching for data that we know to be cached. It also, as far as I can 
tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6487) Log WARN on large batch sizes

2014-04-15 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969501#comment-13969501
 ] 

Lyuben Todorov commented on CASSANDRA-6487:
---

Just noticed that we're actually already using the memory meter for checking 
batch size when it might get placed into the prepared statement cache, so why 
not log based on that value (calculated in 
{{BatchStatement#measureForPreparedCache}}). 

> Log WARN on large batch sizes
> -
>
> Key: CASSANDRA-6487
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Lyuben Todorov
>Priority: Minor
> Fix For: 2.0.8
>
> Attachments: 6487_trunk.patch, 6487_trunk_v2.patch, 
> cassandra-2.0-6487.diff
>
>
> Large batches on a coordinator can cause a lot of node stress. I propose 
> adding a WARN log entry if batch sizes go beyond a configurable size. This 
> will give more visibility to operators on something that can happen on the 
> developer side. 
> New yaml setting with 5k default.
> {{# Log WARN on any batch size exceeding this value. 5k by default.}}
> {{# Caution should be taken on increasing the size of this threshold as it 
> can lead to node instability.}}
> {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6755) Optimise CellName/Composite comparisons for NativeCell

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969500#comment-13969500
 ] 

Benedict commented on CASSANDRA-6755:
-

An ideal solution would probably be modelled on the util.FastByteOperations 
class

> Optimise CellName/Composite comparisons for NativeCell
> --
>
> Key: CASSANDRA-6755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6755
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6694, to reduce temporary garbage generation we 
> should minimise the incidence of CellName component extraction. The biggest 
> win will be to perform comparisons on Cell where possible, instead of 
> CellName, so that Native*Cell can use its extra information to avoid creating 
> any ByteBuffer objects



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6755) Optimise CellName/Composite comparisons for NativeCell

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6755:


Summary: Optimise CellName/Composite comparisons for NativeCell  (was: 
Minimise extraction of CellName components)

> Optimise CellName/Composite comparisons for NativeCell
> --
>
> Key: CASSANDRA-6755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6755
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 3.0
>
>
> As discussed in CASSANDRA-6694, to reduce temporary garbage generation we 
> should minimise the incidence of CellName component extraction. The biggest 
> win will be to perform comparisons on Cell where possible, instead of 
> CellName, so that Native*Cell can use its extra information to avoid creating 
> any ByteBuffer objects



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7039) DirectByteBuffer compatible LZ4 methods

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7039:


Fix Version/s: 3.0

> DirectByteBuffer compatible LZ4 methods
> ---
>
> Key: CASSANDRA-7039
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7039
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 3.0
>
>
> As we move more things off-heap, it's becoming more and more essential to be 
> able to use DirectByteBuffer (or native pointers) in various places. 
> Unfortunately LZ4 doesn't currently support this operation, despite being JNI 
> based - this means we both have to perform unnecessary copies to de/compress 
> data from DBB, but also we can stall GC as any JNI method operating over a 
> java array using the GetPrimitiveArrayCritical enters a critical section that 
> prevents GC for its duration. This means STWs will be at least as long any 
> running compression/decompression (and no GC will happen until they complete, 
> so it's additive).
> We should temporarily fork (and then resubmit upstream) jpountz-lz4 to 
> support operating over a native pointer, so that we can pass a DBB or a raw 
> pointer we have allocated ourselves. This will help improve performance when 
> flushing the new offheap memtables, as well as enable us to implement 
> CASSANDRA-6726 and finish CASSANDRA-4338.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-5020) Time to switch back to byte[] internally?

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-5020.
-

Resolution: Not a Problem

This has most likely become "not a problem" as a result of movement towards 
off-heap memtables + cells, which bring the overheads down as low as we can go 
with a per-cell data structure.

> Time to switch back to byte[] internally?
> -
>
> Key: CASSANDRA-5020
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5020
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.0
>
>
> We switched to ByteBuffer for column names and values back in 0.7, which gave 
> us a short term performance boost on mmap'd reads, but we gave that up when 
> we switched to refcounted sstables in 1.0.  (refcounting all the way up the 
> read path would be too painful, so we copy into an on-heap buffer when 
> reading from an sstable, then release the reference.)
> A HeapByteBuffer wastes a lot of memory compared to a byte[] (5 more ints, a 
> long, and a boolean).
> The hard problem here is how to do the arena allocation we do on writes, 
> which has been very successful in reducing STW CMS from heap fragmentation.  
> ByteBuffer is a good fit there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7039) DirectByteBuffer compatible LZ4 methods

2014-04-15 Thread Benedict (JIRA)
Benedict created CASSANDRA-7039:
---

 Summary: DirectByteBuffer compatible LZ4 methods
 Key: CASSANDRA-7039
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7039
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor


As we move more things off-heap, it's becoming more and more essential to be 
able to use DirectByteBuffer (or native pointers) in various places. 
Unfortunately LZ4 doesn't currently support this operation, despite being JNI 
based - this means we both have to perform unnecessary copies to de/compress 
data from DBB, but also we can stall GC as any JNI method operating over a java 
array using the GetPrimitiveArrayCritical enters a critical section that 
prevents GC for its duration. This means STWs will be at least as long any 
running compression/decompression (and no GC will happen until they complete, 
so it's additive).

We should temporarily fork (and then resubmit upstream) jpountz-lz4 to support 
operating over a native pointer, so that we can pass a DBB or a raw pointer we 
have allocated ourselves. This will help improve performance when flushing the 
new offheap memtables, as well as enable us to implement CASSANDRA-6726 and 
finish CASSANDRA-4338.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[1/2] git commit: Clean up IndexInfo on keyspace/table drops

2014-04-15 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/trunk 6e97178a5 -> fc4ae115a


Clean up IndexInfo on keyspace/table drops

patch by Sam Tunnicliffe; reviewed by Aleksey Yeschenko for
CASSANDRA-6924


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b69f5e36
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b69f5e36
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b69f5e36

Branch: refs/heads/trunk
Commit: b69f5e363b75543429a25b0909b45dff735c64b2
Parents: 6658a6e
Author: beobal 
Authored: Mon Apr 14 20:08:31 2014 +0100
Committer: Aleksey Yeschenko 
Committed: Tue Apr 15 15:17:58 2014 +0300

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/config/CFMetaData.java | 6 ++
 src/java/org/apache/cassandra/config/KSMetaData.java | 1 +
 3 files changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index d7c6e71..592eef9 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -44,6 +44,7 @@
  * Ensure safe resource cleanup when replacing sstables (CASSANDRA-6912)
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
+ * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/CFMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java 
b/src/java/org/apache/cassandra/config/CFMetaData.java
index e930de4..72a0fc5 100644
--- a/src/java/org/apache/cassandra/config/CFMetaData.java
+++ b/src/java/org/apache/cassandra/config/CFMetaData.java
@@ -1585,6 +1585,12 @@ public final class CFMetaData
 for (TriggerDefinition td : triggers.values())
 td.deleteFromSchema(mutation, cfName, timestamp);
 
+for (String indexName : 
Keyspace.open(this.ksName).getColumnFamilyStore(this.cfName).getBuiltIndexes())
+{
+ColumnFamily indexCf = mutation.addOrGet(IndexCf);
+
indexCf.addTombstone(indexCf.getComparator().makeCellName(indexName), ldt, 
timestamp);
+}
+
 return mutation;
 }
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/KSMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/KSMetaData.java 
b/src/java/org/apache/cassandra/config/KSMetaData.java
index 3d1edb6..d0cb613 100644
--- a/src/java/org/apache/cassandra/config/KSMetaData.java
+++ b/src/java/org/apache/cassandra/config/KSMetaData.java
@@ -242,6 +242,7 @@ public final class KSMetaData
 mutation.delete(SystemKeyspace.SCHEMA_COLUMNS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_TRIGGERS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_USER_TYPES_CF, timestamp);
+mutation.delete(SystemKeyspace.INDEX_CF, timestamp);
 
 return mutation;
 }



[2/2] git commit: Merge branch 'cassandra-2.1' into trunk

2014-04-15 Thread aleksey
Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fc4ae115
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fc4ae115
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fc4ae115

Branch: refs/heads/trunk
Commit: fc4ae115ac94b1599d308956590672eaca49e64d
Parents: 6e97178 b69f5e3
Author: Aleksey Yeschenko 
Authored: Tue Apr 15 15:23:12 2014 +0300
Committer: Aleksey Yeschenko 
Committed: Tue Apr 15 15:23:12 2014 +0300

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/config/CFMetaData.java | 6 ++
 src/java/org/apache/cassandra/config/KSMetaData.java | 1 +
 3 files changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc4ae115/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc4ae115/src/java/org/apache/cassandra/config/CFMetaData.java
--



git commit: Clean up IndexInfo on keyspace/table drops

2014-04-15 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 6658a6e03 -> b69f5e363


Clean up IndexInfo on keyspace/table drops

patch by Sam Tunnicliffe; reviewed by Aleksey Yeschenko for
CASSANDRA-6924


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b69f5e36
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b69f5e36
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b69f5e36

Branch: refs/heads/cassandra-2.1
Commit: b69f5e363b75543429a25b0909b45dff735c64b2
Parents: 6658a6e
Author: beobal 
Authored: Mon Apr 14 20:08:31 2014 +0100
Committer: Aleksey Yeschenko 
Committed: Tue Apr 15 15:17:58 2014 +0300

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/config/CFMetaData.java | 6 ++
 src/java/org/apache/cassandra/config/KSMetaData.java | 1 +
 3 files changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index d7c6e71..592eef9 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -44,6 +44,7 @@
  * Ensure safe resource cleanup when replacing sstables (CASSANDRA-6912)
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
+ * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/CFMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java 
b/src/java/org/apache/cassandra/config/CFMetaData.java
index e930de4..72a0fc5 100644
--- a/src/java/org/apache/cassandra/config/CFMetaData.java
+++ b/src/java/org/apache/cassandra/config/CFMetaData.java
@@ -1585,6 +1585,12 @@ public final class CFMetaData
 for (TriggerDefinition td : triggers.values())
 td.deleteFromSchema(mutation, cfName, timestamp);
 
+for (String indexName : 
Keyspace.open(this.ksName).getColumnFamilyStore(this.cfName).getBuiltIndexes())
+{
+ColumnFamily indexCf = mutation.addOrGet(IndexCf);
+
indexCf.addTombstone(indexCf.getComparator().makeCellName(indexName), ldt, 
timestamp);
+}
+
 return mutation;
 }
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/KSMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/KSMetaData.java 
b/src/java/org/apache/cassandra/config/KSMetaData.java
index 3d1edb6..d0cb613 100644
--- a/src/java/org/apache/cassandra/config/KSMetaData.java
+++ b/src/java/org/apache/cassandra/config/KSMetaData.java
@@ -242,6 +242,7 @@ public final class KSMetaData
 mutation.delete(SystemKeyspace.SCHEMA_COLUMNS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_TRIGGERS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_USER_TYPES_CF, timestamp);
+mutation.delete(SystemKeyspace.INDEX_CF, timestamp);
 
 return mutation;
 }



[jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969476#comment-13969476
 ] 

Benedict commented on CASSANDRA-4718:
-

[~jasobrown]: Could you upload the full stress outputs for these runs? And also 
try running a separate stress run with a fixed high threadcount and op count?

In particular for CQL, the results in the file are a little bit weird. That 
said, given their consistency for thrift I don't doubt the result is 
meaningful, but it would be good to understand what we're incorporating a bit 
better before committing.

> More-efficient ExecutorService for improved throughput
> --
>
> Key: CASSANDRA-4718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: performance
> Fix For: 2.1
>
> Attachments: 4718-v1.patch, PerThreadQueue.java, baq vs trunk.png, op 
> costs of various queues.ods, stress op rate with various queues.ods, 
> v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6924) Data Inserted Immediately After Secondary Index Creation is not Indexed

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-6924:
---

Attachment: 6924-2.1.txt

> Data Inserted Immediately After Secondary Index Creation is not Indexed
> ---
>
> Key: CASSANDRA-6924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6924
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Sam Tunnicliffe
> Fix For: 2.0.7
>
> Attachments: 6924-2.1.txt, repro.py
>
>
> The head of the cassandra-1.2 branch (currently 1.2.16-tentative) contains a 
> regression from 1.2.15.  Data that is inserted immediately after secondary 
> index creation may never get indexed.
> You can reproduce the issue with a [pycassa integration 
> test|https://github.com/pycassa/pycassa/blob/master/tests/test_autopacking.py#L793]
>  by running:
> {noformat}
> nosetests tests/test_autopacking.py:TestKeyValidators.test_get_indexed_slices
> {noformat}
> from the pycassa directory.
> The operation order goes like this:
> # create CF
> # create secondary index
> # insert data
> # query secondary index
> If a short sleep is added in between steps 2 and 3, the data gets indexed and 
> the query is successful.
> If a sleep is only added in between steps 3 and 4, some of the data is never 
> indexed and the query will return incomplete results.  This appears to be the 
> case even if the sleep is relatively long (30s), which makes me think the 
> data may never get indexed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6924) Data Inserted Immediately After Secondary Index Creation is not Indexed

2014-04-15 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969428#comment-13969428
 ] 

Sam Tunnicliffe commented on CASSANDRA-6924:


This doesn't seem like a regression as the repro script fails for me just as 
consistently on 1.2.15 as it does on later versions.

The issue appears to be that when a ks or cf is dropped, we don't update 
system.IndexInfo to remove the entry for the 2i. Then when the ks/cf & index 
are recreated, we treat the index creation not as a brand new index, but as if 
we're restarting and linking in an existing index to the cf. So we skip the 
buildIndexAsync call that we should make which is what causes some entries to 
never get indexed. 

Fixing this so that we do clean up IndexInfo leads to us running into 
CASSANDRA-5202 on pre-2.1 branches. On 2.1, we see the issues mentioned in 
CASSANDRA-6959 so as Sylvain suggests there, the test needs to be changed to 
wait for schema agreement. This can be acheived with a 1s wait, or by actively 
testing for agreement. Now that the buildIndexAsync call is happening on index 
initialisation, we can insert this wait in one of two places: between the index 
creation and the inserts, or between the inserts and the reads. I've updated 
the dtest accordingly and added another variant which drops just the cf, rather 
than the entire ks (https://github.com/riptano/cassandra-dtest/pull/40). I do 
still see the errors from {{CommitLogSegmentManager}} on 2.1 detailed on 
CASSANDRA-6959 even after applying the patch attached to that issue.

Likewise, using Tyler's original repro script, a 1s sleep before commencing the 
reads is now enough to ensure the run succeeds (on the 2.1 branch).

On trunk, I get completely different errors running both the dtest & repro.py, 
both with and without the IndexInfo fix:
{code}
ERROR [Thrift:1] 2014-04-14 15:45:10,714 CustomTThreadPoolServer.java:212 - 
Error occurred during processing of message.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: fromIndex(34) > toIndex(25)
at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411) 
~[main/:na]
at 
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:281)
 ~[main/:na]
at 
org.apache.cassandra.service.MigrationManager.announceColumnFamilyUpdate(MigrationManager.java:242)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.CreateIndexStatement.announceMigration(CreateIndexStatement.java:141)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:71)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:180)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:214) 
~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204) 
~[main/:na]
at 
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1973)
 ~[main/:na]
at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486)
 ~[thrift/:na]
at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470)
 ~[thrift/:na]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[libthrift-0.9.1.jar:0.9.1]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[libthrift-0.9.1.jar:0.9.1]
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
 ~[main/:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: fromIndex(34) > toIndex(25)
at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
~[na:1.7.0_51]
at java.util.concurrent.FutureTask.get(FutureTask.java:188) 
~[na:1.7.0_51]
at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407) 
~[main/:na]
... 16 common frames omitted
Caused by: java.lang.IllegalArgumentException: fromIndex(34) > toIndex(25)
at java.util.TimSort.rangeCheck(TimSort.java:921) ~[na:1.7.0_51]
at java.util.TimSort.sort(TimSort.java:182) ~[na:1.7.0_51]
at java.util.Arrays.sort(Arrays.java:727) ~[na:1.7.0_51]
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.sortCells(ArrayBackedSortedColumns.java:113)
 ~[main/:na]
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.maybeSortCells(ArrayBackedSortedColumns.j

[jira] [Commented] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969419#comment-13969419
 ] 

Benedict commented on CASSANDRA-7030:
-

bq. This leads to the CLHM not obeying its limits as readily as it is asked to

Confirmed that the problem I am seeing with concurrent execution (and that I 
would guess is leading to your test results) is down to CLHM. By replacing the 
CLHM with an AtomicReferenceArray to guarantee the bounds I get:

{noformat}
concurrent malloc:
Total Elapsed: 9.708s
Allocate Elapsed: 21.271s
Free Elapsed: 26.023s
Total Allocated: 62483Mb
Rate: 1.290Gb/s
Live Allocated: 1020Mb
VM total:117
vsz: 3149
rsz: 1280

synchronized malloc:
Total Elapsed: 36.526s
Allocate Elapsed: 134.114s
Free Elapsed: 128.416s
Total Allocated: 62483Mb
Rate: 0.232Gb/s
Live Allocated: 1020Mb
VM total:117
vsz: 3213
rsz: 1427

synchronized jemalloc:
Total Elapsed: 217.113s
Allocate Elapsed: 162.753s
Free Elapsed: 1531.215s
Total Allocated: 62483Mb
Rate: 0.036Gb/s
Live Allocated: 1020Mb
VM total:70
vsz: 4084
rsz: 1410
{noformat}

Can you rerun your test with either synchronised malloc, or with an 
AtomicReferenceArray instead of the CLHM, to confirm?

Note I have reverted my position back to "let's get rid of jemalloc" - without 
more evidence to the contrary: the test I was running that initiated the 
creation of this ticket was measuring elapsed time for both allocate() *and* 
free(), and I dropped the latter from the tests based on your benchmark because 
it's difficult to time the free() calls (as they live in the eviction 
listener). Now I am timing both, and you can see the real-elapsed time and 
per-CPU elapsed times are dramatically higher for jemalloc once both are 
included. The cost of calling free() appears to be disproportionately higher 
for jemalloc.

Note the throughput rate for jemalloc: 36Mb/s. This is really really pathetic!

> Remove JEMallocAllocator
> 
>
> Key: CASSANDRA-7030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 2.1 beta2
>
> Attachments: 7030.txt
>
>
> JEMalloc, whilst having some nice performance properties by comparison to 
> Doug Lea's standard malloc algorithm in principle, is pointless in practice 
> because of the JNA cost. In general it is around 30x more expensive to call 
> than unsafe.allocate(); malloc does not have a variability of response time 
> as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
> sensible idea. I doubt if custom JNI would make it worthwhile either.
> I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-3680) Add Support for Composite Secondary Indexes

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-3680:
---

Attachment: (was: 7038-1.2.txt)

> Add Support for Composite Secondary Indexes
> ---
>
> Key: CASSANDRA-3680
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Sylvain Lebresne
>  Labels: cql3, secondary_index
> Fix For: 1.2.0 beta 1
>
> Attachments: 0001-Secondary-indexes-on-composite-columns.txt
>
>
> CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
> differently, for efficiency and functionality secondary index api needs to be 
> altered to allow composite indexes.  
> I think this will require the IndexManager api to have a 
> maybeIndex(ByteBuffer column) method that SS can call and implement a 
> PerRowSecondaryIndex per column, break the composite into parts and index 
> specific bits, also including the base rowkey.
> Then a search against a TRANSPOSED row or DOCUMENT will be possible.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-3680) Add Support for Composite Secondary Indexes

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-3680:
---

Attachment: (was: 7038-2.1.txt)

> Add Support for Composite Secondary Indexes
> ---
>
> Key: CASSANDRA-3680
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Sylvain Lebresne
>  Labels: cql3, secondary_index
> Fix For: 1.2.0 beta 1
>
> Attachments: 0001-Secondary-indexes-on-composite-columns.txt
>
>
> CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
> differently, for efficiency and functionality secondary index api needs to be 
> altered to allow composite indexes.  
> I think this will require the IndexManager api to have a 
> maybeIndex(ByteBuffer column) method that SS can call and implement a 
> PerRowSecondaryIndex per column, break the composite into parts and index 
> specific bits, also including the base rowkey.
> Then a search against a TRANSPOSED row or DOCUMENT will be possible.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7038) Nodetool rebuild_index requires named indexes argument

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-7038:
---

Attachment: 7038-2.1.txt
7038-1.2.txt

> Nodetool rebuild_index requires named indexes argument
> --
>
> Key: CASSANDRA-7038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Trivial
> Attachments: 7038-1.2.txt, 7038-2.1.txt
>
>
> In addition to explicitly listing the indexes to be rebuilt, nodetool 
> rebuild_indexes will also accept just keyspace & columnfamily arguments, 
> indicating that all indexes for that ks/cf should be rebuilt.
> This doesn't actually work as CFS.rebuildSecondaryIndex requires the explicit 
> list. In the 2 arg version, nodetool just passes an empty list here and so 
> the rebuild becomes a no-op. As this has been the case since CASSANDRA-3860 
> (AFAICT, 80ea03f is the commit that removed this) we may as well just remove 
> the option from nodetool, patch attached to do that. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-3680) Add Support for Composite Secondary Indexes

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-3680:
---

Attachment: 7038-2.1.txt
7038-1.2.txt

> Add Support for Composite Secondary Indexes
> ---
>
> Key: CASSANDRA-3680
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Sylvain Lebresne
>  Labels: cql3, secondary_index
> Fix For: 1.2.0 beta 1
>
> Attachments: 0001-Secondary-indexes-on-composite-columns.txt
>
>
> CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
> differently, for efficiency and functionality secondary index api needs to be 
> altered to allow composite indexes.  
> I think this will require the IndexManager api to have a 
> maybeIndex(ByteBuffer column) method that SS can call and implement a 
> PerRowSecondaryIndex per column, break the composite into parts and index 
> specific bits, also including the base rowkey.
> Then a search against a TRANSPOSED row or DOCUMENT will be possible.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7038) Nodetool rebuild_index requires named indexes argument

2014-04-15 Thread Sam Tunnicliffe (JIRA)
Sam Tunnicliffe created CASSANDRA-7038:
--

 Summary: Nodetool rebuild_index requires named indexes argument
 Key: CASSANDRA-7038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7038
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Trivial


In addition to explicitly listing the indexes to be rebuilt, nodetool 
rebuild_indexes will also accept just keyspace & columnfamily arguments, 
indicating that all indexes for that ks/cf should be rebuilt.
This doesn't actually work as CFS.rebuildSecondaryIndex requires the explicit 
list. In the 2 arg version, nodetool just passes an empty list here and so the 
rebuild becomes a no-op. As this has been the case since CASSANDRA-3860 
(AFAICT, 80ea03f is the commit that removed this) we may as well just remove 
the option from nodetool, patch attached to do that. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reopened CASSANDRA-7030:
-


I think there's actually a couple of questions we should answer before closing 
the ticket:

1) without JNI, should we be supporting jemalloc (it is slower and has higher 
overheads in all comparable workloads we can test)?
2) should we be synchronising on malloc/free for jemalloc? Or do we simply hope 
the user has compiled jemalloc in a manner that avoids the issue?

> Remove JEMallocAllocator
> 
>
> Key: CASSANDRA-7030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 2.1 beta2
>
> Attachments: 7030.txt
>
>
> JEMalloc, whilst having some nice performance properties by comparison to 
> Doug Lea's standard malloc algorithm in principle, is pointless in practice 
> because of the JNA cost. In general it is around 30x more expensive to call 
> than unsafe.allocate(); malloc does not have a variability of response time 
> as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
> sensible idea. I doubt if custom JNI would make it worthwhile either.
> I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969377#comment-13969377
 ] 

Benedict commented on CASSANDRA-7030:
-

FTR, though, I think the problem with your test is that jemalloc is 
synchronised and malloc is not. This leads to the CLHM not obeying its limits 
as readily as it is asked to (seems to keep ~ 3x as much data around in my 
test):

{noformat}
concurrent malloc:
Elapsed: 55.433s
Allocated: 2973Mb
VM total:177
vsz: 6221
rsz: 4501

synchronized malloc:
Elapsed: 96.507s
Allocated: 1026Mb
VM total:187
vsz: 3341
rsz: 1681

synchronized jemalloc:
Elapsed: 263.686s
Allocated: 1027Mb
VM total:192
vsz: 3628
rsz: 1525
{noformat}

and for posterity, the code I was running:

{code}
public static void main(String[] args) throws InterruptedException, 
IOException
{
String pid = 
ManagementFactory.getRuntimeMXBean().getName().split("@")[0];
final IAllocator allocator = new NativeAllocator();
final AtomicLong total = new AtomicLong();
EvictionListener listener = new EvictionListener()
{
public void onEviction(UUID k, Memory mem)
{
total.addAndGet(-mem.size());
mem.free(allocator);
}
};

final Map map = new ConcurrentLinkedHashMap.Builder().weigher(Weighers. singleton())

.initialCapacity(8 * 65536).maximumWeightedCapacity(2 * 65536)

.listener(listener).build();
final AtomicLong elapsed = new AtomicLong();
final AtomicLong count = new AtomicLong();
final ExecutorService exec = Executors.newFixedThreadPool(8);
for (int i = 0 ; i < 8 ; i++)
{
final Random rand = new Random(i);
exec.execute(new Runnable()
{
public void run()
{
byte[] keyBytes = new byte[16];
for (int i = 0; i < 100; i++)
{
int size = rand.nextInt(128 * 128);
if (size <= 0)
continue;
rand.nextBytes(keyBytes);
long start = System.nanoTime();
Memory mem = new Memory(allocator, size);
elapsed.addAndGet(System.nanoTime() - start);
mem.setMemory(0, mem.size(), (byte) 2);
Memory r = map.put(UUID.nameUUIDFromBytes(keyBytes), 
mem);
if (r != null)
r.free();
total.addAndGet(size);
if (count.incrementAndGet() % 1000 == 0)
System.out.println("1M");
}
}
});
}

exec.shutdown();
exec.awaitTermination(1L, TimeUnit.HOURS);
System.out.println(String.format("Elapsed: %.3fs", elapsed.get() * 
0.1d));
System.out.println(String.format("Allocated: %.0fMb", total.get() / 
(double) (1 << 20)));
System.out.println(String.format("VM total:%.0f", 
Runtime.getRuntime().totalMemory() / (double) (1 << 20)));
memuse("vsz", pid);
memuse("rsz", pid);
Thread.sleep(100);
}

private static void memuse(String type, String pid) throws IOException
{
Process p = new ProcessBuilder().command("ps", "-o", type, 
pid).redirectErrorStream(true).start();
BufferedReader reader = new BufferedReader(new 
InputStreamReader(p.getInputStream()));
reader.readLine();
System.out.println(String.format("%s: %.0f", type, 
Integer.parseInt(reader.readLine()) / 1024d));
}
{code}

> Remove JEMallocAllocator
> 
>
> Key: CASSANDRA-7030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
>  Labels: performance
> Fix For: 2.1 beta2
>
> Attachments: 7030.txt
>
>
> JEMalloc, whilst having some nice performance properties by comparison to 
> Doug Lea's standard malloc algorithm in principle, is pointless in practice 
> because of the JNA cost. In general it is around 30x more expensive to call 
> than unsafe.allocate(); malloc does not have a variability of response time 
> as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
> sensible idea. I doubt if custom JNI would make it worthwhile either.
> I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.

2014-04-15 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969333#comment-13969333
 ] 

Marcus Eriksson commented on CASSANDRA-6696:


pushed a new version to 
https://github.com/krummas/cassandra/commits/marcuse/6696-3 which;

* adds nodetool command to rebalance data over disks so that user can do this 
whenever they want (like after manually adding sstables to the data directories)
* removes diskawarewriter from everything but streams and the rebalancing 
command
* makes the flush executor an array of executors.
* splits ranges based on total partitioner range and makes this feature 
vnodes-only
* supports the old way of doing things for non-vnodes setup (and ordered 
partitioners)

there are still some of my config-changes left in as i bet there will be more 
comments on this

> Drive replacement in JBOD can cause data to reappear. 
> --
>
> Key: CASSANDRA-6696
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6696
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: sankalp kohli
>Assignee: Marcus Eriksson
> Fix For: 3.0
>
>
> In JBOD, when someone gets a bad drive, the bad drive is replaced with a new 
> empty one and repair is run. 
> This can cause deleted data to come back in some cases. Also this is true for 
> corrupt stables in which we delete the corrupt stable and run repair. 
> Here is an example:
> Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. 
> row=sankalp col=sankalp is written 20 days back and successfully went to all 
> three nodes. 
> Then a delete/tombstone was written successfully for the same row column 15 
> days back. 
> Since this tombstone is more than gc grace, it got compacted in Nodes A and B 
> since it got compacted with the actual data. So there is no trace of this row 
> column in node A and B.
> Now in node C, say the original data is in drive1 and tombstone is in drive2. 
> Compaction has not yet reclaimed the data and tombstone.  
> Drive2 becomes corrupt and was replaced with new empty drive. 
> Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp 
> has come back to life. 
> Now after replacing the drive we run repair. This data will be propagated to 
> all nodes. 
> Note: This is still a problem even if we run repair every gc grace. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)