date:20150327

[
https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383404#comment-14383404
]

Aleksey Yeschenko commented on CASSANDRA-9048:
--

We've just recently committed CASSANDRA-8225 that should increase {{COPY}}
performance roughly 10x over CASSANDRA-7405, which was another 10x improvement
over {{COPY}} in 2.0.

That's a total improvement of ~100x between 2.0 and 2.1.4. We'll have 2.1.4
released soon. I believe it should be sufficiently fast for now, until the new
Spark-based loader arrives. But if it isn't, there is at least a few more lhf x
left there if we add prepared statements there w/ single-partition batching and
use more than one node.

bq. Hmm. My expectation was slightly different – I was looking for Spark to
handle Cassandra-to-Cassandra transformations (CREATE TABLE AS, INSERT INTO ...
SELECT FROM) and for COPY to continue simple load-from-file operations. Unless
we get file parsing for free via Spark somehow (I don't think we do) then I
don't see that as a particularly natural fit.

I'm looking for Spark there for both as an input backend. We get it all for
free - https://github.com/databricks/spark-csv and
https://github.com/datastax/spark-cassandra-connector

Delimited File Bulk Loader
--

Key: CASSANDRA-9048
URL: https://issues.apache.org/jira/browse/CASSANDRA-9048
Project: Cassandra
Issue Type: Improvement
Components: Tools
Reporter: Brian Hess
Fix For: 3.0

Attachments: CASSANDRA-9048.patch

There is a strong need for bulk loading data from delimited files into
Cassandra. Starting with delimited files means that the data is not
currently in the SSTable format, and therefore cannot immediately leverage
Cassandra's bulk loading tool, sstableloader, directly.
A tool supporting delimited files much closer matches the format of the data
more often than the SSTable format itself, and a tool that loads from
delimited files is very useful.
In order for this bulk loader to be more generally useful to customers, it
should handle a number of options at a minimum:
- support specifying the input file or to read the data from stdin (so other
command-line programs can pipe into the loader)
- supply the CQL schema for the input data
- support all data types other than collections (collections is a stretch
goal/need)
- an option to specify the delimiter
- an option to specify comma as the decimal delimiter (for international use
casese)
- an option to specify how NULL values are specified in the file (e.g., the
empty string or the string NULL)
- an option to specify how BOOLEAN values are specified in the file (e.g.,
TRUE/FALSE or 0/1)
- an option to specify the Date and Time format
- an option to skip some number of rows at the beginning of the file
- an option to only read in some number of rows from the file
- an option to indicate how many parse errors to tolerate
- an option to specify a file that will contain all the lines that did not
parse correctly (up to the maximum number of parse errors)
- an option to specify the CQL port to connect to (with 9042 as the default).
Additional options would be useful, but this set of options/features is a
start.
A word on COPY. COPY comes via CQLSH which requires the client to be the
same version as the server (e.g., 2.0 CQLSH does not work with 2.1 Cassandra,
etc). This tool should be able to connect to any version of Cassandra
(within reason). For example, it should be able to handle 2.0.x and 2.1.x.
Moreover, CQLSH's COPY command does not support a number of the options
above. Lastly, the performance of COPY in 2.0.x is not high enough to be
considered a bulk ingest tool.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9048) Delimited File Bulk Loader

[
https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383436#comment-14383436
]

Aleksey Yeschenko commented on CASSANDRA-9048:
--

bq. Spark certainly also makes sense for the CREATE-TABLE-AS-SELECT (which is
what CASSANDRA-8234 is about, not about loading).

That's how the ticket is currently worded, but it's meant to do more. Perhaps I
should open a new one, to detail everything explicitly.

bq. Aleksey Yeschenko - The use case here is that a client machine has a pile
of delimited files that need to be loaded in bulk to Cassandra - a common use
case we see. In the Spark-based tool, you would have to have Spark on the
client (perhaps that's okay) but moreover it would be reading files from the
local filesystem (or stdin), not from a distributed file system, so there would
be no parallelism from Spark.

Spark will be bundled with C* itself in a near version. Running the tool in
Spark local mode would be the equivalent of today's COPY.

Delimited File Bulk Loader
--

Key: CASSANDRA-9048
URL: https://issues.apache.org/jira/browse/CASSANDRA-9048
Project: Cassandra
Issue Type: Improvement
Components: Tools
Reporter: Brian Hess
Fix For: 3.0

Attachments: CASSANDRA-9048.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9034) AssertionError in SizeEstimatesRecorder


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383449#comment-14383449
 ] 

Aleksey Yeschenko commented on CASSANDRA-9034:
--

bq. Aleksey Yeschenko if you are happy could you take care of committing please?

Doubt I'll ever be happy, so let's not wait for that. Committed to both 2.1 and 
3.0, as {{bd48424}}, thanks.

 AssertionError in SizeEstimatesRecorder
 ---

 Key: CASSANDRA-9034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9034
 Project: Cassandra
  Issue Type: Bug
 Environment: Trunk (52ddfe412a)
Reporter: Stefania
Assignee: Carl Yeksigian
Priority: Minor
 Fix For: 3.0

 Attachments: 9034-trunk.txt


 One of the dtests of CASSANDRA-8236 
 (https://github.com/stef1927/cassandra-dtest/tree/8236) raises the following 
 exception unless I set {{-Dcassandra.size_recorder_interval=0}}:
 {code}
 ERROR [OptionalTasks:1] 2015-03-25 12:58:47,015 CassandraDaemon.java:179 - 
 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.AssertionError: null
 at 
 org.apache.cassandra.service.StorageService.getLocalTokens(StorageService.java:2235)
  ~[main/:na]
 at 
 org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:61)
  ~[main/:na]
 at 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:82)
  ~[main/:na]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 [na:1.7.0_76]
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
 [na:1.7.0_76]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
  [na:1.7.0_76]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  [na:1.7.0_76]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_76]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_76]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_76]
 INFO  [RMI TCP Connection(2)-127.0.0.1] 2015-03-25 12:59:23,189 
 StorageService.java:863 - Joining ring by operator request
 {code}
 The test is {{start_node_without_join_test}} in 
 _pushed_notifications_test.py_ but starting a node that won't join the ring 
 might be sufficient to reproduce the exception (I haven't tried though).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[1/2] cassandra git commit: Prevent AssertionError from SizeEstimatesRecorder

2015-03-27 Thread aleksey

Repository: cassandra
Updated Branches:
  refs/heads/trunk 5d9574fc0 - 04f351d57


Prevent AssertionError from SizeEstimatesRecorder

patch by Carl Yeksigian; reviewed by Stefania Alborghetti for
CASSANDRA-9034


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bd484241
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bd484241
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bd484241

Branch: refs/heads/trunk
Commit: bd4842410e73574dff8f3a51bd95e414f76ed506
Parents: b66092f
Author: Carl Yeksigian c...@apache.org
Authored: Thu Mar 26 14:58:46 2015 -0400
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Mar 27 10:08:54 2015 +0300

--
 CHANGES.txt | 1 +
 src/java/org/apache/cassandra/db/SizeEstimatesRecorder.java | 6 ++
 2 files changed, 7 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd484241/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index dba397c..3f5571e 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.4
+ * Prevent AssertionError from SizeEstimatesRecorder (CASSANDRA-9034)
  * Avoid overwriting index summaries for sstables with an older format that
does not support downsampling; rebuild summaries on startup when this
is detected (CASSANDRA-8993)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd484241/src/java/org/apache/cassandra/db/SizeEstimatesRecorder.java
--
diff --git a/src/java/org/apache/cassandra/db/SizeEstimatesRecorder.java 
b/src/java/org/apache/cassandra/db/SizeEstimatesRecorder.java
index 1472c11..13d9c60 100644
--- a/src/java/org/apache/cassandra/db/SizeEstimatesRecorder.java
+++ b/src/java/org/apache/cassandra/db/SizeEstimatesRecorder.java
@@ -55,6 +55,12 @@ public class SizeEstimatesRecorder extends MigrationListener 
implements Runnable
 
 public void run()
 {
+if (StorageService.instance.isStarting())
+{
+logger.debug(Node has not yet joined; not recording size 
estimates);
+return;
+}
+
 logger.debug(Recording size estimates);
 
 // find primary token ranges for the local node.

[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk

2015-03-27 Thread aleksey

Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/04f351d5
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/04f351d5
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/04f351d5

Branch: refs/heads/trunk
Commit: 04f351d57422b91dd1be3822fa28a3220a42056a
Parents: 5d9574f bd48424
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Mar 27 10:11:31 2015 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Mar 27 10:11:31 2015 +0300

--
 CHANGES.txt | 1 +
 src/java/org/apache/cassandra/db/SizeEstimatesRecorder.java | 6 ++
 2 files changed, 7 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/04f351d5/CHANGES.txt
--
diff --cc CHANGES.txt
index 1780249,3f5571e..4a25079
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,83 -1,5 +1,84 @@@
 +3.0
 + * Compressed Commit Log (CASSANDRA-6809)
 + * Optimise IntervalTree (CASSANDRA-8988)
 + * Add a key-value payload for third party usage (CASSANDRA-8553)
 + * Bump metrics-reporter-config dependency for metrics 3.0 (CASSANDRA-8149)
 + * Partition intra-cluster message streams by size, not type (CASSANDRA-8789)
 + * Add WriteFailureException to native protocol, notify coordinator of
 +   write failures (CASSANDRA-8592)
 + * Convert SequentialWriter to nio (CASSANDRA-8709)
 + * Add role based access control (CASSANDRA-7653, 8650, 7216, 8760, 8849, 
8761, 8850)
 + * Record client ip address in tracing sessions (CASSANDRA-8162)
 + * Indicate partition key columns in response metadata for prepared
 +   statements (CASSANDRA-7660)
 + * Merge UUIDType and TimeUUIDType parse logic (CASSANDRA-8759)
 + * Avoid memory allocation when searching index summary (CASSANDRA-8793)
 + * Optimise (Time)?UUIDType Comparisons (CASSANDRA-8730)
 + * Make CRC32Ex into a separate maven dependency (CASSANDRA-8836)
 + * Use preloaded jemalloc w/ Unsafe (CASSANDRA-8714)
 + * Avoid accessing partitioner through StorageProxy (CASSANDRA-8244, 8268)
 + * Upgrade Metrics library and remove depricated metrics (CASSANDRA-5657)
 + * Serializing Row cache alternative, fully off heap (CASSANDRA-7438)
 + * Duplicate rows returned when in clause has repeated values (CASSANDRA-6707)
 + * Make CassandraException unchecked, extend RuntimeException (CASSANDRA-8560)
 + * Support direct buffer decompression for reads (CASSANDRA-8464)
 + * DirectByteBuffer compatible LZ4 methods (CASSANDRA-7039)
 + * Group sstables for anticompaction correctly (CASSANDRA-8578)
 + * Add ReadFailureException to native protocol, respond
 +   immediately when replicas encounter errors while handling
 +   a read request (CASSANDRA-7886)
 + * Switch CommitLogSegment from RandomAccessFile to nio (CASSANDRA-8308)
 + * Allow mixing token and partition key restrictions (CASSANDRA-7016)
 + * Support index key/value entries on map collections (CASSANDRA-8473)
 + * Modernize schema tables (CASSANDRA-8261)
 + * Support for user-defined aggregation functions (CASSANDRA-8053)
 + * Fix NPE in SelectStatement with empty IN values (CASSANDRA-8419)
 + * Refactor SelectStatement, return IN results in natural order instead
 +   of IN value list order and ignore duplicate values in partition key IN 
restrictions (CASSANDRA-7981)
 + * Support UDTs, tuples, and collections in user-defined
 +   functions (CASSANDRA-7563)
 + * Fix aggregate fn results on empty selection, result column name,
 +   and cqlsh parsing (CASSANDRA-8229)
 + * Mark sstables as repaired after full repair (CASSANDRA-7586)
 + * Extend Descriptor to include a format value and refactor reader/writer
 +   APIs (CASSANDRA-7443)
 + * Integrate JMH for microbenchmarks (CASSANDRA-8151)
 + * Keep sstable levels when bootstrapping (CASSANDRA-7460)
 + * Add Sigar library and perform basic OS settings check on startup 
(CASSANDRA-7838)
 + * Support for aggregation functions (CASSANDRA-4914)
 + * Remove cassandra-cli (CASSANDRA-7920)
 + * Accept dollar quoted strings in CQL (CASSANDRA-7769)
 + * Make assassinate a first class command (CASSANDRA-7935)
 + * Support IN clause on any partition key column (CASSANDRA-7855)
 + * Support IN clause on any clustering column (CASSANDRA-4762)
 + * Improve compaction logging (CASSANDRA-7818)
 + * Remove YamlFileNetworkTopologySnitch (CASSANDRA-7917)
 + * Do anticompaction in groups (CASSANDRA-6851)
 + * Support user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 7781, 
7929,
 +   7924, 7812, 8063, 7813, 7708)
 + * Permit configurable timestamps with cassandra-stress (CASSANDRA-7416)
 + * Move sstable RandomAccessReader to nio2, which allows using the
 +   FILE_SHARE_DELETE flag on Windows (CASSANDRA-4050)
 + * Remove CQL2

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows

2015-03-27 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383541#comment-14383541
 ] 

Marcus Eriksson commented on CASSANDRA-9045:


[~r0mant] did you see the compacting large row-message for the row you 
deleted in cqlsh.txt between 18:07 and 19:39?


 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.14

 Attachments: cqlsh.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8717) Top-k queries with custom secondary indexes


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-8717:
-
Reviewer: Aleksey Yeschenko

 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9036) disk full when running cleanup (on a far from full disk)

2015-03-27 Thread Erik Forsberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383628#comment-14383628
 ] 

Erik Forsberg commented on CASSANDRA-9036:
--

After applying patch:

{noformat}
 INFO [CompactionExecutor:12] 2015-03-27 10:16:38,930 CompactionManager.java 
(line 564) Cleaning up 
SSTableReader(path='/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db')
DEBUG [CompactionExecutor:12] 2015-03-27 10:16:39,423 Directories.java (line 
265) removing candidate /cassandra, usable=732825808896, requested=933404582552
ERROR [CompactionExecutor:12] 2015-03-27 10:16:39,424 CassandraDaemon.java 
(line 199) Exception in thread Thread[CompactionExecutor:12,1,main]
java.io.IOException: disk full
at 
org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:567)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:63)
at 
org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:281)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:225)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

The number it reports as usable corresponds quite well with output from df:

{noformat}
# df /cassandra
Filesystem  1K-blocks   Used Available Use% Mounted on
/dev/sda7  1893666392 1178016188 715650204  63% /cassandra
{noformat}

The number it reports as requested doesn't at all correspond with the actual 
file size: 

{noformat}
# ls -l 
/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db 
-rw-r--r-- 1 cassandra cassandra 234667877465 Mar 21 04:42 
/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db
{noformat}

The file is compressed, we're using DeflateCompressor:

{noformat}
# sstablemetadata 
/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db|grep 
Compr
Compression ratio: 0.21589549046598225
{noformat}

No quota. Filesystem is XFS. 

Is the estimation of space needed for compaction taking compression into 
account? 

 disk full when running cleanup (on a far from full disk)
 --

 Key: CASSANDRA-9036
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9036
 Project: Cassandra
  Issue Type: Bug
Reporter: Erik Forsberg
Assignee: Robert Stupp

 I'm trying to run cleanup, but get this:
 {noformat}
  INFO [CompactionExecutor:18] 2015-03-25 10:29:16,355 CompactionManager.java 
 (line 564) Cleaning up 
 SSTableReader(path='/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db')
 ERROR [CompactionExecutor:18] 2015-03-25 10:29:16,664 CassandraDaemon.java 
 (line 199) Exception in thread Thread[CompactionExecutor:18,1,main]
 java.io.IOException: disk full
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:567)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:63)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:281)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:225)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Now that's odd, since:
 * Disk has some 680G left
 * The sstable it's trying to cleanup is far less than 680G:
 {noformat}
 # ls -lh *4345750*
 -rw-r--r-- 1 cassandra cassandra  64M Mar 21 04:42 
 production-Data_daily-jb-4345750-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 219G Mar 21 04:42 
 production-Data_daily-jb-4345750-Data.db
 -rw-r--r-- 1 cassandra cassandra 503M Mar 21 04:42 
 production-Data_daily-jb-4345750-Filter.db
 -rw-r--r-- 1 cassandra cassandra  42G Mar 21 04:42 
 production-Data_daily-jb-4345750-Index.db
 -rw-r--r-- 1 cassandra cassandra 5.9K Mar 21 04:42 
 production-Data_daily-jb-4345750-Statistics.db
 -rw-r--r-- 1 cassandra cassandra  81M Mar 21 04:42 
 production-Data_daily-jb-4345750-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Mar 21 04:42 
 production-Data_daily-jb-4345750-TOC.txt
 {noformat}
 Sure, it's large, but it's not 680G. 
 No other compactions are running on that server. I'm getting this on 12 / 56 
 servers right now.

[jira] [Commented] (CASSANDRA-8808) CQLSSTableWriter: close does not work + more than one table throws ex

2015-03-27 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383660#comment-14383660
 ] 

Sebastian YEPES FERNANDEZ commented on CASSANDRA-8808:
--

For anyone interested, I have just created a related issue that was introduced 
in 2.1.3
https://issues.apache.org/jira/browse/CASSANDRA-9052


 CQLSSTableWriter: close does not work + more than one table throws ex
 -

 Key: CASSANDRA-8808
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8808
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sebastian YEPES FERNANDEZ
Assignee: Benjamin Lerer
  Labels: cql
 Fix For: 2.1.4, 2.0.14

 Attachments: CASSANDRA-8808-2.0-V2.txt, CASSANDRA-8808-2.0.txt, 
 CASSANDRA-8808-2.1-V2.txt, CASSANDRA-8808-2.1.txt, 
 CASSANDRA-8808-trunk-V2.txt, CASSANDRA-8808-trunk.txt


 I have encountered the following two issues:
  - When closing the CQLSSTableWriter it just hangs the process and does 
 nothing. (https://issues.apache.org/jira/browse/CASSANDRA-8281)
  - When writing more than one table throws ex. 
 (https://issues.apache.org/jira/browse/CASSANDRA-8251)
 These issue can be reproduced with the following code:
 {code:title=test.java|borderStyle=solid}
 import org.apache.cassandra.config.Config;
 import org.apache.cassandra.io.sstable.CQLSSTableWriter;
 public static void main(String[] args) {
   Config.setClientMode(true);
   CQLSSTableWriter w1 = CQLSSTableWriter.builder()
 .inDirectory(/tmp/kspc/t1)
 .forTable(CREATE TABLE kspc.t1 ( id  int, PRIMARY KEY (id));)
 .using(INSERT INTO kspc.t1 (id) VALUES ( ? );)
 .build();
   CQLSSTableWriter w2 = CQLSSTableWriter.builder()
 .inDirectory(/tmp/kspc/t2)
 .forTable(CREATE TABLE kspc.t2 ( id  int, PRIMARY KEY (id));)
 .using(INSERT INTO kspc.t2 (id) VALUES ( ? );)
 .build();
   try {
 w1.addRow(1);
 w2.addRow(1);
 w1.close();
 w2.close();
   } catch (Exception e) {
 System.out.println(e);
   }
 }
 {code}
 {code:title=The error|borderStyle=solid}
 Exception in thread main java.lang.ExceptionInInitializerError
 at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:324)
 at org.apache.cassandra.db.Keyspace.init(Keyspace.java:277)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:96)
 at 
 org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:101)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.rawAddRow(CQLSSTableWriter.java:226)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:145)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:120)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
 at 
 org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
 at 
 com.allthingsmonitoring.utils.BulkDataLoader.main(BulkDataLoader.groovy:415)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.cassandra.config.DatabaseDescriptor.getFlushWriters(DatabaseDescriptor.java:1053)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.clinit(ColumnFamilyStore.java:85)
 ... 18 more
 {code}
 I have just tested the in the cassandra-2.1 branch and the issue still 
 persists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8180) Optimize disk seek using min/max colunm name meta data when the LIMIT clause is used

2015-03-27 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383642#comment-14383642
 ] 

Stefania commented on CASSANDRA-8180:
-

Using {{MergeIterator}} is a great idea, I still have some details to iron out 
but it's already looking much better.

I have one question : the iterator is over atoms but the sstable min and max 
column names are lists of ByteBuffer, which I can compare with atoms using the 
ClusteringComparator but it would be nice to convert the lower bound to an 
atom, so we can  have only one generic type (the {{In}} type) in the 
MergeIterator specialization, which must feed atoms upstream. Is there a way to 
do this or do I just have to settle for having two different (comparable) types 
in MergeIterator?


 Optimize disk seek using min/max colunm name meta data when the LIMIT clause 
 is used
 

 Key: CASSANDRA-8180
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8180
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Cassandra 2.0.10
Reporter: DOAN DuyHai
Assignee: Stefania
Priority: Minor
 Fix For: 3.0


 I was working on an example of sensor data table (timeseries) and face a use 
 case where C* does not optimize read on disk.
 {code}
 cqlsh:test CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col)) 
 WITH CLUSTERING ORDER BY (col DESC);
 cqlsh:test INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10');
 ...
 nodetool flush test test
 ...
 cqlsh:test INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20');
 ...
 nodetool flush test test
 ...
 cqlsh:test INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30');
 ...
 nodetool flush test test
 {code}
 After that, I activate request tracing:
 {code}
 cqlsh:test SELECT * FROM test WHERE id=1 LIMIT 1;
  activity  | 
 timestamp| source| source_elapsed
 ---+--+---+
 execute_cql3_query | 
 23:48:46,498 | 127.0.0.1 |  0
 Parsing SELECT * FROM test WHERE id=1 LIMIT 1; | 
 23:48:46,498 | 127.0.0.1 | 74
Preparing statement | 
 23:48:46,499 | 127.0.0.1 |253
   Executing single-partition query on test | 
 23:48:46,499 | 127.0.0.1 |930
   Acquiring sstable references | 
 23:48:46,499 | 127.0.0.1 |943
Merging memtable tombstones | 
 23:48:46,499 | 127.0.0.1 |   1032
Key cache hit for sstable 3 | 
 23:48:46,500 | 127.0.0.1 |   1160
Seeking to partition beginning in data file | 
 23:48:46,500 | 127.0.0.1 |   1173
Key cache hit for sstable 2 | 
 23:48:46,500 | 127.0.0.1 |   1889
Seeking to partition beginning in data file | 
 23:48:46,500 | 127.0.0.1 |   1901
Key cache hit for sstable 1 | 
 23:48:46,501 | 127.0.0.1 |   2373
Seeking to partition beginning in data file | 
 23:48:46,501 | 127.0.0.1 |   2384
  Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 
 23:48:46,501 | 127.0.0.1 |   2768
 Merging data from memtables and 3 sstables | 
 23:48:46,501 | 127.0.0.1 |   2784
 Read 2 live and 0 tombstoned cells | 
 23:48:46,501 | 127.0.0.1 |   2976
   Request complete | 
 23:48:46,501 | 127.0.0.1 |   3551
 {code}
 We can clearly see that C* hits 3 SSTables on disk instead of just one, 
 although it has the min/max column meta data to decide which SSTable contains 
 the most recent data.
 Funny enough, if we add a clause on the clustering column to the select, this 
 time C* optimizes the read path:
 {code}
 cqlsh:test SELECT * FROM test WHERE id=1 AND col  25 LIMIT 1;
  activity  | 
 timestamp| source| source_elapsed
 ---+--+---+
 execute_cql3_query | 
 23:52:31,888 | 127.0.0.1 |  0

[jira] [Commented] (CASSANDRA-8180) Optimize disk seek using min/max colunm name meta data when the LIMIT clause is used


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383652#comment-14383652
 ] 

Sylvain Lebresne commented on CASSANDRA-8180:
-

bq. it would be nice to convert the lower bound to an atom

I suspect you don't need to have an Atom, only a Clusterable, in which case 
you can convert the lower bound to a Clustering with something like {{new 
SimpleClustering(sstable.minClusteringValues.toArray(new 
ByteBuffer\[metadata.clusteringColumns().size()\]))}}.

 Optimize disk seek using min/max colunm name meta data when the LIMIT clause 
 is used
 

 Key: CASSANDRA-8180
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8180
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Cassandra 2.0.10
Reporter: DOAN DuyHai
Assignee: Stefania
Priority: Minor
 Fix For: 3.0


 I was working on an example of sensor data table (timeseries) and face a use 
 case where C* does not optimize read on disk.
 {code}
 cqlsh:test CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col)) 
 WITH CLUSTERING ORDER BY (col DESC);
 cqlsh:test INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10');
 ...
 nodetool flush test test
 ...
 cqlsh:test INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20');
 ...
 nodetool flush test test
 ...
 cqlsh:test INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30');
 ...
 nodetool flush test test
 {code}
 After that, I activate request tracing:
 {code}
 cqlsh:test SELECT * FROM test WHERE id=1 LIMIT 1;
  activity  | 
 timestamp| source| source_elapsed
 ---+--+---+
 execute_cql3_query | 
 23:48:46,498 | 127.0.0.1 |  0
 Parsing SELECT * FROM test WHERE id=1 LIMIT 1; | 
 23:48:46,498 | 127.0.0.1 | 74
Preparing statement | 
 23:48:46,499 | 127.0.0.1 |253
   Executing single-partition query on test | 
 23:48:46,499 | 127.0.0.1 |930
   Acquiring sstable references | 
 23:48:46,499 | 127.0.0.1 |943
Merging memtable tombstones | 
 23:48:46,499 | 127.0.0.1 |   1032
Key cache hit for sstable 3 | 
 23:48:46,500 | 127.0.0.1 |   1160
Seeking to partition beginning in data file | 
 23:48:46,500 | 127.0.0.1 |   1173
Key cache hit for sstable 2 | 
 23:48:46,500 | 127.0.0.1 |   1889
Seeking to partition beginning in data file | 
 23:48:46,500 | 127.0.0.1 |   1901
Key cache hit for sstable 1 | 
 23:48:46,501 | 127.0.0.1 |   2373
Seeking to partition beginning in data file | 
 23:48:46,501 | 127.0.0.1 |   2384
  Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 
 23:48:46,501 | 127.0.0.1 |   2768
 Merging data from memtables and 3 sstables | 
 23:48:46,501 | 127.0.0.1 |   2784
 Read 2 live and 0 tombstoned cells | 
 23:48:46,501 | 127.0.0.1 |   2976
   Request complete | 
 23:48:46,501 | 127.0.0.1 |   3551
 {code}
 We can clearly see that C* hits 3 SSTables on disk instead of just one, 
 although it has the min/max column meta data to decide which SSTable contains 
 the most recent data.
 Funny enough, if we add a clause on the clustering column to the select, this 
 time C* optimizes the read path:
 {code}
 cqlsh:test SELECT * FROM test WHERE id=1 AND col  25 LIMIT 1;
  activity  | 
 timestamp| source| source_elapsed
 ---+--+---+
 execute_cql3_query | 
 23:52:31,888 | 127.0.0.1 |  0
Parsing SELECT * FROM test WHERE id=1 AND col  25 LIMIT 1; | 
 23:52:31,888 | 127.0.0.1 | 60
Preparing statement | 
 23:52:31,888 | 127.0.0.1 |277

[jira] [Created] (CASSANDRA-9052) CQLSSTableWriter close does not work - Regression bug: CASSANDRA-8281

2015-03-27 Thread Sebastian YEPES FERNANDEZ (JIRA)

Sebastian YEPES FERNANDEZ created CASSANDRA-9052:


 Summary: CQLSSTableWriter close does not work - Regression bug: 
CASSANDRA-8281
 Key: CASSANDRA-9052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9052
 Project: Cassandra
  Issue Type: Bug
 Environment: cassandra-all:2.1.2
cassandra-all:2.1.3
Reporter: Sebastian YEPES FERNANDEZ


Hello,

I have just noticed that the last C* version 2.1.3 reintroduced an old bug 
CASSANDRA-8281.
When closing the CQLSSTableWriter after adding rows it generated the following 
Exception:

{code:title=Exception|borderStyle=solid}
Exception in thread main java.lang.ExceptionInInitializerError
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:324)
at org.apache.cassandra.db.Keyspace.init(Keyspace.java:277)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:96)
at 
org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:101)
at 
org.apache.cassandra.io.sstable.CQLSSTableWriter.rawAddRow(CQLSSTableWriter.java:225)
at 
org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:144)
at 
org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:122)
at BulkDataLoader.main(BulkDataLoader.groovy:26)
Caused by: java.lang.NullPointerException
at 
org.apache.cassandra.config.DatabaseDescriptor.getFlushWriters(DatabaseDescriptor.java:1053)
at 
org.apache.cassandra.db.ColumnFamilyStore.clinit(ColumnFamilyStore.java:85)
... 18 more
{code}


Note that is this works correctly in 2.1.2 and not with 2.1.3, we can reproduce 
this issue with the following code:

{code:title=test.java|borderStyle=solid}
import org.apache.cassandra.config.Config;
import org.apache.cassandra.io.sstable.CQLSSTableWriter;

public static void main(String[] args) {
  Config.setClientMode(true);

  // These folders needs must exist: mkdir -p /tmp/kspc/t1
  CQLSSTableWriter w1 = CQLSSTableWriter.builder()
.inDirectory(/tmp/kspc/t1)
.forTable(CREATE TABLE kspc.t1 ( id  int, PRIMARY KEY (id));)
.using(INSERT INTO kspc.t1 (id) VALUES ( ? );)
.build();

  try {
w1.addRow(1);
w1.close();
  } catch (Exception e) {
System.out.println(e);
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383613#comment-14383613
 ] 

Aleksey Yeschenko commented on CASSANDRA-8717:
--

On second thought, this looks reasonable enough for at least 3.0 inclusion - 
especially if this eventually allows you guys to get rid of that C* fork.

Still, I want to hear from [~slebresne] and [~beobal], the latter planning to 
do some C* API refactoring for a while, before proceeding.

 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383633#comment-14383633
 ] 

Benjamin Lerer commented on CASSANDRA-8099:
---

As the patch is relatively large I have choose to split my review of the CQL 
layer into chunks and give my comments for each chunk as soon as I have 
finished reviewing it. I think it will make the things more manageable for 
Sylvain and me.

For the first chunk I focused on the {{restrictions}}:
* I am not a big fan of big class hierachies but I wonder if it will not be 
better to have two sub-classes for {{PrimaryKeyRestrictionSet}} one for the 
partition key and one for the clustering columns rather than having a boolean 
variable.
* In {{PrimaryKeyRestrictionSet}} the method {{addColumnFilterTo}} can be 
simplified based on the fact that we know if the restrictions are on the 
partition key components or on the clustering key columns.
* The {{AbstractPrimaryKeyRestrictions.toByteBuffers}} method can be moved down 
as it is only used in {{PrimaryKeyRestrictionSet}} 
* In {{MultiColumnRestriction}} the method {{isPartitionKey()}} is not used (in 
case you have forgotten: {{MultiColumnRestriction}} only apply to clustering 
key columns). 
* I understand why you renamed {{?Restriction.Slice}} to 
{{?Restriction.SliceRestriction}} but now the class names look a bit 
inconsistent. May be we should rename the other classes too.
* In {{ColumnFilter}} the {{add(Expression expression)}} method is not used.
* In {{Operator}} the {{reverse}} method is not needed anymore and can be 
removed.
* In {{StatementRestrictions}} I do not understand the use of {{useFiltering}}. 
My understanding was that we should return an error message specifying that 
{{ALLOW FILTERING}} is required and that this problem should have been handled 
by {{checkNeedsFiltering}} in {{SelectStatement}}. Could you explain?
* In {{StatementRestrictions}} the {{nonPKRestrictedColumns}} method look wrong 
to me as it can return some primary key columns.

 Refactor and modernize the storage engine
 -

 Key: CASSANDRA-8099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 3.0

 Attachments: 8099-nit


 The current storage engine (which for this ticket I'll loosely define as the 
 code implementing the read/write path) is suffering from old age. One of the 
 main problem is that the only structure it deals with is the cell, which 
 completely ignores the more high level CQL structure that groups cell into 
 (CQL) rows.
 This leads to many inefficiencies, like the fact that during a reads we have 
 to group cells multiple times (to count on replica, then to count on the 
 coordinator, then to produce the CQL resultset) because we forget about the 
 grouping right away each time (so lots of useless cell names comparisons in 
 particular). But outside inefficiencies, having to manually recreate the CQL 
 structure every time we need it for something is hindering new features and 
 makes the code more complex that it should be.
 Said storage engine also has tons of technical debt. To pick an example, the 
 fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
 hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has 
 to go into to simply remove the last query result.
 So I want to bite the bullet and modernize this storage engine. I propose to 
 do 2 main things:
 # Make the storage engine more aware of the CQL structure. In practice, 
 instead of having partitions be a simple iterable map of cells, it should be 
 an iterable list of row (each being itself composed of per-column cells, 
 though obviously not exactly the same kind of cell we have today).
 # Make the engine more iterative. What I mean here is that in the read path, 
 we end up reading all cells in memory (we put them in a ColumnFamily object), 
 but there is really no reason to. If instead we were working with iterators 
 all the way through, we could get to a point where we're basically 
 transferring data from disk to the network, and we should be able to reduce 
 GC substantially.
 Please note that such refactor should provide some performance improvements 
 right off the bat but it's not it's primary goal either. It's primary goal is 
 to simplify the storage engine and adds abstraction that are better suited to 
 further optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-9052) CQLSSTableWriter close does not work - Regression bug: CASSANDRA-8281


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer reassigned CASSANDRA-9052:
-

Assignee: Benjamin Lerer

 CQLSSTableWriter close does not work - Regression bug: CASSANDRA-8281
 -

 Key: CASSANDRA-9052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9052
 Project: Cassandra
  Issue Type: Bug
 Environment: cassandra-all:2.1.2
 cassandra-all:2.1.3
Reporter: Sebastian YEPES FERNANDEZ
Assignee: Benjamin Lerer
  Labels: API, CQL, SSTableWriter

 Hello,
 I have just noticed that the last C* version 2.1.3 reintroduced an old bug 
 CASSANDRA-8281.
 When closing the CQLSSTableWriter after adding rows it generated the 
 following Exception:
 {code:title=Exception|borderStyle=solid}
 Exception in thread main java.lang.ExceptionInInitializerError
 at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:324)
 at org.apache.cassandra.db.Keyspace.init(Keyspace.java:277)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:96)
 at 
 org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:101)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.rawAddRow(CQLSSTableWriter.java:225)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:144)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:119)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
 at 
 org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:122)
 at BulkDataLoader.main(BulkDataLoader.groovy:26)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.cassandra.config.DatabaseDescriptor.getFlushWriters(DatabaseDescriptor.java:1053)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.clinit(ColumnFamilyStore.java:85)
 ... 18 more
 {code}
 Note that is this works correctly in 2.1.2 and not with 2.1.3, we can 
 reproduce this issue with the following code:
 {code:title=test.java|borderStyle=solid}
 import org.apache.cassandra.config.Config;
 import org.apache.cassandra.io.sstable.CQLSSTableWriter;
 public static void main(String[] args) {
   Config.setClientMode(true);
   // These folders needs must exist: mkdir -p /tmp/kspc/t1
   CQLSSTableWriter w1 = CQLSSTableWriter.builder()
 .inDirectory(/tmp/kspc/t1)
 .forTable(CREATE TABLE kspc.t1 ( id  int, PRIMARY KEY (id));)
 .using(INSERT INTO kspc.t1 (id) VALUES ( ? );)
 .build();
   try {
 w1.addRow(1);
 w1.close();
   } catch (Exception e) {
 System.out.println(e);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

2015-03-27 Thread Robbie Strickland (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383640#comment-14383640
 ] 

Robbie Strickland commented on CASSANDRA-8717:
--

FWIW, I spoke with several other teams at Spark Summit last week that would
really like this patch for the same reason.



 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9036) disk full when running cleanup (on a far from full disk)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383648#comment-14383648
 ] 

Robert Stupp commented on CASSANDRA-9036:
-

OK, thanks for trying the patch.
Then there's a bug in calculation of the sstable size for compaction - it 
requests 870GB, usable is 680GB ({{usable=732825808896, 
requested=933404582552}}). So it is correct to ignore that directory candidate 
- [~krummas]?

 disk full when running cleanup (on a far from full disk)
 --

 Key: CASSANDRA-9036
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9036
 Project: Cassandra
  Issue Type: Bug
Reporter: Erik Forsberg
Assignee: Robert Stupp

 I'm trying to run cleanup, but get this:
 {noformat}
  INFO [CompactionExecutor:18] 2015-03-25 10:29:16,355 CompactionManager.java 
 (line 564) Cleaning up 
 SSTableReader(path='/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db')
 ERROR [CompactionExecutor:18] 2015-03-25 10:29:16,664 CassandraDaemon.java 
 (line 199) Exception in thread Thread[CompactionExecutor:18,1,main]
 java.io.IOException: disk full
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:567)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:63)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:281)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:225)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Now that's odd, since:
 * Disk has some 680G left
 * The sstable it's trying to cleanup is far less than 680G:
 {noformat}
 # ls -lh *4345750*
 -rw-r--r-- 1 cassandra cassandra  64M Mar 21 04:42 
 production-Data_daily-jb-4345750-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 219G Mar 21 04:42 
 production-Data_daily-jb-4345750-Data.db
 -rw-r--r-- 1 cassandra cassandra 503M Mar 21 04:42 
 production-Data_daily-jb-4345750-Filter.db
 -rw-r--r-- 1 cassandra cassandra  42G Mar 21 04:42 
 production-Data_daily-jb-4345750-Index.db
 -rw-r--r-- 1 cassandra cassandra 5.9K Mar 21 04:42 
 production-Data_daily-jb-4345750-Statistics.db
 -rw-r--r-- 1 cassandra cassandra  81M Mar 21 04:42 
 production-Data_daily-jb-4345750-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Mar 21 04:42 
 production-Data_daily-jb-4345750-TOC.txt
 {noformat}
 Sure, it's large, but it's not 680G. 
 No other compactions are running on that server. I'm getting this on 12 / 56 
 servers right now. 
 Could it be some bug in the calculation of the expected size of the new 
 sstable, perhaps? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9036) disk full when running cleanup (on a far from full disk)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383665#comment-14383665
 ] 

Robert Stupp commented on CASSANDRA-9036:
-

Right - the request size is the _uncompressed_ size. Will provide a patch that 
multiples that with the (previous) compression ratio soon. (so yes, it is 
related to CASSANDRA-7386)

 disk full when running cleanup (on a far from full disk)
 --

 Key: CASSANDRA-9036
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9036
 Project: Cassandra
  Issue Type: Bug
Reporter: Erik Forsberg
Assignee: Robert Stupp

 I'm trying to run cleanup, but get this:
 {noformat}
  INFO [CompactionExecutor:18] 2015-03-25 10:29:16,355 CompactionManager.java 
 (line 564) Cleaning up 
 SSTableReader(path='/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db')
 ERROR [CompactionExecutor:18] 2015-03-25 10:29:16,664 CassandraDaemon.java 
 (line 199) Exception in thread Thread[CompactionExecutor:18,1,main]
 java.io.IOException: disk full
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:567)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:63)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:281)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:225)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Now that's odd, since:
 * Disk has some 680G left
 * The sstable it's trying to cleanup is far less than 680G:
 {noformat}
 # ls -lh *4345750*
 -rw-r--r-- 1 cassandra cassandra  64M Mar 21 04:42 
 production-Data_daily-jb-4345750-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 219G Mar 21 04:42 
 production-Data_daily-jb-4345750-Data.db
 -rw-r--r-- 1 cassandra cassandra 503M Mar 21 04:42 
 production-Data_daily-jb-4345750-Filter.db
 -rw-r--r-- 1 cassandra cassandra  42G Mar 21 04:42 
 production-Data_daily-jb-4345750-Index.db
 -rw-r--r-- 1 cassandra cassandra 5.9K Mar 21 04:42 
 production-Data_daily-jb-4345750-Statistics.db
 -rw-r--r-- 1 cassandra cassandra  81M Mar 21 04:42 
 production-Data_daily-jb-4345750-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Mar 21 04:42 
 production-Data_daily-jb-4345750-TOC.txt
 {noformat}
 Sure, it's large, but it's not 680G. 
 No other compactions are running on that server. I'm getting this on 12 / 56 
 servers right now. 
 Could it be some bug in the calculation of the expected size of the new 
 sstable, perhaps? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

2015-03-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383698#comment-14383698
 ] 

Andrés de la Peña commented on CASSANDRA-8717:
--

On our side, we would very much like to abandon the fork and distribute our 
index as a plugin once you guys agree that the proposed changes regarding top-K 
queries are a go.

 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (CASSANDRA-8150) Revaluate Default JVM tuning parameters

[
https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko updated CASSANDRA-8150:
-
Comment: was deleted

(was: Dear sender,

I am giving a training and will be back on Tuesday 31-03-2015

During breaks I hope to respond on your email.

Your email will not be forwarded.

For urgent matters regarding GRTC / RTPE contact Peter v/d Koolwijk
(peter.van.de.koolw...@ing.nl / 06-54660211

Or alternatively my manager Coos v/d Berg (coos.van.den.b...@ing.nl /
06-22018780)

Best regards,

Hans van der Linde

-
ATTENTION:
The information in this electronic mail message is private and
confidential, and only intended for the addressee. Should you
receive this message by mistake, you are hereby notified that
any disclosure, reproduction, distribution or use of this
message is strictly prohibited. Please inform the sender by
reply transmission and delete the message without copying or
opening it.

Messages and attachments are scanned for all viruses known.
If this message contains password-protected attachments, the
files have NOT been scanned for viruses by the ING mail domain.
Always scan attachments before opening them.
-

)

Revaluate Default JVM tuning parameters
---

Key: CASSANDRA-8150
URL: https://issues.apache.org/jira/browse/CASSANDRA-8150
Project: Cassandra
Issue Type: Improvement
Components: Config
Reporter: Matt Stump
Assignee: Ryan McGuire
Attachments: upload.png

It's been found that the old twitter recommendations of 100m per core up to
800m is harmful and should no longer be used.
Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3
or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess
1/3 is probably better for releases greater than 2.1.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9051) Error in cqlsh command line while querying

2015-03-27 Thread Naresh Palaiya (JIRA)

Naresh Palaiya created CASSANDRA-9051:
-

 Summary: Error in cqlsh command line while querying
 Key: CASSANDRA-9051
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9051
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Naresh Palaiya
Priority: Critical
 Fix For: 2.1.2


Aggregation queries on Cassandra cluster results in the following error. Even 
after increasing the read_request_timeout_in_ms and range_request_timeout_in_ms 
parameters. For more information on the bug. You can refer the this stack 
overflow link.

http://stackoverflow.com/questions/29205005/error-in-cqlsh-command-line-while-querying

 errors={}, last_host=localhost
  Statement trace did not complete within 10 seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6680) Clock skew detection via gossip

2015-03-27 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383548#comment-14383548
 ] 

Sergio Bossa commented on CASSANDRA-6680:
-

Hybrid Logical Clocks could be relevant for this issue: 
http://www.cse.buffalo.edu/tech-reports/2014-04.pdf

 Clock skew detection via gossip
 ---

 Key: CASSANDRA-6680
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6680
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Brandon Williams
Assignee: Stefania
Priority: Minor
 Fix For: 3.0


 Gossip's HeartbeatState keeps the generation (local timestamp the node was 
 started) and version (monotonically increasing per gossip interval) which 
 could be used to roughly calculate the node's current time, enabling 
 detection of gossip messages too far in the future for the clocks to be 
 synced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9051) Error in cqlsh command line while querying

2015-03-27 Thread Naresh Palaiya (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Naresh Palaiya updated CASSANDRA-9051:
--
Description:
Aggregation queries (select count(*) from TABLE_NAME ) on Cassandra cluster
results in the following error. Even after increasing the
read_request_timeout_in_ms and range_request_timeout_in_ms parameters. For more
information on the bug. You can refer the this stack overflow link.

http://stackoverflow.com/questions/29205005/error-in-cqlsh-command-line-while-querying

errors={}, last_host=localhost
Statement trace did not complete within 10 seconds

was:
Aggregation queries on Cassandra cluster results in the following error. Even
after increasing the read_request_timeout_in_ms and range_request_timeout_in_ms
parameters. For more information on the bug. You can refer the this stack
overflow link.

http://stackoverflow.com/questions/29205005/error-in-cqlsh-command-line-while-querying

errors={}, last_host=localhost
Statement trace did not complete within 10 seconds

Error in cqlsh command line while querying
--

Key: CASSANDRA-9051
URL: https://issues.apache.org/jira/browse/CASSANDRA-9051
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Naresh Palaiya
Priority: Critical
Fix For: 2.1.2

Aggregation queries (select count(*) from TABLE_NAME ) on Cassandra cluster
results in the following error. Even after increasing the
read_request_timeout_in_ms and range_request_timeout_in_ms parameters. For
more information on the bug. You can refer the this stack overflow link.
http://stackoverflow.com/questions/29205005/error-in-cqlsh-command-line-while-querying
errors={}, last_host=localhost
Statement trace did not complete within 10 seconds

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8984) Introduce Transactional API for behaviours that can corrupt system state

2015-03-27 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383602#comment-14383602
 ] 

Benedict commented on CASSANDRA-8984:
-

Rebased to trunk [here|https://github.com/belliottsmith/cassandra/tree/8984]

 Introduce Transactional API for behaviours that can corrupt system state
 

 Key: CASSANDRA-8984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8984
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1.4

 Attachments: 8984_windows_timeout.txt


 As a penultimate (and probably final for 2.1, if we agree to introduce it 
 there) round of changes to the internals managing sstable writing, I've 
 introduced a new API called Transactional that I hope will make it much 
 easier to write correct behaviour. As things stand we conflate a lot of 
 behaviours into methods like close - the recent changes unpicked some of 
 these, but didn't go far enough. My proposal here introduces an interface 
 designed to support four actions (on top of their normal function):
 * prepareToCommit
 * commit
 * abort
 * cleanup
 In normal operation, once we have finished constructing a state change we 
 call prepareToCommit; once all such state changes are prepared, we call 
 commit. If at any point everything fails, abort is called. In _either_ case, 
 cleanup is called at the very last.
 These transactional objects are all AutoCloseable, with the behaviour being 
 to rollback any changes unless commit has completed successfully.
 The changes are actually less invasive than it might sound, since we did 
 recently introduce abort in some places, as well as have commit like methods. 
 This simply formalises the behaviour, and makes it consistent between all 
 objects that interact in this way. Much of the code change is boilerplate, 
 such as moving an object into a try-declaration, although the change is still 
 non-trivial. What it _does_ do is eliminate a _lot_ of special casing that we 
 have had since 2.1 was released. The data tracker API changes and compaction 
 leftover cleanups should finish the job with making this much easier to 
 reason about, but this change I think is worthwhile considering for 2.1, 
 since we've just overhauled this entire area (and not released these 
 changes), and this change is essentially just the finishing touches, so the 
 risk is minimal and the potential gains reasonably significant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-27 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383561#comment-14383561
 ] 

Benedict commented on CASSANDRA-8993:
-

I hate it when the explanation is simply that I'm an idiot (or if I'm 
charitable, that I forgot the reason). Given this explanation, I think either 
piece of code is as good as the other, so since it's your baby perhaps you can 
decide which you prefer?

 EffectiveIndexInterval calculation is incorrect
 ---

 Key: CASSANDRA-8993
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Blocker
 Fix For: 2.1.4

 Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt


 I'm not familiar enough with the calculation itself to understand why this is 
 happening, but see discussion on CASSANDRA-8851 for the background. I've 
 introduced a test case to look for this during downsampling, but it seems to 
 pass just fine, so it may be an artefact of upgrading.
 The problem was, unfortunately, not manifesting directly because it would 
 simply result in a failed lookup. This was only exposed when early opening 
 used firstKeyBeyond, which does not use the effective interval, and provided 
 the result to getPosition().
 I propose a simple fix that ensures a bug here cannot break correctness. 
 Perhaps [~thobbs] can follow up with an investigation as to how it actually 
 went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9052) CQLSSTableWriter close does not work - Regression bug: CASSANDRA-8281


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383787#comment-14383787
 ] 

Benjamin Lerer commented on CASSANDRA-9052:
---

This ticket has nothing to do with CASSANDRA-8281. In CASSANDRA-8281 the 
problem was that some non daemon thread was preventing the JVM to shutdown 
there was no Exception. 
According to the stacktrace the Exception is trigger by the call to {{addRow}} 
not by the call to {{close}}. This ticket is in fact a duplicate of the other 
ticket that you have open CASSANDRA-8808.

 CQLSSTableWriter close does not work - Regression bug: CASSANDRA-8281
 -

 Key: CASSANDRA-9052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9052
 Project: Cassandra
  Issue Type: Bug
 Environment: cassandra-all:2.1.2
 cassandra-all:2.1.3
Reporter: Sebastian YEPES FERNANDEZ
Assignee: Benjamin Lerer
  Labels: API, CQL, SSTableWriter

 Hello,
 I have just noticed that the last C* version 2.1.3 reintroduced an old bug 
 CASSANDRA-8281.
 When closing the CQLSSTableWriter after adding rows it generated the 
 following Exception:
 {code:title=Exception|borderStyle=solid}
 Exception in thread main java.lang.ExceptionInInitializerError
 at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:324)
 at org.apache.cassandra.db.Keyspace.init(Keyspace.java:277)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:96)
 at 
 org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:101)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.rawAddRow(CQLSSTableWriter.java:225)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:144)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:119)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
 at 
 org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:122)
 at BulkDataLoader.main(BulkDataLoader.groovy:26)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.cassandra.config.DatabaseDescriptor.getFlushWriters(DatabaseDescriptor.java:1053)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.clinit(ColumnFamilyStore.java:85)
 ... 18 more
 {code}
 Note that is this works correctly in 2.1.2 and not with 2.1.3, we can 
 reproduce this issue with the following code:
 {code:title=test.java|borderStyle=solid}
 import org.apache.cassandra.config.Config;
 import org.apache.cassandra.io.sstable.CQLSSTableWriter;
 public static void main(String[] args) {
   Config.setClientMode(true);
   // These folders needs must exist: mkdir -p /tmp/kspc/t1
   CQLSSTableWriter w1 = CQLSSTableWriter.builder()
 .inDirectory(/tmp/kspc/t1)
 .forTable(CREATE TABLE kspc.t1 ( id  int, PRIMARY KEY (id));)
 .using(INSERT INTO kspc.t1 (id) VALUES ( ? );)
 .build();
   try {
 w1.addRow(1);
 w1.close();
   } catch (Exception e) {
 System.out.println(e);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383704#comment-14383704
 ] 

Sylvain Lebresne commented on CASSANDRA-8717:
-

I don't have a problem with this in theory, at least in 3.0 (I tend to agree 
with Aleksey on that part), though I could argue that what you fundamentally 
ask is not specific to indexing. What you want is a way to transform the 
result of internal queries. It's rather close to aggregation except that 
instead of transforming multiple rows into a single, you want to transform some 
rows into other rows (sorting them being just one particular use case of that). 
 The fact that the results you want to transform is the result of your custom 
index is kind of incidental. So I do feel that implementing this as the more 
general concept of results transformation would be cleaner (and more generic).  
However, doing so is probably a little bit more involved so I'm happy to 
hijack the 2ndary index API for that in the short term and leave 
generalization to later, provided we agree that we may generalize that better 
and thus slightly break those new APIs.

Now on the patch, I do think {{requiresFullScan}} somewhat break the 
{{concurrencyFactor}} computation in {{getRangeSlice}} as {{remainingRows}} can 
become negative. This is not a huge deal in the sense that the code ensure the 
{{concurrentFactor}} is never smaller than 1, but it still is kind of wrong in 
principle. In fact, that method is really about modifying the query limit 
internally (up until the combine method has been applied), and that's imo the 
proper way to expose it.

Another nit is that we should rename the {{sort}} method in something more 
generic (as said above, sorting is somewhat of a special case and no reason to 
imply a limitation to that). It could be renamed {{combine}} or, imo a bit 
better, something like {{postReconciliationProcessing}}.


 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (CASSANDRA-9052) CQLSSTableWriter close does not work - Regression bug: CASSANDRA-8281


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer resolved CASSANDRA-9052.
---
Resolution: Duplicate

 CQLSSTableWriter close does not work - Regression bug: CASSANDRA-8281
 -

 Key: CASSANDRA-9052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9052
 Project: Cassandra
  Issue Type: Bug
 Environment: cassandra-all:2.1.2
 cassandra-all:2.1.3
Reporter: Sebastian YEPES FERNANDEZ
Assignee: Benjamin Lerer
  Labels: API, CQL, SSTableWriter

 Hello,
 I have just noticed that the last C* version 2.1.3 reintroduced an old bug 
 CASSANDRA-8281.
 When closing the CQLSSTableWriter after adding rows it generated the 
 following Exception:
 {code:title=Exception|borderStyle=solid}
 Exception in thread main java.lang.ExceptionInInitializerError
 at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:324)
 at org.apache.cassandra.db.Keyspace.init(Keyspace.java:277)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119)
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:96)
 at 
 org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:101)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.rawAddRow(CQLSSTableWriter.java:225)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:144)
 at 
 org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:119)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189)
 at 
 org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
 at 
 org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
 at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:122)
 at BulkDataLoader.main(BulkDataLoader.groovy:26)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.cassandra.config.DatabaseDescriptor.getFlushWriters(DatabaseDescriptor.java:1053)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.clinit(ColumnFamilyStore.java:85)
 ... 18 more
 {code}
 Note that is this works correctly in 2.1.2 and not with 2.1.3, we can 
 reproduce this issue with the following code:
 {code:title=test.java|borderStyle=solid}
 import org.apache.cassandra.config.Config;
 import org.apache.cassandra.io.sstable.CQLSSTableWriter;
 public static void main(String[] args) {
   Config.setClientMode(true);
   // These folders needs must exist: mkdir -p /tmp/kspc/t1
   CQLSSTableWriter w1 = CQLSSTableWriter.builder()
 .inDirectory(/tmp/kspc/t1)
 .forTable(CREATE TABLE kspc.t1 ( id  int, PRIMARY KEY (id));)
 .using(INSERT INTO kspc.t1 (id) VALUES ( ? );)
 .build();
   try {
 w1.addRow(1);
 w1.close();
   } catch (Exception e) {
 System.out.println(e);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

2015-03-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383812#comment-14383812
 ] 

Andrés de la Peña commented on CASSANDRA-8717:
--

I agree with your idea about doing this for the short term and leave 
generalization for later. We can deal with future API changes without problems. 
What we would need at 2i level is some way to specify that we need to scan all 
the nodes and the aforementioned method so as to combine the partial results. 
Indeed, sort is not the most fortunate name for this method...

 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8374) Better support of null for UDF


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383844#comment-14383844
 ] 

Sylvain Lebresne commented on CASSANDRA-8374:
-

So, I think it's time to decide where we go here. As I said in my previous 
comment, my preference here is to go with no default and an explicit choice. 
And doing so still leaves the ability to change our mind later and add a 
default, so I suggest we implement that and if someone feels strongly about a 
particular default, he can open a follow up ticket.

[~snazy] can you rebase your branch, make sure it implements what's above and 
remind us where that branch actually is?

 Better support of null for UDF
 --

 Key: CASSANDRA-8374
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Robert Stupp
  Labels: client-impacting, cql3.3, docs-impacting, udf
 Fix For: 3.0

 Attachments: 8374-3.txt, 8473-1.txt, 8473-2.txt


 Currently, every function needs to deal with it's argument potentially being 
 {{null}}. There is very many case where that's just annoying, users should be 
 able to define a function like:
 {noformat}
 CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;'
 {noformat}
 without having this crashing as soon as a column it's applied to doesn't a 
 value for some rows (I'll note that this definition apparently cannot be 
 compiled currently, which should be looked into).  
 In fact, I think that by default methods shouldn't have to care about 
 {{null}} values: if the value is {{null}}, we should not call the method at 
 all and return {{null}}. There is still methods that may explicitely want to 
 handle {{null}} (to return a default value for instance), so maybe we can add 
 an {{ALLOW NULLS}} to the creation syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7807) Push notification when tracing completes for an operation


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-7807:

Attachment: 7807-v3.txt

I worked in all of your comments.

{{SimpleClient}}:
Regarding {{TransportException}} - unfortunately it’s an interface - not an 
(unchecked) exception class.

{{Event}}:
I decided to check the version at the top-level methods (so removed the 
additional, paranoid checks in the individual implementations)

{{TraceState}}:
Added functionality to {{Connection}}/{{ServerConnection}} that checks for 
registration.

{{TraceCompleteTest}}:
removed that unnecessary stuff (so it doesn’t waste time)

{{MessagePayloadTest}}:
Apologies for that.


 Push notification when tracing completes for an operation
 -

 Key: CASSANDRA-7807
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7807
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Tyler Hobbs
Assignee: Robert Stupp
Priority: Minor
  Labels: client-impacting, protocolv4
 Fix For: 3.0

 Attachments: 7807-v2.txt, 7807-v3.txt, 7807.txt


 Tracing is an asynchronous operation, and drivers currently poll to determine 
 when the trace is complete (in a loop with sleeps).  Instead, the server 
 could push a notification to the driver when the trace completes.
 I'm guessing that most of the work for this will be around pushing 
 notifications to a single connection instead of all connections that have 
 registered listeners for a particular event type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9036) disk full when running cleanup (on a far from full disk)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-9036:

Attachment: 9036-3.0.txt
9036-2.1.txt
9036-2.0.txt

Patch(es) are about to fix the issue in 
{{ColumnFamilyStore.getExpectedCompactedFileSize}}, which returned the on-disk 
size for non-CLEANUP-compactions but the uncompressed size for 
CLEANUP-compactions.

[~krummas] can you review?

 disk full when running cleanup (on a far from full disk)
 --

 Key: CASSANDRA-9036
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9036
 Project: Cassandra
  Issue Type: Bug
Reporter: Erik Forsberg
Assignee: Robert Stupp
 Attachments: 9036-2.0.txt, 9036-2.1.txt, 9036-3.0.txt


 I'm trying to run cleanup, but get this:
 {noformat}
  INFO [CompactionExecutor:18] 2015-03-25 10:29:16,355 CompactionManager.java 
 (line 564) Cleaning up 
 SSTableReader(path='/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db')
 ERROR [CompactionExecutor:18] 2015-03-25 10:29:16,664 CassandraDaemon.java 
 (line 199) Exception in thread Thread[CompactionExecutor:18,1,main]
 java.io.IOException: disk full
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:567)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:63)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:281)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:225)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Now that's odd, since:
 * Disk has some 680G left
 * The sstable it's trying to cleanup is far less than 680G:
 {noformat}
 # ls -lh *4345750*
 -rw-r--r-- 1 cassandra cassandra  64M Mar 21 04:42 
 production-Data_daily-jb-4345750-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 219G Mar 21 04:42 
 production-Data_daily-jb-4345750-Data.db
 -rw-r--r-- 1 cassandra cassandra 503M Mar 21 04:42 
 production-Data_daily-jb-4345750-Filter.db
 -rw-r--r-- 1 cassandra cassandra  42G Mar 21 04:42 
 production-Data_daily-jb-4345750-Index.db
 -rw-r--r-- 1 cassandra cassandra 5.9K Mar 21 04:42 
 production-Data_daily-jb-4345750-Statistics.db
 -rw-r--r-- 1 cassandra cassandra  81M Mar 21 04:42 
 production-Data_daily-jb-4345750-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Mar 21 04:42 
 production-Data_daily-jb-4345750-TOC.txt
 {noformat}
 Sure, it's large, but it's not 680G. 
 No other compactions are running on that server. I'm getting this on 12 / 56 
 servers right now. 
 Could it be some bug in the calculation of the expected size of the new 
 sstable, perhaps? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7304) Ability to distinguish between NULL and UNSET values in Prepared Statements


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-7304:

Reviewer: Benjamin Lerer  (was: Sylvain Lebresne)

 Ability to distinguish between NULL and UNSET values in Prepared Statements
 ---

 Key: CASSANDRA-7304
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7304
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Drew Kutcharian
Assignee: Oded Peer
  Labels: cql, protocolv4
 Fix For: 3.0

 Attachments: 7304-03.patch, 7304-04.patch, 7304-2.patch, 7304.patch


 Currently Cassandra inserts tombstones when a value of a column is bound to 
 NULL in a prepared statement. At higher insert rates managing all these 
 tombstones becomes an unnecessary overhead. This limits the usefulness of the 
 prepared statements since developers have to either create multiple prepared 
 statements (each with a different combination of column names, which at times 
 is just unfeasible because of the sheer number of possible combinations) or 
 fall back to using regular (non-prepared) statements.
 This JIRA is here to explore the possibility of either:
 A. Have a flag on prepared statements that once set, tells Cassandra to 
 ignore null columns
 or
 B. Have an UNSET value which makes Cassandra skip the null columns and not 
 tombstone them
 Basically, in the context of a prepared statement, a null value means delete, 
 but we don’t have anything that means ignore (besides creating a new 
 prepared statement without the ignored column).
 Please refer to the original conversation on DataStax Java Driver mailing 
 list for more background:
 https://groups.google.com/a/lists.datastax.com/d/topic/java-driver-user/cHE3OOSIXBU/discussion
 *EDIT 18/12/14 - [~odpeer] Implementation Notes:*
 The motivation hasn't changed.
 Protocol version 4 specifies that bind variables do not require having a 
 value when executing a statement. Bind variables without a value are called 
 'unset'. The 'unset' bind variable is serialized as the int value '-2' 
 without following bytes.
 \\
 \\
 * An unset bind variable in an EXECUTE or BATCH request
 ** On a {{value}} does not modify the value and does not create a tombstone
 ** On the {{ttl}} clause is treated as 'unlimited'
 ** On the {{timestamp}} clause is treated as 'now'
 ** On a map key or a list index throws {{InvalidRequestException}}
 ** On a {{counter}} increment or decrement operation does not change the 
 counter value, e.g. {{UPDATE my_tab SET c = c - ? WHERE k = 1}} does change 
 the value of counter {{c}}
 ** On a tuple field or UDT field throws {{InvalidRequestException}}
 * An unset bind variable in a QUERY request
 ** On a partition column, clustering column or index column in the {{WHERE}} 
 clause throws {{InvalidRequestException}}
 ** On the {{limit}} clause is treated as 'unlimited'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7304) Ability to distinguish between NULL and UNSET values in Prepared Statements


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383826#comment-14383826
 ] 

Sylvain Lebresne commented on CASSANDRA-7304:
-

I apologize for not getting back on this one, it kind of slipped out of my 
review list, but we need to get that ready for 3.0. [~odpeer] would you have 
the time to rebase this to trunk?

 Ability to distinguish between NULL and UNSET values in Prepared Statements
 ---

 Key: CASSANDRA-7304
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7304
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Drew Kutcharian
Assignee: Oded Peer
  Labels: cql, protocolv4
 Fix For: 3.0

 Attachments: 7304-03.patch, 7304-04.patch, 7304-2.patch, 7304.patch


 Currently Cassandra inserts tombstones when a value of a column is bound to 
 NULL in a prepared statement. At higher insert rates managing all these 
 tombstones becomes an unnecessary overhead. This limits the usefulness of the 
 prepared statements since developers have to either create multiple prepared 
 statements (each with a different combination of column names, which at times 
 is just unfeasible because of the sheer number of possible combinations) or 
 fall back to using regular (non-prepared) statements.
 This JIRA is here to explore the possibility of either:
 A. Have a flag on prepared statements that once set, tells Cassandra to 
 ignore null columns
 or
 B. Have an UNSET value which makes Cassandra skip the null columns and not 
 tombstone them
 Basically, in the context of a prepared statement, a null value means delete, 
 but we don’t have anything that means ignore (besides creating a new 
 prepared statement without the ignored column).
 Please refer to the original conversation on DataStax Java Driver mailing 
 list for more background:
 https://groups.google.com/a/lists.datastax.com/d/topic/java-driver-user/cHE3OOSIXBU/discussion
 *EDIT 18/12/14 - [~odpeer] Implementation Notes:*
 The motivation hasn't changed.
 Protocol version 4 specifies that bind variables do not require having a 
 value when executing a statement. Bind variables without a value are called 
 'unset'. The 'unset' bind variable is serialized as the int value '-2' 
 without following bytes.
 \\
 \\
 * An unset bind variable in an EXECUTE or BATCH request
 ** On a {{value}} does not modify the value and does not create a tombstone
 ** On the {{ttl}} clause is treated as 'unlimited'
 ** On the {{timestamp}} clause is treated as 'now'
 ** On a map key or a list index throws {{InvalidRequestException}}
 ** On a {{counter}} increment or decrement operation does not change the 
 counter value, e.g. {{UPDATE my_tab SET c = c - ? WHERE k = 1}} does change 
 the value of counter {{c}}
 ** On a tuple field or UDT field throws {{InvalidRequestException}}
 * An unset bind variable in a QUERY request
 ** On a partition column, clustering column or index column in the {{WHERE}} 
 clause throws {{InvalidRequestException}}
 ** On the {{limit}} clause is treated as 'unlimited'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9036) disk full when running cleanup (on a far from full disk)

2015-03-27 Thread Erik Forsberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383840#comment-14383840
 ] 

Erik Forsberg commented on CASSANDRA-9036:
--

Applied the 2.0 version and put in production on one of my nodes. Cleanup of my 
rather-large-file now started without exception. 

So I'm now officially happy! :-)

 disk full when running cleanup (on a far from full disk)
 --

 Key: CASSANDRA-9036
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9036
 Project: Cassandra
  Issue Type: Bug
Reporter: Erik Forsberg
Assignee: Robert Stupp
 Attachments: 9036-2.0.txt, 9036-2.1.txt, 9036-3.0.txt


 I'm trying to run cleanup, but get this:
 {noformat}
  INFO [CompactionExecutor:18] 2015-03-25 10:29:16,355 CompactionManager.java 
 (line 564) Cleaning up 
 SSTableReader(path='/cassandra/production/Data_daily/production-Data_daily-jb-4345750-Data.db')
 ERROR [CompactionExecutor:18] 2015-03-25 10:29:16,664 CassandraDaemon.java 
 (line 199) Exception in thread Thread[CompactionExecutor:18,1,main]
 java.io.IOException: disk full
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:567)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:63)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:281)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:225)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Now that's odd, since:
 * Disk has some 680G left
 * The sstable it's trying to cleanup is far less than 680G:
 {noformat}
 # ls -lh *4345750*
 -rw-r--r-- 1 cassandra cassandra  64M Mar 21 04:42 
 production-Data_daily-jb-4345750-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 219G Mar 21 04:42 
 production-Data_daily-jb-4345750-Data.db
 -rw-r--r-- 1 cassandra cassandra 503M Mar 21 04:42 
 production-Data_daily-jb-4345750-Filter.db
 -rw-r--r-- 1 cassandra cassandra  42G Mar 21 04:42 
 production-Data_daily-jb-4345750-Index.db
 -rw-r--r-- 1 cassandra cassandra 5.9K Mar 21 04:42 
 production-Data_daily-jb-4345750-Statistics.db
 -rw-r--r-- 1 cassandra cassandra  81M Mar 21 04:42 
 production-Data_daily-jb-4345750-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Mar 21 04:42 
 production-Data_daily-jb-4345750-TOC.txt
 {noformat}
 Sure, it's large, but it's not 680G. 
 No other compactions are running on that server. I'm getting this on 12 / 56 
 servers right now. 
 Could it be some bug in the calculation of the expected size of the new 
 sstable, perhaps? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8241) Use javac instead of javassist