[jira] [Comment Edited] (CASSANDRA-10363) NullPointerException returned with select ttl(value), IN, ORDER BY and paging off

2015-10-15 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957009#comment-14957009
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-10363 at 10/15/15 10:44 AM:


I've attached a patch backporting this to 2.0, not for actually committing but 
so those unable to upgrade just yet can patch their own systems if necessary. 
The test changes the expectations for a few scenarios from the 2.1+ version 
because CASSANDRA-4911 isn't in 2.0 & so {{ORDER BY}} can only contain columns 
in the selection.

[branch|https://github.com/beobal/cassandra/tree/10363-2.0], 
[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-testall/],
 
[dtests|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-dtest/]
 (test runs pending)

Edit: there are a few dtest failures in the run above, but checking these 
against 2.0 there aren't any new failures.


was (Author: beobal):
I've attached a patch backporting this to 2.0, not for actually committing but 
so those unable to upgrade just yet can patch their own systems if necessary. 
The test changes the expectations for a few scenarios from the 2.1+ version 
because CASSANDRA-4911 isn't in 2.0 & so {{ORDER BY}} can only contain columns 
in the selection.

[branch|https://github.com/beobal/cassandra/tree/10363-2.0], 
[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-testall/],
 
[dtests|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-dtest/]
 (test runs pending)

> NullPointerException returned with select ttl(value), IN, ORDER BY and paging 
> off
> -
>
> Key: CASSANDRA-10363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10363
> Project: Cassandra
>  Issue Type: Bug
> Environment: Apache Cassandra 2.1.8.689
>Reporter: Sucwinder Bassi
>Assignee: Benjamin Lerer
>Priority: Minor
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
> Attachments: 10363-2.0-c4de752.txt
>
>
> Running this query with paging off returns a NullPointerException:
> cqlsh:test> SELECT value, ttl(value), last_modified FROM test where 
> useruid='userid1' AND direction IN ('out','in') ORDER BY last_modified; 
> ServerError:  message="java.lang.NullPointerException">
> Here's the stack trace from the system.log:
> ERROR [SharedPool-Worker-1] 2015-09-17 13:11:03,937  ErrorMessage.java:251 - 
> Unexpected exception during request
> java.lang.NullPointerException: null
> at 
> org.apache.cassandra.db.marshal.LongType.compareLongs(LongType.java:41) 
> ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.db.marshal.TimestampType.compare(TimestampType.java:48) 
> ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.db.marshal.TimestampType.compare(TimestampType.java:38) 
> ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$SingleColumnComparator.compare(SelectStatement.java:2419)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$SingleColumnComparator.compare(SelectStatement.java:2406)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at java.util.TimSort.countRunAndMakeAscending(TimSort.java:351) 
> ~[na:1.8.0_40]
> at java.util.TimSort.sort(TimSort.java:216) ~[na:1.8.0_40]
> at java.util.Arrays.sort(Arrays.java:1512) ~[na:1.8.0_40]
> at java.util.ArrayList.sort(ArrayList.java:1454) ~[na:1.8.0_40]
> at java.util.Collections.sort(Collections.java:175) ~[na:1.8.0_40]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.orderResults(SelectStatement.java:1400)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1255)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:299)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:276)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:67)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238)
>  ~[cassandra-all-2.1.8.689.jar:2.1.8.689]
> at 
> com.datastax.bdp.cassandra.cql3.DseQueryHandler$StatementExecution.execute(DseQueryHandler.java:291)

[jira] [Commented] (CASSANDRA-10509) Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958603#comment-14958603
 ] 

Stefania commented on CASSANDRA-10509:
--

CI looks reasonable.

Regarding {{SELECT *}}, it's the driver token aware load balancing policy that 
prevents this from being reproducible. Because we always send the final page 
request to the node where the last key is local and where therefore 
{{ExclusiveBounds}} does work. The problem can be reproduced with {{SELECT *}} 
provided we use an {{exclusive_cql_connection}}.


> Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip
> -
>
> Key: CASSANDRA-10509
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10509
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Paulo Motta
>Assignee: Stefania
> Fix For: 2.2.x
>
>
> Test failing on 2.2 after fixing CASSANDRA-10507:
> http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10507-2.2-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10532) Allow LWT operation on static column with only partition keys

2015-10-15 Thread DOAN DuyHai (JIRA)
DOAN DuyHai created CASSANDRA-10532:
---

 Summary: Allow LWT operation on static column with only partition 
keys
 Key: CASSANDRA-10532
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10532
 Project: Cassandra
  Issue Type: Bug
  Components: API
 Environment: C* 2.2.0
Reporter: DOAN DuyHai


Schema

{code:sql}
CREATE TABLE IF NOT EXISTS achilles_embedded.entity_with_static_column(
id bigint,
uuid uuid,
static_col text static,
value text,
PRIMARY KEY(id, uuid));
{code}

When trying to prepare the following query

{code:sql}
DELETE static_col FROM achilles_embedded.entity_with_static_column WHERE 
id=:id_Eq IF static_col=:static_col;
{code}

I got the error *DELETE statements must restrict all PRIMARY KEY columns with 
equality relations in order to use IF conditions, but column 'uuid' is not 
restricted*

Since the mutation only impacts the static column and the CAS check is on the 
static column, it makes sense to provide only partition key



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10518) initialDirectories passed into ColumnFamilyStore contructor

2015-10-15 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958812#comment-14958812
 ] 

Marcus Eriksson commented on CASSANDRA-10518:
-

Ok, this lgtm (with a tiny nit), will commit once CI is done;
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-10518-dtest/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-10518-testall/

> initialDirectories passed into ColumnFamilyStore contructor
> ---
>
> Key: CASSANDRA-10518
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10518
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 3.0.0 rc2
>
> Attachments: 10518-v2.txt, initialDirectoriesFixV1.patch
>
>
> One of the goals of CASSANDRA-8671 was to let compaction strategies write to 
> directories not used by normal tables, and the field 
> {{ColumnFamilyStore.initialDirectories}} was added to make sstables in those 
> directories discoverable on cfs instantiation.
> Unfortunately, in my patch, I passed the full list of directories 
> {{initialDirectories}} into the ColumnFamilyStore constructor, effectively 
> making these directories usable by any table. The attached patch fixes this, 
> and elaborates on the correct usage of the usage of 
> {{ColumnFamilyStore.addInitialDirectories}} in it's comment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10468) Fix class-casting error in mixed clusters for 2.2->3.0 upgrades

2015-10-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958673#comment-14958673
 ] 

Sylvain Lebresne commented on CASSANDRA-10468:
--

Correct, there is problems with the reversed case (on top of the one you 
mention, we also don't properly reverse the name comparator). Pushed a fix 
[here|https://github.com/pcmanus/cassandra/commits/10468-followup] that also 
include a unit test for all this.

> Fix class-casting error in mixed clusters for 2.2->3.0 upgrades
> ---
>
> Key: CASSANDRA-10468
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10468
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Sylvain Lebresne
> Fix For: 3.0.0 rc2
>
>
> Three upgrade tests:
> - {{upgrade_tests/cql_tests.py:TestCQL.cas_and_list_index_test}}
> - {{upgrade_tests/cql_tests.py:TestCQL.collection_and_regular_test}}
> - {{upgrade_tests/cql_tests.py:TestCQL.composite_index_collections_test}}
> fail on the upgrade path from 2.2 to 3.0. The failures can be found on CassCI 
> here:
> [cas_and_list_index_test|http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/43/testReport/upgrade_tests.cql_tests/TestCQL/cas_and_list_index_test/]
> [collection_and_regular_test|http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/43/testReport/upgrade_tests.cql_tests/TestCQL/collection_and_regular_test/]
> [composite_index_collections_test|http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/43/testReport/upgrade_tests.cql_tests/TestCQL/composite_index_collections_test/]
> You can run these tests with the following command:
> {code}
> SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 
> nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.cas_and_list_index_test 
> upgrade_tests/cql_tests.py:TestCQL.collection_and_regular_test 
> upgrade_tests/cql_tests.py:TestCQL.composite_index_collections_test
> {code}
> Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is 
> merged, these tests should also run with this upgrade path on normal 3.0 jobs.
> EDIT: the following test seems to fail with the same error:
> http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/41/testReport/upgrade_tests.cql_tests/TestCQL/null_support_test/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10509) Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958603#comment-14958603
 ] 

Stefania edited comment on CASSANDRA-10509 at 10/15/15 10:04 AM:
-

CI looks reasonable.

Regarding {{SELECT *}}, it's the driver token aware load balancing policy that 
prevents this from being reproducible. Because we always send the final page 
request to the node where the last key is local and where therefore 
{{ExcludingBounds}} does work. The problem can be reproduced with {{SELECT *}} 
provided we use an {{exclusive_cql_connection}}.



was (Author: stefania):
CI looks reasonable.

Regarding {{SELECT *}}, it's the driver token aware load balancing policy that 
prevents this from being reproducible. Because we always send the final page 
request to the node where the last key is local and where therefore 
{{ExclusiveBounds}} does work. The problem can be reproduced with {{SELECT *}} 
provided we use an {{exclusive_cql_connection}}.


> Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip
> -
>
> Key: CASSANDRA-10509
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10509
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Paulo Motta
>Assignee: Stefania
> Fix For: 2.2.x
>
>
> Test failing on 2.2 after fixing CASSANDRA-10507:
> http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10507-2.2-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10533) Allowing to have static columns attached to clustering columns

2015-10-15 Thread DOAN DuyHai (JIRA)
DOAN DuyHai created CASSANDRA-10533:
---

 Summary: Allowing to have static columns attached to clustering 
columns
 Key: CASSANDRA-10533
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10533
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: DOAN DuyHai


Now that [CASSANDRA-8099] is done, can we look again into the idea of having 
*static columns* respective to clustering column ?

I have a very relevant use-case for a customer. They want to store store an 
hierarchy of data for user expenses:

{code:sql}
CREATE TABLE user_expenses(
 user_id bigint,
 firstname text static,
 lastname text static,
 report_id uuid,
 report_title text,
 report_amount double,
 report_xxx 
 ...,
 line_id uuid,
 line_item text,
 line-amount double,
 ...
 PRIMARY KEY((user_id), report_id, line_id)
)
{code}

So basically we have 2 levels of nesting:
 1 user - N reports
 1 report - N lines

 With the above data model,  all report data are *duplicated* for each line so 
that any update on report_title or other report property will require the 
*anti-pattern read-before-write*:

 1. Select all line_id for this report_id
 2. For each line_id, perform the update

One possible trick is to use a static map but it's 
far from being elegant, not to say dirty.

So I believe that there is definitely a need for static columns that are 
*relative* to a clustering column. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7953) RangeTombstones not merging during compaction

2015-10-15 Thread J.P. Eiti Kimura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958871#comment-14958871
 ] 

J.P. Eiti Kimura commented on CASSANDRA-7953:
-

Hello guys whe are you planning to release this patch? 
we are facing the same problem as [~fhsgoncalves] described above 

> RangeTombstones not merging during compaction
> -
>
> Key: CASSANDRA-7953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7953
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1
>Reporter: Marcus Olsson
>Assignee: Branimir Lambov
>Priority: Minor
>  Labels: compaction, deletes, tombstone
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 0001-7953-v2.patch, CASSANDRA-7953-1.patch, 
> CASSANDRA-7953.patch
>
>
> When performing a compaction on two sstables that contain the same 
> RangeTombstone with different timestamps, the tombstones are not merged in 
> the new sstable.
> This has been tested using cassandra 2.1 with the following table:
> {code}
> CREATE TABLE test(
>   key text,
>   column text,
>   data text,
>   PRIMARY KEY(key, column)
> );
> {code}
> And then doing the following:
> {code}
> INSERT INTO test (key, column, data) VALUES ("1", "1", "1"); // If the 
> sstable only contains tombstones during compaction it seems that the sstable 
> either gets removed or isn't created (but that could probably be a separate 
> JIRA issue).
> INSERT INTO test (key, column, data) VALUES ("1", "2", "2"); // The inserts 
> are not actually needed, since the deletes create tombstones either way.
> DELETE FROM test WHERE key="1" AND column="2";
> nodetool flush
> INSERT INTO test (key, column, data) VALUES ("1", "2", "2");
> DELETE FROM test WHERE key="1" AND column="2";
> nodetool flush
> nodetool compact
> {code}
> When checking with the SSTableExport tool two tombstones exists in the 
> compacted sstable. This can be repeated, resulting in more and more 
> tombstones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9484) Inconsistent select count

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958663#comment-14958663
 ] 

Stefania commented on CASSANDRA-9484:
-

[~philipthompson] can you see if you can still reproduce it with the 2.2 patch 
of 10509? 

> Inconsistent select count
> -
>
> Key: CASSANDRA-9484
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9484
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Benjamin Lerer
> Fix For: 3.x, 2.2.x
>
>
> I am running the dtest simultaneous_bootstrap_test located at 
> https://github.com/riptano/cassandra-dtest/compare/cassandra-7069 and finding 
> that at the final data verification step, the query {{SELECT COUNT (*) FROM 
> keyspace1.standard1}} alternated between correctly returning 500,000 rows and 
> returning 500,001 rows. Running cleanup or compaction does not affect the 
> behavior. I have verified with sstable2json that there are exactly 500k rows 
> on disk between the two nodes in the cluster.
> I am reproducing this on trunk currently. It is not happening on 2.1-head.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7953) RangeTombstones not merging during compaction

2015-10-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958849#comment-14958849
 ] 

Fernando Gonçalves commented on CASSANDRA-7953:
---

We have experienced the same behaviour described on ticket 
[https://issues.apache.org/jira/browse/CASSANDRA-10505]: "Once this happens in 
multiple sstables, compacting them causes the duplication to grow. The more 
this occurs, the worse the problem gets."

Basically, when we run the repair in the node, the compaction process starts 
and never ends, many pending tasks, and the number of sstables of one table 
grows exponentially (reaching 34k sstables). We use one column of the type 
map, LeveledStrategyCompaction, and many updates to this column. 
The memory consumption grows a lot too. We decide to stop the repair process 
and kill the node, because the latency grown a lot too and impact the whole 
cluster. But we need to run repair again, because we kill one node and put a 
new node on the cluster, and another node was impacted from this bug again, and 
we need to repeated the process: kill the repair process, kill the node, start 
a new node.

So we just created another table without using the collection map, but using 
blob type instead, and migrate all the data for it - we are fine now: the 
repair process and compaction finished successfully without big impact on 
performance.

Please, give attention for this ticket, I think that its a major issue!

> RangeTombstones not merging during compaction
> -
>
> Key: CASSANDRA-7953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7953
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1
>Reporter: Marcus Olsson
>Assignee: Branimir Lambov
>Priority: Minor
>  Labels: compaction, deletes, tombstone
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 0001-7953-v2.patch, CASSANDRA-7953-1.patch, 
> CASSANDRA-7953.patch
>
>
> When performing a compaction on two sstables that contain the same 
> RangeTombstone with different timestamps, the tombstones are not merged in 
> the new sstable.
> This has been tested using cassandra 2.1 with the following table:
> {code}
> CREATE TABLE test(
>   key text,
>   column text,
>   data text,
>   PRIMARY KEY(key, column)
> );
> {code}
> And then doing the following:
> {code}
> INSERT INTO test (key, column, data) VALUES ("1", "1", "1"); // If the 
> sstable only contains tombstones during compaction it seems that the sstable 
> either gets removed or isn't created (but that could probably be a separate 
> JIRA issue).
> INSERT INTO test (key, column, data) VALUES ("1", "2", "2"); // The inserts 
> are not actually needed, since the deletes create tombstones either way.
> DELETE FROM test WHERE key="1" AND column="2";
> nodetool flush
> INSERT INTO test (key, column, data) VALUES ("1", "2", "2");
> DELETE FROM test WHERE key="1" AND column="2";
> nodetool flush
> nodetool compact
> {code}
> When checking with the SSTableExport tool two tombstones exists in the 
> compacted sstable. This can be repeated, resulting in more and more 
> tombstones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2015-10-15 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov reassigned CASSANDRA-10520:
---

Assignee: Branimir Lambov

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 3.0.x
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10509) Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958464#comment-14958464
 ] 

Stefania commented on CASSANDRA-10509:
--

It seems we get an extra row in the count if we cross the page boundary. For 
example we can reproduce this problem about 50% of the times with as little as 
1000 entries if we set the page size to 1000 using 
{{self.session.default_fetch_size = 1000}}. Using the current dtest value of 
100K entries is slower but also more reliable in reproducing this problem.

{{AbstractQueryPager.fetchPage}} retrieves an extra live row the last time it 
is called. It has a mechanism to exclude the first row if it is the same as the 
last row in the previous page by calling {{containsPreviousLast}}, which is 
implemented by the sub-classes. {{RangeNamesQueryPager.containsPreviousLast}} 
however always returns false because the corresponding {{queryNextPage}} uses 
{{ExcludingBounds}} to set the range in the read command. Therefore, the last 
queried key should never be included. However, as far as I can see 
{{ExcludingBounds}} is serialized as {{Bounds}}, which does include the 
endpoints. So, first we query (MIN, MIN) to get the entire range, then the next 
page will query (LAST_KEY,MIN), where LAST_KEY is the key of the last partition 
retrieved by the previous page, but if the last key is not local we are 
actually querying \[LAST_KEY, MIN\] and we retrieve the partition for LAST_KEY 
again.

It is not clear why it could not be reproduced with {{SELECT *}}.

Tentative [patch|https://github.com/stef1927/cassandra/tree/10509-2.2] attached.

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10509-2.2-dtest
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10509-2.2-testall

> Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip
> -
>
> Key: CASSANDRA-10509
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10509
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Paulo Motta
>Assignee: Stefania
> Fix For: 2.2.x
>
>
> Test failing on 2.2 after fixing CASSANDRA-10507:
> http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10507-2.2-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/2] cassandra git commit: Reduce contention getting instances of CompositeType

2015-10-15 Thread slebresne
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 5f5e9602d -> b42a0cfe8


Reduce contention getting instances of CompositeType

patch by schlosna; reviewed by slebresne for CASSANDRA-10433


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bee48ebe
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bee48ebe
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bee48ebe

Branch: refs/heads/cassandra-3.0
Commit: bee48ebe206bd02c231266858e9ae137a928689d
Parents: 7875326
Author: Sylvain Lebresne 
Authored: Thu Oct 15 09:50:40 2015 +0200
Committer: Sylvain Lebresne 
Committed: Thu Oct 15 09:50:40 2015 +0200

--
 CHANGES.txt |  1 +
 .../cassandra/db/marshal/CompositeType.java | 20 
 2 files changed, 13 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index c02e2fa..9a0baaa 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.4
+ * Reduce contention getting instances of CompositeType (CASSANDRA-10433)
 Merged from 2.1:
  * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
  * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/src/java/org/apache/cassandra/db/marshal/CompositeType.java
--
diff --git a/src/java/org/apache/cassandra/db/marshal/CompositeType.java 
b/src/java/org/apache/cassandra/db/marshal/CompositeType.java
index 0218411..9892118 100644
--- a/src/java/org/apache/cassandra/db/marshal/CompositeType.java
+++ b/src/java/org/apache/cassandra/db/marshal/CompositeType.java
@@ -19,18 +19,18 @@ package org.apache.cassandra.db.marshal;
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.util.Arrays;
 import java.util.ArrayList;
-import java.util.HashMap;
+import java.util.Arrays;
 import java.util.List;
-import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 
 import com.google.common.collect.ImmutableList;
 
-import org.apache.cassandra.exceptions.ConfigurationException;
-import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.DataOutputBufferFixed;
 import org.apache.cassandra.serializers.MarshalException;
@@ -68,7 +68,7 @@ public class CompositeType extends AbstractCompositeType
 public final List types;
 
 // interning instances
-private static final Map, CompositeType> instances = 
new HashMap, CompositeType>();
+private static final ConcurrentMap, CompositeType> 
instances = new ConcurrentHashMap, CompositeType>();
 
 public static CompositeType getInstance(TypeParser parser) throws 
ConfigurationException, SyntaxException
 {
@@ -98,7 +98,7 @@ public class CompositeType extends AbstractCompositeType
 return true;
 }
 
-public static synchronized CompositeType getInstance(List 
types)
+public static CompositeType getInstance(List types)
 {
 assert types != null && !types.isEmpty();
 
@@ -106,7 +106,11 @@ public class CompositeType extends AbstractCompositeType
 if (ct == null)
 {
 ct = new CompositeType(types);
-instances.put(types, ct);
+CompositeType previous = instances.putIfAbsent(types, ct);
+if (previous != null)
+{
+ct = previous;
+}
 }
 return ct;
 }



[3/3] cassandra git commit: Merge branch 'cassandra-3.0' into trunk

2015-10-15 Thread slebresne
Merge branch 'cassandra-3.0' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/29576a44
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/29576a44
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/29576a44

Branch: refs/heads/trunk
Commit: 29576a44d073b6aa07655aa9f9fdbbaac5a5322a
Parents: d87aab9 b42a0cf
Author: Sylvain Lebresne 
Authored: Thu Oct 15 09:54:39 2015 +0200
Committer: Sylvain Lebresne 
Committed: Thu Oct 15 09:54:39 2015 +0200

--
 CHANGES.txt |  1 +
 .../cassandra/db/marshal/CompositeType.java | 20 
 2 files changed, 13 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/29576a44/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/29576a44/src/java/org/apache/cassandra/db/marshal/CompositeType.java
--



[2/3] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0

2015-10-15 Thread slebresne
Merge branch 'cassandra-2.2' into cassandra-3.0

Conflicts:
CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b42a0cfe
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b42a0cfe
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b42a0cfe

Branch: refs/heads/trunk
Commit: b42a0cfe87d175b9d5a053bcb91a9fc70a0c241e
Parents: 5f5e960 bee48eb
Author: Sylvain Lebresne 
Authored: Thu Oct 15 09:53:48 2015 +0200
Committer: Sylvain Lebresne 
Committed: Thu Oct 15 09:53:48 2015 +0200

--
 CHANGES.txt |  1 +
 .../cassandra/db/marshal/CompositeType.java | 20 
 2 files changed, 13 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/CHANGES.txt
--
diff --cc CHANGES.txt
index 66e34b6,9a0baaa..fa74539
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,22 -1,5 +1,23 @@@
 -2.2.4
 +3.0-rc2
 + * Revert CASSANDRA-7486, make CMS default GC, move GC config to
 +   conf/jvm.options (CASSANDRA-10403)
 + * Fix TeeingAppender causing some logs to be truncated/empty 
(CASSANDRA-10447)
 + * Allow EACH_QUORUM for reads (CASSANDRA-9602)
 + * Fix potential ClassCastException while upgrading (CASSANDRA-10468)
 + * Fix NPE in MVs on update (CASSANDRA-10503)
 + * Only include modified cell data in indexing deltas (CASSANDRA-10438)
 + * Do not load keyspace when creating sstable writer (CASSANDRA-10443)
 + * If node is not yet gossiping write all MV updates to batchlog only 
(CASSANDRA-10413)
 + * Re-populate token metadata after commit log recovery (CASSANDRA-10293)
 + * Provide additional metrics for materialized views (CASSANDRA-10323)
 + * Flush system schema tables after local schema changes (CASSANDRA-10429)
 +Merged from 2.2:
+  * Reduce contention getting instances of CompositeType (CASSANDRA-10433)
 + * Fix the regression when using LIMIT with aggregates (CASSANDRA-10487)
 + * Avoid NoClassDefFoundError during DataDescriptor initialization on windows 
(CASSANDRA-10412)
 + * Preserve case of quoted Role & User names (CASSANDRA-10394)
 + * cqlsh pg-style-strings broken (CASSANDRA-10484)
 + * cqlsh prompt includes name of keyspace after failed `use` statement 
(CASSANDRA-10369)
  Merged from 2.1:
   * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
   * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/src/java/org/apache/cassandra/db/marshal/CompositeType.java
--



[jira] [Commented] (CASSANDRA-10529) Channel.size() is costly, mutually exclusive, and on the critical path

2015-10-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958523#comment-14958523
 ] 

Benedict commented on CASSANDRA-10529:
--

That is very strange, but given standard is as high as old mmap it's probably 
fine. We need to fix the variability in cstar. For future reference, it's worth 
at least disabling vnodes to ensure we have an identical cluster until cstar 
supports sets of predefined token rings.

I did not examine this exhaustively, but I saw a meaningful uptick (>20%) when 
profiling a single node cluster after making this change. However that may have 
been down to interactions with the specific profiler I was using at the time 
(which did require safe points), which may have worsened the problem of mutual 
exclusivity. Either way, it's worth making.

> Channel.size() is costly, mutually exclusive, and on the critical path
> --
>
> Key: CASSANDRA-10529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Stefania
> Fix For: 3.0.0 rc2
>
>
> [~stefania_alborghetti] mentioned this already on another ticket, but I have 
> lost track of exactly where. While benchmarking it became apparent this was a 
> noticeable bottleneck for small in-memory workloads with few files, 
> especially with RF=1. We should probably fix this soon, since it is trivial 
> to do so, and the call is only to impose an assertion that our requested 
> length is less than the file size. It isn't possible to safely memoize a 
> value anywhere we can guarantee to be able to safely refer to it without some 
> refactoring, so I suggest simply removing the assertion for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10531) ColumnFilter should have unit tests

2015-10-15 Thread Sylvain Lebresne (JIRA)
Sylvain Lebresne created CASSANDRA-10531:


 Summary: ColumnFilter should have unit tests
 Key: CASSANDRA-10531
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10531
 Project: Cassandra
  Issue Type: Test
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 3.x


{{ColumnFilter}} should be decently tested indirectly but there is no reason to 
cover it more directly with simple unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10365) Consider storing types by their CQL names in schema tables instead of fully-qualified internal class names

2015-10-15 Thread Olivier Michallat (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958593#comment-14958593
 ] 

Olivier Michallat commented on CASSANDRA-10365:
---

The Java driver already has this kind of indirection, for example for state and 
final functions in an aggregate's metadata. It will make things more brittle in 
the face of potential event propagation bugs, but if it's an assumed choice I'm 
fine with it.

> Consider storing types by their CQL names in schema tables instead of 
> fully-qualified internal class names
> --
>
> Key: CASSANDRA-10365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10365
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>  Labels: client-impacting
> Fix For: 3.0.0 rc2
>
>
> Consider saving CQL types names for column, UDF/UDA arguments and return 
> types, and UDT components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/3] cassandra git commit: Reduce contention getting instances of CompositeType

2015-10-15 Thread slebresne
Repository: cassandra
Updated Branches:
  refs/heads/trunk d87aab987 -> 29576a44d


Reduce contention getting instances of CompositeType

patch by schlosna; reviewed by slebresne for CASSANDRA-10433


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bee48ebe
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bee48ebe
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bee48ebe

Branch: refs/heads/trunk
Commit: bee48ebe206bd02c231266858e9ae137a928689d
Parents: 7875326
Author: Sylvain Lebresne 
Authored: Thu Oct 15 09:50:40 2015 +0200
Committer: Sylvain Lebresne 
Committed: Thu Oct 15 09:50:40 2015 +0200

--
 CHANGES.txt |  1 +
 .../cassandra/db/marshal/CompositeType.java | 20 
 2 files changed, 13 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index c02e2fa..9a0baaa 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.4
+ * Reduce contention getting instances of CompositeType (CASSANDRA-10433)
 Merged from 2.1:
  * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
  * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/src/java/org/apache/cassandra/db/marshal/CompositeType.java
--
diff --git a/src/java/org/apache/cassandra/db/marshal/CompositeType.java 
b/src/java/org/apache/cassandra/db/marshal/CompositeType.java
index 0218411..9892118 100644
--- a/src/java/org/apache/cassandra/db/marshal/CompositeType.java
+++ b/src/java/org/apache/cassandra/db/marshal/CompositeType.java
@@ -19,18 +19,18 @@ package org.apache.cassandra.db.marshal;
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.util.Arrays;
 import java.util.ArrayList;
-import java.util.HashMap;
+import java.util.Arrays;
 import java.util.List;
-import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 
 import com.google.common.collect.ImmutableList;
 
-import org.apache.cassandra.exceptions.ConfigurationException;
-import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.DataOutputBufferFixed;
 import org.apache.cassandra.serializers.MarshalException;
@@ -68,7 +68,7 @@ public class CompositeType extends AbstractCompositeType
 public final List types;
 
 // interning instances
-private static final Map, CompositeType> instances = 
new HashMap, CompositeType>();
+private static final ConcurrentMap, CompositeType> 
instances = new ConcurrentHashMap, CompositeType>();
 
 public static CompositeType getInstance(TypeParser parser) throws 
ConfigurationException, SyntaxException
 {
@@ -98,7 +98,7 @@ public class CompositeType extends AbstractCompositeType
 return true;
 }
 
-public static synchronized CompositeType getInstance(List 
types)
+public static CompositeType getInstance(List types)
 {
 assert types != null && !types.isEmpty();
 
@@ -106,7 +106,11 @@ public class CompositeType extends AbstractCompositeType
 if (ct == null)
 {
 ct = new CompositeType(types);
-instances.put(types, ct);
+CompositeType previous = instances.putIfAbsent(types, ct);
+if (previous != null)
+{
+ct = previous;
+}
 }
 return ct;
 }



[2/2] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0

2015-10-15 Thread slebresne
Merge branch 'cassandra-2.2' into cassandra-3.0

Conflicts:
CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b42a0cfe
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b42a0cfe
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b42a0cfe

Branch: refs/heads/cassandra-3.0
Commit: b42a0cfe87d175b9d5a053bcb91a9fc70a0c241e
Parents: 5f5e960 bee48eb
Author: Sylvain Lebresne 
Authored: Thu Oct 15 09:53:48 2015 +0200
Committer: Sylvain Lebresne 
Committed: Thu Oct 15 09:53:48 2015 +0200

--
 CHANGES.txt |  1 +
 .../cassandra/db/marshal/CompositeType.java | 20 
 2 files changed, 13 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/CHANGES.txt
--
diff --cc CHANGES.txt
index 66e34b6,9a0baaa..fa74539
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,22 -1,5 +1,23 @@@
 -2.2.4
 +3.0-rc2
 + * Revert CASSANDRA-7486, make CMS default GC, move GC config to
 +   conf/jvm.options (CASSANDRA-10403)
 + * Fix TeeingAppender causing some logs to be truncated/empty 
(CASSANDRA-10447)
 + * Allow EACH_QUORUM for reads (CASSANDRA-9602)
 + * Fix potential ClassCastException while upgrading (CASSANDRA-10468)
 + * Fix NPE in MVs on update (CASSANDRA-10503)
 + * Only include modified cell data in indexing deltas (CASSANDRA-10438)
 + * Do not load keyspace when creating sstable writer (CASSANDRA-10443)
 + * If node is not yet gossiping write all MV updates to batchlog only 
(CASSANDRA-10413)
 + * Re-populate token metadata after commit log recovery (CASSANDRA-10293)
 + * Provide additional metrics for materialized views (CASSANDRA-10323)
 + * Flush system schema tables after local schema changes (CASSANDRA-10429)
 +Merged from 2.2:
+  * Reduce contention getting instances of CompositeType (CASSANDRA-10433)
 + * Fix the regression when using LIMIT with aggregates (CASSANDRA-10487)
 + * Avoid NoClassDefFoundError during DataDescriptor initialization on windows 
(CASSANDRA-10412)
 + * Preserve case of quoted Role & User names (CASSANDRA-10394)
 + * cqlsh pg-style-strings broken (CASSANDRA-10484)
 + * cqlsh prompt includes name of keyspace after failed `use` statement 
(CASSANDRA-10369)
  Merged from 2.1:
   * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
   * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/src/java/org/apache/cassandra/db/marshal/CompositeType.java
--



[jira] [Commented] (CASSANDRA-9484) Inconsistent select count

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958502#comment-14958502
 ] 

Stefania commented on CASSANDRA-9484:
-

This looks similar to CASSANDRA-10509.

> Inconsistent select count
> -
>
> Key: CASSANDRA-9484
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9484
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Benjamin Lerer
> Fix For: 3.x, 2.2.x
>
>
> I am running the dtest simultaneous_bootstrap_test located at 
> https://github.com/riptano/cassandra-dtest/compare/cassandra-7069 and finding 
> that at the final data verification step, the query {{SELECT COUNT (*) FROM 
> keyspace1.standard1}} alternated between correctly returning 500,000 rows and 
> returning 500,001 rows. Running cleanup or compaction does not affect the 
> behavior. I have verified with sstable2json that there are exactly 500k rows 
> on disk between the two nodes in the cluster.
> I am reproducing this on trunk currently. It is not happening on 2.1-head.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10471) fix flapping empty_in_test dtest

2015-10-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958561#comment-14958561
 ] 

Sylvain Lebresne commented on CASSANDRA-10471:
--

bq. As a reviewer I didn't figure out how to verify that this statement is true 
or why. I mucked about with StatementRestrictions and family and found where IN 
restrictions are expressed, but it's all pretty big in scope. Do you have any 
pointers?

The query that triggers the problem is
{noformat}
SELECT v FROM test_compact WHERE k1 = 0 AND k2 IN ()
{noformat}
That query explicitly requests no row (it's a query by name with an empty list 
of names). One would expect such valid but somewhat uninteresting query to be 
dealt with at the CQL layer (by returning nothing), yielding no internal query, 
and that is what happens on 3.0 (see 
{{SelectStatement.makeClusteringIndexFilter}}, in the case of a query by names, 
the code checks if {{getRequestedRows}} returns an empty list and return 
{{null}} if so which is code for "we know the query return nothing"). That's 
the reason why I initially made {{ColumnFilter.Builder}} reject the case where 
nothing was selected, but that was misguided especially since the code is 
perfectly fine dealing with an empty {{ColumnFilter}}.

And it happens that this optimization isn't done in 2.2. Or rather, it used to 
be done but was "broken" by CASSANDRA-7981. If you look at the equivalent code 
on 2.2, in {{SelectStatement.makeFilter()}}, it assumes a {{IN ()}} would 
result in {{getRequestedColumns}} returning {{null}}, but it's easy to see it's 
not the case anymore (and it's equally easy to see that CASSANDRA-7981 is the 
culprit for that). So on the 2.2 node, the query simply generates an empty list 
in {{SelectStatement.getRequestedColumns()}} (because 
{{SingleColumnRestriction.InWithValues.getValues()}} does nothing special if 
its list of terms is empty, is just returns an empty list) and queries with 
that. That's where, during an upgrade, the 3.0 node receives a query by name 
with an empty list of names, and that triggers my misguided assertion.

I'll note that I'm fine "fixing" that broken optimization in 2.2 and I've 
pushed a very trivial fix to do so 
[here|https://github.com/pcmanus/cassandra/commits/10471-2.2], but I don't 
really care whether we do it or not since 1) it's pretty inconsequential for 
2.2 users and 2) even if we do commit that 2.2 patch, we still need to fix 3.0 
for users who might upgrade from a 2.2 version that don't have that fix.

bq. As far as I can tell null isn't used as a signal anywhere

Well it does. It signals we don't want to skip any value when {{isFetchAll}} is 
set (paraphrasing the comment on the declaration of {{selection}} here). If 
{{isFetchAll == true}}, {{selection}} is the subset of columns for which we 
want to include the values (the ones whose values are not skipped). As the case 
where we don't skip any value is common, we use {{null}} to signal it. The 
equivalent if we were to not use {{null}} would be to have {{selection}} be all 
the columns, but that would mean a slightly less efficient {{canSkipValue}} in 
that common case.
I'll note that I feel all this is reasonably well explained in the class 
javadoc and the comments around the class field declarations, but I'm open to 
improvement suggestions.

bq. canSkipValue might have depended on it before this change

Why only before this change? {{selection == null}} only matter in 
{{canSkipValue}} if {{isFetchAll == true}}, and 
{{ColumnFilter.Builder.build()}} explicitly only force 
{{PartitionColumns.NONE}} if {{!isFetchAll}}.

bq. There is no unit test for ColumnFilter

There isn't and I've created CASSANDRA-10531 to change that.



> fix flapping empty_in_test dtest
> 
>
> Key: CASSANDRA-10471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10471
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Sylvain Lebresne
> Fix For: 3.0.0 rc2
>
>
> {{upgrade_tests/cql_tests.py:TestCQL.empty_in_test}} fails about half the 
> time on the upgrade path from 2.2 to 3.0:
> http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/42/testReport/upgrade_tests.cql_tests/TestCQL/empty_in_test/history/
> Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is 
> merged, these tests should also run with this upgrade path on normal 3.0 
> jobs. Until then, you can run it with the following command:
> {code}
> SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 
> nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.empty_in_test
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10461) Fix sstableverify_test dtest

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958534#comment-14958534
 ] 

Stefania commented on CASSANDRA-10461:
--

It seems the second pull request fixing the line separator is ineffective. We 
need to see what the tool is outputting on Jenkins, pull request 
[here|https://github.com/riptano/cassandra-dtest/pull/609].

> Fix sstableverify_test dtest
> 
>
> Key: CASSANDRA-10461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10461
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Stefania
>  Labels: test
> Fix For: 3.0.0 rc2
>
>
> The dtest for sstableverify is failing:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/
> It fails in the same way when I run it on OpenStack, so I don't think it's 
> just a CassCI problem.
> [~slebresne] Looks like you made changes to this test recently:
> https://github.com/riptano/cassandra-dtest/commit/51ab085f21e01cc8e5ad88a277cb4a43abd3f880
> Could you have a look at the failure? I'm assigning you for triage, but feel 
> free to reassign.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10433) Reduce contention in CompositeType instance interning

2015-10-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-10433:
-
Assignee: David Schlosnagle

> Reduce contention in CompositeType instance interning
> -
>
> Key: CASSANDRA-10433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10433
> Project: Cassandra
>  Issue Type: Improvement
> Environment: Cassandra 2.2.1 running on 6 AWS c3.4xlarge nodes, 
> CentOS 6.6
>Reporter: David Schlosnagle
>Assignee: David Schlosnagle
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 
> 0001-Avoid-contention-in-CompositeType-instance-interning.patch
>
>
> While running some workload tests on Cassandra 2.2.1 and profiling with 
> flight recorder in a test environment, we have noticed significant contention 
> on the static synchronized 
> org.apache.cassandra.db.marshal.CompositeType.getInstance(List) method.
> We are seeing threads blocked for 22.828 seconds from a 60 second snapshot 
> while under a mix of reads and writes from a Thrift based client.
> I would propose to reduce contention in 
> org.apache.cassandra.db.marshal.CompositeType.getInstance(List) by using a 
> ConcurrentHashMap for the instances cache.
> {code}
> Contention Back Trace
> org.apache.cassandra.db.marshal.CompositeType.getInstance(List)
>   
> org.apache.cassandra.db.composites.AbstractCompoundCellNameType.asAbstractType()
> org.apache.cassandra.db.SuperColumns.getComparatorFor(CFMetaData, boolean)
>   org.apache.cassandra.db.SuperColumns.getComparatorFor(CFMetaData, 
> ByteBuffer)
> 
> org.apache.cassandra.thrift.ThriftValidation.validateColumnNames(CFMetaData, 
> ByteBuffer, Iterable)
>   
> org.apache.cassandra.thrift.ThriftValidation.validateColumnPath(CFMetaData, 
> ColumnPath)
> 
> org.apache.cassandra.thrift.ThriftValidation.validateColumnOrSuperColumn(CFMetaData,
>  ByteBuffer, ColumnOrSuperColumn)
>   
> org.apache.cassandra.thrift.ThriftValidation.validateMutation(CFMetaData, 
> ByteBuffer, Mutation)
> 
> org.apache.cassandra.thrift.CassandraServer.createMutationList(ConsistencyLevel,
>  Map, boolean)
>   
> org.apache.cassandra.thrift.CassandraServer.batch_mutate(Map, 
> ConsistencyLevel)
> 
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra$Iface,
>  Cassandra$batch_mutate_args)
> 
> org.apache.cassandra.thrift.ThriftValidation.validateRange(CFMetaData, 
> ColumnParent, SliceRange)
>   
> org.apache.cassandra.thrift.ThriftValidation.validatePredicate(CFMetaData, 
> ColumnParent, SlicePredicate)
> 
> org.apache.cassandra.thrift.CassandraServer.get_range_slices(ColumnParent, 
> SlicePredicate, KeyRange, ConsistencyLevel)
>   
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra$Iface,
>  Cassandra$get_range_slices_args)
> 
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Object,
>  TBase)
>   org.apache.thrift.ProcessFunction.process(int, TProtocol, 
> TProtocol, Object)
> org.apache.thrift.TBaseProcessor.process(TProtocol, 
> TProtocol)
>   
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run()
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
>   java.util.concurrent.ThreadPoolExecutor$Worker.run()
> 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(String, 
> List, ColumnParent, long, SlicePredicate, ConsistencyLevel, ClientState)
>   
> org.apache.cassandra.thrift.CassandraServer.multiget_slice(List, 
> ColumnParent, SlicePredicate, ConsistencyLevel)
> 
> org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Cassandra$Iface,
>  Cassandra$multiget_slice_args)
>   
> org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Object,
>  TBase)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: Reduce contention getting instances of CompositeType

2015-10-15 Thread slebresne
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.2 78753263e -> bee48ebe2


Reduce contention getting instances of CompositeType

patch by schlosna; reviewed by slebresne for CASSANDRA-10433


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bee48ebe
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bee48ebe
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bee48ebe

Branch: refs/heads/cassandra-2.2
Commit: bee48ebe206bd02c231266858e9ae137a928689d
Parents: 7875326
Author: Sylvain Lebresne 
Authored: Thu Oct 15 09:50:40 2015 +0200
Committer: Sylvain Lebresne 
Committed: Thu Oct 15 09:50:40 2015 +0200

--
 CHANGES.txt |  1 +
 .../cassandra/db/marshal/CompositeType.java | 20 
 2 files changed, 13 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index c02e2fa..9a0baaa 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.4
+ * Reduce contention getting instances of CompositeType (CASSANDRA-10433)
 Merged from 2.1:
  * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
  * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/src/java/org/apache/cassandra/db/marshal/CompositeType.java
--
diff --git a/src/java/org/apache/cassandra/db/marshal/CompositeType.java 
b/src/java/org/apache/cassandra/db/marshal/CompositeType.java
index 0218411..9892118 100644
--- a/src/java/org/apache/cassandra/db/marshal/CompositeType.java
+++ b/src/java/org/apache/cassandra/db/marshal/CompositeType.java
@@ -19,18 +19,18 @@ package org.apache.cassandra.db.marshal;
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.util.Arrays;
 import java.util.ArrayList;
-import java.util.HashMap;
+import java.util.Arrays;
 import java.util.List;
-import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 
 import com.google.common.collect.ImmutableList;
 
-import org.apache.cassandra.exceptions.ConfigurationException;
-import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.DataOutputBufferFixed;
 import org.apache.cassandra.serializers.MarshalException;
@@ -68,7 +68,7 @@ public class CompositeType extends AbstractCompositeType
 public final List types;
 
 // interning instances
-private static final Map, CompositeType> instances = 
new HashMap, CompositeType>();
+private static final ConcurrentMap, CompositeType> 
instances = new ConcurrentHashMap, CompositeType>();
 
 public static CompositeType getInstance(TypeParser parser) throws 
ConfigurationException, SyntaxException
 {
@@ -98,7 +98,7 @@ public class CompositeType extends AbstractCompositeType
 return true;
 }
 
-public static synchronized CompositeType getInstance(List 
types)
+public static CompositeType getInstance(List types)
 {
 assert types != null && !types.isEmpty();
 
@@ -106,7 +106,11 @@ public class CompositeType extends AbstractCompositeType
 if (ct == null)
 {
 ct = new CompositeType(types);
-instances.put(types, ct);
+CompositeType previous = instances.putIfAbsent(types, ct);
+if (previous != null)
+{
+ct = previous;
+}
 }
 return ct;
 }



[jira] [Commented] (CASSANDRA-10519) RepairException: [repair #... on .../..., (...,...]] Validation failed in /w.x.y.z

2015-10-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958499#comment-14958499
 ] 

Gábor Auth commented on CASSANDRA-10519:


At the moment it works right.

I've upgraded from 2.1.5 to 2.2.2 a few days ago and after the full upgrade ran 
the repair on each node (-full -pr on each node one-by-one not simultaneously). 
I've started a daily repair on my test cluster, if it comes up again, I will 
comment this issue.

> RepairException: [repair #... on .../..., (...,...]] Validation failed in 
> /w.x.y.z
> --
>
> Key: CASSANDRA-10519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10519
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 7, JDK 8u60, Cassandra 2.2.2 (upgraded from 2.1.5)
>Reporter: Gábor Auth
>
> Sometimes the repair fails:
> {code}
> ERROR [Repair#3:1] 2015-10-14 06:22:56,490 CassandraDaemon.java:185 - 
> Exception in thread Thread[Repair#3:1,5,RMI Runtime]
> com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, 
> (2414492737393085601,27880539413409
> 54029]] Validation failed in /w.y.x.z
> at 
> com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387)
>  ~[guava-16.0.jar:na]
> at 
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) 
> ~[guava-16.0.jar:na]
> at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:169) 
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
> Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
> #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, 
> (2414492737393085601,2788053941340954029]] Validation failed in /w.y.x.z
> at 
> org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> ... 3 common frames omitted
> {code}
> And here is the w.y.x.z side:
> {code}
> ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 
> CompactionManager.java:1053 - Cannot start multiple repair sessions over the 
> same sstables
> ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 Validator.java:246 - 
> Failed creating a merkle tree for [repair 
> #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, 
> (2414492737393085601,2788053941340954029]], /a.b.c.d (see log for details)
> ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,488 CassandraDaemon.java:185 
> - Exception in thread Thread[ValidationExecutor:7,1,main]
> java.lang.RuntimeException: Cannot start multiple repair sessions over the 
> same sstables
> at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1054)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:86)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:652)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_60]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> ...
> ERROR [Reference-Reaper:1] 2015-10-14 06:23:21,439 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@74fc054a) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1949471967:/home/cassandra/dsc-cassandra-2.2.2/bin/../data/data/keyspace/table-b15521b062e4bbedcdee5e027297/la-1195-big
>  was not released before the reference was garbage collected
> 

cassandra git commit: Skip redundant tombstones on compaction.

2015-10-15 Thread marcuse
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 02f88e38e -> a61fc01f4


Skip redundant tombstones on compaction.

Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f

Branch: refs/heads/cassandra-2.1
Commit: a61fc01f418426847e3aad133127da3615813236
Parents: 02f88e3
Author: Branimir Lambov 
Authored: Wed Oct 7 14:46:24 2015 +0300
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:28:42 2015 +0200

--
 CHANGES.txt |   1 +
 .../org/apache/cassandra/db/ColumnIndex.java|  32 +++--
 .../org/apache/cassandra/db/RangeTombstone.java | 135 ++-
 .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 +
 4 files changed, 218 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index b16acb5..68b44ed 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.12
+ * Merge range tombstones during compaction (CASSANDRA-7953)
  * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
  * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)
  * Don't allow startup if the node's rack has changed (CASSANDRA-10242)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java 
b/src/java/org/apache/cassandra/db/ColumnIndex.java
index d9d6a9c..0ea5c87 100644
--- a/src/java/org/apache/cassandra/db/ColumnIndex.java
+++ b/src/java/org/apache/cassandra/db/ColumnIndex.java
@@ -180,14 +180,24 @@ public class ColumnIndex
 firstColumn = column;
 startPosition = endPosition;
 // TODO: have that use the firstColumn as min + make sure we 
optimize that on read
-endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, 
output, atomSerializer);
+endPosition += 
tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer);
 blockSize = 0; // We don't count repeated tombstone marker in 
the block size, to avoid a situation
// where we wouldn't make any progress because 
a block is filled by said marker
+
+maybeWriteRowHeader();
 }
 
-long size = atomSerializer.serializedSizeForSSTable(column);
-endPosition += size;
-blockSize += size;
+if (tombstoneTracker.update(column, false))
+{
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+size += atomSerializer.serializedSizeForSSTable(column);
+endPosition += size;
+blockSize += size;
+
+atomSerializer.serializeForSSTable(column, output);
+}
+
+lastColumn = column;
 
 // if we hit the column index size that we have to index after, go 
ahead and index it.
 if (blockSize >= DatabaseDescriptor.getColumnIndexSize())
@@ -197,14 +207,6 @@ public class ColumnIndex
 firstColumn = null;
 lastBlockClosing = column;
 }
-
-maybeWriteRowHeader();
-atomSerializer.serializeForSSTable(column, output);
-
-// TODO: Should deal with removing unneeded tombstones
-tombstoneTracker.update(column, false);
-
-lastColumn = column;
 }
 
 private void maybeWriteRowHeader() throws IOException
@@ -216,12 +218,16 @@ public class ColumnIndex
 }
 }
 
-public ColumnIndex build()
+public ColumnIndex build() throws IOException
 {
 // all columns were GC'd after all
 if (lastColumn == null)
 return ColumnIndex.EMPTY;
 
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+endPosition += size;
+blockSize += size;
+
 // the last column may have fallen on an index boundary already.  
if not, index it explicitly.
 if (result.columnsIndex.isEmpty() || lastBlockClosing != 
lastColumn)
 {

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java

[1/2] cassandra git commit: Skip redundant tombstones on compaction.

2015-10-15 Thread marcuse
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.2 bee48ebe2 -> 3b7ccdfb1


Skip redundant tombstones on compaction.

Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f

Branch: refs/heads/cassandra-2.2
Commit: a61fc01f418426847e3aad133127da3615813236
Parents: 02f88e3
Author: Branimir Lambov 
Authored: Wed Oct 7 14:46:24 2015 +0300
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:28:42 2015 +0200

--
 CHANGES.txt |   1 +
 .../org/apache/cassandra/db/ColumnIndex.java|  32 +++--
 .../org/apache/cassandra/db/RangeTombstone.java | 135 ++-
 .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 +
 4 files changed, 218 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index b16acb5..68b44ed 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.12
+ * Merge range tombstones during compaction (CASSANDRA-7953)
  * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
  * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)
  * Don't allow startup if the node's rack has changed (CASSANDRA-10242)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java 
b/src/java/org/apache/cassandra/db/ColumnIndex.java
index d9d6a9c..0ea5c87 100644
--- a/src/java/org/apache/cassandra/db/ColumnIndex.java
+++ b/src/java/org/apache/cassandra/db/ColumnIndex.java
@@ -180,14 +180,24 @@ public class ColumnIndex
 firstColumn = column;
 startPosition = endPosition;
 // TODO: have that use the firstColumn as min + make sure we 
optimize that on read
-endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, 
output, atomSerializer);
+endPosition += 
tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer);
 blockSize = 0; // We don't count repeated tombstone marker in 
the block size, to avoid a situation
// where we wouldn't make any progress because 
a block is filled by said marker
+
+maybeWriteRowHeader();
 }
 
-long size = atomSerializer.serializedSizeForSSTable(column);
-endPosition += size;
-blockSize += size;
+if (tombstoneTracker.update(column, false))
+{
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+size += atomSerializer.serializedSizeForSSTable(column);
+endPosition += size;
+blockSize += size;
+
+atomSerializer.serializeForSSTable(column, output);
+}
+
+lastColumn = column;
 
 // if we hit the column index size that we have to index after, go 
ahead and index it.
 if (blockSize >= DatabaseDescriptor.getColumnIndexSize())
@@ -197,14 +207,6 @@ public class ColumnIndex
 firstColumn = null;
 lastBlockClosing = column;
 }
-
-maybeWriteRowHeader();
-atomSerializer.serializeForSSTable(column, output);
-
-// TODO: Should deal with removing unneeded tombstones
-tombstoneTracker.update(column, false);
-
-lastColumn = column;
 }
 
 private void maybeWriteRowHeader() throws IOException
@@ -216,12 +218,16 @@ public class ColumnIndex
 }
 }
 
-public ColumnIndex build()
+public ColumnIndex build() throws IOException
 {
 // all columns were GC'd after all
 if (lastColumn == null)
 return ColumnIndex.EMPTY;
 
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+endPosition += size;
+blockSize += size;
+
 // the last column may have fallen on an index boundary already.  
if not, index it explicitly.
 if (result.columnsIndex.isEmpty() || lastBlockClosing != 
lastColumn)
 {

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java

[3/4] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0

2015-10-15 Thread marcuse
Merge branch 'cassandra-2.2' into cassandra-3.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6a1c1d90
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6a1c1d90
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6a1c1d90

Branch: refs/heads/trunk
Commit: 6a1c1d900925cb0532633c943e7c4325edc8f64c
Parents: b42a0cf 3b7ccdf
Author: Marcus Eriksson 
Authored: Thu Oct 15 15:35:53 2015 +0200
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:35:53 2015 +0200

--

--




[1/4] cassandra git commit: Skip redundant tombstones on compaction.

2015-10-15 Thread marcuse
Repository: cassandra
Updated Branches:
  refs/heads/trunk 29576a44d -> 0e3da95d6


Skip redundant tombstones on compaction.

Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f

Branch: refs/heads/trunk
Commit: a61fc01f418426847e3aad133127da3615813236
Parents: 02f88e3
Author: Branimir Lambov 
Authored: Wed Oct 7 14:46:24 2015 +0300
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:28:42 2015 +0200

--
 CHANGES.txt |   1 +
 .../org/apache/cassandra/db/ColumnIndex.java|  32 +++--
 .../org/apache/cassandra/db/RangeTombstone.java | 135 ++-
 .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 +
 4 files changed, 218 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index b16acb5..68b44ed 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.12
+ * Merge range tombstones during compaction (CASSANDRA-7953)
  * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
  * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)
  * Don't allow startup if the node's rack has changed (CASSANDRA-10242)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java 
b/src/java/org/apache/cassandra/db/ColumnIndex.java
index d9d6a9c..0ea5c87 100644
--- a/src/java/org/apache/cassandra/db/ColumnIndex.java
+++ b/src/java/org/apache/cassandra/db/ColumnIndex.java
@@ -180,14 +180,24 @@ public class ColumnIndex
 firstColumn = column;
 startPosition = endPosition;
 // TODO: have that use the firstColumn as min + make sure we 
optimize that on read
-endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, 
output, atomSerializer);
+endPosition += 
tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer);
 blockSize = 0; // We don't count repeated tombstone marker in 
the block size, to avoid a situation
// where we wouldn't make any progress because 
a block is filled by said marker
+
+maybeWriteRowHeader();
 }
 
-long size = atomSerializer.serializedSizeForSSTable(column);
-endPosition += size;
-blockSize += size;
+if (tombstoneTracker.update(column, false))
+{
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+size += atomSerializer.serializedSizeForSSTable(column);
+endPosition += size;
+blockSize += size;
+
+atomSerializer.serializeForSSTable(column, output);
+}
+
+lastColumn = column;
 
 // if we hit the column index size that we have to index after, go 
ahead and index it.
 if (blockSize >= DatabaseDescriptor.getColumnIndexSize())
@@ -197,14 +207,6 @@ public class ColumnIndex
 firstColumn = null;
 lastBlockClosing = column;
 }
-
-maybeWriteRowHeader();
-atomSerializer.serializeForSSTable(column, output);
-
-// TODO: Should deal with removing unneeded tombstones
-tombstoneTracker.update(column, false);
-
-lastColumn = column;
 }
 
 private void maybeWriteRowHeader() throws IOException
@@ -216,12 +218,16 @@ public class ColumnIndex
 }
 }
 
-public ColumnIndex build()
+public ColumnIndex build() throws IOException
 {
 // all columns were GC'd after all
 if (lastColumn == null)
 return ColumnIndex.EMPTY;
 
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+endPosition += size;
+blockSize += size;
+
 // the last column may have fallen on an index boundary already.  
if not, index it explicitly.
 if (result.columnsIndex.isEmpty() || lastBlockClosing != 
lastColumn)
 {

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java

[3/3] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0

2015-10-15 Thread marcuse
Merge branch 'cassandra-2.2' into cassandra-3.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6a1c1d90
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6a1c1d90
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6a1c1d90

Branch: refs/heads/cassandra-3.0
Commit: 6a1c1d900925cb0532633c943e7c4325edc8f64c
Parents: b42a0cf 3b7ccdf
Author: Marcus Eriksson 
Authored: Thu Oct 15 15:35:53 2015 +0200
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:35:53 2015 +0200

--

--




[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2

2015-10-15 Thread marcuse
Merge branch 'cassandra-2.1' into cassandra-2.2

Conflicts:
CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3b7ccdfb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3b7ccdfb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3b7ccdfb

Branch: refs/heads/cassandra-2.2
Commit: 3b7ccdfb15b43880804d61a5e7d62c82b3b664eb
Parents: bee48eb a61fc01
Author: Marcus Eriksson 
Authored: Thu Oct 15 15:33:29 2015 +0200
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:33:29 2015 +0200

--
 .../org/apache/cassandra/db/ColumnIndex.java|  32 +++--
 .../org/apache/cassandra/db/RangeTombstone.java | 135 ++-
 .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 +
 3 files changed, 217 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/src/java/org/apache/cassandra/db/RangeTombstone.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
--
diff --cc test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
index 000,0460a16..71634e9
mode 00,100644..100644
--- a/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
+++ b/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
@@@ -1,0 -1,125 +1,125 @@@
+ /*
+  * Licensed to the Apache Software Foundation (ASF) under one
+  * or more contributor license agreements.  See the NOTICE file
+  * distributed with this work for additional information
+  * regarding copyright ownership.  The ASF licenses this file
+  * to you under the Apache License, Version 2.0 (the
+  * "License"); you may not use this file except in compliance
+  * with the License.  You may obtain a copy of the License at
+  *
+  * http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+ 
+ package org.apache.cassandra.cql3;
+ 
+ import static org.junit.Assert.assertEquals;
+ import static org.junit.Assert.assertTrue;
+ 
+ import com.google.common.collect.Iterables;
+ 
+ import org.junit.Before;
+ import org.junit.Test;
+ 
+ import org.apache.cassandra.Util;
+ import org.apache.cassandra.db.*;
+ import org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
+ import org.apache.cassandra.db.composites.*;
++import org.apache.cassandra.io.sstable.format.SSTableReader;
+ import org.apache.cassandra.io.sstable.ISSTableScanner;
 -import org.apache.cassandra.io.sstable.SSTableReader;
+ 
+ public class RangeTombstoneMergeTest extends CQLTester
+ {
+ @Before
+ public void before() throws Throwable
+ {
+ createTable("CREATE TABLE %s(" +
+ "  key text," +
+ "  column text," +
+ "  data text," +
+ "  extra text," +
+ "  PRIMARY KEY(key, column)" +
+ ");");
+ 
+ // If the sstable only contains tombstones during compaction it seems 
that the sstable either gets removed or isn't created (but that could probably 
be a separate JIRA issue).
+ execute("INSERT INTO %s (key, column, data) VALUES (?, ?, ?)", "1", 
"1", "1");
+ }
+ 
+ @Test
+ public void testEqualMerge() throws Throwable
+ {
+ addRemoveAndFlush();
+ 
+ for (int i=0; i<3; ++i)
+ {
+ addRemoveAndFlush();
+ compact();
+ }
+ 
+ assertOneTombstone();
+ }
+ 
+ @Test
+ public void testRangeMerge() throws Throwable
+ {
+ addRemoveAndFlush();
+ 
+ execute("INSERT INTO %s (key, column, data, extra) VALUES (?, ?, ?, 
?)", "1", "2", "2", "2");
+ execute("DELETE extra FROM %s WHERE key=? AND column=?", "1", "2");
+ 
+ flush();
+ compact();
+ 
+ execute("DELETE FROM %s WHERE key=? AND column=?", "1", "2");
+ 
+ flush();
+ compact();
+ 
+ assertOneTombstone();
+ }
+ 
+ void assertOneTombstone() throws Throwable
+ {
+ assertRows(execute("SELECT column FROM %s"),
+row("1"));
+ assertAllRows(row("1", "1", "1", null));
+ 
+ ColumnFamilyStore cfs = 
Keyspace.open(KEYSPACE).getColumnFamilyStore(currentTable());
+ ColumnFamily cf = cfs.getColumnFamily(Util.dk("1"), 

[2/3] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2

2015-10-15 Thread marcuse
Merge branch 'cassandra-2.1' into cassandra-2.2

Conflicts:
CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3b7ccdfb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3b7ccdfb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3b7ccdfb

Branch: refs/heads/cassandra-3.0
Commit: 3b7ccdfb15b43880804d61a5e7d62c82b3b664eb
Parents: bee48eb a61fc01
Author: Marcus Eriksson 
Authored: Thu Oct 15 15:33:29 2015 +0200
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:33:29 2015 +0200

--
 .../org/apache/cassandra/db/ColumnIndex.java|  32 +++--
 .../org/apache/cassandra/db/RangeTombstone.java | 135 ++-
 .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 +
 3 files changed, 217 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/src/java/org/apache/cassandra/db/RangeTombstone.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
--
diff --cc test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
index 000,0460a16..71634e9
mode 00,100644..100644
--- a/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
+++ b/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
@@@ -1,0 -1,125 +1,125 @@@
+ /*
+  * Licensed to the Apache Software Foundation (ASF) under one
+  * or more contributor license agreements.  See the NOTICE file
+  * distributed with this work for additional information
+  * regarding copyright ownership.  The ASF licenses this file
+  * to you under the Apache License, Version 2.0 (the
+  * "License"); you may not use this file except in compliance
+  * with the License.  You may obtain a copy of the License at
+  *
+  * http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+ 
+ package org.apache.cassandra.cql3;
+ 
+ import static org.junit.Assert.assertEquals;
+ import static org.junit.Assert.assertTrue;
+ 
+ import com.google.common.collect.Iterables;
+ 
+ import org.junit.Before;
+ import org.junit.Test;
+ 
+ import org.apache.cassandra.Util;
+ import org.apache.cassandra.db.*;
+ import org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
+ import org.apache.cassandra.db.composites.*;
++import org.apache.cassandra.io.sstable.format.SSTableReader;
+ import org.apache.cassandra.io.sstable.ISSTableScanner;
 -import org.apache.cassandra.io.sstable.SSTableReader;
+ 
+ public class RangeTombstoneMergeTest extends CQLTester
+ {
+ @Before
+ public void before() throws Throwable
+ {
+ createTable("CREATE TABLE %s(" +
+ "  key text," +
+ "  column text," +
+ "  data text," +
+ "  extra text," +
+ "  PRIMARY KEY(key, column)" +
+ ");");
+ 
+ // If the sstable only contains tombstones during compaction it seems 
that the sstable either gets removed or isn't created (but that could probably 
be a separate JIRA issue).
+ execute("INSERT INTO %s (key, column, data) VALUES (?, ?, ?)", "1", 
"1", "1");
+ }
+ 
+ @Test
+ public void testEqualMerge() throws Throwable
+ {
+ addRemoveAndFlush();
+ 
+ for (int i=0; i<3; ++i)
+ {
+ addRemoveAndFlush();
+ compact();
+ }
+ 
+ assertOneTombstone();
+ }
+ 
+ @Test
+ public void testRangeMerge() throws Throwable
+ {
+ addRemoveAndFlush();
+ 
+ execute("INSERT INTO %s (key, column, data, extra) VALUES (?, ?, ?, 
?)", "1", "2", "2", "2");
+ execute("DELETE extra FROM %s WHERE key=? AND column=?", "1", "2");
+ 
+ flush();
+ compact();
+ 
+ execute("DELETE FROM %s WHERE key=? AND column=?", "1", "2");
+ 
+ flush();
+ compact();
+ 
+ assertOneTombstone();
+ }
+ 
+ void assertOneTombstone() throws Throwable
+ {
+ assertRows(execute("SELECT column FROM %s"),
+row("1"));
+ assertAllRows(row("1", "1", "1", null));
+ 
+ ColumnFamilyStore cfs = 
Keyspace.open(KEYSPACE).getColumnFamilyStore(currentTable());
+ ColumnFamily cf = cfs.getColumnFamily(Util.dk("1"), 

[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057
 ] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/15/15 3:24 PM:
-

I discovered that an index on one of the tables has a wide row, and I'm 
wondering if that could be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.


was (Author: rstrickland):
I discovered that an index on one of the tables has a wide row, and I'm 
assuming that to be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close

2015-10-15 Thread Sharvanath Pathak (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharvanath Pathak updated CASSANDRA-10534:
--
Description: 
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. I am wroking with version 2.1.9.

I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.

Following is the trace I saw in system.log

{noformat}
INFO  [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - 
Opening 
/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368
 (79 bytes)
ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting 
forcefully due to file system exception on startup, disk failure policy "stop"
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[na:1.7.0_80]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
[na:1.7.0_80]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_80]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
~[na:1.7.0_80]
at java.io.DataInputStream.readUTF(DataInputStream.java:589) 
~[na:1.7.0_80]
at java.io.DataInputStream.readUTF(DataInputStream.java:564) 
~[na:1.7.0_80]
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
... 14 common frames omitted
{noformat}

  was:
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. I am wroking with version 2.1.9.

I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.


> CompressionInfo not being fsynced on close
> --
>
> 

[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Mikhail Stepura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959074#comment-14959074
 ] 

Mikhail Stepura commented on CASSANDRA-10515:
-

[~jeffery.griffith] could you please attach a thread dump as well?

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Mikhail Stepura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959148#comment-14959148
 ] 

Mikhail Stepura commented on CASSANDRA-10449:
-

I would love to get hold of a heapdump for that OOM. At least we could figure 
out what's consuming the heap.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close

2015-10-15 Thread Sharvanath Pathak (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharvanath Pathak updated CASSANDRA-10534:
--
Description: 
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. I am working with version 2.1.9.

I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 has caused the regression, which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.

Following is the trace I saw in system.log

{noformat}
INFO  [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - 
Opening 
/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368
 (79 bytes)
ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting 
forcefully due to file system exception on startup, disk failure policy "stop"
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[na:1.7.0_80]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
[na:1.7.0_80]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_80]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
~[na:1.7.0_80]
at java.io.DataInputStream.readUTF(DataInputStream.java:589) 
~[na:1.7.0_80]
at java.io.DataInputStream.readUTF(DataInputStream.java:564) 
~[na:1.7.0_80]
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
... 14 common frames omitted
{noformat}

  was:
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. I am working with version 2.1.9.

I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.

Following is the trace I saw in system.log

{noformat}
INFO  [SSTableBatchOpen:1] 2015-09-29 

[jira] [Commented] (CASSANDRA-9484) Inconsistent select count

2015-10-15 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959124#comment-14959124
 ] 

Philip Thompson commented on CASSANDRA-9484:


I can't reproduce this at all anymore, even on base 2.2 and 3.0. We can 
probably close this ticket. I'm going to merge the test into CI, and we can 
re-open this ticket if it starts failing again.

> Inconsistent select count
> -
>
> Key: CASSANDRA-9484
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9484
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Benjamin Lerer
> Fix For: 3.x, 2.2.x
>
>
> I am running the dtest simultaneous_bootstrap_test located at 
> https://github.com/riptano/cassandra-dtest/compare/cassandra-7069 and finding 
> that at the final data verification step, the query {{SELECT COUNT (*) FROM 
> keyspace1.standard1}} alternated between correctly returning 500,000 rows and 
> returning 500,001 rows. Running cleanup or compaction does not affect the 
> behavior. I have verified with sstable2json that there are exactly 500k rows 
> on disk between the two nodes in the cluster.
> I am reproducing this on trunk currently. It is not happening on 2.1-head.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10528) Proposal: Integrate RxJava

2015-10-15 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959149#comment-14959149
 ] 

T Jake Luciani commented on CASSANDRA-10528:


You mean in the POC? Well this benchmark was RF=1 so the driver was using TAP 
and no MS was used.

In general terms though, we keep a thread per connection so relates to 
CASSANDRA-8457 linked above.  My thought was to combine our native netty epoll 
event loop with the messaging service event loop to avoid having many more 
threads.

> Proposal: Integrate RxJava
> --
>
> Key: CASSANDRA-10528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10528
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
> Fix For: 3.x
>
> Attachments: rxjava-stress.png
>
>
> The purpose of this ticket is to discuss the merits of integrating the 
> [RxJava|https://github.com/ReactiveX/RxJava] framework into C*.  Enabling us 
> to incrementally make the internals of C* async and move away from SEDA to a 
> more modern thread per core architecture. 
> Related tickets:
>* CASSANDRA-8520
>* CASSANDRA-8457
>* CASSANDRA-5239
>* CASSANDRA-7040
>* CASSANDRA-5863
>* CASSANDRA-6696
>* CASSANDRA-7392
> My *primary* goals in raising this issue are to provide a way of:
> *  *Incrementally* making the backend async
> *  Avoiding code complexity/readability issues
> *  Avoiding NIH where possible
> *  Building on an extendable library
> My *non*-goals in raising this issue are:
> 
>* Rewrite the entire database in one big bang
>* Write our own async api/framework
> 
> -
> I've attempted to integrate RxJava a while back and found it not ready mainly 
> due to our lack of lambda support.  Now with Java 8 I've found it very 
> enjoyable and have not hit any performance issues. A gentle introduction to 
> RxJava is [here|http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/] as 
> well as their 
> [wiki|https://github.com/ReactiveX/RxJava/wiki/Additional-Reading].  The 
> primary concept of RX is the 
> [Obervable|http://reactivex.io/documentation/observable.html] which is 
> essentially a stream of stuff you can subscribe to and act on, chain, etc. 
> This is quite similar to [Java 8 streams 
> api|http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html]
>  (or I should say streams api is similar to it).  The difference is java 8 
> streams can't be used for asynchronous events while RxJava can.
> Another improvement since I last tried integrating RxJava is the completion 
> of CASSANDRA-8099 which provides is a very iterable/incremental approach to 
> our storage engine.  *Iterators and Observables are well paired conceptually 
> so morphing our current Storage engine to be async is much simpler now.*
> In an effort to show how one can incrementally change our backend I've done a 
> quick POC with RxJava and replaced our non-paging read requests to become 
> non-blocking.
> https://github.com/apache/cassandra/compare/trunk...tjake:rxjava-3.0
> As you can probably see the code is straight-forward and sometimes quite nice!
> *Old*
> {code}
> private static PartitionIterator 
> fetchRows(List commands, ConsistencyLevel 
> consistencyLevel)
> throws UnavailableException, ReadFailureException, ReadTimeoutException
> {
> int cmdCount = commands.size();
> SinglePartitionReadLifecycle[] reads = new 
> SinglePartitionReadLifecycle[cmdCount];
> for (int i = 0; i < cmdCount; i++)
> reads[i] = new SinglePartitionReadLifecycle(commands.get(i), 
> consistencyLevel);
> for (int i = 0; i < cmdCount; i++)
> reads[i].doInitialQueries();
> for (int i = 0; i < cmdCount; i++)
> reads[i].maybeTryAdditionalReplicas();
> for (int i = 0; i < cmdCount; i++)
> reads[i].awaitRes
> ultsAndRetryOnDigestMismatch();
> for (int i = 0; i < cmdCount; i++)
> if (!reads[i].isDone())
> reads[i].maybeAwaitFullDataRead();
> List results = new ArrayList<>(cmdCount);
> for (int i = 0; i < cmdCount; i++)
> {
> assert reads[i].isDone();
> results.add(reads[i].getResult());
> }
> return PartitionIterators.concat(results);
> }
> {code}
>  *New*
> {code}
> private static Observable 
> fetchRows(List commands, ConsistencyLevel 
> consistencyLevel)
> throws UnavailableException, ReadFailureException, ReadTimeoutException
> {
> return Observable.from(commands)
>  .map(command -> new 
> SinglePartitionReadLifecycle(command, 

[jira] [Commented] (CASSANDRA-10524) Add ability to skip TIME_WAIT sockets on port check on Windows startup

2015-10-15 Thread Andy Tolbert (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959059#comment-14959059
 ] 

Andy Tolbert commented on CASSANDRA-10524:
--

Another one I've seen:  {{LAST_ACK}}

> Add ability to skip TIME_WAIT sockets on port check on Windows startup
> --
>
> Key: CASSANDRA-10524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10524
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>Priority: Trivial
>  Labels: Windows
> Fix For: 3.0.0 rc2, 2.2.4
>
> Attachments: win_aggressive_startup.txt
>
>
> C* sockets are often staying TIME_WAIT for up to 120 seconds (2x max segment 
> lifetime) for me in my dev environment on Windows. This is rather obnoxious 
> since it means I can't launch C* for up to 2 minutes after stopping it.
> Attaching a patch that adds a simple -a for aggressive startup to the launch 
> scripts to ignore duplicate port check from netstat if it's TIME_WAIT. Also 
> snuck in some more liberal interpretation of help strings in the .ps1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057
 ] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/15/15 3:25 PM:
-

I discovered that an index on one of the tables has a wide row, and I'm 
wondering if that could be the root of the issue:

Example from one node:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.


was (Author: rstrickland):
I discovered that an index on one of the tables has a wide row, and I'm 
wondering if that could be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I discovered that an index on one of the tables has a wide row, and I'm 
assuming that to be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close

2015-10-15 Thread Sharvanath Pathak (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharvanath Pathak updated CASSANDRA-10534:
--
Description: 
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. I am working with version 2.1.9.

I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.

Following is the trace I saw in system.log

{noformat}
INFO  [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - 
Opening 
/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368
 (79 bytes)
ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting 
forcefully due to file system exception on startup, disk failure policy "stop"
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) 
~[apache-cassandra-2.1.9.jar:2.1.9]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[na:1.7.0_80]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
[na:1.7.0_80]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_80]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
~[na:1.7.0_80]
at java.io.DataInputStream.readUTF(DataInputStream.java:589) 
~[na:1.7.0_80]
at java.io.DataInputStream.readUTF(DataInputStream.java:564) 
~[na:1.7.0_80]
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
 ~[apache-cassandra-2.1.9.jar:2.1.9]
... 14 common frames omitted
{noformat}

  was:
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. I am wroking with version 2.1.9.

I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.

Following is the trace I saw in system.log

{noformat}
INFO  [SSTableBatchOpen:1] 2015-09-29 

[jira] [Commented] (CASSANDRA-10528) Proposal: Integrate RxJava

2015-10-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959135#comment-14959135
 ] 

Jonathan Ellis commented on CASSANDRA-10528:


What about MessagingService?

> Proposal: Integrate RxJava
> --
>
> Key: CASSANDRA-10528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10528
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
> Fix For: 3.x
>
> Attachments: rxjava-stress.png
>
>
> The purpose of this ticket is to discuss the merits of integrating the 
> [RxJava|https://github.com/ReactiveX/RxJava] framework into C*.  Enabling us 
> to incrementally make the internals of C* async and move away from SEDA to a 
> more modern thread per core architecture. 
> Related tickets:
>* CASSANDRA-8520
>* CASSANDRA-8457
>* CASSANDRA-5239
>* CASSANDRA-7040
>* CASSANDRA-5863
>* CASSANDRA-6696
>* CASSANDRA-7392
> My *primary* goals in raising this issue are to provide a way of:
> *  *Incrementally* making the backend async
> *  Avoiding code complexity/readability issues
> *  Avoiding NIH where possible
> *  Building on an extendable library
> My *non*-goals in raising this issue are:
> 
>* Rewrite the entire database in one big bang
>* Write our own async api/framework
> 
> -
> I've attempted to integrate RxJava a while back and found it not ready mainly 
> due to our lack of lambda support.  Now with Java 8 I've found it very 
> enjoyable and have not hit any performance issues. A gentle introduction to 
> RxJava is [here|http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/] as 
> well as their 
> [wiki|https://github.com/ReactiveX/RxJava/wiki/Additional-Reading].  The 
> primary concept of RX is the 
> [Obervable|http://reactivex.io/documentation/observable.html] which is 
> essentially a stream of stuff you can subscribe to and act on, chain, etc. 
> This is quite similar to [Java 8 streams 
> api|http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html]
>  (or I should say streams api is similar to it).  The difference is java 8 
> streams can't be used for asynchronous events while RxJava can.
> Another improvement since I last tried integrating RxJava is the completion 
> of CASSANDRA-8099 which provides is a very iterable/incremental approach to 
> our storage engine.  *Iterators and Observables are well paired conceptually 
> so morphing our current Storage engine to be async is much simpler now.*
> In an effort to show how one can incrementally change our backend I've done a 
> quick POC with RxJava and replaced our non-paging read requests to become 
> non-blocking.
> https://github.com/apache/cassandra/compare/trunk...tjake:rxjava-3.0
> As you can probably see the code is straight-forward and sometimes quite nice!
> *Old*
> {code}
> private static PartitionIterator 
> fetchRows(List commands, ConsistencyLevel 
> consistencyLevel)
> throws UnavailableException, ReadFailureException, ReadTimeoutException
> {
> int cmdCount = commands.size();
> SinglePartitionReadLifecycle[] reads = new 
> SinglePartitionReadLifecycle[cmdCount];
> for (int i = 0; i < cmdCount; i++)
> reads[i] = new SinglePartitionReadLifecycle(commands.get(i), 
> consistencyLevel);
> for (int i = 0; i < cmdCount; i++)
> reads[i].doInitialQueries();
> for (int i = 0; i < cmdCount; i++)
> reads[i].maybeTryAdditionalReplicas();
> for (int i = 0; i < cmdCount; i++)
> reads[i].awaitRes
> ultsAndRetryOnDigestMismatch();
> for (int i = 0; i < cmdCount; i++)
> if (!reads[i].isDone())
> reads[i].maybeAwaitFullDataRead();
> List results = new ArrayList<>(cmdCount);
> for (int i = 0; i < cmdCount; i++)
> {
> assert reads[i].isDone();
> results.add(reads[i].getResult());
> }
> return PartitionIterators.concat(results);
> }
> {code}
>  *New*
> {code}
> private static Observable 
> fetchRows(List commands, ConsistencyLevel 
> consistencyLevel)
> throws UnavailableException, ReadFailureException, ReadTimeoutException
> {
> return Observable.from(commands)
>  .map(command -> new 
> SinglePartitionReadLifecycle(command, consistencyLevel))
>  .flatMap(read -> read.getPartitionIterator())
>  .toList()
>  .map(results -> PartitionIterators.concat(results));
> }
> {code}
> Since the read call is now non blocking (no more future.get()) we can remove 
> one thread pool 

[4/4] cassandra git commit: Merge branch 'cassandra-3.0' into trunk

2015-10-15 Thread marcuse
Merge branch 'cassandra-3.0' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0e3da95d
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0e3da95d
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0e3da95d

Branch: refs/heads/trunk
Commit: 0e3da95d6bbfcddc1bdb381e02499206aac56d7a
Parents: 29576a4 6a1c1d9
Author: Marcus Eriksson 
Authored: Thu Oct 15 15:36:05 2015 +0200
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:36:05 2015 +0200

--

--




[1/3] cassandra git commit: Skip redundant tombstones on compaction.

2015-10-15 Thread marcuse
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 b42a0cfe8 -> 6a1c1d900


Skip redundant tombstones on compaction.

Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f

Branch: refs/heads/cassandra-3.0
Commit: a61fc01f418426847e3aad133127da3615813236
Parents: 02f88e3
Author: Branimir Lambov 
Authored: Wed Oct 7 14:46:24 2015 +0300
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:28:42 2015 +0200

--
 CHANGES.txt |   1 +
 .../org/apache/cassandra/db/ColumnIndex.java|  32 +++--
 .../org/apache/cassandra/db/RangeTombstone.java | 135 ++-
 .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 +
 4 files changed, 218 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index b16acb5..68b44ed 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.12
+ * Merge range tombstones during compaction (CASSANDRA-7953)
  * (cqlsh) Distinguish negative and positive infinity in output 
(CASSANDRA-10523)
  * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970)
  * Don't allow startup if the node's rack has changed (CASSANDRA-10242)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java 
b/src/java/org/apache/cassandra/db/ColumnIndex.java
index d9d6a9c..0ea5c87 100644
--- a/src/java/org/apache/cassandra/db/ColumnIndex.java
+++ b/src/java/org/apache/cassandra/db/ColumnIndex.java
@@ -180,14 +180,24 @@ public class ColumnIndex
 firstColumn = column;
 startPosition = endPosition;
 // TODO: have that use the firstColumn as min + make sure we 
optimize that on read
-endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, 
output, atomSerializer);
+endPosition += 
tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer);
 blockSize = 0; // We don't count repeated tombstone marker in 
the block size, to avoid a situation
// where we wouldn't make any progress because 
a block is filled by said marker
+
+maybeWriteRowHeader();
 }
 
-long size = atomSerializer.serializedSizeForSSTable(column);
-endPosition += size;
-blockSize += size;
+if (tombstoneTracker.update(column, false))
+{
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+size += atomSerializer.serializedSizeForSSTable(column);
+endPosition += size;
+blockSize += size;
+
+atomSerializer.serializeForSSTable(column, output);
+}
+
+lastColumn = column;
 
 // if we hit the column index size that we have to index after, go 
ahead and index it.
 if (blockSize >= DatabaseDescriptor.getColumnIndexSize())
@@ -197,14 +207,6 @@ public class ColumnIndex
 firstColumn = null;
 lastBlockClosing = column;
 }
-
-maybeWriteRowHeader();
-atomSerializer.serializeForSSTable(column, output);
-
-// TODO: Should deal with removing unneeded tombstones
-tombstoneTracker.update(column, false);
-
-lastColumn = column;
 }
 
 private void maybeWriteRowHeader() throws IOException
@@ -216,12 +218,16 @@ public class ColumnIndex
 }
 }
 
-public ColumnIndex build()
+public ColumnIndex build() throws IOException
 {
 // all columns were GC'd after all
 if (lastColumn == null)
 return ColumnIndex.EMPTY;
 
+long size = tombstoneTracker.writeUnwrittenTombstones(output, 
atomSerializer);
+endPosition += size;
+blockSize += size;
+
 // the last column may have fallen on an index boundary already.  
if not, index it explicitly.
 if (result.columnsIndex.isEmpty() || lastBlockClosing != 
lastColumn)
 {

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java

[2/4] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2

2015-10-15 Thread marcuse
Merge branch 'cassandra-2.1' into cassandra-2.2

Conflicts:
CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3b7ccdfb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3b7ccdfb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3b7ccdfb

Branch: refs/heads/trunk
Commit: 3b7ccdfb15b43880804d61a5e7d62c82b3b664eb
Parents: bee48eb a61fc01
Author: Marcus Eriksson 
Authored: Thu Oct 15 15:33:29 2015 +0200
Committer: Marcus Eriksson 
Committed: Thu Oct 15 15:33:29 2015 +0200

--
 .../org/apache/cassandra/db/ColumnIndex.java|  32 +++--
 .../org/apache/cassandra/db/RangeTombstone.java | 135 ++-
 .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 +
 3 files changed, 217 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/src/java/org/apache/cassandra/db/RangeTombstone.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
--
diff --cc test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
index 000,0460a16..71634e9
mode 00,100644..100644
--- a/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
+++ b/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java
@@@ -1,0 -1,125 +1,125 @@@
+ /*
+  * Licensed to the Apache Software Foundation (ASF) under one
+  * or more contributor license agreements.  See the NOTICE file
+  * distributed with this work for additional information
+  * regarding copyright ownership.  The ASF licenses this file
+  * to you under the Apache License, Version 2.0 (the
+  * "License"); you may not use this file except in compliance
+  * with the License.  You may obtain a copy of the License at
+  *
+  * http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+ 
+ package org.apache.cassandra.cql3;
+ 
+ import static org.junit.Assert.assertEquals;
+ import static org.junit.Assert.assertTrue;
+ 
+ import com.google.common.collect.Iterables;
+ 
+ import org.junit.Before;
+ import org.junit.Test;
+ 
+ import org.apache.cassandra.Util;
+ import org.apache.cassandra.db.*;
+ import org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
+ import org.apache.cassandra.db.composites.*;
++import org.apache.cassandra.io.sstable.format.SSTableReader;
+ import org.apache.cassandra.io.sstable.ISSTableScanner;
 -import org.apache.cassandra.io.sstable.SSTableReader;
+ 
+ public class RangeTombstoneMergeTest extends CQLTester
+ {
+ @Before
+ public void before() throws Throwable
+ {
+ createTable("CREATE TABLE %s(" +
+ "  key text," +
+ "  column text," +
+ "  data text," +
+ "  extra text," +
+ "  PRIMARY KEY(key, column)" +
+ ");");
+ 
+ // If the sstable only contains tombstones during compaction it seems 
that the sstable either gets removed or isn't created (but that could probably 
be a separate JIRA issue).
+ execute("INSERT INTO %s (key, column, data) VALUES (?, ?, ?)", "1", 
"1", "1");
+ }
+ 
+ @Test
+ public void testEqualMerge() throws Throwable
+ {
+ addRemoveAndFlush();
+ 
+ for (int i=0; i<3; ++i)
+ {
+ addRemoveAndFlush();
+ compact();
+ }
+ 
+ assertOneTombstone();
+ }
+ 
+ @Test
+ public void testRangeMerge() throws Throwable
+ {
+ addRemoveAndFlush();
+ 
+ execute("INSERT INTO %s (key, column, data, extra) VALUES (?, ?, ?, 
?)", "1", "2", "2", "2");
+ execute("DELETE extra FROM %s WHERE key=? AND column=?", "1", "2");
+ 
+ flush();
+ compact();
+ 
+ execute("DELETE FROM %s WHERE key=? AND column=?", "1", "2");
+ 
+ flush();
+ compact();
+ 
+ assertOneTombstone();
+ }
+ 
+ void assertOneTombstone() throws Throwable
+ {
+ assertRows(execute("SELECT column FROM %s"),
+row("1"));
+ assertAllRows(row("1", "1", "1", null));
+ 
+ ColumnFamilyStore cfs = 
Keyspace.open(KEYSPACE).getColumnFamilyStore(currentTable());
+ ColumnFamily cf = cfs.getColumnFamily(Util.dk("1"), Composites.EMPTY, 

[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close

2015-10-15 Thread Sharvanath Pathak (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharvanath Pathak updated CASSANDRA-10534:
--
Component/s: Core

> CompressionInfo not being fsynced on close
> --
>
> Key: CASSANDRA-10534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Sharvanath Pathak
>
> I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
> this happened multiple times in our testing with hard node reboots. After 
> some investigation it seems like these file is not being fsynced, and that 
> can potentially lead to data corruption. 
> I checked for fsync calls using strace, and found them happening for all but 
> the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
> tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
> the code and did not revealed any fsync calls. Moreover, I suspect the commit 
>  4e95953f29d89a441dfe06d3f0393ed7dd8586df 
> (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
>  to have caused the regression. Which removed the 
> {noformat}
>  getChannel().force(true);
> {noformat}
> from CompressionMetadata.Writer.close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10534) CompressionInfo not being fsynced on close

2015-10-15 Thread Sharvanath Pathak (JIRA)
Sharvanath Pathak created CASSANDRA-10534:
-

 Summary: CompressionInfo not being fsynced on close
 Key: CASSANDRA-10534
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10534
 Project: Cassandra
  Issue Type: Bug
Reporter: Sharvanath Pathak


I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. 
I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close

2015-10-15 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-10534:

Reproduced In: 2.1.9
Fix Version/s: 2.1.x

> CompressionInfo not being fsynced on close
> --
>
> Key: CASSANDRA-10534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Sharvanath Pathak
> Fix For: 2.1.x
>
>
> I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
> this happened multiple times in our testing with hard node reboots. After 
> some investigation it seems like these file is not being fsynced, and that 
> can potentially lead to data corruption. I am wroking with version 2.1.9.
> I checked for fsync calls using strace, and found them happening for all but 
> the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
> tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
> the code and did not revealed any fsync calls. Moreover, I suspect the commit 
>  4e95953f29d89a441dfe06d3f0393ed7dd8586df 
> (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
>  to have caused the regression. Which removed the 
> {noformat}
>  getChannel().force(true);
> {noformat}
> from CompressionMetadata.Writer.close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-10469) Fix collection indexing upgrade dtest

2015-10-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe resolved CASSANDRA-10469.
-
Resolution: Not A Problem

This seems to be related to CASSANDRA-10468. Although the ClassCastException 
isn't observed during the failures of this particular test, it hasn't failed on 
cassci since 10468 was committed. Locally I'm seeing the same thing; in my 
latest comparison, 0/10 runs resulted in a failure when the {{UPGRADE_TO}} 
target is set to [26c8892|https://github.com/apache/cassandra/commit/26c8892] 
(the commit with the 10468 fix), compared to 7/10 failures where the 
{{UPGRADE_TO}} target is the previous commit 
({{48889d2|https://github.com/apache/cassandra/commit/48889d2}}).

I'm going to resolve this as not a problem, and we can reopen it if we see the 
problem reoccur (CASSANDRA-10468 was recently reopened).

> Fix collection indexing upgrade dtest
> -
>
> Key: CASSANDRA-10469
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10469
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Sam Tunnicliffe
> Fix For: 3.0.0 rc2
>
>
> {{upgrade_tests/cql_tests.py:TestCQL.collection_indexing_test}} fails on the 
> upgrade path from 2.2 to 3.0. You can see the failure on CassCI here:
> http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/44/testReport/upgrade_tests.cql_tests/TestCQL/collection_indexing_test/
> Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is 
> merged, these tests should also run with this upgrade path on normal 3.0 
> jobs. Until then, you can run it with the following command:
> {code}
> SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 
> nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.collection_indexing_test
> {code}
> Note that this test fails most of the time, but does occasionally succeed:
> http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/44/testReport/upgrade_tests.cql_tests/TestCQL/collection_indexing_test/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close

2015-10-15 Thread Sharvanath Pathak (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharvanath Pathak updated CASSANDRA-10534:
--
Description: 
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. I am wroking with version 2.1.9.

I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.

  was:
I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
this happened multiple times in our testing with hard node reboots. After some 
investigation it seems like these file is not being fsynced, and that can 
potentially lead to data corruption. 
I checked for fsync calls using strace, and found them happening for all but 
the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
the code and did not revealed any fsync calls. Moreover, I suspect the commit  
4e95953f29d89a441dfe06d3f0393ed7dd8586df 
(https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
 to have caused the regression. Which removed the 
{noformat}
 getChannel().force(true);
{noformat}
from CompressionMetadata.Writer.close.


> CompressionInfo not being fsynced on close
> --
>
> Key: CASSANDRA-10534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Sharvanath Pathak
>
> I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, 
> this happened multiple times in our testing with hard node reboots. After 
> some investigation it seems like these file is not being fsynced, and that 
> can potentially lead to data corruption. I am wroking with version 2.1.9.
> I checked for fsync calls using strace, and found them happening for all but 
> the following components: CompressionInfo, TOC.txt and digest.sha1. All seem 
> tolerable but the  CompressionInfo seem tolerable. Also a quick look through 
> the code and did not revealed any fsync calls. Moreover, I suspect the commit 
>  4e95953f29d89a441dfe06d3f0393ed7dd8586df 
> (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344)
>  to have caused the regression. Which removed the 
> {noformat}
>  getChannel().force(true);
> {noformat}
> from CompressionMetadata.Writer.close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10471) fix flapping empty_in_test dtest

2015-10-15 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959541#comment-14959541
 ] 

Ariel Weisberg commented on CASSANDRA-10471:


OK thanks for the explanation. The comments are good it's just me not having 
enough context.

I can't tell if the dtests were harmed. There are 20 failures on the branch. 
The 3.0 branch hasn't had 20 failures in the past few builds. I compared the 
contents and it just looks sort of random and overlapping.

+1

> fix flapping empty_in_test dtest
> 
>
> Key: CASSANDRA-10471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10471
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Sylvain Lebresne
> Fix For: 3.0.0 rc2
>
>
> {{upgrade_tests/cql_tests.py:TestCQL.empty_in_test}} fails about half the 
> time on the upgrade path from 2.2 to 3.0:
> http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/42/testReport/upgrade_tests.cql_tests/TestCQL/empty_in_test/history/
> Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is 
> merged, these tests should also run with this upgrade path on normal 3.0 
> jobs. Until then, you can run it with the following command:
> {code}
> SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 
> nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.empty_in_test
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal

2015-10-15 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959869#comment-14959869
 ] 

Joel Knighton commented on CASSANDRA-10089:
---

I suppose Jim's suggestion is the sensible solution here. 

I've pushed branches of the same name to my repo to get CI results. I'll update 
once those are in, and hopefully they'll be good and I can +1.

> NullPointerException in Gossip handleStateNormal
> 
>
> Key: CASSANDRA-10089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing 
> dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
>  in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 
> 15:39:57,873 CassandraDaemon.java:183 - Exception in thread 
> Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat 
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  ~[main/:na] \tat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na] \tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_80] \tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches  but it is clearly not related 
> to CASSANDRA-9970, if anything it could have been a side effect of 
> CASSANDRA-9871.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:00 AM:
--

[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed or was very, very slow. Eventually a second MemtableFlushWriter 
thread blocks. I believe if I left it continue to run, all or many of them 
will. 

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: jeffery.griffith):
[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed or was very, very slow:

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 

[jira] [Commented] (CASSANDRA-10538) Assertion failed in LogFile when disk is full

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960136#comment-14960136
 ] 

Stefania commented on CASSANDRA-10538:
--

I've created a patch to ensure we update the in memory records after updating 
the disk state, to prevent the assertion in case we throw in 
{{LifecycleTransaction.doCommit}}. However we still need to verify this is what 
actually happened in the logs.

I've also changed {{LogTransaction.doCommit} and {{doAbort}} so that they catch 
and return runtime exceptions. [~benedict] is this something we missed right?

> Assertion failed in LogFile when disk is full
> -
>
> Key: CASSANDRA-10538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10538
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
> Attachments: 
> ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log
>
>
> [~carlyeks] was running a stress job which filled up the disk. At the end of 
> the system logs there are several assertion errors:
> {code}
> ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 
> - Exception in thread Thread[CompactionExecutor:1,1,main]
> java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes
> at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_40]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_40]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_40]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_40]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> INFO  [IndexSummaryManager:1] 2015-10-14 21:10:40,099 
> IndexSummaryManager.java:257 - Redistributing index summaries
> ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 
> CassandraDaemon.java:195 - Exception in thread 
> Thread[IndexSummaryManager:1,1,main]
> java.lang.AssertionError: Already completed!
> at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158)
>  ~[main/:na]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242)
>  ~[main/:na]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
> at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE
> {code}
> We should not have an assertion if it can happen 

[jira] [Comment Edited] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files

2015-10-15 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960123#comment-14960123
 ] 

Ariel Weisberg edited comment on CASSANDRA-10421 at 10/16/15 3:47 AM:
--

Syncing the directory won't sync the log file. You need sync the log file 
specifically to have that data be available. Syncing the directory makes rename 
and file creation durable, but not the files contained in the directory.

bq. I also had to add several log.txnFile().close(); to the unit tests 
(whenever we test removeUnfinishedLeftovers) because on Windows we cannot 
delete files that are open. This is a bit ugly so maybe we should also go back 
to using FileUtils::appendLine, unless again you have performance concerns.
I don't mind opening the file every time. However to sync it after every write 
you will need to keep it open long enough to do that. Or open it O_SYNC or 
something.


was (Author: aweisberg):
Syncing the directory won't sync the log file. You need sync the log file 
specifically to have that data be available. Syncing the directory makes rename 
and file creation durable, but not the files contained in the directory.

bq. I also had to add several log.txnFile().close(); to the unit tests 
(whenever we test removeUnfinishedLeftovers) because on Windows we cannot 
delete files that are open. This is a bit ugly so maybe we should also go back 
to using FileUtils::appendLine, unless again you have performance concerns.
I don't mind opening the file every time.

> Potential issue with LogTransaction as it only checks in a single directory 
> for files
> -
>
> Key: CASSANDRA-10421
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10421
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Stefania
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> When creating a new LogTransaction we try to create the new logfile in the 
> same directory as the one we are writing to, but as we use 
> {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}}
>  this might end up in "any" of the configured data directories. If it does, 
> we will not be able to clean up leftovers as we check for files in the same 
> directory as the logfile was created: 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163
> cc [~Stefania]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959979#comment-14959979
 ] 

Stefania commented on CASSANDRA-10421:
--

bq. So what I think I see is that when the LogTransaction completes it first 
writes to the log the commit record, and then starts making permanent changes 
to the the files on disk (deleting the old ones). But if it hasn't actually 
synced the log to disk then on a restart we could have a partial log and 
attempt to roll back, but it is too late because before the crash we had 
already deleted parts of the before state. At the end we should sync the log 
files before deleting the obsolete files right?

Yes this is correct, that's why we flush after writing every record but if we 
want to survive a power cut then we should call {{fsync}}. 

bq. Was the intent to sync the folder when creating the log file or when adding 
a record which indicates the addition of other data files?

The intent was to sync the folder when creating the file and to sync the 
content of the file with a flush when appending data to it. However flushing 
only passes the data to the operating system; it won't protect us from a power 
cut. This wasn't clear to me yesterday. At a minimum we should {{fsync}} after 
adding the final record and probably also when adding new records as you 
correctly pointed out. Therefore, I would strongly prefer to leave it as it was 
before and call {{fsync}} on the folder whenever we add a record, so in case of 
a power cut we have the most up-to-date data. This is what I did in the latest 
commit. Let me know if you have performance concerns.

I also had to add several {{log.txnFile().close();}} to the unit tests 
(whenever we test {{removeUnfinishedLeftovers}}) because on Windows we cannot 
delete files that are open. This is a bit ugly so maybe we should also go back 
to using {{FileUtils::appendLine}}, unless again you have performance concerns.

I've rebased on 3.0 and started a new CI run on both Linux and Windows.

> Potential issue with LogTransaction as it only checks in a single directory 
> for files
> -
>
> Key: CASSANDRA-10421
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10421
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Stefania
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> When creating a new LogTransaction we try to create the new logfile in the 
> same directory as the one we are writing to, but as we use 
> {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}}
>  this might end up in "any" of the configured data directories. If it does, 
> we will not be able to clean up leftovers as we check for files in the same 
> directory as the logfile was created: 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163
> cc [~Stefania]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 11:57 PM:
--

[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed or was very, very slow:

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: jeffery.griffith):
[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed:

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at 

[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: RUN3tpstats.jpg

[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed:

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, 
> RUN3tpstats.jpg, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also 

[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:05 AM:
--

[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed or was very, very slow. Eventually a second MemtableFlushWriter 
thread blocks. I believe if I left it continue to run, all or many of them 
will. 

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


I see one thread for MemtablePostFlush and this is it:

{code}
"MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 
nid=0x2d93 waiting on condition [0x7fda4b46c000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0005838ba468> (a 
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: jeffery.griffith):
[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the 

[jira] [Commented] (CASSANDRA-10461) Fix sstableverify_test dtest

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959922#comment-14959922
 ] 

Stefania commented on CASSANDRA-10461:
--

It's the warning introduced by CASSANDRA-10199 that causes an extra line in the 
output, I'll have to enhance the test to extract sstables from the output by 
matching the keyspace or table name rather than making assumptions on the 
output returned by {{sstableutil}}:

{code}
'WARN  14:58:01 Only 37512 MB free across all data volumes. Consider adding 
more capacity to your cluster or removing obsolete snapshots',
{code}

> Fix sstableverify_test dtest
> 
>
> Key: CASSANDRA-10461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10461
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Stefania
>  Labels: test
> Fix For: 3.0.0 rc2
>
>
> The dtest for sstableverify is failing:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/
> It fails in the same way when I run it on OpenStack, so I don't think it's 
> just a CassCI problem.
> [~slebresne] Looks like you made changes to this test recently:
> https://github.com/riptano/cassandra-dtest/commit/51ab085f21e01cc8e5ad88a277cb4a43abd3f880
> Could you have a look at the failure? I'm assigning you for triage, but feel 
> free to reassign.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10538) Assertion failed in LogFile when disk is full

2015-10-15 Thread Stefania (JIRA)
Stefania created CASSANDRA-10538:


 Summary: Assertion failed in LogFile when disk is full
 Key: CASSANDRA-10538
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10538
 Project: Cassandra
  Issue Type: Bug
Reporter: Stefania
Assignee: Stefania
 Fix For: 3.x
 Attachments: 
ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, 
ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, 
ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, 
ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, 
ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, 
ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log

[~carlyeks] was running a stress job which filled up the disk. At the end of 
the system logs there are several assertion errors:

{code}
ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 - 
Exception in thread Thread[CompactionExecutor:1,1,main]
java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes
at 
org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156)
 ~[main/:na]
at 
org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77)
 ~[main/:na]
at 
org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110)
 ~[main/:na]
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182)
 ~[main/:na]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[main/:na]
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
 ~[main/:na]
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
 ~[main/:na]
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_40]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_40]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_40]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
INFO  [IndexSummaryManager:1] 2015-10-14 21:10:40,099 
IndexSummaryManager.java:257 - Redistributing index summaries
ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 CassandraDaemon.java:195 
- Exception in thread Thread[IndexSummaryManager:1,1,main]
java.lang.AssertionError: Already completed!
at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) 
~[main/:na]
at 
org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376)
 ~[main/:na]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
 ~[main/:na]
at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259)
 ~[main/:na]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
 ~[main/:na]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193)
 ~[main/:na]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134)
 ~[main/:na]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[main/:na]
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE
{code}

We should not have an assertion if it can happen when the disk is full, we 
should rather have a runtime exception.

I also would like to understand exactly what triggered the assertion. 
{{LifecycleTransaction}} can throw at the beginning of the commit method if it 
cannot write the record to disk, in which case all we have to do is ensure we 
update the records in memory after writing to disk (currently we update them 
before). However, I am not sure this is what happened here, it looks more like 
abort was called twice, which should never happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:13 AM:
--

[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed or was very, very slow. Eventually a second MemtableFlushWriter 
thread blocks. I believe if I left it continue to run, all or many of them 
will. 

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


I see one thread for MemtablePostFlush and this is it:

{code}
"MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 
nid=0x2d93 waiting on condition [0x7fda4b46c000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0005838ba468> (a 
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

I followed it for a while longer after this and it really looks like the post 
flush stacks blocked on that latch forever:

{code}
00:01
MemtableFlushWriter   2 2   2024 0  
   0
MemtablePostFlush 1 47159   4277 0  
   0
MemtableReclaimMemory 0 0   2024 0  
   0


00:03
MemtableFlushWriter   3 3   2075 0  
   0
MemtablePostFlush

[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:14 AM:
--

[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed or was very, very slow. Eventually a second MemtableFlushWriter 
thread blocks. I believe if I left it continue to run, all or many of them 
will. 

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


I see one thread for MemtablePostFlush and this is it:

{code}
"MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 
nid=0x2d93 waiting on condition [0x7fda4b46c000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0005838ba468> (a 
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

I followed it for a while longer after this and it really looks like the post 
flush stays blocked on that latch forever:

{code}
00:01
MemtableFlushWriter   2 2   2024 0  
   0
MemtablePostFlush 1 47159   4277 0  
   0
MemtableReclaimMemory 0 0   2024 0  
   0


00:03
MemtableFlushWriter   3 3   2075 0  
   0
MemtablePostFlush 

[jira] [Commented] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files

2015-10-15 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960123#comment-14960123
 ] 

Ariel Weisberg commented on CASSANDRA-10421:


Syncing the directory won't sync the log file. You need sync the log file 
specifically to have that data be available. Syncing the directory makes rename 
and file creation durable, but not the files contained in the directory.

bq. I also had to add several log.txnFile().close(); to the unit tests 
(whenever we test removeUnfinishedLeftovers) because on Windows we cannot 
delete files that are open. This is a bit ugly so maybe we should also go back 
to using FileUtils::appendLine, unless again you have performance concerns.
I don't mind opening the file every time.

> Potential issue with LogTransaction as it only checks in a single directory 
> for files
> -
>
> Key: CASSANDRA-10421
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10421
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Stefania
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> When creating a new LogTransaction we try to create the new logfile in the 
> same directory as the one we are writing to, but as we use 
> {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}}
>  this might end up in "any" of the configured data directories. If it does, 
> we will not be able to clean up leftovers as we check for files in the same 
> directory as the logfile was created: 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163
> cc [~Stefania]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959525#comment-14959525
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 8:06 PM:
-

Yeah doesn't look like the locking thread is deadlocked at all. I know this is 
a stretch, but considering we just migrated from 2.0.x, could there be 
something data specific that is confusing the compaction? Not sure where to 
check for slow flushes. Should i just watch tpstats?


was (Author: jeffery.griffith):
Yeah doesn't look blocked. How can i check for the slow flushes?

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled

2015-10-15 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959567#comment-14959567
 ] 

Carl Yeksigian commented on CASSANDRA-10522:


+1

> counter upgrade dtest fails on 3.0 with JVM assertions disabled
> ---
>
> Key: CASSANDRA-10522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10522
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Andrew Hust
>Assignee: Yuki Morishita
> Fix For: 3.0.0 rc2
>
>
> {{counter_tests.TestCounters.upgrade_test}}
> will fail when run on a cluster with JVM assertions disabled.  The tests will 
> hang when cassandra throws the following exception:
> {code}
> java.lang.IllegalStateException: No match found
>   at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60]
>   at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) 
> [main/:na]
> {code}
> These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 
> when assertions are enabled.
> Ran against:
> apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}}
> apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}}
> Ran with cmd:
> {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv 
> counter_tests.TestCounters.upgrade_test}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: Define cassandra_storagedir variable in debian/cassandra.in.sh

2015-10-15 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 6a1c1d900 -> c3b2aedfd


Define cassandra_storagedir variable in debian/cassandra.in.sh

patch by Paulo Motta; reviewed by Aleksey Yeschenko for CASSANDRA-10525


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c3b2aedf
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c3b2aedf
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c3b2aedf

Branch: refs/heads/cassandra-3.0
Commit: c3b2aedfd8bfce193abc8ed3809a850e603361d5
Parents: 6a1c1d9
Author: Paulo Motta 
Authored: Wed Oct 14 10:12:51 2015 -0700
Committer: Aleksey Yeschenko 
Committed: Thu Oct 15 23:13:25 2015 +0100

--
 debian/cassandra.in.sh | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3b2aedf/debian/cassandra.in.sh
--
diff --git a/debian/cassandra.in.sh b/debian/cassandra.in.sh
index 9f69ac9..8fcaf9c 100644
--- a/debian/cassandra.in.sh
+++ b/debian/cassandra.in.sh
@@ -4,6 +4,10 @@ CASSANDRA_CONF=/etc/cassandra
 
 CASSANDRA_HOME=/usr/share/cassandra
 
+# the default location for commitlogs, sstables, and saved caches
+# if not set in cassandra.yaml
+cassandra_storagedir=/var/lib/cassandra
+
 # The java classpath (required)
 if [ -n "$CLASSPATH" ]; then
 CLASSPATH=$CLASSPATH:$CASSANDRA_CONF



[jira] [Commented] (CASSANDRA-10525) Hints directory not created on debian packaged install

2015-10-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959731#comment-14959731
 ] 

Aleksey Yeschenko commented on CASSANDRA-10525:
---

Committed as 
[c3b2aedfd8bfce193abc8ed3809a850e603361d5|https://github.com/apache/cassandra/commit/c3b2aedfd8bfce193abc8ed3809a850e603361d5]
 to 3.0 and merged with trunk, thanks.

> Hints directory not created on debian packaged install
> --
>
> Key: CASSANDRA-10525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10525
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reproduction steps:
> * Create debian package install with {{dpkg-buildpackage -uc -us}}
> * Install package with {{sudo dpkg -i ../cassandra_3.0.0\~rc1_all.deb}}
> * Start cassandra with {{sudo service cassandra start}}
> Cassandra does not start with the following message on 
> {{/var/log/cassandra/system.log}}:
> {noformat}
> DEBUG [main] 2015-10-14 09:28:36,083 StartupChecks.java:191 - Checking 
> directory /var/lib/cassandra/data
> DEBUG [main] 2015-10-14 09:28:36,087 StartupChecks.java:191 - Checking 
> directory /var/lib/cassandra/commitlog
> DEBUG [main] 2015-10-14 09:28:36,087 StartupChecks.java:191 - Checking 
> directory /var/lib/cassandra/saved_caches
> DEBUG [main] 2015-10-14 09:28:36,087 StartupChecks.java:191 - Checking 
> directory /hints
> WARN  [main] 2015-10-14 09:28:36,088 StartupChecks.java:197 - Directory 
> /hints doesn't exist
> ERROR [main] 2015-10-14 09:28:36,088 CassandraDaemon.java:702 - Has no 
> permission to create directory /hints
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files

2015-10-15 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959729#comment-14959729
 ] 

Ariel Weisberg commented on CASSANDRA-10421:


So what I think I see is that when the LogTransaction completes it first writes 
to the log the commit record, and then starts making permanent changes to the 
the files on disk (deleting the old ones). But if it hasn't actually synced the 
log to disk then on a restart we could have a partial log and attempt to roll 
back, but it is too late because before the crash we had already deleted parts 
of the before state. At the end we should sync the log files before deleting 
the obsolete files right?

Before we add a new file that we want to have cleaned up maybe we also want to 
make sure the record is one disk so that it will definitely be cleaned up? 
Maybe not necessary since it is just additional data that will be compacted 
later.

Maybe optimizing for power failure isn't necessary, but then why are we syncing 
directories?

[Here it seems like you don't sync the folder when appending every 
record?|https://github.com/apache/cassandra/commit/8e02e47e1a4a86428bec61d8975a9706c544003b#diff-a7c36820cf8658b605948a23e3033f88R76].
 Was the intent to sync the folder when creating the log file or when adding a 
record which indicates the addition of other data files?

I am generally +1 other then my confusion over how syncing of the log file 
contents is handled.

The tests don't seem to match trunk. I gave them another spin on the 3.0 branch 
to get another sample. 

> Potential issue with LogTransaction as it only checks in a single directory 
> for files
> -
>
> Key: CASSANDRA-10421
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10421
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Stefania
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> When creating a new LogTransaction we try to create the new logfile in the 
> same directory as the one we are writing to, but as we use 
> {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}}
>  this might end up in "any" of the configured data directories. If it does, 
> we will not be able to clean up leftovers as we check for files in the same 
> directory as the logfile was created: 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163
> cc [~Stefania]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10537) CONTAINS and CONTAINS KEY support for Lightweight Transactions

2015-10-15 Thread Nimi Wariboko Jr. (JIRA)
Nimi Wariboko Jr. created CASSANDRA-10537:
-

 Summary: CONTAINS and CONTAINS KEY support for Lightweight 
Transactions
 Key: CASSANDRA-10537
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10537
 Project: Cassandra
  Issue Type: Improvement
Reporter: Nimi Wariboko Jr.
 Fix For: 2.1.x


Conditional updates currently do not support CONTAINS and CONTAINS KEY 
conditions. Queries such as 

{{UPDATE mytable SET somefield = 4 WHERE pk = 'pkv' IF set_column CONTAINS 5;}}

are not possible.

Would it also be possible to support the negation of these (ex. testing that a 
value does not exist inside a set)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal

2015-10-15 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959718#comment-14959718
 ] 

Joel Knighton commented on CASSANDRA-10089:
---

Unfortunately, it looks like whatever environmental issues affected the first 
run also got the most recent run. 

Fortunately, looking at 2.1/2.2 dtest/testall runs recently, it seems to have 
been resolved.

Can you trigger one more run? If that doesn't work, I'll evaluate tests locally 
as part of review.

Sorry about this.

> NullPointerException in Gossip handleStateNormal
> 
>
> Key: CASSANDRA-10089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing 
> dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
>  in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 
> 15:39:57,873 CassandraDaemon.java:183 - Exception in thread 
> Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat 
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  ~[main/:na] \tat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na] \tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_80] \tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches  but it is clearly not related 
> to CASSANDRA-9970, if anything it could have been a side effect of 
> CASSANDRA-9871.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal

2015-10-15 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959718#comment-14959718
 ] 

Joel Knighton edited comment on CASSANDRA-10089 at 10/15/15 10:05 PM:
--

Unfortunately, it looks like whatever environmental issues affected the first 
run also hit the most recent run. 

Fortunately, looking at 2.1/2.2 dtest/testall runs recently, it seems to have 
been resolved.

Can you trigger one more run? If that doesn't work, I'll evaluate tests locally 
as part of review.

Sorry about this.


was (Author: jkni):
Unfortunately, it looks like whatever environmental issues affected the first 
run also got the most recent run. 

Fortunately, looking at 2.1/2.2 dtest/testall runs recently, it seems to have 
been resolved.

Can you trigger one more run? If that doesn't work, I'll evaluate tests locally 
as part of review.

Sorry about this.

> NullPointerException in Gossip handleStateNormal
> 
>
> Key: CASSANDRA-10089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing 
> dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
>  in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 
> 15:39:57,873 CassandraDaemon.java:183 - Exception in thread 
> Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat 
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  ~[main/:na] \tat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na] \tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_80] \tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches  but it is clearly not related 
> to CASSANDRA-9970, if anything it could have been a side effect of 
> CASSANDRA-9871.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[4/4] cassandra git commit: Merge branch 'cassandra-3.0' into trunk

2015-10-15 Thread yukim
Merge branch 'cassandra-3.0' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/1cb9a02b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/1cb9a02b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/1cb9a02b

Branch: refs/heads/trunk
Commit: 1cb9a02bd951b424d960047297084d6ce4b18b6c
Parents: 0e3da95 a52597d
Author: Yuki Morishita 
Authored: Thu Oct 15 17:31:56 2015 -0500
Committer: Yuki Morishita 
Committed: Thu Oct 15 17:31:56 2015 -0500

--
 CHANGES.txt | 1 +
 debian/cassandra.in.sh  | 4 
 src/java/org/apache/cassandra/db/lifecycle/LogFile.java | 3 ++-
 3 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/1cb9a02b/CHANGES.txt
--
diff --cc CHANGES.txt
index e2d989c,dcacc69..5265215
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,9 -1,5 +1,10 @@@
 +3.2
 + * Abort in-progress queries that time out (CASSANDRA-7392)
 + * Add transparent data encryption core classes (CASSANDRA-9945)
 +
 +
  3.0-rc2
+  * Fix LogFile throws Exception when assertion is disabled (CASSANDRA-10522)
   * Revert CASSANDRA-7486, make CMS default GC, move GC config to
 conf/jvm.options (CASSANDRA-10403)
   * Fix TeeingAppender causing some logs to be truncated/empty 
(CASSANDRA-10447)



[3/4] cassandra git commit: Fix LogFile throws Exception when assertion is disabled

2015-10-15 Thread yukim
Fix LogFile throws Exception when assertion is disabled

patch by yukim; reviewed by carlyeks for CASSANDRA-10522


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a52597d8
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a52597d8
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a52597d8

Branch: refs/heads/cassandra-3.0
Commit: a52597d81396e09274ecf6d05ebbf0e24c259fc6
Parents: c3b2aed
Author: Yuki Morishita 
Authored: Wed Oct 14 11:03:22 2015 -0500
Committer: Yuki Morishita 
Committed: Thu Oct 15 17:30:59 2015 -0500

--
 CHANGES.txt | 1 +
 src/java/org/apache/cassandra/db/lifecycle/LogFile.java | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index fa74539..dcacc69 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0-rc2
+ * Fix LogFile throws Exception when assertion is disabled (CASSANDRA-10522)
  * Revert CASSANDRA-7486, make CMS default GC, move GC config to
conf/jvm.options (CASSANDRA-10403)
  * Fix TeeingAppender causing some logs to be truncated/empty (CASSANDRA-10447)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
--
diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java 
b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
index c698722..bff3724 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
@@ -43,7 +43,8 @@ final class LogFile
 static LogFile make(File logFile, int folderDescriptor)
 {
 Matcher matcher = LogFile.FILE_REGEX.matcher(logFile.getName());
-assert matcher.matches() && matcher.groupCount() == 3;
+boolean matched = matcher.matches();
+assert matched && matcher.groupCount() == 3;
 
 // For now we don't need this but it is there in case we need to change
 // file format later on, the version is the sstable version as defined 
in BigFormat



[2/4] cassandra git commit: Fix LogFile throws Exception when assertion is disabled

2015-10-15 Thread yukim
Fix LogFile throws Exception when assertion is disabled

patch by yukim; reviewed by carlyeks for CASSANDRA-10522


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a52597d8
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a52597d8
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a52597d8

Branch: refs/heads/trunk
Commit: a52597d81396e09274ecf6d05ebbf0e24c259fc6
Parents: c3b2aed
Author: Yuki Morishita 
Authored: Wed Oct 14 11:03:22 2015 -0500
Committer: Yuki Morishita 
Committed: Thu Oct 15 17:30:59 2015 -0500

--
 CHANGES.txt | 1 +
 src/java/org/apache/cassandra/db/lifecycle/LogFile.java | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index fa74539..dcacc69 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0-rc2
+ * Fix LogFile throws Exception when assertion is disabled (CASSANDRA-10522)
  * Revert CASSANDRA-7486, make CMS default GC, move GC config to
conf/jvm.options (CASSANDRA-10403)
  * Fix TeeingAppender causing some logs to be truncated/empty (CASSANDRA-10447)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
--
diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java 
b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
index c698722..bff3724 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
@@ -43,7 +43,8 @@ final class LogFile
 static LogFile make(File logFile, int folderDescriptor)
 {
 Matcher matcher = LogFile.FILE_REGEX.matcher(logFile.getName());
-assert matcher.matches() && matcher.groupCount() == 3;
+boolean matched = matcher.matches();
+assert matched && matcher.groupCount() == 3;
 
 // For now we don't need this but it is there in case we need to change
 // file format later on, the version is the sstable version as defined 
in BigFormat



[1/4] cassandra git commit: Define cassandra_storagedir variable in debian/cassandra.in.sh

2015-10-15 Thread yukim
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 c3b2aedfd -> a52597d81
  refs/heads/trunk 0e3da95d6 -> 1cb9a02bd


Define cassandra_storagedir variable in debian/cassandra.in.sh

patch by Paulo Motta; reviewed by Aleksey Yeschenko for CASSANDRA-10525


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c3b2aedf
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c3b2aedf
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c3b2aedf

Branch: refs/heads/trunk
Commit: c3b2aedfd8bfce193abc8ed3809a850e603361d5
Parents: 6a1c1d9
Author: Paulo Motta 
Authored: Wed Oct 14 10:12:51 2015 -0700
Committer: Aleksey Yeschenko 
Committed: Thu Oct 15 23:13:25 2015 +0100

--
 debian/cassandra.in.sh | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3b2aedf/debian/cassandra.in.sh
--
diff --git a/debian/cassandra.in.sh b/debian/cassandra.in.sh
index 9f69ac9..8fcaf9c 100644
--- a/debian/cassandra.in.sh
+++ b/debian/cassandra.in.sh
@@ -4,6 +4,10 @@ CASSANDRA_CONF=/etc/cassandra
 
 CASSANDRA_HOME=/usr/share/cassandra
 
+# the default location for commitlogs, sstables, and saved caches
+# if not set in cassandra.yaml
+cassandra_storagedir=/var/lib/cassandra
+
 # The java classpath (required)
 if [ -n "$CLASSPATH" ]; then
 CLASSPATH=$CLASSPATH:$CASSANDRA_CONF



[jira] [Assigned] (CASSANDRA-10517) Make sure all unit tests run on CassCI on Windows

2015-10-15 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-10517:
-

Assignee: Joel Knighton

> Make sure all unit tests run on CassCI on Windows
> -
>
> Key: CASSANDRA-10517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10517
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Joel Knighton
>  Labels: triage
> Fix For: 3.0.0 rc2
>
>
> It seems that some Windows unit tests aren't run sometimes on CassCI, and 
> there's no error reporting for this. For instance, this test was introduced 
> around the time build #38 would have happened, but has only run in builds 
> #50-3 and #64:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.cql3/ViewTest/testPrimaryKeyIsNotNull/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled

2015-10-15 Thread Carl Yeksigian (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Yeksigian updated CASSANDRA-10522:
---
Reviewer: Carl Yeksigian

> counter upgrade dtest fails on 3.0 with JVM assertions disabled
> ---
>
> Key: CASSANDRA-10522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10522
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Andrew Hust
>Assignee: Yuki Morishita
> Fix For: 3.0.0 rc2
>
>
> {{counter_tests.TestCounters.upgrade_test}}
> will fail when run on a cluster with JVM assertions disabled.  The tests will 
> hang when cassandra throws the following exception:
> {code}
> java.lang.IllegalStateException: No match found
>   at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60]
>   at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) 
> [main/:na]
> {code}
> These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 
> when assertions are enabled.
> Ran against:
> apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}}
> apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}}
> Ran with cmd:
> {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv 
> counter_tests.TestCounters.upgrade_test}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959508#comment-14959508
 ] 

T Jake Luciani commented on CASSANDRA-10515:


That's RUNNABLE though not BLOCKED.  Are you actually deadlocking or only 
seeing slow flushes?

[~krummas] any ideas?


> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959554#comment-14959554
 ] 

T Jake Luciani commented on CASSANDRA-10515:


Yeah if the COMPLETED column for flushing is incrementing

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959608#comment-14959608
 ] 

Jeff Griffith commented on CASSANDRA-10515:
---

I had restarted but I'll watch live the next iteration. As you see upwards in 
the comments though, they do start piling up:
MemtableFlushWriter   1 1   1574 0  
   0
MemtablePostFlush 1 13755 134889 0  
   0
MemtableReclaimMemory 0 0   1574 0  
   0

In the previous iteration, there were four threads for MemtableFlushWriter all 
blocked behind the runnable 
LeveledManifest.getCandidatesFor(LeveledManifest.java:572)


> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959525#comment-14959525
 ] 

Jeff Griffith commented on CASSANDRA-10515:
---

Yeah doesn't look blocked. How can i check for the slow flushes?

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (CASSANDRA-10524) Add ability to skip TIME_WAIT sockets on port check on Windows startup

2015-10-15 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie reopened CASSANDRA-10524:
-

> Add ability to skip TIME_WAIT sockets on port check on Windows startup
> --
>
> Key: CASSANDRA-10524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10524
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>Priority: Trivial
>  Labels: Windows
> Fix For: 3.0.0 rc2, 2.2.4
>
> Attachments: win_aggressive_startup.txt
>
>
> C* sockets are often staying TIME_WAIT for up to 120 seconds (2x max segment 
> lifetime) for me in my dev environment on Windows. This is rather obnoxious 
> since it means I can't launch C* for up to 2 minutes after stopping it.
> Attaching a patch that adds a simple -a for aggressive startup to the launch 
> scripts to ignore duplicate port check from netstat if it's TIME_WAIT. Also 
> snuck in some more liberal interpretation of help strings in the .ps1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled

2015-10-15 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-10522:
---
Tester: Andrew Hust

> counter upgrade dtest fails on 3.0 with JVM assertions disabled
> ---
>
> Key: CASSANDRA-10522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10522
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Andrew Hust
>Assignee: Yuki Morishita
> Fix For: 3.0.0 rc2
>
>
> {{counter_tests.TestCounters.upgrade_test}}
> will fail when run on a cluster with JVM assertions disabled.  The tests will 
> hang when cassandra throws the following exception:
> {code}
> java.lang.IllegalStateException: No match found
>   at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60]
>   at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) 
> [main/:na]
> {code}
> These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 
> when assertions are enabled.
> Ran against:
> apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}}
> apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}}
> Ran with cmd:
> {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv 
> counter_tests.TestCounters.upgrade_test}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10536) Batch statements with multiple updates to partition error when table is indexed

2015-10-15 Thread Tyler Hobbs (JIRA)
Tyler Hobbs created CASSANDRA-10536:
---

 Summary: Batch statements with multiple updates to partition error 
when table is indexed
 Key: CASSANDRA-10536
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10536
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Tyler Hobbs
Assignee: Sylvain Lebresne
 Fix For: 3.0.0 rc2


If a {{BATCH}} statement contains multiple {{UPDATE}} statements that update 
the same partition, and a secondary index exists on that table, the batch 
statement will error:

{noformat}
ServerError: 
{noformat}

with the following traceback in the logs:

{noformat}
ERROR 20:53:46 Unexpected exception during request
java.lang.IllegalStateException: An update should not be written again once it 
has been read
at 
org.apache.cassandra.db.partitions.PartitionUpdate.assertNotBuilt(PartitionUpdate.java:504)
 ~[main/:na]
at 
org.apache.cassandra.db.partitions.PartitionUpdate.add(PartitionUpdate.java:535)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:96)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.ModificationStatement.addUpdates(ModificationStatement.java:667)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.BatchStatement.getMutations(BatchStatement.java:234)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:335)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:321)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:316)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:205)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:471)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:448)
 ~[main/:na]
at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:130)
 ~[main/:na]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507)
 [main/:na]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401)
 [main/:na]
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_45]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 [main/:na]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[main/:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
{noformat}

This is due to {{SecondaryIndexManager.validate()}} triggering a build of the 
{{PartitionUpdate}} (stacktrace from debugging the build() call):

{noformat}
at 
org.apache.cassandra.db.partitions.PartitionUpdate.build(PartitionUpdate.java:571)
 [main/:na]
at 
org.apache.cassandra.db.partitions.PartitionUpdate.maybeBuild(PartitionUpdate.java:561)
 [main/:na]
at 
org.apache.cassandra.db.partitions.PartitionUpdate.iterator(PartitionUpdate.java:418)
 [main/:na]
at 
org.apache.cassandra.index.internal.CassandraIndex.validateRows(CassandraIndex.java:560)
 [main/:na]
at 
org.apache.cassandra.index.internal.CassandraIndex.validate(CassandraIndex.java:314)
 [main/:na]
at 
org.apache.cassandra.index.SecondaryIndexManager.lambda$validate$75(SecondaryIndexManager.java:642)
 [main/:na]
at 
org.apache.cassandra.index.SecondaryIndexManager$$Lambda$166/1388080038.accept(Unknown
 Source) [main/:na]
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
[na:1.8.0_45]
at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) 
[na:1.8.0_45]
at 
java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3566)
 [na:1.8.0_45]
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512) 
[na:1.8.0_45]
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502) 
[na:1.8.0_45]
at 

[jira] [Commented] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled

2015-10-15 Thread Andrew Hust (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959736#comment-14959736
 ] 

Andrew Hust commented on CASSANDRA-10522:
-

Confirmed that these tests (and duplicate jira tests) now pass and no exception 
is thrown.

Ran on:
yukim/10522: {{93783039918f8662760195e0f33c4cab20b17c8d}}

> counter upgrade dtest fails on 3.0 with JVM assertions disabled
> ---
>
> Key: CASSANDRA-10522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10522
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Andrew Hust
>Assignee: Yuki Morishita
> Fix For: 3.0.0 rc2
>
>
> {{counter_tests.TestCounters.upgrade_test}}
> will fail when run on a cluster with JVM assertions disabled.  The tests will 
> hang when cassandra throws the following exception:
> {code}
> java.lang.IllegalStateException: No match found
>   at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60]
>   at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552)
>  ~[main/:na]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) 
> [main/:na]
> {code}
> These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 
> when assertions are enabled.
> Ran against:
> apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}}
> apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}}
> Ran with cmd:
> {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv 
> counter_tests.TestCounters.upgrade_test}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal

2015-10-15 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959758#comment-14959758
 ] 

Jim Witschey commented on CASSANDRA-10089:
--

Sorry to insert myself, but: you should be able to trigger the builds you want 
by just pulling down Stefania's branch and pushing it to GitHub.

> NullPointerException in Gossip handleStateNormal
> 
>
> Key: CASSANDRA-10089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing 
> dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
>  in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 
> 15:39:57,873 CassandraDaemon.java:183 - Exception in thread 
> Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat 
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  ~[main/:na] \tat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na] \tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_80] \tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches  but it is clearly not related 
> to CASSANDRA-9970, if anything it could have been a side effect of 
> CASSANDRA-9871.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10529) Channel.size() is costly, mutually exclusive, and on the critical path

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960139#comment-14960139
 ] 

Stefania commented on CASSANDRA-10529:
--

I agree it's definitely worth removing the assertion. Is there anything else 
you require for this ticket?

> Channel.size() is costly, mutually exclusive, and on the critical path
> --
>
> Key: CASSANDRA-10529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Stefania
> Fix For: 3.0.0 rc2
>
>
> [~stefania_alborghetti] mentioned this already on another ticket, but I have 
> lost track of exactly where. While benchmarking it became apparent this was a 
> noticeable bottleneck for small in-memory workloads with few files, 
> especially with RF=1. We should probably fix this soon, since it is trivial 
> to do so, and the call is only to impose an assertion that our requested 
> length is less than the file size. It isn't possible to safely memoize a 
> value anywhere we can guarantee to be able to safely refer to it without some 
> refactoring, so I suggest simply removing the assertion for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10461) Fix sstableverify_test dtest

2015-10-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960198#comment-14960198
 ] 

Stefania commented on CASSANDRA-10461:
--

The pull request for ignoring extra output lines is 
[here|https://github.com/riptano/cassandra-dtest/pull/613].

> Fix sstableverify_test dtest
> 
>
> Key: CASSANDRA-10461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10461
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>Assignee: Stefania
>  Labels: test
> Fix For: 3.0.0 rc2
>
>
> The dtest for sstableverify is failing:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/
> It fails in the same way when I run it on OpenStack, so I don't think it's 
> just a CassCI problem.
> [~slebresne] Looks like you made changes to this test recently:
> https://github.com/riptano/cassandra-dtest/commit/51ab085f21e01cc8e5ad88a277cb4a43abd3f880
> Could you have a look at the failure? I'm assigning you for triage, but feel 
> free to reassign.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960192#comment-14960192
 ] 

Marcus Eriksson commented on CASSANDRA-10515:
-

could you post nodetool cfstats and your node config ? This looks like 
CASSANDRA-9882 but that problem was with DTCS and very many sstables.

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, 
> RUN3tpstats.jpg, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10539) Different encodings used between nodes can cause inconsistently generated prepared statement ids

2015-10-15 Thread Andy Tolbert (JIRA)
Andy Tolbert created CASSANDRA-10539:


 Summary: Different encodings used between nodes can cause 
inconsistently generated prepared statement ids 
 Key: CASSANDRA-10539
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10539
 Project: Cassandra
  Issue Type: Bug
Reporter: Andy Tolbert
Priority: Minor


[From the java-driver mailing 
list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI]
 / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955]

If you have nodes in your cluster that are using a different default character 
set it's possible for nodes to generate different prepared statement ids for 
the same 'keyspace + query string' combination.  I imagine this is not a very 
typical or desired configuration (thus the low severity).

This is because 
[MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54]
 uses 
[String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()]
 which relies on the default charset.

In the general case this is fine, but if you use some characters in your query 
string such as 
[Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE]
 ('\u') the byte representation may vary based on the coding.

I was able to reproduce this configuring a 2-node cluster with node1 using 
file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}.   The 
java-driver test demonstrates this can be found 
[here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10539) Different encodings used between nodes can cause inconsistently generated prepared statement ids

2015-10-15 Thread Andy Tolbert (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Tolbert updated CASSANDRA-10539:
-
Description: 
[From the java-driver mailing 
list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI]
 / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955]

If you have nodes in your cluster that are using a different default character 
set it's possible for nodes to generate different prepared statement ids for 
the same 'keyspace + query string' combination.  I imagine this is not a very 
typical or desired configuration (thus the low severity).

This is because 
[MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54]
 uses 
[String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()]
 which relies on the default charset.

In the general case this is fine, but if you use some characters in your query 
string such as 
[Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE]
 ('\u') the byte representation may vary based on the coding.

I was able to reproduce this configuring a 2-node cluster with node1 using 
file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}.   The 
java-driver test that demonstrates this can be found 
[here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java].

  was:
[From the java-driver mailing 
list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI]
 / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955]

If you have nodes in your cluster that are using a different default character 
set it's possible for nodes to generate different prepared statement ids for 
the same 'keyspace + query string' combination.  I imagine this is not a very 
typical or desired configuration (thus the low severity).

This is because 
[MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54]
 uses 
[String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()]
 which relies on the default charset.

In the general case this is fine, but if you use some characters in your query 
string such as 
[Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE]
 ('\u') the byte representation may vary based on the coding.

I was able to reproduce this configuring a 2-node cluster with node1 using 
file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}.   The 
java-driver test demonstrates this can be found 
[here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java].


> Different encodings used between nodes can cause inconsistently generated 
> prepared statement ids 
> -
>
> Key: CASSANDRA-10539
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10539
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Andy Tolbert
>Priority: Minor
>
> [From the java-driver mailing 
> list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI]
>  / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955]
> If you have nodes in your cluster that are using a different default 
> character set it's possible for nodes to generate different prepared 
> statement ids for the same 'keyspace + query string' combination.  I imagine 
> this is not a very typical or desired configuration (thus the low severity).
> This is because 
> [MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54]
>  uses 
> [String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()]
>  which relies on the default charset.
> In the general case this is fine, but if you use some characters in your 
> query string such as 
> [Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE]
>  ('\u') the byte representation may vary based on the coding.
> I was able to reproduce this configuring a 2-node cluster with node1 using 
> file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}.   The 
> java-driver test that demonstrates this can be found 
> [here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959218#comment-14959218
 ] 

Jeff Griffith commented on CASSANDRA-10515:
---

btw, we tried commitlog_segment_recycling: false but we realized after this 
should already be the default. we briefly thought it made a difference after 
restart that node but the problem did return after several hours. there is some 
mention in another jira about tuning the number of memtable flush writers. 
could this be an issue? it's still difficult to explain though why we only see 
this in a few nodes in the ten clusters all with the same config.

will try to get the thread dump asap.


> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10519) RepairException: [repair #... on .../..., (...,...]] Validation failed in /w.x.y.z

2015-10-15 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959239#comment-14959239
 ] 

Yuki Morishita commented on CASSANDRA-10519:


{code}
Cannot start multiple repair sessions over the same sstables
{code}

There was leftover incremental repair session on one of the nodes.
Restarting node will solve the problem.

Recent version of C* will try to clear out leftover, so it should be less 
likely to happen.
(Not perfect though, we need something like CASSANDRA-10302 to keep state 
clean.)

> RepairException: [repair #... on .../..., (...,...]] Validation failed in 
> /w.x.y.z
> --
>
> Key: CASSANDRA-10519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10519
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 7, JDK 8u60, Cassandra 2.2.2 (upgraded from 2.1.5)
>Reporter: Gábor Auth
>
> Sometimes the repair fails:
> {code}
> ERROR [Repair#3:1] 2015-10-14 06:22:56,490 CassandraDaemon.java:185 - 
> Exception in thread Thread[Repair#3:1,5,RMI Runtime]
> com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, 
> (2414492737393085601,27880539413409
> 54029]] Validation failed in /w.y.x.z
> at 
> com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387)
>  ~[guava-16.0.jar:na]
> at 
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) 
> ~[guava-16.0.jar:na]
> at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:169) 
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
> Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
> #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, 
> (2414492737393085601,2788053941340954029]] Validation failed in /w.y.x.z
> at 
> org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> ... 3 common frames omitted
> {code}
> And here is the w.y.x.z side:
> {code}
> ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 
> CompactionManager.java:1053 - Cannot start multiple repair sessions over the 
> same sstables
> ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 Validator.java:246 - 
> Failed creating a merkle tree for [repair 
> #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, 
> (2414492737393085601,2788053941340954029]], /a.b.c.d (see log for details)
> ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,488 CassandraDaemon.java:185 
> - Exception in thread Thread[ValidationExecutor:7,1,main]
> java.lang.RuntimeException: Cannot start multiple repair sessions over the 
> same sstables
> at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1054)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:86)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:652)
>  ~[apache-cassandra-2.2.2.jar:2.2.2]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_60]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> ...
> ERROR [Reference-Reaper:1] 2015-10-14 06:23:21,439 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@74fc054a) to class 
> 

[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959208#comment-14959208
 ] 

Jeff Griffith commented on CASSANDRA-10515:
---

working on it [~mishail]

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959217#comment-14959217
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Ok [~mishail] I will re-run with heap dump enabled (we had it turned off for 
some reason) and post it.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959498#comment-14959498
 ] 

Jeff Griffith commented on CASSANDRA-10515:
---

A second iteration. Ran into a second instance of metrics via RMI but caught it 
very early when only a few were blocked behind the compaction. Still looks like 
the same general place:

{code}
"CompactionExecutor:16" #1502 daemon prio=1 os_prio=4 tid=0x7fb78c4f2000 
nid=0xf7ff runnable [0x7fb751941000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.putVal(HashMap.java:641)
at java.util.HashMap.put(HashMap.java:611)
at java.util.HashSet.add(HashSet.java:219)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:512)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004bcf24298> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004bcbec488> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004b98f1b00> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

{code}

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >