[jira] [Comment Edited] (CASSANDRA-10363) NullPointerException returned with select ttl(value), IN, ORDER BY and paging off
[ https://issues.apache.org/jira/browse/CASSANDRA-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957009#comment-14957009 ] Sam Tunnicliffe edited comment on CASSANDRA-10363 at 10/15/15 10:44 AM: I've attached a patch backporting this to 2.0, not for actually committing but so those unable to upgrade just yet can patch their own systems if necessary. The test changes the expectations for a few scenarios from the 2.1+ version because CASSANDRA-4911 isn't in 2.0 & so {{ORDER BY}} can only contain columns in the selection. [branch|https://github.com/beobal/cassandra/tree/10363-2.0], [testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-testall/], [dtests|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-dtest/] (test runs pending) Edit: there are a few dtest failures in the run above, but checking these against 2.0 there aren't any new failures. was (Author: beobal): I've attached a patch backporting this to 2.0, not for actually committing but so those unable to upgrade just yet can patch their own systems if necessary. The test changes the expectations for a few scenarios from the 2.1+ version because CASSANDRA-4911 isn't in 2.0 & so {{ORDER BY}} can only contain columns in the selection. [branch|https://github.com/beobal/cassandra/tree/10363-2.0], [testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-testall/], [dtests|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10363-2.0-dtest/] (test runs pending) > NullPointerException returned with select ttl(value), IN, ORDER BY and paging > off > - > > Key: CASSANDRA-10363 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10363 > Project: Cassandra > Issue Type: Bug > Environment: Apache Cassandra 2.1.8.689 >Reporter: Sucwinder Bassi >Assignee: Benjamin Lerer >Priority: Minor > Fix For: 2.1.x, 2.2.x, 3.0.x > > Attachments: 10363-2.0-c4de752.txt > > > Running this query with paging off returns a NullPointerException: > cqlsh:test> SELECT value, ttl(value), last_modified FROM test where > useruid='userid1' AND direction IN ('out','in') ORDER BY last_modified; > ServerError: message="java.lang.NullPointerException"> > Here's the stack trace from the system.log: > ERROR [SharedPool-Worker-1] 2015-09-17 13:11:03,937 ErrorMessage.java:251 - > Unexpected exception during request > java.lang.NullPointerException: null > at > org.apache.cassandra.db.marshal.LongType.compareLongs(LongType.java:41) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.db.marshal.TimestampType.compare(TimestampType.java:48) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.db.marshal.TimestampType.compare(TimestampType.java:38) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.statements.SelectStatement$SingleColumnComparator.compare(SelectStatement.java:2419) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.statements.SelectStatement$SingleColumnComparator.compare(SelectStatement.java:2406) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:351) > ~[na:1.8.0_40] > at java.util.TimSort.sort(TimSort.java:216) ~[na:1.8.0_40] > at java.util.Arrays.sort(Arrays.java:1512) ~[na:1.8.0_40] > at java.util.ArrayList.sort(ArrayList.java:1454) ~[na:1.8.0_40] > at java.util.Collections.sort(Collections.java:175) ~[na:1.8.0_40] > at > org.apache.cassandra.cql3.statements.SelectStatement.orderResults(SelectStatement.java:1400) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1255) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:299) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:276) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:67) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238) > ~[cassandra-all-2.1.8.689.jar:2.1.8.689] > at > com.datastax.bdp.cassandra.cql3.DseQueryHandler$StatementExecution.execute(DseQueryHandler.java:291)
[jira] [Commented] (CASSANDRA-10509) Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip
[ https://issues.apache.org/jira/browse/CASSANDRA-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958603#comment-14958603 ] Stefania commented on CASSANDRA-10509: -- CI looks reasonable. Regarding {{SELECT *}}, it's the driver token aware load balancing policy that prevents this from being reproducible. Because we always send the final page request to the node where the last key is local and where therefore {{ExclusiveBounds}} does work. The problem can be reproduced with {{SELECT *}} provided we use an {{exclusive_cql_connection}}. > Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip > - > > Key: CASSANDRA-10509 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10509 > Project: Cassandra > Issue Type: Sub-task >Reporter: Paulo Motta >Assignee: Stefania > Fix For: 2.2.x > > > Test failing on 2.2 after fixing CASSANDRA-10507: > http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10507-2.2-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10532) Allow LWT operation on static column with only partition keys
DOAN DuyHai created CASSANDRA-10532: --- Summary: Allow LWT operation on static column with only partition keys Key: CASSANDRA-10532 URL: https://issues.apache.org/jira/browse/CASSANDRA-10532 Project: Cassandra Issue Type: Bug Components: API Environment: C* 2.2.0 Reporter: DOAN DuyHai Schema {code:sql} CREATE TABLE IF NOT EXISTS achilles_embedded.entity_with_static_column( id bigint, uuid uuid, static_col text static, value text, PRIMARY KEY(id, uuid)); {code} When trying to prepare the following query {code:sql} DELETE static_col FROM achilles_embedded.entity_with_static_column WHERE id=:id_Eq IF static_col=:static_col; {code} I got the error *DELETE statements must restrict all PRIMARY KEY columns with equality relations in order to use IF conditions, but column 'uuid' is not restricted* Since the mutation only impacts the static column and the CAS check is on the static column, it makes sense to provide only partition key -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10518) initialDirectories passed into ColumnFamilyStore contructor
[ https://issues.apache.org/jira/browse/CASSANDRA-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958812#comment-14958812 ] Marcus Eriksson commented on CASSANDRA-10518: - Ok, this lgtm (with a tiny nit), will commit once CI is done; http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-10518-dtest/ http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-10518-testall/ > initialDirectories passed into ColumnFamilyStore contructor > --- > > Key: CASSANDRA-10518 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10518 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 3.0.0 rc2 > > Attachments: 10518-v2.txt, initialDirectoriesFixV1.patch > > > One of the goals of CASSANDRA-8671 was to let compaction strategies write to > directories not used by normal tables, and the field > {{ColumnFamilyStore.initialDirectories}} was added to make sstables in those > directories discoverable on cfs instantiation. > Unfortunately, in my patch, I passed the full list of directories > {{initialDirectories}} into the ColumnFamilyStore constructor, effectively > making these directories usable by any table. The attached patch fixes this, > and elaborates on the correct usage of the usage of > {{ColumnFamilyStore.addInitialDirectories}} in it's comment -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10468) Fix class-casting error in mixed clusters for 2.2->3.0 upgrades
[ https://issues.apache.org/jira/browse/CASSANDRA-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958673#comment-14958673 ] Sylvain Lebresne commented on CASSANDRA-10468: -- Correct, there is problems with the reversed case (on top of the one you mention, we also don't properly reverse the name comparator). Pushed a fix [here|https://github.com/pcmanus/cassandra/commits/10468-followup] that also include a unit test for all this. > Fix class-casting error in mixed clusters for 2.2->3.0 upgrades > --- > > Key: CASSANDRA-10468 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10468 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Sylvain Lebresne > Fix For: 3.0.0 rc2 > > > Three upgrade tests: > - {{upgrade_tests/cql_tests.py:TestCQL.cas_and_list_index_test}} > - {{upgrade_tests/cql_tests.py:TestCQL.collection_and_regular_test}} > - {{upgrade_tests/cql_tests.py:TestCQL.composite_index_collections_test}} > fail on the upgrade path from 2.2 to 3.0. The failures can be found on CassCI > here: > [cas_and_list_index_test|http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/43/testReport/upgrade_tests.cql_tests/TestCQL/cas_and_list_index_test/] > [collection_and_regular_test|http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/43/testReport/upgrade_tests.cql_tests/TestCQL/collection_and_regular_test/] > [composite_index_collections_test|http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/43/testReport/upgrade_tests.cql_tests/TestCQL/composite_index_collections_test/] > You can run these tests with the following command: > {code} > SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 > nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.cas_and_list_index_test > upgrade_tests/cql_tests.py:TestCQL.collection_and_regular_test > upgrade_tests/cql_tests.py:TestCQL.composite_index_collections_test > {code} > Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is > merged, these tests should also run with this upgrade path on normal 3.0 jobs. > EDIT: the following test seems to fail with the same error: > http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/41/testReport/upgrade_tests.cql_tests/TestCQL/null_support_test/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10509) Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip
[ https://issues.apache.org/jira/browse/CASSANDRA-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958603#comment-14958603 ] Stefania edited comment on CASSANDRA-10509 at 10/15/15 10:04 AM: - CI looks reasonable. Regarding {{SELECT *}}, it's the driver token aware load balancing policy that prevents this from being reproducible. Because we always send the final page request to the node where the last key is local and where therefore {{ExcludingBounds}} does work. The problem can be reproduced with {{SELECT *}} provided we use an {{exclusive_cql_connection}}. was (Author: stefania): CI looks reasonable. Regarding {{SELECT *}}, it's the driver token aware load balancing policy that prevents this from being reproducible. Because we always send the final page request to the node where the last key is local and where therefore {{ExclusiveBounds}} does work. The problem can be reproduced with {{SELECT *}} provided we use an {{exclusive_cql_connection}}. > Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip > - > > Key: CASSANDRA-10509 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10509 > Project: Cassandra > Issue Type: Sub-task >Reporter: Paulo Motta >Assignee: Stefania > Fix For: 2.2.x > > > Test failing on 2.2 after fixing CASSANDRA-10507: > http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10507-2.2-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10533) Allowing to have static columns attached to clustering columns
DOAN DuyHai created CASSANDRA-10533: --- Summary: Allowing to have static columns attached to clustering columns Key: CASSANDRA-10533 URL: https://issues.apache.org/jira/browse/CASSANDRA-10533 Project: Cassandra Issue Type: Improvement Components: Core Reporter: DOAN DuyHai Now that [CASSANDRA-8099] is done, can we look again into the idea of having *static columns* respective to clustering column ? I have a very relevant use-case for a customer. They want to store store an hierarchy of data for user expenses: {code:sql} CREATE TABLE user_expenses( user_id bigint, firstname text static, lastname text static, report_id uuid, report_title text, report_amount double, report_xxx ..., line_id uuid, line_item text, line-amount double, ... PRIMARY KEY((user_id), report_id, line_id) ) {code} So basically we have 2 levels of nesting: 1 user - N reports 1 report - N lines With the above data model, all report data are *duplicated* for each line so that any update on report_title or other report property will require the *anti-pattern read-before-write*: 1. Select all line_id for this report_id 2. For each line_id, perform the update One possible trick is to use a static mapbut it's far from being elegant, not to say dirty. So I believe that there is definitely a need for static columns that are *relative* to a clustering column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7953) RangeTombstones not merging during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958871#comment-14958871 ] J.P. Eiti Kimura commented on CASSANDRA-7953: - Hello guys whe are you planning to release this patch? we are facing the same problem as [~fhsgoncalves] described above > RangeTombstones not merging during compaction > - > > Key: CASSANDRA-7953 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7953 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.1 >Reporter: Marcus Olsson >Assignee: Branimir Lambov >Priority: Minor > Labels: compaction, deletes, tombstone > Fix For: 2.1.x, 2.2.x > > Attachments: 0001-7953-v2.patch, CASSANDRA-7953-1.patch, > CASSANDRA-7953.patch > > > When performing a compaction on two sstables that contain the same > RangeTombstone with different timestamps, the tombstones are not merged in > the new sstable. > This has been tested using cassandra 2.1 with the following table: > {code} > CREATE TABLE test( > key text, > column text, > data text, > PRIMARY KEY(key, column) > ); > {code} > And then doing the following: > {code} > INSERT INTO test (key, column, data) VALUES ("1", "1", "1"); // If the > sstable only contains tombstones during compaction it seems that the sstable > either gets removed or isn't created (but that could probably be a separate > JIRA issue). > INSERT INTO test (key, column, data) VALUES ("1", "2", "2"); // The inserts > are not actually needed, since the deletes create tombstones either way. > DELETE FROM test WHERE key="1" AND column="2"; > nodetool flush > INSERT INTO test (key, column, data) VALUES ("1", "2", "2"); > DELETE FROM test WHERE key="1" AND column="2"; > nodetool flush > nodetool compact > {code} > When checking with the SSTableExport tool two tombstones exists in the > compacted sstable. This can be repeated, resulting in more and more > tombstones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9484) Inconsistent select count
[ https://issues.apache.org/jira/browse/CASSANDRA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958663#comment-14958663 ] Stefania commented on CASSANDRA-9484: - [~philipthompson] can you see if you can still reproduce it with the 2.2 patch of 10509? > Inconsistent select count > - > > Key: CASSANDRA-9484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9484 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Benjamin Lerer > Fix For: 3.x, 2.2.x > > > I am running the dtest simultaneous_bootstrap_test located at > https://github.com/riptano/cassandra-dtest/compare/cassandra-7069 and finding > that at the final data verification step, the query {{SELECT COUNT (*) FROM > keyspace1.standard1}} alternated between correctly returning 500,000 rows and > returning 500,001 rows. Running cleanup or compaction does not affect the > behavior. I have verified with sstable2json that there are exactly 500k rows > on disk between the two nodes in the cluster. > I am reproducing this on trunk currently. It is not happening on 2.1-head. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7953) RangeTombstones not merging during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958849#comment-14958849 ] Fernando Gonçalves commented on CASSANDRA-7953: --- We have experienced the same behaviour described on ticket [https://issues.apache.org/jira/browse/CASSANDRA-10505]: "Once this happens in multiple sstables, compacting them causes the duplication to grow. The more this occurs, the worse the problem gets." Basically, when we run the repair in the node, the compaction process starts and never ends, many pending tasks, and the number of sstables of one table grows exponentially (reaching 34k sstables). We use one column of the type map, LeveledStrategyCompaction, and many updates to this column. The memory consumption grows a lot too. We decide to stop the repair process and kill the node, because the latency grown a lot too and impact the whole cluster. But we need to run repair again, because we kill one node and put a new node on the cluster, and another node was impacted from this bug again, and we need to repeated the process: kill the repair process, kill the node, start a new node. So we just created another table without using the collection map, but using blob type instead, and migrate all the data for it - we are fine now: the repair process and compaction finished successfully without big impact on performance. Please, give attention for this ticket, I think that its a major issue! > RangeTombstones not merging during compaction > - > > Key: CASSANDRA-7953 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7953 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.1 >Reporter: Marcus Olsson >Assignee: Branimir Lambov >Priority: Minor > Labels: compaction, deletes, tombstone > Fix For: 2.1.x, 2.2.x > > Attachments: 0001-7953-v2.patch, CASSANDRA-7953-1.patch, > CASSANDRA-7953.patch > > > When performing a compaction on two sstables that contain the same > RangeTombstone with different timestamps, the tombstones are not merged in > the new sstable. > This has been tested using cassandra 2.1 with the following table: > {code} > CREATE TABLE test( > key text, > column text, > data text, > PRIMARY KEY(key, column) > ); > {code} > And then doing the following: > {code} > INSERT INTO test (key, column, data) VALUES ("1", "1", "1"); // If the > sstable only contains tombstones during compaction it seems that the sstable > either gets removed or isn't created (but that could probably be a separate > JIRA issue). > INSERT INTO test (key, column, data) VALUES ("1", "2", "2"); // The inserts > are not actually needed, since the deletes create tombstones either way. > DELETE FROM test WHERE key="1" AND column="2"; > nodetool flush > INSERT INTO test (key, column, data) VALUES ("1", "2", "2"); > DELETE FROM test WHERE key="1" AND column="2"; > nodetool flush > nodetool compact > {code} > When checking with the SSTableExport tool two tombstones exists in the > compacted sstable. This can be repeated, resulting in more and more > tombstones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.
[ https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Branimir Lambov reassigned CASSANDRA-10520: --- Assignee: Branimir Lambov > Compressed writer and reader should support non-compressed data. > > > Key: CASSANDRA-10520 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10520 > Project: Cassandra > Issue Type: Improvement >Reporter: Branimir Lambov >Assignee: Branimir Lambov > Fix For: 3.0.x > > > Compressing uncompressible data, as done, for instance, to write SSTables > during stress-tests, results in chunks larger than 64k which are a problem > for the buffer pooling mechanisms employed by the > {{CompressedRandomAccessReader}}. This results in non-negligible performance > issues due to excessive memory allocation. > To solve this problem and avoid decompression delays in the cases where it > does not provide benefits, I think we should allow compressed files to store > uncompressed chunks as alternative to compressed data. Such a chunk could be > written after compression returns a buffer larger than, for example, 90% of > the input, and would not result in additional delays in writing. On reads it > could be recognized by size (using a single global threshold constant in the > compression metadata) and data could be directly transferred into the > decompressed buffer, skipping the decompression step and ensuring a 64k > buffer for compressed data always suffices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10509) Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip
[ https://issues.apache.org/jira/browse/CASSANDRA-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958464#comment-14958464 ] Stefania commented on CASSANDRA-10509: -- It seems we get an extra row in the count if we cross the page boundary. For example we can reproduce this problem about 50% of the times with as little as 1000 entries if we set the page size to 1000 using {{self.session.default_fetch_size = 1000}}. Using the current dtest value of 100K entries is slower but also more reliable in reproducing this problem. {{AbstractQueryPager.fetchPage}} retrieves an extra live row the last time it is called. It has a mechanism to exclude the first row if it is the same as the last row in the previous page by calling {{containsPreviousLast}}, which is implemented by the sub-classes. {{RangeNamesQueryPager.containsPreviousLast}} however always returns false because the corresponding {{queryNextPage}} uses {{ExcludingBounds}} to set the range in the read command. Therefore, the last queried key should never be included. However, as far as I can see {{ExcludingBounds}} is serialized as {{Bounds}}, which does include the endpoints. So, first we query (MIN, MIN) to get the entire range, then the next page will query (LAST_KEY,MIN), where LAST_KEY is the key of the last partition retrieved by the previous page, but if the last key is not local we are actually querying \[LAST_KEY, MIN\] and we retrieve the partition for LAST_KEY again. It is not clear why it could not be reproduced with {{SELECT *}}. Tentative [patch|https://github.com/stef1927/cassandra/tree/10509-2.2] attached. http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10509-2.2-dtest http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10509-2.2-testall > Fix dtest cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip > - > > Key: CASSANDRA-10509 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10509 > Project: Cassandra > Issue Type: Sub-task >Reporter: Paulo Motta >Assignee: Stefania > Fix For: 2.2.x > > > Test failing on 2.2 after fixing CASSANDRA-10507: > http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10507-2.2-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/2] cassandra git commit: Reduce contention getting instances of CompositeType
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 5f5e9602d -> b42a0cfe8 Reduce contention getting instances of CompositeType patch by schlosna; reviewed by slebresne for CASSANDRA-10433 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bee48ebe Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bee48ebe Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bee48ebe Branch: refs/heads/cassandra-3.0 Commit: bee48ebe206bd02c231266858e9ae137a928689d Parents: 7875326 Author: Sylvain LebresneAuthored: Thu Oct 15 09:50:40 2015 +0200 Committer: Sylvain Lebresne Committed: Thu Oct 15 09:50:40 2015 +0200 -- CHANGES.txt | 1 + .../cassandra/db/marshal/CompositeType.java | 20 2 files changed, 13 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index c02e2fa..9a0baaa 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.4 + * Reduce contention getting instances of CompositeType (CASSANDRA-10433) Merged from 2.1: * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/src/java/org/apache/cassandra/db/marshal/CompositeType.java -- diff --git a/src/java/org/apache/cassandra/db/marshal/CompositeType.java b/src/java/org/apache/cassandra/db/marshal/CompositeType.java index 0218411..9892118 100644 --- a/src/java/org/apache/cassandra/db/marshal/CompositeType.java +++ b/src/java/org/apache/cassandra/db/marshal/CompositeType.java @@ -19,18 +19,18 @@ package org.apache.cassandra.db.marshal; import java.io.IOException; import java.nio.ByteBuffer; -import java.util.Arrays; import java.util.ArrayList; -import java.util.HashMap; +import java.util.Arrays; import java.util.List; -import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; import com.google.common.collect.ImmutableList; -import org.apache.cassandra.exceptions.ConfigurationException; -import org.apache.cassandra.exceptions.SyntaxException; import org.apache.cassandra.cql3.ColumnIdentifier; import org.apache.cassandra.cql3.Operator; +import org.apache.cassandra.exceptions.ConfigurationException; +import org.apache.cassandra.exceptions.SyntaxException; import org.apache.cassandra.io.util.DataOutputBuffer; import org.apache.cassandra.io.util.DataOutputBufferFixed; import org.apache.cassandra.serializers.MarshalException; @@ -68,7 +68,7 @@ public class CompositeType extends AbstractCompositeType public final List types; // interning instances -private static final Map , CompositeType> instances = new HashMap
, CompositeType>(); +private static final ConcurrentMap
, CompositeType> instances = new ConcurrentHashMap
, CompositeType>(); public static CompositeType getInstance(TypeParser parser) throws ConfigurationException, SyntaxException { @@ -98,7 +98,7 @@ public class CompositeType extends AbstractCompositeType return true; } -public static synchronized CompositeType getInstance(List
types) +public static CompositeType getInstance(List types) { assert types != null && !types.isEmpty(); @@ -106,7 +106,11 @@ public class CompositeType extends AbstractCompositeType if (ct == null) { ct = new CompositeType(types); -instances.put(types, ct); +CompositeType previous = instances.putIfAbsent(types, ct); +if (previous != null) +{ +ct = previous; +} } return ct; }
[3/3] cassandra git commit: Merge branch 'cassandra-3.0' into trunk
Merge branch 'cassandra-3.0' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/29576a44 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/29576a44 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/29576a44 Branch: refs/heads/trunk Commit: 29576a44d073b6aa07655aa9f9fdbbaac5a5322a Parents: d87aab9 b42a0cf Author: Sylvain LebresneAuthored: Thu Oct 15 09:54:39 2015 +0200 Committer: Sylvain Lebresne Committed: Thu Oct 15 09:54:39 2015 +0200 -- CHANGES.txt | 1 + .../cassandra/db/marshal/CompositeType.java | 20 2 files changed, 13 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/29576a44/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/29576a44/src/java/org/apache/cassandra/db/marshal/CompositeType.java --
[2/3] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b42a0cfe Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b42a0cfe Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b42a0cfe Branch: refs/heads/trunk Commit: b42a0cfe87d175b9d5a053bcb91a9fc70a0c241e Parents: 5f5e960 bee48eb Author: Sylvain LebresneAuthored: Thu Oct 15 09:53:48 2015 +0200 Committer: Sylvain Lebresne Committed: Thu Oct 15 09:53:48 2015 +0200 -- CHANGES.txt | 1 + .../cassandra/db/marshal/CompositeType.java | 20 2 files changed, 13 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/CHANGES.txt -- diff --cc CHANGES.txt index 66e34b6,9a0baaa..fa74539 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,22 -1,5 +1,23 @@@ -2.2.4 +3.0-rc2 + * Revert CASSANDRA-7486, make CMS default GC, move GC config to + conf/jvm.options (CASSANDRA-10403) + * Fix TeeingAppender causing some logs to be truncated/empty (CASSANDRA-10447) + * Allow EACH_QUORUM for reads (CASSANDRA-9602) + * Fix potential ClassCastException while upgrading (CASSANDRA-10468) + * Fix NPE in MVs on update (CASSANDRA-10503) + * Only include modified cell data in indexing deltas (CASSANDRA-10438) + * Do not load keyspace when creating sstable writer (CASSANDRA-10443) + * If node is not yet gossiping write all MV updates to batchlog only (CASSANDRA-10413) + * Re-populate token metadata after commit log recovery (CASSANDRA-10293) + * Provide additional metrics for materialized views (CASSANDRA-10323) + * Flush system schema tables after local schema changes (CASSANDRA-10429) +Merged from 2.2: + * Reduce contention getting instances of CompositeType (CASSANDRA-10433) + * Fix the regression when using LIMIT with aggregates (CASSANDRA-10487) + * Avoid NoClassDefFoundError during DataDescriptor initialization on windows (CASSANDRA-10412) + * Preserve case of quoted Role & User names (CASSANDRA-10394) + * cqlsh pg-style-strings broken (CASSANDRA-10484) + * cqlsh prompt includes name of keyspace after failed `use` statement (CASSANDRA-10369) Merged from 2.1: * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/src/java/org/apache/cassandra/db/marshal/CompositeType.java --
[jira] [Commented] (CASSANDRA-10529) Channel.size() is costly, mutually exclusive, and on the critical path
[ https://issues.apache.org/jira/browse/CASSANDRA-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958523#comment-14958523 ] Benedict commented on CASSANDRA-10529: -- That is very strange, but given standard is as high as old mmap it's probably fine. We need to fix the variability in cstar. For future reference, it's worth at least disabling vnodes to ensure we have an identical cluster until cstar supports sets of predefined token rings. I did not examine this exhaustively, but I saw a meaningful uptick (>20%) when profiling a single node cluster after making this change. However that may have been down to interactions with the specific profiler I was using at the time (which did require safe points), which may have worsened the problem of mutual exclusivity. Either way, it's worth making. > Channel.size() is costly, mutually exclusive, and on the critical path > -- > > Key: CASSANDRA-10529 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10529 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Stefania > Fix For: 3.0.0 rc2 > > > [~stefania_alborghetti] mentioned this already on another ticket, but I have > lost track of exactly where. While benchmarking it became apparent this was a > noticeable bottleneck for small in-memory workloads with few files, > especially with RF=1. We should probably fix this soon, since it is trivial > to do so, and the call is only to impose an assertion that our requested > length is less than the file size. It isn't possible to safely memoize a > value anywhere we can guarantee to be able to safely refer to it without some > refactoring, so I suggest simply removing the assertion for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10531) ColumnFilter should have unit tests
Sylvain Lebresne created CASSANDRA-10531: Summary: ColumnFilter should have unit tests Key: CASSANDRA-10531 URL: https://issues.apache.org/jira/browse/CASSANDRA-10531 Project: Cassandra Issue Type: Test Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 3.x {{ColumnFilter}} should be decently tested indirectly but there is no reason to cover it more directly with simple unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10365) Consider storing types by their CQL names in schema tables instead of fully-qualified internal class names
[ https://issues.apache.org/jira/browse/CASSANDRA-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958593#comment-14958593 ] Olivier Michallat commented on CASSANDRA-10365: --- The Java driver already has this kind of indirection, for example for state and final functions in an aggregate's metadata. It will make things more brittle in the face of potential event propagation bugs, but if it's an assumed choice I'm fine with it. > Consider storing types by their CQL names in schema tables instead of > fully-qualified internal class names > -- > > Key: CASSANDRA-10365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10365 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko > Labels: client-impacting > Fix For: 3.0.0 rc2 > > > Consider saving CQL types names for column, UDF/UDA arguments and return > types, and UDT components. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/3] cassandra git commit: Reduce contention getting instances of CompositeType
Repository: cassandra Updated Branches: refs/heads/trunk d87aab987 -> 29576a44d Reduce contention getting instances of CompositeType patch by schlosna; reviewed by slebresne for CASSANDRA-10433 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bee48ebe Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bee48ebe Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bee48ebe Branch: refs/heads/trunk Commit: bee48ebe206bd02c231266858e9ae137a928689d Parents: 7875326 Author: Sylvain LebresneAuthored: Thu Oct 15 09:50:40 2015 +0200 Committer: Sylvain Lebresne Committed: Thu Oct 15 09:50:40 2015 +0200 -- CHANGES.txt | 1 + .../cassandra/db/marshal/CompositeType.java | 20 2 files changed, 13 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index c02e2fa..9a0baaa 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.4 + * Reduce contention getting instances of CompositeType (CASSANDRA-10433) Merged from 2.1: * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/src/java/org/apache/cassandra/db/marshal/CompositeType.java -- diff --git a/src/java/org/apache/cassandra/db/marshal/CompositeType.java b/src/java/org/apache/cassandra/db/marshal/CompositeType.java index 0218411..9892118 100644 --- a/src/java/org/apache/cassandra/db/marshal/CompositeType.java +++ b/src/java/org/apache/cassandra/db/marshal/CompositeType.java @@ -19,18 +19,18 @@ package org.apache.cassandra.db.marshal; import java.io.IOException; import java.nio.ByteBuffer; -import java.util.Arrays; import java.util.ArrayList; -import java.util.HashMap; +import java.util.Arrays; import java.util.List; -import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; import com.google.common.collect.ImmutableList; -import org.apache.cassandra.exceptions.ConfigurationException; -import org.apache.cassandra.exceptions.SyntaxException; import org.apache.cassandra.cql3.ColumnIdentifier; import org.apache.cassandra.cql3.Operator; +import org.apache.cassandra.exceptions.ConfigurationException; +import org.apache.cassandra.exceptions.SyntaxException; import org.apache.cassandra.io.util.DataOutputBuffer; import org.apache.cassandra.io.util.DataOutputBufferFixed; import org.apache.cassandra.serializers.MarshalException; @@ -68,7 +68,7 @@ public class CompositeType extends AbstractCompositeType public final List types; // interning instances -private static final Map , CompositeType> instances = new HashMap
, CompositeType>(); +private static final ConcurrentMap
, CompositeType> instances = new ConcurrentHashMap
, CompositeType>(); public static CompositeType getInstance(TypeParser parser) throws ConfigurationException, SyntaxException { @@ -98,7 +98,7 @@ public class CompositeType extends AbstractCompositeType return true; } -public static synchronized CompositeType getInstance(List
types) +public static CompositeType getInstance(List types) { assert types != null && !types.isEmpty(); @@ -106,7 +106,11 @@ public class CompositeType extends AbstractCompositeType if (ct == null) { ct = new CompositeType(types); -instances.put(types, ct); +CompositeType previous = instances.putIfAbsent(types, ct); +if (previous != null) +{ +ct = previous; +} } return ct; }
[2/2] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b42a0cfe Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b42a0cfe Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b42a0cfe Branch: refs/heads/cassandra-3.0 Commit: b42a0cfe87d175b9d5a053bcb91a9fc70a0c241e Parents: 5f5e960 bee48eb Author: Sylvain LebresneAuthored: Thu Oct 15 09:53:48 2015 +0200 Committer: Sylvain Lebresne Committed: Thu Oct 15 09:53:48 2015 +0200 -- CHANGES.txt | 1 + .../cassandra/db/marshal/CompositeType.java | 20 2 files changed, 13 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/CHANGES.txt -- diff --cc CHANGES.txt index 66e34b6,9a0baaa..fa74539 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,22 -1,5 +1,23 @@@ -2.2.4 +3.0-rc2 + * Revert CASSANDRA-7486, make CMS default GC, move GC config to + conf/jvm.options (CASSANDRA-10403) + * Fix TeeingAppender causing some logs to be truncated/empty (CASSANDRA-10447) + * Allow EACH_QUORUM for reads (CASSANDRA-9602) + * Fix potential ClassCastException while upgrading (CASSANDRA-10468) + * Fix NPE in MVs on update (CASSANDRA-10503) + * Only include modified cell data in indexing deltas (CASSANDRA-10438) + * Do not load keyspace when creating sstable writer (CASSANDRA-10443) + * If node is not yet gossiping write all MV updates to batchlog only (CASSANDRA-10413) + * Re-populate token metadata after commit log recovery (CASSANDRA-10293) + * Provide additional metrics for materialized views (CASSANDRA-10323) + * Flush system schema tables after local schema changes (CASSANDRA-10429) +Merged from 2.2: + * Reduce contention getting instances of CompositeType (CASSANDRA-10433) + * Fix the regression when using LIMIT with aggregates (CASSANDRA-10487) + * Avoid NoClassDefFoundError during DataDescriptor initialization on windows (CASSANDRA-10412) + * Preserve case of quoted Role & User names (CASSANDRA-10394) + * cqlsh pg-style-strings broken (CASSANDRA-10484) + * cqlsh prompt includes name of keyspace after failed `use` statement (CASSANDRA-10369) Merged from 2.1: * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) http://git-wip-us.apache.org/repos/asf/cassandra/blob/b42a0cfe/src/java/org/apache/cassandra/db/marshal/CompositeType.java --
[jira] [Commented] (CASSANDRA-9484) Inconsistent select count
[ https://issues.apache.org/jira/browse/CASSANDRA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958502#comment-14958502 ] Stefania commented on CASSANDRA-9484: - This looks similar to CASSANDRA-10509. > Inconsistent select count > - > > Key: CASSANDRA-9484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9484 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Benjamin Lerer > Fix For: 3.x, 2.2.x > > > I am running the dtest simultaneous_bootstrap_test located at > https://github.com/riptano/cassandra-dtest/compare/cassandra-7069 and finding > that at the final data verification step, the query {{SELECT COUNT (*) FROM > keyspace1.standard1}} alternated between correctly returning 500,000 rows and > returning 500,001 rows. Running cleanup or compaction does not affect the > behavior. I have verified with sstable2json that there are exactly 500k rows > on disk between the two nodes in the cluster. > I am reproducing this on trunk currently. It is not happening on 2.1-head. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10471) fix flapping empty_in_test dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958561#comment-14958561 ] Sylvain Lebresne commented on CASSANDRA-10471: -- bq. As a reviewer I didn't figure out how to verify that this statement is true or why. I mucked about with StatementRestrictions and family and found where IN restrictions are expressed, but it's all pretty big in scope. Do you have any pointers? The query that triggers the problem is {noformat} SELECT v FROM test_compact WHERE k1 = 0 AND k2 IN () {noformat} That query explicitly requests no row (it's a query by name with an empty list of names). One would expect such valid but somewhat uninteresting query to be dealt with at the CQL layer (by returning nothing), yielding no internal query, and that is what happens on 3.0 (see {{SelectStatement.makeClusteringIndexFilter}}, in the case of a query by names, the code checks if {{getRequestedRows}} returns an empty list and return {{null}} if so which is code for "we know the query return nothing"). That's the reason why I initially made {{ColumnFilter.Builder}} reject the case where nothing was selected, but that was misguided especially since the code is perfectly fine dealing with an empty {{ColumnFilter}}. And it happens that this optimization isn't done in 2.2. Or rather, it used to be done but was "broken" by CASSANDRA-7981. If you look at the equivalent code on 2.2, in {{SelectStatement.makeFilter()}}, it assumes a {{IN ()}} would result in {{getRequestedColumns}} returning {{null}}, but it's easy to see it's not the case anymore (and it's equally easy to see that CASSANDRA-7981 is the culprit for that). So on the 2.2 node, the query simply generates an empty list in {{SelectStatement.getRequestedColumns()}} (because {{SingleColumnRestriction.InWithValues.getValues()}} does nothing special if its list of terms is empty, is just returns an empty list) and queries with that. That's where, during an upgrade, the 3.0 node receives a query by name with an empty list of names, and that triggers my misguided assertion. I'll note that I'm fine "fixing" that broken optimization in 2.2 and I've pushed a very trivial fix to do so [here|https://github.com/pcmanus/cassandra/commits/10471-2.2], but I don't really care whether we do it or not since 1) it's pretty inconsequential for 2.2 users and 2) even if we do commit that 2.2 patch, we still need to fix 3.0 for users who might upgrade from a 2.2 version that don't have that fix. bq. As far as I can tell null isn't used as a signal anywhere Well it does. It signals we don't want to skip any value when {{isFetchAll}} is set (paraphrasing the comment on the declaration of {{selection}} here). If {{isFetchAll == true}}, {{selection}} is the subset of columns for which we want to include the values (the ones whose values are not skipped). As the case where we don't skip any value is common, we use {{null}} to signal it. The equivalent if we were to not use {{null}} would be to have {{selection}} be all the columns, but that would mean a slightly less efficient {{canSkipValue}} in that common case. I'll note that I feel all this is reasonably well explained in the class javadoc and the comments around the class field declarations, but I'm open to improvement suggestions. bq. canSkipValue might have depended on it before this change Why only before this change? {{selection == null}} only matter in {{canSkipValue}} if {{isFetchAll == true}}, and {{ColumnFilter.Builder.build()}} explicitly only force {{PartitionColumns.NONE}} if {{!isFetchAll}}. bq. There is no unit test for ColumnFilter There isn't and I've created CASSANDRA-10531 to change that. > fix flapping empty_in_test dtest > > > Key: CASSANDRA-10471 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10471 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Sylvain Lebresne > Fix For: 3.0.0 rc2 > > > {{upgrade_tests/cql_tests.py:TestCQL.empty_in_test}} fails about half the > time on the upgrade path from 2.2 to 3.0: > http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/42/testReport/upgrade_tests.cql_tests/TestCQL/empty_in_test/history/ > Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is > merged, these tests should also run with this upgrade path on normal 3.0 > jobs. Until then, you can run it with the following command: > {code} > SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 > nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.empty_in_test > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10461) Fix sstableverify_test dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958534#comment-14958534 ] Stefania commented on CASSANDRA-10461: -- It seems the second pull request fixing the line separator is ineffective. We need to see what the tool is outputting on Jenkins, pull request [here|https://github.com/riptano/cassandra-dtest/pull/609]. > Fix sstableverify_test dtest > > > Key: CASSANDRA-10461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10461 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Stefania > Labels: test > Fix For: 3.0.0 rc2 > > > The dtest for sstableverify is failing: > http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/ > It fails in the same way when I run it on OpenStack, so I don't think it's > just a CassCI problem. > [~slebresne] Looks like you made changes to this test recently: > https://github.com/riptano/cassandra-dtest/commit/51ab085f21e01cc8e5ad88a277cb4a43abd3f880 > Could you have a look at the failure? I'm assigning you for triage, but feel > free to reassign. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10433) Reduce contention in CompositeType instance interning
[ https://issues.apache.org/jira/browse/CASSANDRA-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-10433: - Assignee: David Schlosnagle > Reduce contention in CompositeType instance interning > - > > Key: CASSANDRA-10433 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10433 > Project: Cassandra > Issue Type: Improvement > Environment: Cassandra 2.2.1 running on 6 AWS c3.4xlarge nodes, > CentOS 6.6 >Reporter: David Schlosnagle >Assignee: David Schlosnagle >Priority: Minor > Fix For: 2.2.x > > Attachments: > 0001-Avoid-contention-in-CompositeType-instance-interning.patch > > > While running some workload tests on Cassandra 2.2.1 and profiling with > flight recorder in a test environment, we have noticed significant contention > on the static synchronized > org.apache.cassandra.db.marshal.CompositeType.getInstance(List) method. > We are seeing threads blocked for 22.828 seconds from a 60 second snapshot > while under a mix of reads and writes from a Thrift based client. > I would propose to reduce contention in > org.apache.cassandra.db.marshal.CompositeType.getInstance(List) by using a > ConcurrentHashMap for the instances cache. > {code} > Contention Back Trace > org.apache.cassandra.db.marshal.CompositeType.getInstance(List) > > org.apache.cassandra.db.composites.AbstractCompoundCellNameType.asAbstractType() > org.apache.cassandra.db.SuperColumns.getComparatorFor(CFMetaData, boolean) > org.apache.cassandra.db.SuperColumns.getComparatorFor(CFMetaData, > ByteBuffer) > > org.apache.cassandra.thrift.ThriftValidation.validateColumnNames(CFMetaData, > ByteBuffer, Iterable) > > org.apache.cassandra.thrift.ThriftValidation.validateColumnPath(CFMetaData, > ColumnPath) > > org.apache.cassandra.thrift.ThriftValidation.validateColumnOrSuperColumn(CFMetaData, > ByteBuffer, ColumnOrSuperColumn) > > org.apache.cassandra.thrift.ThriftValidation.validateMutation(CFMetaData, > ByteBuffer, Mutation) > > org.apache.cassandra.thrift.CassandraServer.createMutationList(ConsistencyLevel, > Map, boolean) > > org.apache.cassandra.thrift.CassandraServer.batch_mutate(Map, > ConsistencyLevel) > > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra$Iface, > Cassandra$batch_mutate_args) > > org.apache.cassandra.thrift.ThriftValidation.validateRange(CFMetaData, > ColumnParent, SliceRange) > > org.apache.cassandra.thrift.ThriftValidation.validatePredicate(CFMetaData, > ColumnParent, SlicePredicate) > > org.apache.cassandra.thrift.CassandraServer.get_range_slices(ColumnParent, > SlicePredicate, KeyRange, ConsistencyLevel) > > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra$Iface, > Cassandra$get_range_slices_args) > > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Object, > TBase) > org.apache.thrift.ProcessFunction.process(int, TProtocol, > TProtocol, Object) > org.apache.thrift.TBaseProcessor.process(TProtocol, > TProtocol) > > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run() > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) > java.util.concurrent.ThreadPoolExecutor$Worker.run() > > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(String, > List, ColumnParent, long, SlicePredicate, ConsistencyLevel, ClientState) > > org.apache.cassandra.thrift.CassandraServer.multiget_slice(List, > ColumnParent, SlicePredicate, ConsistencyLevel) > > org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Cassandra$Iface, > Cassandra$multiget_slice_args) > > org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Object, > TBase) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
cassandra git commit: Reduce contention getting instances of CompositeType
Repository: cassandra Updated Branches: refs/heads/cassandra-2.2 78753263e -> bee48ebe2 Reduce contention getting instances of CompositeType patch by schlosna; reviewed by slebresne for CASSANDRA-10433 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bee48ebe Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bee48ebe Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bee48ebe Branch: refs/heads/cassandra-2.2 Commit: bee48ebe206bd02c231266858e9ae137a928689d Parents: 7875326 Author: Sylvain LebresneAuthored: Thu Oct 15 09:50:40 2015 +0200 Committer: Sylvain Lebresne Committed: Thu Oct 15 09:50:40 2015 +0200 -- CHANGES.txt | 1 + .../cassandra/db/marshal/CompositeType.java | 20 2 files changed, 13 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index c02e2fa..9a0baaa 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.4 + * Reduce contention getting instances of CompositeType (CASSANDRA-10433) Merged from 2.1: * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) http://git-wip-us.apache.org/repos/asf/cassandra/blob/bee48ebe/src/java/org/apache/cassandra/db/marshal/CompositeType.java -- diff --git a/src/java/org/apache/cassandra/db/marshal/CompositeType.java b/src/java/org/apache/cassandra/db/marshal/CompositeType.java index 0218411..9892118 100644 --- a/src/java/org/apache/cassandra/db/marshal/CompositeType.java +++ b/src/java/org/apache/cassandra/db/marshal/CompositeType.java @@ -19,18 +19,18 @@ package org.apache.cassandra.db.marshal; import java.io.IOException; import java.nio.ByteBuffer; -import java.util.Arrays; import java.util.ArrayList; -import java.util.HashMap; +import java.util.Arrays; import java.util.List; -import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; import com.google.common.collect.ImmutableList; -import org.apache.cassandra.exceptions.ConfigurationException; -import org.apache.cassandra.exceptions.SyntaxException; import org.apache.cassandra.cql3.ColumnIdentifier; import org.apache.cassandra.cql3.Operator; +import org.apache.cassandra.exceptions.ConfigurationException; +import org.apache.cassandra.exceptions.SyntaxException; import org.apache.cassandra.io.util.DataOutputBuffer; import org.apache.cassandra.io.util.DataOutputBufferFixed; import org.apache.cassandra.serializers.MarshalException; @@ -68,7 +68,7 @@ public class CompositeType extends AbstractCompositeType public final List types; // interning instances -private static final Map , CompositeType> instances = new HashMap
, CompositeType>(); +private static final ConcurrentMap
, CompositeType> instances = new ConcurrentHashMap
, CompositeType>(); public static CompositeType getInstance(TypeParser parser) throws ConfigurationException, SyntaxException { @@ -98,7 +98,7 @@ public class CompositeType extends AbstractCompositeType return true; } -public static synchronized CompositeType getInstance(List
types) +public static CompositeType getInstance(List types) { assert types != null && !types.isEmpty(); @@ -106,7 +106,11 @@ public class CompositeType extends AbstractCompositeType if (ct == null) { ct = new CompositeType(types); -instances.put(types, ct); +CompositeType previous = instances.putIfAbsent(types, ct); +if (previous != null) +{ +ct = previous; +} } return ct; }
[jira] [Commented] (CASSANDRA-10519) RepairException: [repair #... on .../..., (...,...]] Validation failed in /w.x.y.z
[ https://issues.apache.org/jira/browse/CASSANDRA-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958499#comment-14958499 ] Gábor Auth commented on CASSANDRA-10519: At the moment it works right. I've upgraded from 2.1.5 to 2.2.2 a few days ago and after the full upgrade ran the repair on each node (-full -pr on each node one-by-one not simultaneously). I've started a daily repair on my test cluster, if it comes up again, I will comment this issue. > RepairException: [repair #... on .../..., (...,...]] Validation failed in > /w.x.y.z > -- > > Key: CASSANDRA-10519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10519 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 7, JDK 8u60, Cassandra 2.2.2 (upgraded from 2.1.5) >Reporter: Gábor Auth > > Sometimes the repair fails: > {code} > ERROR [Repair#3:1] 2015-10-14 06:22:56,490 CassandraDaemon.java:185 - > Exception in thread Thread[Repair#3:1,5,RMI Runtime] > com.google.common.util.concurrent.UncheckedExecutionException: > org.apache.cassandra.exceptions.RepairException: [repair > #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, > (2414492737393085601,27880539413409 > 54029]] Validation failed in /w.y.x.z > at > com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387) > ~[guava-16.0.jar:na] > at > com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) > ~[guava-16.0.jar:na] > at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:169) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60] > Caused by: org.apache.cassandra.exceptions.RepairException: [repair > #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, > (2414492737393085601,2788053941340954029]] Validation failed in /w.y.x.z > at > org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[apache-cassandra-2.2.2.jar:2.2.2] > ... 3 common frames omitted > {code} > And here is the w.y.x.z side: > {code} > ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 > CompactionManager.java:1053 - Cannot start multiple repair sessions over the > same sstables > ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 Validator.java:246 - > Failed creating a merkle tree for [repair > #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, > (2414492737393085601,2788053941340954029]], /a.b.c.d (see log for details) > ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,488 CassandraDaemon.java:185 > - Exception in thread Thread[ValidationExecutor:7,1,main] > java.lang.RuntimeException: Cannot start multiple repair sessions over the > same sstables > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1054) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:86) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:652) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > ... > ERROR [Reference-Reaper:1] 2015-10-14 06:23:21,439 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@74fc054a) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1949471967:/home/cassandra/dsc-cassandra-2.2.2/bin/../data/data/keyspace/table-b15521b062e4bbedcdee5e027297/la-1195-big > was not released before the reference was garbage collected >
cassandra git commit: Skip redundant tombstones on compaction.
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 02f88e38e -> a61fc01f4 Skip redundant tombstones on compaction. Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f Branch: refs/heads/cassandra-2.1 Commit: a61fc01f418426847e3aad133127da3615813236 Parents: 02f88e3 Author: Branimir LambovAuthored: Wed Oct 7 14:46:24 2015 +0300 Committer: Marcus Eriksson Committed: Thu Oct 15 15:28:42 2015 +0200 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/ColumnIndex.java| 32 +++-- .../org/apache/cassandra/db/RangeTombstone.java | 135 ++- .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 + 4 files changed, 218 insertions(+), 75 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b16acb5..68b44ed 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.12 + * Merge range tombstones during compaction (CASSANDRA-7953) * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) * Don't allow startup if the node's rack has changed (CASSANDRA-10242) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java b/src/java/org/apache/cassandra/db/ColumnIndex.java index d9d6a9c..0ea5c87 100644 --- a/src/java/org/apache/cassandra/db/ColumnIndex.java +++ b/src/java/org/apache/cassandra/db/ColumnIndex.java @@ -180,14 +180,24 @@ public class ColumnIndex firstColumn = column; startPosition = endPosition; // TODO: have that use the firstColumn as min + make sure we optimize that on read -endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, output, atomSerializer); +endPosition += tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer); blockSize = 0; // We don't count repeated tombstone marker in the block size, to avoid a situation // where we wouldn't make any progress because a block is filled by said marker + +maybeWriteRowHeader(); } -long size = atomSerializer.serializedSizeForSSTable(column); -endPosition += size; -blockSize += size; +if (tombstoneTracker.update(column, false)) +{ +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +size += atomSerializer.serializedSizeForSSTable(column); +endPosition += size; +blockSize += size; + +atomSerializer.serializeForSSTable(column, output); +} + +lastColumn = column; // if we hit the column index size that we have to index after, go ahead and index it. if (blockSize >= DatabaseDescriptor.getColumnIndexSize()) @@ -197,14 +207,6 @@ public class ColumnIndex firstColumn = null; lastBlockClosing = column; } - -maybeWriteRowHeader(); -atomSerializer.serializeForSSTable(column, output); - -// TODO: Should deal with removing unneeded tombstones -tombstoneTracker.update(column, false); - -lastColumn = column; } private void maybeWriteRowHeader() throws IOException @@ -216,12 +218,16 @@ public class ColumnIndex } } -public ColumnIndex build() +public ColumnIndex build() throws IOException { // all columns were GC'd after all if (lastColumn == null) return ColumnIndex.EMPTY; +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +endPosition += size; +blockSize += size; + // the last column may have fallen on an index boundary already. if not, index it explicitly. if (result.columnsIndex.isEmpty() || lastBlockClosing != lastColumn) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java
[1/2] cassandra git commit: Skip redundant tombstones on compaction.
Repository: cassandra Updated Branches: refs/heads/cassandra-2.2 bee48ebe2 -> 3b7ccdfb1 Skip redundant tombstones on compaction. Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f Branch: refs/heads/cassandra-2.2 Commit: a61fc01f418426847e3aad133127da3615813236 Parents: 02f88e3 Author: Branimir LambovAuthored: Wed Oct 7 14:46:24 2015 +0300 Committer: Marcus Eriksson Committed: Thu Oct 15 15:28:42 2015 +0200 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/ColumnIndex.java| 32 +++-- .../org/apache/cassandra/db/RangeTombstone.java | 135 ++- .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 + 4 files changed, 218 insertions(+), 75 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b16acb5..68b44ed 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.12 + * Merge range tombstones during compaction (CASSANDRA-7953) * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) * Don't allow startup if the node's rack has changed (CASSANDRA-10242) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java b/src/java/org/apache/cassandra/db/ColumnIndex.java index d9d6a9c..0ea5c87 100644 --- a/src/java/org/apache/cassandra/db/ColumnIndex.java +++ b/src/java/org/apache/cassandra/db/ColumnIndex.java @@ -180,14 +180,24 @@ public class ColumnIndex firstColumn = column; startPosition = endPosition; // TODO: have that use the firstColumn as min + make sure we optimize that on read -endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, output, atomSerializer); +endPosition += tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer); blockSize = 0; // We don't count repeated tombstone marker in the block size, to avoid a situation // where we wouldn't make any progress because a block is filled by said marker + +maybeWriteRowHeader(); } -long size = atomSerializer.serializedSizeForSSTable(column); -endPosition += size; -blockSize += size; +if (tombstoneTracker.update(column, false)) +{ +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +size += atomSerializer.serializedSizeForSSTable(column); +endPosition += size; +blockSize += size; + +atomSerializer.serializeForSSTable(column, output); +} + +lastColumn = column; // if we hit the column index size that we have to index after, go ahead and index it. if (blockSize >= DatabaseDescriptor.getColumnIndexSize()) @@ -197,14 +207,6 @@ public class ColumnIndex firstColumn = null; lastBlockClosing = column; } - -maybeWriteRowHeader(); -atomSerializer.serializeForSSTable(column, output); - -// TODO: Should deal with removing unneeded tombstones -tombstoneTracker.update(column, false); - -lastColumn = column; } private void maybeWriteRowHeader() throws IOException @@ -216,12 +218,16 @@ public class ColumnIndex } } -public ColumnIndex build() +public ColumnIndex build() throws IOException { // all columns were GC'd after all if (lastColumn == null) return ColumnIndex.EMPTY; +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +endPosition += size; +blockSize += size; + // the last column may have fallen on an index boundary already. if not, index it explicitly. if (result.columnsIndex.isEmpty() || lastBlockClosing != lastColumn) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java
[3/4] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6a1c1d90 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6a1c1d90 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6a1c1d90 Branch: refs/heads/trunk Commit: 6a1c1d900925cb0532633c943e7c4325edc8f64c Parents: b42a0cf 3b7ccdf Author: Marcus ErikssonAuthored: Thu Oct 15 15:35:53 2015 +0200 Committer: Marcus Eriksson Committed: Thu Oct 15 15:35:53 2015 +0200 -- --
[1/4] cassandra git commit: Skip redundant tombstones on compaction.
Repository: cassandra Updated Branches: refs/heads/trunk 29576a44d -> 0e3da95d6 Skip redundant tombstones on compaction. Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f Branch: refs/heads/trunk Commit: a61fc01f418426847e3aad133127da3615813236 Parents: 02f88e3 Author: Branimir LambovAuthored: Wed Oct 7 14:46:24 2015 +0300 Committer: Marcus Eriksson Committed: Thu Oct 15 15:28:42 2015 +0200 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/ColumnIndex.java| 32 +++-- .../org/apache/cassandra/db/RangeTombstone.java | 135 ++- .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 + 4 files changed, 218 insertions(+), 75 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b16acb5..68b44ed 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.12 + * Merge range tombstones during compaction (CASSANDRA-7953) * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) * Don't allow startup if the node's rack has changed (CASSANDRA-10242) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java b/src/java/org/apache/cassandra/db/ColumnIndex.java index d9d6a9c..0ea5c87 100644 --- a/src/java/org/apache/cassandra/db/ColumnIndex.java +++ b/src/java/org/apache/cassandra/db/ColumnIndex.java @@ -180,14 +180,24 @@ public class ColumnIndex firstColumn = column; startPosition = endPosition; // TODO: have that use the firstColumn as min + make sure we optimize that on read -endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, output, atomSerializer); +endPosition += tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer); blockSize = 0; // We don't count repeated tombstone marker in the block size, to avoid a situation // where we wouldn't make any progress because a block is filled by said marker + +maybeWriteRowHeader(); } -long size = atomSerializer.serializedSizeForSSTable(column); -endPosition += size; -blockSize += size; +if (tombstoneTracker.update(column, false)) +{ +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +size += atomSerializer.serializedSizeForSSTable(column); +endPosition += size; +blockSize += size; + +atomSerializer.serializeForSSTable(column, output); +} + +lastColumn = column; // if we hit the column index size that we have to index after, go ahead and index it. if (blockSize >= DatabaseDescriptor.getColumnIndexSize()) @@ -197,14 +207,6 @@ public class ColumnIndex firstColumn = null; lastBlockClosing = column; } - -maybeWriteRowHeader(); -atomSerializer.serializeForSSTable(column, output); - -// TODO: Should deal with removing unneeded tombstones -tombstoneTracker.update(column, false); - -lastColumn = column; } private void maybeWriteRowHeader() throws IOException @@ -216,12 +218,16 @@ public class ColumnIndex } } -public ColumnIndex build() +public ColumnIndex build() throws IOException { // all columns were GC'd after all if (lastColumn == null) return ColumnIndex.EMPTY; +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +endPosition += size; +blockSize += size; + // the last column may have fallen on an index boundary already. if not, index it explicitly. if (result.columnsIndex.isEmpty() || lastBlockClosing != lastColumn) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java
[3/3] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6a1c1d90 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6a1c1d90 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6a1c1d90 Branch: refs/heads/cassandra-3.0 Commit: 6a1c1d900925cb0532633c943e7c4325edc8f64c Parents: b42a0cf 3b7ccdf Author: Marcus ErikssonAuthored: Thu Oct 15 15:35:53 2015 +0200 Committer: Marcus Eriksson Committed: Thu Oct 15 15:35:53 2015 +0200 -- --
[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2
Merge branch 'cassandra-2.1' into cassandra-2.2 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3b7ccdfb Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3b7ccdfb Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3b7ccdfb Branch: refs/heads/cassandra-2.2 Commit: 3b7ccdfb15b43880804d61a5e7d62c82b3b664eb Parents: bee48eb a61fc01 Author: Marcus ErikssonAuthored: Thu Oct 15 15:33:29 2015 +0200 Committer: Marcus Eriksson Committed: Thu Oct 15 15:33:29 2015 +0200 -- .../org/apache/cassandra/db/ColumnIndex.java| 32 +++-- .../org/apache/cassandra/db/RangeTombstone.java | 135 ++- .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 + 3 files changed, 217 insertions(+), 75 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/src/java/org/apache/cassandra/db/RangeTombstone.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java -- diff --cc test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java index 000,0460a16..71634e9 mode 00,100644..100644 --- a/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java +++ b/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java @@@ -1,0 -1,125 +1,125 @@@ + /* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + package org.apache.cassandra.cql3; + + import static org.junit.Assert.assertEquals; + import static org.junit.Assert.assertTrue; + + import com.google.common.collect.Iterables; + + import org.junit.Before; + import org.junit.Test; + + import org.apache.cassandra.Util; + import org.apache.cassandra.db.*; + import org.apache.cassandra.db.columniterator.OnDiskAtomIterator; + import org.apache.cassandra.db.composites.*; ++import org.apache.cassandra.io.sstable.format.SSTableReader; + import org.apache.cassandra.io.sstable.ISSTableScanner; -import org.apache.cassandra.io.sstable.SSTableReader; + + public class RangeTombstoneMergeTest extends CQLTester + { + @Before + public void before() throws Throwable + { + createTable("CREATE TABLE %s(" + + " key text," + + " column text," + + " data text," + + " extra text," + + " PRIMARY KEY(key, column)" + + ");"); + + // If the sstable only contains tombstones during compaction it seems that the sstable either gets removed or isn't created (but that could probably be a separate JIRA issue). + execute("INSERT INTO %s (key, column, data) VALUES (?, ?, ?)", "1", "1", "1"); + } + + @Test + public void testEqualMerge() throws Throwable + { + addRemoveAndFlush(); + + for (int i=0; i<3; ++i) + { + addRemoveAndFlush(); + compact(); + } + + assertOneTombstone(); + } + + @Test + public void testRangeMerge() throws Throwable + { + addRemoveAndFlush(); + + execute("INSERT INTO %s (key, column, data, extra) VALUES (?, ?, ?, ?)", "1", "2", "2", "2"); + execute("DELETE extra FROM %s WHERE key=? AND column=?", "1", "2"); + + flush(); + compact(); + + execute("DELETE FROM %s WHERE key=? AND column=?", "1", "2"); + + flush(); + compact(); + + assertOneTombstone(); + } + + void assertOneTombstone() throws Throwable + { + assertRows(execute("SELECT column FROM %s"), +row("1")); + assertAllRows(row("1", "1", "1", null)); + + ColumnFamilyStore cfs = Keyspace.open(KEYSPACE).getColumnFamilyStore(currentTable()); + ColumnFamily cf = cfs.getColumnFamily(Util.dk("1"),
[2/3] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2
Merge branch 'cassandra-2.1' into cassandra-2.2 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3b7ccdfb Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3b7ccdfb Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3b7ccdfb Branch: refs/heads/cassandra-3.0 Commit: 3b7ccdfb15b43880804d61a5e7d62c82b3b664eb Parents: bee48eb a61fc01 Author: Marcus ErikssonAuthored: Thu Oct 15 15:33:29 2015 +0200 Committer: Marcus Eriksson Committed: Thu Oct 15 15:33:29 2015 +0200 -- .../org/apache/cassandra/db/ColumnIndex.java| 32 +++-- .../org/apache/cassandra/db/RangeTombstone.java | 135 ++- .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 + 3 files changed, 217 insertions(+), 75 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/src/java/org/apache/cassandra/db/RangeTombstone.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java -- diff --cc test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java index 000,0460a16..71634e9 mode 00,100644..100644 --- a/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java +++ b/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java @@@ -1,0 -1,125 +1,125 @@@ + /* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + package org.apache.cassandra.cql3; + + import static org.junit.Assert.assertEquals; + import static org.junit.Assert.assertTrue; + + import com.google.common.collect.Iterables; + + import org.junit.Before; + import org.junit.Test; + + import org.apache.cassandra.Util; + import org.apache.cassandra.db.*; + import org.apache.cassandra.db.columniterator.OnDiskAtomIterator; + import org.apache.cassandra.db.composites.*; ++import org.apache.cassandra.io.sstable.format.SSTableReader; + import org.apache.cassandra.io.sstable.ISSTableScanner; -import org.apache.cassandra.io.sstable.SSTableReader; + + public class RangeTombstoneMergeTest extends CQLTester + { + @Before + public void before() throws Throwable + { + createTable("CREATE TABLE %s(" + + " key text," + + " column text," + + " data text," + + " extra text," + + " PRIMARY KEY(key, column)" + + ");"); + + // If the sstable only contains tombstones during compaction it seems that the sstable either gets removed or isn't created (but that could probably be a separate JIRA issue). + execute("INSERT INTO %s (key, column, data) VALUES (?, ?, ?)", "1", "1", "1"); + } + + @Test + public void testEqualMerge() throws Throwable + { + addRemoveAndFlush(); + + for (int i=0; i<3; ++i) + { + addRemoveAndFlush(); + compact(); + } + + assertOneTombstone(); + } + + @Test + public void testRangeMerge() throws Throwable + { + addRemoveAndFlush(); + + execute("INSERT INTO %s (key, column, data, extra) VALUES (?, ?, ?, ?)", "1", "2", "2", "2"); + execute("DELETE extra FROM %s WHERE key=? AND column=?", "1", "2"); + + flush(); + compact(); + + execute("DELETE FROM %s WHERE key=? AND column=?", "1", "2"); + + flush(); + compact(); + + assertOneTombstone(); + } + + void assertOneTombstone() throws Throwable + { + assertRows(execute("SELECT column FROM %s"), +row("1")); + assertAllRows(row("1", "1", "1", null)); + + ColumnFamilyStore cfs = Keyspace.open(KEYSPACE).getColumnFamilyStore(currentTable()); + ColumnFamily cf = cfs.getColumnFamily(Util.dk("1"),
[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause
[ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057 ] Robbie Strickland edited comment on CASSANDRA-10449 at 10/15/15 3:24 PM: - I discovered that an index on one of the tables has a wide row, and I'm wondering if that could be the root of the issue: Example: {noformat} Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 10299432635 Compacted partition mean bytes: 253692309 {noformat} This seems like a problem in general for indexes, where the original data model may be well distributed but the index may have unpredictable distribution. was (Author: rstrickland): I discovered that an index on one of the tables has a wide row, and I'm assuming that to be the root of the issue: Example: {noformat} Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 10299432635 Compacted partition mean bytes: 253692309 {noformat} This seems like a problem in general for indexes, where the original data model may be well distributed but the index may have unpredictable distribution. > OOM on bootstrap due to long GC pause > - > > Key: CASSANDRA-10449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10449 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04, AWS >Reporter: Robbie Strickland > Labels: gc > Fix For: 2.1.x > > Attachments: system.log.10-05, thread_dump.log > > > I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and > 500-700GB per node. SSTable counts are <10 per table. I am attempting to > provision additional nodes, but bootstrapping OOMs every time after about 10 > hours with a sudden long GC pause: > {noformat} > INFO [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old > Generation GC in 1586126ms. G1 Old Gen: 49213756976 -> 49072277176; > ... > ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 > CassandraDaemon.java:223 - Exception in thread > Thread[MemtableFlushWriter:454,5,main] > java.lang.OutOfMemoryError: Java heap space > {noformat} > I have tried increasing max heap to 48G just to get through the bootstrap, to > no avail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close
[ https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharvanath Pathak updated CASSANDRA-10534: -- Description: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am wroking with version 2.1.9. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. Following is the trace I saw in system.log {noformat} INFO [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368 (79 bytes) ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting forcefully due to file system exception on startup, disk failure policy "stop" org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: java.io.EOFException: null at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_80] at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_80] at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_80] at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) ~[apache-cassandra-2.1.9.jar:2.1.9] ... 14 common frames omitted {noformat} was: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am wroking with version 2.1.9. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. > CompressionInfo not being fsynced on close > -- > >
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959074#comment-14959074 ] Mikhail Stepura commented on CASSANDRA-10515: - [~jeffery.griffith] could you please attach a thread dump as well? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause
[ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959148#comment-14959148 ] Mikhail Stepura commented on CASSANDRA-10449: - I would love to get hold of a heapdump for that OOM. At least we could figure out what's consuming the heap. > OOM on bootstrap due to long GC pause > - > > Key: CASSANDRA-10449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10449 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04, AWS >Reporter: Robbie Strickland > Labels: gc > Fix For: 2.1.x > > Attachments: system.log.10-05, thread_dump.log > > > I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and > 500-700GB per node. SSTable counts are <10 per table. I am attempting to > provision additional nodes, but bootstrapping OOMs every time after about 10 > hours with a sudden long GC pause: > {noformat} > INFO [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old > Generation GC in 1586126ms. G1 Old Gen: 49213756976 -> 49072277176; > ... > ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 > CassandraDaemon.java:223 - Exception in thread > Thread[MemtableFlushWriter:454,5,main] > java.lang.OutOfMemoryError: Java heap space > {noformat} > I have tried increasing max heap to 48G just to get through the bootstrap, to > no avail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close
[ https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharvanath Pathak updated CASSANDRA-10534: -- Description: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am working with version 2.1.9. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) has caused the regression, which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. Following is the trace I saw in system.log {noformat} INFO [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368 (79 bytes) ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting forcefully due to file system exception on startup, disk failure policy "stop" org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: java.io.EOFException: null at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_80] at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_80] at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_80] at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) ~[apache-cassandra-2.1.9.jar:2.1.9] ... 14 common frames omitted {noformat} was: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am working with version 2.1.9. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. Following is the trace I saw in system.log {noformat} INFO [SSTableBatchOpen:1] 2015-09-29
[jira] [Commented] (CASSANDRA-9484) Inconsistent select count
[ https://issues.apache.org/jira/browse/CASSANDRA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959124#comment-14959124 ] Philip Thompson commented on CASSANDRA-9484: I can't reproduce this at all anymore, even on base 2.2 and 3.0. We can probably close this ticket. I'm going to merge the test into CI, and we can re-open this ticket if it starts failing again. > Inconsistent select count > - > > Key: CASSANDRA-9484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9484 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Benjamin Lerer > Fix For: 3.x, 2.2.x > > > I am running the dtest simultaneous_bootstrap_test located at > https://github.com/riptano/cassandra-dtest/compare/cassandra-7069 and finding > that at the final data verification step, the query {{SELECT COUNT (*) FROM > keyspace1.standard1}} alternated between correctly returning 500,000 rows and > returning 500,001 rows. Running cleanup or compaction does not affect the > behavior. I have verified with sstable2json that there are exactly 500k rows > on disk between the two nodes in the cluster. > I am reproducing this on trunk currently. It is not happening on 2.1-head. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10528) Proposal: Integrate RxJava
[ https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959149#comment-14959149 ] T Jake Luciani commented on CASSANDRA-10528: You mean in the POC? Well this benchmark was RF=1 so the driver was using TAP and no MS was used. In general terms though, we keep a thread per connection so relates to CASSANDRA-8457 linked above. My thought was to combine our native netty epoll event loop with the messaging service event loop to avoid having many more threads. > Proposal: Integrate RxJava > -- > > Key: CASSANDRA-10528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10528 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Fix For: 3.x > > Attachments: rxjava-stress.png > > > The purpose of this ticket is to discuss the merits of integrating the > [RxJava|https://github.com/ReactiveX/RxJava] framework into C*. Enabling us > to incrementally make the internals of C* async and move away from SEDA to a > more modern thread per core architecture. > Related tickets: >* CASSANDRA-8520 >* CASSANDRA-8457 >* CASSANDRA-5239 >* CASSANDRA-7040 >* CASSANDRA-5863 >* CASSANDRA-6696 >* CASSANDRA-7392 > My *primary* goals in raising this issue are to provide a way of: > * *Incrementally* making the backend async > * Avoiding code complexity/readability issues > * Avoiding NIH where possible > * Building on an extendable library > My *non*-goals in raising this issue are: > >* Rewrite the entire database in one big bang >* Write our own async api/framework > > - > I've attempted to integrate RxJava a while back and found it not ready mainly > due to our lack of lambda support. Now with Java 8 I've found it very > enjoyable and have not hit any performance issues. A gentle introduction to > RxJava is [here|http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/] as > well as their > [wiki|https://github.com/ReactiveX/RxJava/wiki/Additional-Reading]. The > primary concept of RX is the > [Obervable|http://reactivex.io/documentation/observable.html] which is > essentially a stream of stuff you can subscribe to and act on, chain, etc. > This is quite similar to [Java 8 streams > api|http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html] > (or I should say streams api is similar to it). The difference is java 8 > streams can't be used for asynchronous events while RxJava can. > Another improvement since I last tried integrating RxJava is the completion > of CASSANDRA-8099 which provides is a very iterable/incremental approach to > our storage engine. *Iterators and Observables are well paired conceptually > so morphing our current Storage engine to be async is much simpler now.* > In an effort to show how one can incrementally change our backend I've done a > quick POC with RxJava and replaced our non-paging read requests to become > non-blocking. > https://github.com/apache/cassandra/compare/trunk...tjake:rxjava-3.0 > As you can probably see the code is straight-forward and sometimes quite nice! > *Old* > {code} > private static PartitionIterator > fetchRows(Listcommands, ConsistencyLevel > consistencyLevel) > throws UnavailableException, ReadFailureException, ReadTimeoutException > { > int cmdCount = commands.size(); > SinglePartitionReadLifecycle[] reads = new > SinglePartitionReadLifecycle[cmdCount]; > for (int i = 0; i < cmdCount; i++) > reads[i] = new SinglePartitionReadLifecycle(commands.get(i), > consistencyLevel); > for (int i = 0; i < cmdCount; i++) > reads[i].doInitialQueries(); > for (int i = 0; i < cmdCount; i++) > reads[i].maybeTryAdditionalReplicas(); > for (int i = 0; i < cmdCount; i++) > reads[i].awaitRes > ultsAndRetryOnDigestMismatch(); > for (int i = 0; i < cmdCount; i++) > if (!reads[i].isDone()) > reads[i].maybeAwaitFullDataRead(); > List results = new ArrayList<>(cmdCount); > for (int i = 0; i < cmdCount; i++) > { > assert reads[i].isDone(); > results.add(reads[i].getResult()); > } > return PartitionIterators.concat(results); > } > {code} > *New* > {code} > private static Observable > fetchRows(List commands, ConsistencyLevel > consistencyLevel) > throws UnavailableException, ReadFailureException, ReadTimeoutException > { > return Observable.from(commands) > .map(command -> new > SinglePartitionReadLifecycle(command,
[jira] [Commented] (CASSANDRA-10524) Add ability to skip TIME_WAIT sockets on port check on Windows startup
[ https://issues.apache.org/jira/browse/CASSANDRA-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959059#comment-14959059 ] Andy Tolbert commented on CASSANDRA-10524: -- Another one I've seen: {{LAST_ACK}} > Add ability to skip TIME_WAIT sockets on port check on Windows startup > -- > > Key: CASSANDRA-10524 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10524 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Labels: Windows > Fix For: 3.0.0 rc2, 2.2.4 > > Attachments: win_aggressive_startup.txt > > > C* sockets are often staying TIME_WAIT for up to 120 seconds (2x max segment > lifetime) for me in my dev environment on Windows. This is rather obnoxious > since it means I can't launch C* for up to 2 minutes after stopping it. > Attaching a patch that adds a simple -a for aggressive startup to the launch > scripts to ignore duplicate port check from netstat if it's TIME_WAIT. Also > snuck in some more liberal interpretation of help strings in the .ps1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause
[ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057 ] Robbie Strickland edited comment on CASSANDRA-10449 at 10/15/15 3:25 PM: - I discovered that an index on one of the tables has a wide row, and I'm wondering if that could be the root of the issue: Example from one node: {noformat} Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 10299432635 Compacted partition mean bytes: 253692309 {noformat} This seems like a problem in general for indexes, where the original data model may be well distributed but the index may have unpredictable distribution. was (Author: rstrickland): I discovered that an index on one of the tables has a wide row, and I'm wondering if that could be the root of the issue: Example: {noformat} Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 10299432635 Compacted partition mean bytes: 253692309 {noformat} This seems like a problem in general for indexes, where the original data model may be well distributed but the index may have unpredictable distribution. > OOM on bootstrap due to long GC pause > - > > Key: CASSANDRA-10449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10449 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04, AWS >Reporter: Robbie Strickland > Labels: gc > Fix For: 2.1.x > > Attachments: system.log.10-05, thread_dump.log > > > I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and > 500-700GB per node. SSTable counts are <10 per table. I am attempting to > provision additional nodes, but bootstrapping OOMs every time after about 10 > hours with a sudden long GC pause: > {noformat} > INFO [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old > Generation GC in 1586126ms. G1 Old Gen: 49213756976 -> 49072277176; > ... > ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 > CassandraDaemon.java:223 - Exception in thread > Thread[MemtableFlushWriter:454,5,main] > java.lang.OutOfMemoryError: Java heap space > {noformat} > I have tried increasing max heap to 48G just to get through the bootstrap, to > no avail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause
[ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057 ] Robbie Strickland commented on CASSANDRA-10449: --- I discovered that an index on one of the tables has a wide row, and I'm assuming that to be the root of the issue: Example: {noformat} Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 10299432635 Compacted partition mean bytes: 253692309 {noformat} This seems like a problem in general for indexes, where the original data model may be well distributed but the index may have unpredictable distribution. > OOM on bootstrap due to long GC pause > - > > Key: CASSANDRA-10449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10449 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04, AWS >Reporter: Robbie Strickland > Labels: gc > Fix For: 2.1.x > > Attachments: system.log.10-05, thread_dump.log > > > I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and > 500-700GB per node. SSTable counts are <10 per table. I am attempting to > provision additional nodes, but bootstrapping OOMs every time after about 10 > hours with a sudden long GC pause: > {noformat} > INFO [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old > Generation GC in 1586126ms. G1 Old Gen: 49213756976 -> 49072277176; > ... > ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 > CassandraDaemon.java:223 - Exception in thread > Thread[MemtableFlushWriter:454,5,main] > java.lang.OutOfMemoryError: Java heap space > {noformat} > I have tried increasing max heap to 48G just to get through the bootstrap, to > no avail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close
[ https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharvanath Pathak updated CASSANDRA-10534: -- Description: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am working with version 2.1.9. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. Following is the trace I saw in system.log {noformat} INFO [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368 (79 bytes) ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting forcefully due to file system exception on startup, disk failure policy "stop" org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: java.io.EOFException: null at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_80] at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_80] at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_80] at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) ~[apache-cassandra-2.1.9.jar:2.1.9] ... 14 common frames omitted {noformat} was: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am wroking with version 2.1.9. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. Following is the trace I saw in system.log {noformat} INFO [SSTableBatchOpen:1] 2015-09-29
[jira] [Commented] (CASSANDRA-10528) Proposal: Integrate RxJava
[ https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959135#comment-14959135 ] Jonathan Ellis commented on CASSANDRA-10528: What about MessagingService? > Proposal: Integrate RxJava > -- > > Key: CASSANDRA-10528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10528 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Fix For: 3.x > > Attachments: rxjava-stress.png > > > The purpose of this ticket is to discuss the merits of integrating the > [RxJava|https://github.com/ReactiveX/RxJava] framework into C*. Enabling us > to incrementally make the internals of C* async and move away from SEDA to a > more modern thread per core architecture. > Related tickets: >* CASSANDRA-8520 >* CASSANDRA-8457 >* CASSANDRA-5239 >* CASSANDRA-7040 >* CASSANDRA-5863 >* CASSANDRA-6696 >* CASSANDRA-7392 > My *primary* goals in raising this issue are to provide a way of: > * *Incrementally* making the backend async > * Avoiding code complexity/readability issues > * Avoiding NIH where possible > * Building on an extendable library > My *non*-goals in raising this issue are: > >* Rewrite the entire database in one big bang >* Write our own async api/framework > > - > I've attempted to integrate RxJava a while back and found it not ready mainly > due to our lack of lambda support. Now with Java 8 I've found it very > enjoyable and have not hit any performance issues. A gentle introduction to > RxJava is [here|http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/] as > well as their > [wiki|https://github.com/ReactiveX/RxJava/wiki/Additional-Reading]. The > primary concept of RX is the > [Obervable|http://reactivex.io/documentation/observable.html] which is > essentially a stream of stuff you can subscribe to and act on, chain, etc. > This is quite similar to [Java 8 streams > api|http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html] > (or I should say streams api is similar to it). The difference is java 8 > streams can't be used for asynchronous events while RxJava can. > Another improvement since I last tried integrating RxJava is the completion > of CASSANDRA-8099 which provides is a very iterable/incremental approach to > our storage engine. *Iterators and Observables are well paired conceptually > so morphing our current Storage engine to be async is much simpler now.* > In an effort to show how one can incrementally change our backend I've done a > quick POC with RxJava and replaced our non-paging read requests to become > non-blocking. > https://github.com/apache/cassandra/compare/trunk...tjake:rxjava-3.0 > As you can probably see the code is straight-forward and sometimes quite nice! > *Old* > {code} > private static PartitionIterator > fetchRows(Listcommands, ConsistencyLevel > consistencyLevel) > throws UnavailableException, ReadFailureException, ReadTimeoutException > { > int cmdCount = commands.size(); > SinglePartitionReadLifecycle[] reads = new > SinglePartitionReadLifecycle[cmdCount]; > for (int i = 0; i < cmdCount; i++) > reads[i] = new SinglePartitionReadLifecycle(commands.get(i), > consistencyLevel); > for (int i = 0; i < cmdCount; i++) > reads[i].doInitialQueries(); > for (int i = 0; i < cmdCount; i++) > reads[i].maybeTryAdditionalReplicas(); > for (int i = 0; i < cmdCount; i++) > reads[i].awaitRes > ultsAndRetryOnDigestMismatch(); > for (int i = 0; i < cmdCount; i++) > if (!reads[i].isDone()) > reads[i].maybeAwaitFullDataRead(); > List results = new ArrayList<>(cmdCount); > for (int i = 0; i < cmdCount; i++) > { > assert reads[i].isDone(); > results.add(reads[i].getResult()); > } > return PartitionIterators.concat(results); > } > {code} > *New* > {code} > private static Observable > fetchRows(List commands, ConsistencyLevel > consistencyLevel) > throws UnavailableException, ReadFailureException, ReadTimeoutException > { > return Observable.from(commands) > .map(command -> new > SinglePartitionReadLifecycle(command, consistencyLevel)) > .flatMap(read -> read.getPartitionIterator()) > .toList() > .map(results -> PartitionIterators.concat(results)); > } > {code} > Since the read call is now non blocking (no more future.get()) we can remove > one thread pool
[4/4] cassandra git commit: Merge branch 'cassandra-3.0' into trunk
Merge branch 'cassandra-3.0' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0e3da95d Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0e3da95d Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0e3da95d Branch: refs/heads/trunk Commit: 0e3da95d6bbfcddc1bdb381e02499206aac56d7a Parents: 29576a4 6a1c1d9 Author: Marcus ErikssonAuthored: Thu Oct 15 15:36:05 2015 +0200 Committer: Marcus Eriksson Committed: Thu Oct 15 15:36:05 2015 +0200 -- --
[1/3] cassandra git commit: Skip redundant tombstones on compaction.
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 b42a0cfe8 -> 6a1c1d900 Skip redundant tombstones on compaction. Patch by Branimir Lambov; reviewed by marcuse for CASSANDRA-7953 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a61fc01f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a61fc01f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a61fc01f Branch: refs/heads/cassandra-3.0 Commit: a61fc01f418426847e3aad133127da3615813236 Parents: 02f88e3 Author: Branimir LambovAuthored: Wed Oct 7 14:46:24 2015 +0300 Committer: Marcus Eriksson Committed: Thu Oct 15 15:28:42 2015 +0200 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/ColumnIndex.java| 32 +++-- .../org/apache/cassandra/db/RangeTombstone.java | 135 ++- .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 + 4 files changed, 218 insertions(+), 75 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b16acb5..68b44ed 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.12 + * Merge range tombstones during compaction (CASSANDRA-7953) * (cqlsh) Distinguish negative and positive infinity in output (CASSANDRA-10523) * (cqlsh) allow custom time_format for COPY TO (CASSANDRA-8970) * Don't allow startup if the node's rack has changed (CASSANDRA-10242) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/ColumnIndex.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java b/src/java/org/apache/cassandra/db/ColumnIndex.java index d9d6a9c..0ea5c87 100644 --- a/src/java/org/apache/cassandra/db/ColumnIndex.java +++ b/src/java/org/apache/cassandra/db/ColumnIndex.java @@ -180,14 +180,24 @@ public class ColumnIndex firstColumn = column; startPosition = endPosition; // TODO: have that use the firstColumn as min + make sure we optimize that on read -endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, output, atomSerializer); +endPosition += tombstoneTracker.writeOpenedMarkers(firstColumn.name(), output, atomSerializer); blockSize = 0; // We don't count repeated tombstone marker in the block size, to avoid a situation // where we wouldn't make any progress because a block is filled by said marker + +maybeWriteRowHeader(); } -long size = atomSerializer.serializedSizeForSSTable(column); -endPosition += size; -blockSize += size; +if (tombstoneTracker.update(column, false)) +{ +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +size += atomSerializer.serializedSizeForSSTable(column); +endPosition += size; +blockSize += size; + +atomSerializer.serializeForSSTable(column, output); +} + +lastColumn = column; // if we hit the column index size that we have to index after, go ahead and index it. if (blockSize >= DatabaseDescriptor.getColumnIndexSize()) @@ -197,14 +207,6 @@ public class ColumnIndex firstColumn = null; lastBlockClosing = column; } - -maybeWriteRowHeader(); -atomSerializer.serializeForSSTable(column, output); - -// TODO: Should deal with removing unneeded tombstones -tombstoneTracker.update(column, false); - -lastColumn = column; } private void maybeWriteRowHeader() throws IOException @@ -216,12 +218,16 @@ public class ColumnIndex } } -public ColumnIndex build() +public ColumnIndex build() throws IOException { // all columns were GC'd after all if (lastColumn == null) return ColumnIndex.EMPTY; +long size = tombstoneTracker.writeUnwrittenTombstones(output, atomSerializer); +endPosition += size; +blockSize += size; + // the last column may have fallen on an index boundary already. if not, index it explicitly. if (result.columnsIndex.isEmpty() || lastBlockClosing != lastColumn) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/a61fc01f/src/java/org/apache/cassandra/db/RangeTombstone.java
[2/4] cassandra git commit: Merge branch 'cassandra-2.1' into cassandra-2.2
Merge branch 'cassandra-2.1' into cassandra-2.2 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3b7ccdfb Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3b7ccdfb Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3b7ccdfb Branch: refs/heads/trunk Commit: 3b7ccdfb15b43880804d61a5e7d62c82b3b664eb Parents: bee48eb a61fc01 Author: Marcus ErikssonAuthored: Thu Oct 15 15:33:29 2015 +0200 Committer: Marcus Eriksson Committed: Thu Oct 15 15:33:29 2015 +0200 -- .../org/apache/cassandra/db/ColumnIndex.java| 32 +++-- .../org/apache/cassandra/db/RangeTombstone.java | 135 ++- .../cassandra/cql3/RangeTombstoneMergeTest.java | 125 + 3 files changed, 217 insertions(+), 75 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/src/java/org/apache/cassandra/db/RangeTombstone.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b7ccdfb/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java -- diff --cc test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java index 000,0460a16..71634e9 mode 00,100644..100644 --- a/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java +++ b/test/unit/org/apache/cassandra/cql3/RangeTombstoneMergeTest.java @@@ -1,0 -1,125 +1,125 @@@ + /* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + package org.apache.cassandra.cql3; + + import static org.junit.Assert.assertEquals; + import static org.junit.Assert.assertTrue; + + import com.google.common.collect.Iterables; + + import org.junit.Before; + import org.junit.Test; + + import org.apache.cassandra.Util; + import org.apache.cassandra.db.*; + import org.apache.cassandra.db.columniterator.OnDiskAtomIterator; + import org.apache.cassandra.db.composites.*; ++import org.apache.cassandra.io.sstable.format.SSTableReader; + import org.apache.cassandra.io.sstable.ISSTableScanner; -import org.apache.cassandra.io.sstable.SSTableReader; + + public class RangeTombstoneMergeTest extends CQLTester + { + @Before + public void before() throws Throwable + { + createTable("CREATE TABLE %s(" + + " key text," + + " column text," + + " data text," + + " extra text," + + " PRIMARY KEY(key, column)" + + ");"); + + // If the sstable only contains tombstones during compaction it seems that the sstable either gets removed or isn't created (but that could probably be a separate JIRA issue). + execute("INSERT INTO %s (key, column, data) VALUES (?, ?, ?)", "1", "1", "1"); + } + + @Test + public void testEqualMerge() throws Throwable + { + addRemoveAndFlush(); + + for (int i=0; i<3; ++i) + { + addRemoveAndFlush(); + compact(); + } + + assertOneTombstone(); + } + + @Test + public void testRangeMerge() throws Throwable + { + addRemoveAndFlush(); + + execute("INSERT INTO %s (key, column, data, extra) VALUES (?, ?, ?, ?)", "1", "2", "2", "2"); + execute("DELETE extra FROM %s WHERE key=? AND column=?", "1", "2"); + + flush(); + compact(); + + execute("DELETE FROM %s WHERE key=? AND column=?", "1", "2"); + + flush(); + compact(); + + assertOneTombstone(); + } + + void assertOneTombstone() throws Throwable + { + assertRows(execute("SELECT column FROM %s"), +row("1")); + assertAllRows(row("1", "1", "1", null)); + + ColumnFamilyStore cfs = Keyspace.open(KEYSPACE).getColumnFamilyStore(currentTable()); + ColumnFamily cf = cfs.getColumnFamily(Util.dk("1"), Composites.EMPTY,
[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close
[ https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharvanath Pathak updated CASSANDRA-10534: -- Component/s: Core > CompressionInfo not being fsynced on close > -- > > Key: CASSANDRA-10534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10534 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Sharvanath Pathak > > I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, > this happened multiple times in our testing with hard node reboots. After > some investigation it seems like these file is not being fsynced, and that > can potentially lead to data corruption. > I checked for fsync calls using strace, and found them happening for all but > the following components: CompressionInfo, TOC.txt and digest.sha1. All seem > tolerable but the CompressionInfo seem tolerable. Also a quick look through > the code and did not revealed any fsync calls. Moreover, I suspect the commit > 4e95953f29d89a441dfe06d3f0393ed7dd8586df > (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) > to have caused the regression. Which removed the > {noformat} > getChannel().force(true); > {noformat} > from CompressionMetadata.Writer.close. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10534) CompressionInfo not being fsynced on close
Sharvanath Pathak created CASSANDRA-10534: - Summary: CompressionInfo not being fsynced on close Key: CASSANDRA-10534 URL: https://issues.apache.org/jira/browse/CASSANDRA-10534 Project: Cassandra Issue Type: Bug Reporter: Sharvanath Pathak I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close
[ https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-10534: Reproduced In: 2.1.9 Fix Version/s: 2.1.x > CompressionInfo not being fsynced on close > -- > > Key: CASSANDRA-10534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10534 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Sharvanath Pathak > Fix For: 2.1.x > > > I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, > this happened multiple times in our testing with hard node reboots. After > some investigation it seems like these file is not being fsynced, and that > can potentially lead to data corruption. I am wroking with version 2.1.9. > I checked for fsync calls using strace, and found them happening for all but > the following components: CompressionInfo, TOC.txt and digest.sha1. All seem > tolerable but the CompressionInfo seem tolerable. Also a quick look through > the code and did not revealed any fsync calls. Moreover, I suspect the commit > 4e95953f29d89a441dfe06d3f0393ed7dd8586df > (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) > to have caused the regression. Which removed the > {noformat} > getChannel().force(true); > {noformat} > from CompressionMetadata.Writer.close. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-10469) Fix collection indexing upgrade dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe resolved CASSANDRA-10469. - Resolution: Not A Problem This seems to be related to CASSANDRA-10468. Although the ClassCastException isn't observed during the failures of this particular test, it hasn't failed on cassci since 10468 was committed. Locally I'm seeing the same thing; in my latest comparison, 0/10 runs resulted in a failure when the {{UPGRADE_TO}} target is set to [26c8892|https://github.com/apache/cassandra/commit/26c8892] (the commit with the 10468 fix), compared to 7/10 failures where the {{UPGRADE_TO}} target is the previous commit ({{48889d2|https://github.com/apache/cassandra/commit/48889d2}}). I'm going to resolve this as not a problem, and we can reopen it if we see the problem reoccur (CASSANDRA-10468 was recently reopened). > Fix collection indexing upgrade dtest > - > > Key: CASSANDRA-10469 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10469 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Sam Tunnicliffe > Fix For: 3.0.0 rc2 > > > {{upgrade_tests/cql_tests.py:TestCQL.collection_indexing_test}} fails on the > upgrade path from 2.2 to 3.0. You can see the failure on CassCI here: > http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/44/testReport/upgrade_tests.cql_tests/TestCQL/collection_indexing_test/ > Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is > merged, these tests should also run with this upgrade path on normal 3.0 > jobs. Until then, you can run it with the following command: > {code} > SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 > nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.collection_indexing_test > {code} > Note that this test fails most of the time, but does occasionally succeed: > http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/44/testReport/upgrade_tests.cql_tests/TestCQL/collection_indexing_test/history/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10534) CompressionInfo not being fsynced on close
[ https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharvanath Pathak updated CASSANDRA-10534: -- Description: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am wroking with version 2.1.9. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. was: I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All seem tolerable but the CompressionInfo seem tolerable. Also a quick look through the code and did not revealed any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) to have caused the regression. Which removed the {noformat} getChannel().force(true); {noformat} from CompressionMetadata.Writer.close. > CompressionInfo not being fsynced on close > -- > > Key: CASSANDRA-10534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10534 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Sharvanath Pathak > > I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, > this happened multiple times in our testing with hard node reboots. After > some investigation it seems like these file is not being fsynced, and that > can potentially lead to data corruption. I am wroking with version 2.1.9. > I checked for fsync calls using strace, and found them happening for all but > the following components: CompressionInfo, TOC.txt and digest.sha1. All seem > tolerable but the CompressionInfo seem tolerable. Also a quick look through > the code and did not revealed any fsync calls. Moreover, I suspect the commit > 4e95953f29d89a441dfe06d3f0393ed7dd8586df > (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) > to have caused the regression. Which removed the > {noformat} > getChannel().force(true); > {noformat} > from CompressionMetadata.Writer.close. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10471) fix flapping empty_in_test dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959541#comment-14959541 ] Ariel Weisberg commented on CASSANDRA-10471: OK thanks for the explanation. The comments are good it's just me not having enough context. I can't tell if the dtests were harmed. There are 20 failures on the branch. The 3.0 branch hasn't had 20 failures in the past few builds. I compared the contents and it just looks sort of random and overlapping. +1 > fix flapping empty_in_test dtest > > > Key: CASSANDRA-10471 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10471 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Sylvain Lebresne > Fix For: 3.0.0 rc2 > > > {{upgrade_tests/cql_tests.py:TestCQL.empty_in_test}} fails about half the > time on the upgrade path from 2.2 to 3.0: > http://cassci.datastax.com/view/Upgrades/job/storage_engine_upgrade_dtest-22_tarball-30_HEAD/42/testReport/upgrade_tests.cql_tests/TestCQL/empty_in_test/history/ > Once [this dtest PR|https://github.com/riptano/cassandra-dtest/pull/586] is > merged, these tests should also run with this upgrade path on normal 3.0 > jobs. Until then, you can run it with the following command: > {code} > SKIP=false CASSANDRA_VERSION=binary:2.2.0 UPGRADE_TO=git:cassandra-3.0 > nosetests 2>&1 upgrade_tests/cql_tests.py:TestCQL.empty_in_test > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
[ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959869#comment-14959869 ] Joel Knighton commented on CASSANDRA-10089: --- I suppose Jim's suggestion is the sensible solution here. I've pushed branches of the same name to my repo to get CI results. I'll update once those are in, and hopefully they'll be good and I can +1. > NullPointerException in Gossip handleStateNormal > > > Key: CASSANDRA-10089 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10089 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Whilst comparing dtests for CASSANDRA-9970 I found [this failing > dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/] > in 2.2: > {code} > Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 > 15:39:57,873 CassandraDaemon.java:183 - Exception in thread > Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat > org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) > ~[main/:na] \tat > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > ~[main/:na] \tat > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]'] > {code} > I wasn't able to find it on unpatched branches but it is clearly not related > to CASSANDRA-9970, if anything it could have been a side effect of > CASSANDRA-9871. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:00 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was (Author: jeffery.griffith): [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at
[jira] [Commented] (CASSANDRA-10538) Assertion failed in LogFile when disk is full
[ https://issues.apache.org/jira/browse/CASSANDRA-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960136#comment-14960136 ] Stefania commented on CASSANDRA-10538: -- I've created a patch to ensure we update the in memory records after updating the disk state, to prevent the assertion in case we throw in {{LifecycleTransaction.doCommit}}. However we still need to verify this is what actually happened in the logs. I've also changed {{LogTransaction.doCommit} and {{doAbort}} so that they catch and return runtime exceptions. [~benedict] is this something we missed right? > Assertion failed in LogFile when disk is full > - > > Key: CASSANDRA-10538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10538 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > Attachments: > ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log > > > [~carlyeks] was running a stress job which filled up the disk. At the end of > the system logs there are several assertion errors: > {code} > ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 > - Exception in thread Thread[CompactionExecutor:1,1,main] > java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156) > ~[main/:na] > at > org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77) > ~[main/:na] > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) > ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_40] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_40] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] > INFO [IndexSummaryManager:1] 2015-10-14 21:10:40,099 > IndexSummaryManager.java:257 - Redistributing index summaries > ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 > CassandraDaemon.java:195 - Exception in thread > Thread[IndexSummaryManager:1,1,main] > java.lang.AssertionError: Already completed! > at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158) > ~[main/:na] > at > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242) > ~[main/:na] > at > org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE > {code} > We should not have an assertion if it can happen
[jira] [Comment Edited] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files
[ https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960123#comment-14960123 ] Ariel Weisberg edited comment on CASSANDRA-10421 at 10/16/15 3:47 AM: -- Syncing the directory won't sync the log file. You need sync the log file specifically to have that data be available. Syncing the directory makes rename and file creation durable, but not the files contained in the directory. bq. I also had to add several log.txnFile().close(); to the unit tests (whenever we test removeUnfinishedLeftovers) because on Windows we cannot delete files that are open. This is a bit ugly so maybe we should also go back to using FileUtils::appendLine, unless again you have performance concerns. I don't mind opening the file every time. However to sync it after every write you will need to keep it open long enough to do that. Or open it O_SYNC or something. was (Author: aweisberg): Syncing the directory won't sync the log file. You need sync the log file specifically to have that data be available. Syncing the directory makes rename and file creation durable, but not the files contained in the directory. bq. I also had to add several log.txnFile().close(); to the unit tests (whenever we test removeUnfinishedLeftovers) because on Windows we cannot delete files that are open. This is a bit ugly so maybe we should also go back to using FileUtils::appendLine, unless again you have performance concerns. I don't mind opening the file every time. > Potential issue with LogTransaction as it only checks in a single directory > for files > - > > Key: CASSANDRA-10421 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10421 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.0 rc2 > > > When creating a new LogTransaction we try to create the new logfile in the > same directory as the one we are writing to, but as we use > {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}} > this might end up in "any" of the configured data directories. If it does, > we will not be able to clean up leftovers as we check for files in the same > directory as the logfile was created: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163 > cc [~Stefania] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files
[ https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959979#comment-14959979 ] Stefania commented on CASSANDRA-10421: -- bq. So what I think I see is that when the LogTransaction completes it first writes to the log the commit record, and then starts making permanent changes to the the files on disk (deleting the old ones). But if it hasn't actually synced the log to disk then on a restart we could have a partial log and attempt to roll back, but it is too late because before the crash we had already deleted parts of the before state. At the end we should sync the log files before deleting the obsolete files right? Yes this is correct, that's why we flush after writing every record but if we want to survive a power cut then we should call {{fsync}}. bq. Was the intent to sync the folder when creating the log file or when adding a record which indicates the addition of other data files? The intent was to sync the folder when creating the file and to sync the content of the file with a flush when appending data to it. However flushing only passes the data to the operating system; it won't protect us from a power cut. This wasn't clear to me yesterday. At a minimum we should {{fsync}} after adding the final record and probably also when adding new records as you correctly pointed out. Therefore, I would strongly prefer to leave it as it was before and call {{fsync}} on the folder whenever we add a record, so in case of a power cut we have the most up-to-date data. This is what I did in the latest commit. Let me know if you have performance concerns. I also had to add several {{log.txnFile().close();}} to the unit tests (whenever we test {{removeUnfinishedLeftovers}}) because on Windows we cannot delete files that are open. This is a bit ugly so maybe we should also go back to using {{FileUtils::appendLine}}, unless again you have performance concerns. I've rebased on 3.0 and started a new CI run on both Linux and Windows. > Potential issue with LogTransaction as it only checks in a single directory > for files > - > > Key: CASSANDRA-10421 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10421 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.0 rc2 > > > When creating a new LogTransaction we try to create the new logfile in the > same directory as the one we are writing to, but as we use > {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}} > this might end up in "any" of the configured data directories. If it does, > we will not be able to clean up leftovers as we check for files in the same > directory as the logfile was created: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163 > cc [~Stefania] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 11:57 PM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was (Author: jeffery.griffith): [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: RUN3tpstats.jpg [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:05 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I see one thread for MemtablePostFlush and this is it: {code} "MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 nid=0x2d93 waiting on condition [0x7fda4b46c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0005838ba468> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was (Author: jeffery.griffith): [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the
[jira] [Commented] (CASSANDRA-10461) Fix sstableverify_test dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959922#comment-14959922 ] Stefania commented on CASSANDRA-10461: -- It's the warning introduced by CASSANDRA-10199 that causes an extra line in the output, I'll have to enhance the test to extract sstables from the output by matching the keyspace or table name rather than making assumptions on the output returned by {{sstableutil}}: {code} 'WARN 14:58:01 Only 37512 MB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots', {code} > Fix sstableverify_test dtest > > > Key: CASSANDRA-10461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10461 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Stefania > Labels: test > Fix For: 3.0.0 rc2 > > > The dtest for sstableverify is failing: > http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/ > It fails in the same way when I run it on OpenStack, so I don't think it's > just a CassCI problem. > [~slebresne] Looks like you made changes to this test recently: > https://github.com/riptano/cassandra-dtest/commit/51ab085f21e01cc8e5ad88a277cb4a43abd3f880 > Could you have a look at the failure? I'm assigning you for triage, but feel > free to reassign. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10538) Assertion failed in LogFile when disk is full
Stefania created CASSANDRA-10538: Summary: Assertion failed in LogFile when disk is full Key: CASSANDRA-10538 URL: https://issues.apache.org/jira/browse/CASSANDRA-10538 Project: Cassandra Issue Type: Bug Reporter: Stefania Assignee: Stefania Fix For: 3.x Attachments: ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log [~carlyeks] was running a stress job which filled up the disk. At the end of the system logs there are several assertion errors: {code} ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 - Exception in thread Thread[CompactionExecutor:1,1,main] java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156) ~[main/:na] at org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77) ~[main/:na] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110) ~[main/:na] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) ~[main/:na] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[main/:na] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220) ~[main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] INFO [IndexSummaryManager:1] 2015-10-14 21:10:40,099 IndexSummaryManager.java:257 - Redistributing index summaries ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 CassandraDaemon.java:195 - Exception in thread Thread[IndexSummaryManager:1,1,main] java.lang.AssertionError: Already completed! at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) ~[main/:na] at org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376) ~[main/:na] at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) ~[main/:na] at org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259) ~[main/:na] at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) ~[main/:na] at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193) ~[main/:na] at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158) ~[main/:na] at org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242) ~[main/:na] at org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE {code} We should not have an assertion if it can happen when the disk is full, we should rather have a runtime exception. I also would like to understand exactly what triggered the assertion. {{LifecycleTransaction}} can throw at the beginning of the commit method if it cannot write the record to disk, in which case all we have to do is ensure we update the records in memory after writing to disk (currently we update them before). However, I am not sure this is what happened here, it looks more like abort was called twice, which should never happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:13 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I see one thread for MemtablePostFlush and this is it: {code} "MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 nid=0x2d93 waiting on condition [0x7fda4b46c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0005838ba468> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I followed it for a while longer after this and it really looks like the post flush stacks blocked on that latch forever: {code} 00:01 MemtableFlushWriter 2 2 2024 0 0 MemtablePostFlush 1 47159 4277 0 0 MemtableReclaimMemory 0 0 2024 0 0 00:03 MemtableFlushWriter 3 3 2075 0 0 MemtablePostFlush
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:14 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I see one thread for MemtablePostFlush and this is it: {code} "MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 nid=0x2d93 waiting on condition [0x7fda4b46c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0005838ba468> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I followed it for a while longer after this and it really looks like the post flush stays blocked on that latch forever: {code} 00:01 MemtableFlushWriter 2 2 2024 0 0 MemtablePostFlush 1 47159 4277 0 0 MemtableReclaimMemory 0 0 2024 0 0 00:03 MemtableFlushWriter 3 3 2075 0 0 MemtablePostFlush
[jira] [Commented] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files
[ https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960123#comment-14960123 ] Ariel Weisberg commented on CASSANDRA-10421: Syncing the directory won't sync the log file. You need sync the log file specifically to have that data be available. Syncing the directory makes rename and file creation durable, but not the files contained in the directory. bq. I also had to add several log.txnFile().close(); to the unit tests (whenever we test removeUnfinishedLeftovers) because on Windows we cannot delete files that are open. This is a bit ugly so maybe we should also go back to using FileUtils::appendLine, unless again you have performance concerns. I don't mind opening the file every time. > Potential issue with LogTransaction as it only checks in a single directory > for files > - > > Key: CASSANDRA-10421 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10421 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.0 rc2 > > > When creating a new LogTransaction we try to create the new logfile in the > same directory as the one we are writing to, but as we use > {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}} > this might end up in "any" of the configured data directories. If it does, > we will not be able to clean up leftovers as we check for files in the same > directory as the logfile was created: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163 > cc [~Stefania] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959525#comment-14959525 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 8:06 PM: - Yeah doesn't look like the locking thread is deadlocked at all. I know this is a stretch, but considering we just migrated from 2.0.x, could there be something data specific that is confusing the compaction? Not sure where to check for slow flushes. Should i just watch tpstats? was (Author: jeffery.griffith): Yeah doesn't look blocked. How can i check for the slow flushes? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959567#comment-14959567 ] Carl Yeksigian commented on CASSANDRA-10522: +1 > counter upgrade dtest fails on 3.0 with JVM assertions disabled > --- > > Key: CASSANDRA-10522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10522 > Project: Cassandra > Issue Type: Sub-task >Reporter: Andrew Hust >Assignee: Yuki Morishita > Fix For: 3.0.0 rc2 > > > {{counter_tests.TestCounters.upgrade_test}} > will fail when run on a cluster with JVM assertions disabled. The tests will > hang when cassandra throws the following exception: > {code} > java.lang.IllegalStateException: No match found > at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60] > at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) > [main/:na] > {code} > These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 > when assertions are enabled. > Ran against: > apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}} > apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}} > Ran with cmd: > {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv > counter_tests.TestCounters.upgrade_test}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
cassandra git commit: Define cassandra_storagedir variable in debian/cassandra.in.sh
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 6a1c1d900 -> c3b2aedfd Define cassandra_storagedir variable in debian/cassandra.in.sh patch by Paulo Motta; reviewed by Aleksey Yeschenko for CASSANDRA-10525 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c3b2aedf Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c3b2aedf Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c3b2aedf Branch: refs/heads/cassandra-3.0 Commit: c3b2aedfd8bfce193abc8ed3809a850e603361d5 Parents: 6a1c1d9 Author: Paulo MottaAuthored: Wed Oct 14 10:12:51 2015 -0700 Committer: Aleksey Yeschenko Committed: Thu Oct 15 23:13:25 2015 +0100 -- debian/cassandra.in.sh | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3b2aedf/debian/cassandra.in.sh -- diff --git a/debian/cassandra.in.sh b/debian/cassandra.in.sh index 9f69ac9..8fcaf9c 100644 --- a/debian/cassandra.in.sh +++ b/debian/cassandra.in.sh @@ -4,6 +4,10 @@ CASSANDRA_CONF=/etc/cassandra CASSANDRA_HOME=/usr/share/cassandra +# the default location for commitlogs, sstables, and saved caches +# if not set in cassandra.yaml +cassandra_storagedir=/var/lib/cassandra + # The java classpath (required) if [ -n "$CLASSPATH" ]; then CLASSPATH=$CLASSPATH:$CASSANDRA_CONF
[jira] [Commented] (CASSANDRA-10525) Hints directory not created on debian packaged install
[ https://issues.apache.org/jira/browse/CASSANDRA-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959731#comment-14959731 ] Aleksey Yeschenko commented on CASSANDRA-10525: --- Committed as [c3b2aedfd8bfce193abc8ed3809a850e603361d5|https://github.com/apache/cassandra/commit/c3b2aedfd8bfce193abc8ed3809a850e603361d5] to 3.0 and merged with trunk, thanks. > Hints directory not created on debian packaged install > -- > > Key: CASSANDRA-10525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10525 > Project: Cassandra > Issue Type: Bug >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 3.0.0 rc2 > > > Reproduction steps: > * Create debian package install with {{dpkg-buildpackage -uc -us}} > * Install package with {{sudo dpkg -i ../cassandra_3.0.0\~rc1_all.deb}} > * Start cassandra with {{sudo service cassandra start}} > Cassandra does not start with the following message on > {{/var/log/cassandra/system.log}}: > {noformat} > DEBUG [main] 2015-10-14 09:28:36,083 StartupChecks.java:191 - Checking > directory /var/lib/cassandra/data > DEBUG [main] 2015-10-14 09:28:36,087 StartupChecks.java:191 - Checking > directory /var/lib/cassandra/commitlog > DEBUG [main] 2015-10-14 09:28:36,087 StartupChecks.java:191 - Checking > directory /var/lib/cassandra/saved_caches > DEBUG [main] 2015-10-14 09:28:36,087 StartupChecks.java:191 - Checking > directory /hints > WARN [main] 2015-10-14 09:28:36,088 StartupChecks.java:197 - Directory > /hints doesn't exist > ERROR [main] 2015-10-14 09:28:36,088 CassandraDaemon.java:702 - Has no > permission to create directory /hints > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files
[ https://issues.apache.org/jira/browse/CASSANDRA-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959729#comment-14959729 ] Ariel Weisberg commented on CASSANDRA-10421: So what I think I see is that when the LogTransaction completes it first writes to the log the commit record, and then starts making permanent changes to the the files on disk (deleting the old ones). But if it hasn't actually synced the log to disk then on a restart we could have a partial log and attempt to roll back, but it is too late because before the crash we had already deleted parts of the before state. At the end we should sync the log files before deleting the obsolete files right? Before we add a new file that we want to have cleaned up maybe we also want to make sure the record is one disk so that it will definitely be cleaned up? Maybe not necessary since it is just additional data that will be compacted later. Maybe optimizing for power failure isn't necessary, but then why are we syncing directories? [Here it seems like you don't sync the folder when appending every record?|https://github.com/apache/cassandra/commit/8e02e47e1a4a86428bec61d8975a9706c544003b#diff-a7c36820cf8658b605948a23e3033f88R76]. Was the intent to sync the folder when creating the log file or when adding a record which indicates the addition of other data files? I am generally +1 other then my confusion over how syncing of the log file contents is handled. The tests don't seem to match trunk. I gave them another spin on the 3.0 branch to get another sample. > Potential issue with LogTransaction as it only checks in a single directory > for files > - > > Key: CASSANDRA-10421 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10421 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.0 rc2 > > > When creating a new LogTransaction we try to create the new logfile in the > same directory as the one we are writing to, but as we use > {{[directories.getDirectoryForNewSSTables()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java#L125]}} > this might end up in "any" of the configured data directories. If it does, > we will not be able to clean up leftovers as we check for files in the same > directory as the logfile was created: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L163 > cc [~Stefania] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10537) CONTAINS and CONTAINS KEY support for Lightweight Transactions
Nimi Wariboko Jr. created CASSANDRA-10537: - Summary: CONTAINS and CONTAINS KEY support for Lightweight Transactions Key: CASSANDRA-10537 URL: https://issues.apache.org/jira/browse/CASSANDRA-10537 Project: Cassandra Issue Type: Improvement Reporter: Nimi Wariboko Jr. Fix For: 2.1.x Conditional updates currently do not support CONTAINS and CONTAINS KEY conditions. Queries such as {{UPDATE mytable SET somefield = 4 WHERE pk = 'pkv' IF set_column CONTAINS 5;}} are not possible. Would it also be possible to support the negation of these (ex. testing that a value does not exist inside a set)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
[ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959718#comment-14959718 ] Joel Knighton commented on CASSANDRA-10089: --- Unfortunately, it looks like whatever environmental issues affected the first run also got the most recent run. Fortunately, looking at 2.1/2.2 dtest/testall runs recently, it seems to have been resolved. Can you trigger one more run? If that doesn't work, I'll evaluate tests locally as part of review. Sorry about this. > NullPointerException in Gossip handleStateNormal > > > Key: CASSANDRA-10089 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10089 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Whilst comparing dtests for CASSANDRA-9970 I found [this failing > dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/] > in 2.2: > {code} > Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 > 15:39:57,873 CassandraDaemon.java:183 - Exception in thread > Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat > org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) > ~[main/:na] \tat > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > ~[main/:na] \tat > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]'] > {code} > I wasn't able to find it on unpatched branches but it is clearly not related > to CASSANDRA-9970, if anything it could have been a side effect of > CASSANDRA-9871. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
[ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959718#comment-14959718 ] Joel Knighton edited comment on CASSANDRA-10089 at 10/15/15 10:05 PM: -- Unfortunately, it looks like whatever environmental issues affected the first run also hit the most recent run. Fortunately, looking at 2.1/2.2 dtest/testall runs recently, it seems to have been resolved. Can you trigger one more run? If that doesn't work, I'll evaluate tests locally as part of review. Sorry about this. was (Author: jkni): Unfortunately, it looks like whatever environmental issues affected the first run also got the most recent run. Fortunately, looking at 2.1/2.2 dtest/testall runs recently, it seems to have been resolved. Can you trigger one more run? If that doesn't work, I'll evaluate tests locally as part of review. Sorry about this. > NullPointerException in Gossip handleStateNormal > > > Key: CASSANDRA-10089 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10089 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Whilst comparing dtests for CASSANDRA-9970 I found [this failing > dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/] > in 2.2: > {code} > Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 > 15:39:57,873 CassandraDaemon.java:183 - Exception in thread > Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat > org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) > ~[main/:na] \tat > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > ~[main/:na] \tat > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]'] > {code} > I wasn't able to find it on unpatched branches but it is clearly not related > to CASSANDRA-9970, if anything it could have been a side effect of > CASSANDRA-9871. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[4/4] cassandra git commit: Merge branch 'cassandra-3.0' into trunk
Merge branch 'cassandra-3.0' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/1cb9a02b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/1cb9a02b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/1cb9a02b Branch: refs/heads/trunk Commit: 1cb9a02bd951b424d960047297084d6ce4b18b6c Parents: 0e3da95 a52597d Author: Yuki MorishitaAuthored: Thu Oct 15 17:31:56 2015 -0500 Committer: Yuki Morishita Committed: Thu Oct 15 17:31:56 2015 -0500 -- CHANGES.txt | 1 + debian/cassandra.in.sh | 4 src/java/org/apache/cassandra/db/lifecycle/LogFile.java | 3 ++- 3 files changed, 7 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/1cb9a02b/CHANGES.txt -- diff --cc CHANGES.txt index e2d989c,dcacc69..5265215 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,5 +1,10 @@@ +3.2 + * Abort in-progress queries that time out (CASSANDRA-7392) + * Add transparent data encryption core classes (CASSANDRA-9945) + + 3.0-rc2 + * Fix LogFile throws Exception when assertion is disabled (CASSANDRA-10522) * Revert CASSANDRA-7486, make CMS default GC, move GC config to conf/jvm.options (CASSANDRA-10403) * Fix TeeingAppender causing some logs to be truncated/empty (CASSANDRA-10447)
[3/4] cassandra git commit: Fix LogFile throws Exception when assertion is disabled
Fix LogFile throws Exception when assertion is disabled patch by yukim; reviewed by carlyeks for CASSANDRA-10522 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a52597d8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a52597d8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a52597d8 Branch: refs/heads/cassandra-3.0 Commit: a52597d81396e09274ecf6d05ebbf0e24c259fc6 Parents: c3b2aed Author: Yuki MorishitaAuthored: Wed Oct 14 11:03:22 2015 -0500 Committer: Yuki Morishita Committed: Thu Oct 15 17:30:59 2015 -0500 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/lifecycle/LogFile.java | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index fa74539..dcacc69 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0-rc2 + * Fix LogFile throws Exception when assertion is disabled (CASSANDRA-10522) * Revert CASSANDRA-7486, make CMS default GC, move GC config to conf/jvm.options (CASSANDRA-10403) * Fix TeeingAppender causing some logs to be truncated/empty (CASSANDRA-10447) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/src/java/org/apache/cassandra/db/lifecycle/LogFile.java -- diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java index c698722..bff3724 100644 --- a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java +++ b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java @@ -43,7 +43,8 @@ final class LogFile static LogFile make(File logFile, int folderDescriptor) { Matcher matcher = LogFile.FILE_REGEX.matcher(logFile.getName()); -assert matcher.matches() && matcher.groupCount() == 3; +boolean matched = matcher.matches(); +assert matched && matcher.groupCount() == 3; // For now we don't need this but it is there in case we need to change // file format later on, the version is the sstable version as defined in BigFormat
[2/4] cassandra git commit: Fix LogFile throws Exception when assertion is disabled
Fix LogFile throws Exception when assertion is disabled patch by yukim; reviewed by carlyeks for CASSANDRA-10522 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a52597d8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a52597d8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a52597d8 Branch: refs/heads/trunk Commit: a52597d81396e09274ecf6d05ebbf0e24c259fc6 Parents: c3b2aed Author: Yuki MorishitaAuthored: Wed Oct 14 11:03:22 2015 -0500 Committer: Yuki Morishita Committed: Thu Oct 15 17:30:59 2015 -0500 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/lifecycle/LogFile.java | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index fa74539..dcacc69 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0-rc2 + * Fix LogFile throws Exception when assertion is disabled (CASSANDRA-10522) * Revert CASSANDRA-7486, make CMS default GC, move GC config to conf/jvm.options (CASSANDRA-10403) * Fix TeeingAppender causing some logs to be truncated/empty (CASSANDRA-10447) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a52597d8/src/java/org/apache/cassandra/db/lifecycle/LogFile.java -- diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java index c698722..bff3724 100644 --- a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java +++ b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java @@ -43,7 +43,8 @@ final class LogFile static LogFile make(File logFile, int folderDescriptor) { Matcher matcher = LogFile.FILE_REGEX.matcher(logFile.getName()); -assert matcher.matches() && matcher.groupCount() == 3; +boolean matched = matcher.matches(); +assert matched && matcher.groupCount() == 3; // For now we don't need this but it is there in case we need to change // file format later on, the version is the sstable version as defined in BigFormat
[1/4] cassandra git commit: Define cassandra_storagedir variable in debian/cassandra.in.sh
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 c3b2aedfd -> a52597d81 refs/heads/trunk 0e3da95d6 -> 1cb9a02bd Define cassandra_storagedir variable in debian/cassandra.in.sh patch by Paulo Motta; reviewed by Aleksey Yeschenko for CASSANDRA-10525 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c3b2aedf Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c3b2aedf Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c3b2aedf Branch: refs/heads/trunk Commit: c3b2aedfd8bfce193abc8ed3809a850e603361d5 Parents: 6a1c1d9 Author: Paulo MottaAuthored: Wed Oct 14 10:12:51 2015 -0700 Committer: Aleksey Yeschenko Committed: Thu Oct 15 23:13:25 2015 +0100 -- debian/cassandra.in.sh | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3b2aedf/debian/cassandra.in.sh -- diff --git a/debian/cassandra.in.sh b/debian/cassandra.in.sh index 9f69ac9..8fcaf9c 100644 --- a/debian/cassandra.in.sh +++ b/debian/cassandra.in.sh @@ -4,6 +4,10 @@ CASSANDRA_CONF=/etc/cassandra CASSANDRA_HOME=/usr/share/cassandra +# the default location for commitlogs, sstables, and saved caches +# if not set in cassandra.yaml +cassandra_storagedir=/var/lib/cassandra + # The java classpath (required) if [ -n "$CLASSPATH" ]; then CLASSPATH=$CLASSPATH:$CASSANDRA_CONF
[jira] [Assigned] (CASSANDRA-10517) Make sure all unit tests run on CassCI on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Knighton reassigned CASSANDRA-10517: - Assignee: Joel Knighton > Make sure all unit tests run on CassCI on Windows > - > > Key: CASSANDRA-10517 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10517 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Joel Knighton > Labels: triage > Fix For: 3.0.0 rc2 > > > It seems that some Windows unit tests aren't run sometimes on CassCI, and > there's no error reporting for this. For instance, this test was introduced > around the time build #38 would have happened, but has only run in builds > #50-3 and #64: > http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.cql3/ViewTest/testPrimaryKeyIsNotNull/history/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Yeksigian updated CASSANDRA-10522: --- Reviewer: Carl Yeksigian > counter upgrade dtest fails on 3.0 with JVM assertions disabled > --- > > Key: CASSANDRA-10522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10522 > Project: Cassandra > Issue Type: Sub-task >Reporter: Andrew Hust >Assignee: Yuki Morishita > Fix For: 3.0.0 rc2 > > > {{counter_tests.TestCounters.upgrade_test}} > will fail when run on a cluster with JVM assertions disabled. The tests will > hang when cassandra throws the following exception: > {code} > java.lang.IllegalStateException: No match found > at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60] > at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) > [main/:na] > {code} > These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 > when assertions are enabled. > Ran against: > apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}} > apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}} > Ran with cmd: > {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv > counter_tests.TestCounters.upgrade_test}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959508#comment-14959508 ] T Jake Luciani commented on CASSANDRA-10515: That's RUNNABLE though not BLOCKED. Are you actually deadlocking or only seeing slow flushes? [~krummas] any ideas? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959554#comment-14959554 ] T Jake Luciani commented on CASSANDRA-10515: Yeah if the COMPLETED column for flushing is incrementing > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959608#comment-14959608 ] Jeff Griffith commented on CASSANDRA-10515: --- I had restarted but I'll watch live the next iteration. As you see upwards in the comments though, they do start piling up: MemtableFlushWriter 1 1 1574 0 0 MemtablePostFlush 1 13755 134889 0 0 MemtableReclaimMemory 0 0 1574 0 0 In the previous iteration, there were four threads for MemtableFlushWriter all blocked behind the runnable LeveledManifest.getCandidatesFor(LeveledManifest.java:572) > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959525#comment-14959525 ] Jeff Griffith commented on CASSANDRA-10515: --- Yeah doesn't look blocked. How can i check for the slow flushes? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-10524) Add ability to skip TIME_WAIT sockets on port check on Windows startup
[ https://issues.apache.org/jira/browse/CASSANDRA-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie reopened CASSANDRA-10524: - > Add ability to skip TIME_WAIT sockets on port check on Windows startup > -- > > Key: CASSANDRA-10524 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10524 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Labels: Windows > Fix For: 3.0.0 rc2, 2.2.4 > > Attachments: win_aggressive_startup.txt > > > C* sockets are often staying TIME_WAIT for up to 120 seconds (2x max segment > lifetime) for me in my dev environment on Windows. This is rather obnoxious > since it means I can't launch C* for up to 2 minutes after stopping it. > Attaching a patch that adds a simple -a for aggressive startup to the launch > scripts to ignore duplicate port check from netstat if it's TIME_WAIT. Also > snuck in some more liberal interpretation of help strings in the .ps1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-10522: --- Tester: Andrew Hust > counter upgrade dtest fails on 3.0 with JVM assertions disabled > --- > > Key: CASSANDRA-10522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10522 > Project: Cassandra > Issue Type: Sub-task >Reporter: Andrew Hust >Assignee: Yuki Morishita > Fix For: 3.0.0 rc2 > > > {{counter_tests.TestCounters.upgrade_test}} > will fail when run on a cluster with JVM assertions disabled. The tests will > hang when cassandra throws the following exception: > {code} > java.lang.IllegalStateException: No match found > at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60] > at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) > [main/:na] > {code} > These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 > when assertions are enabled. > Ran against: > apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}} > apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}} > Ran with cmd: > {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv > counter_tests.TestCounters.upgrade_test}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10536) Batch statements with multiple updates to partition error when table is indexed
Tyler Hobbs created CASSANDRA-10536: --- Summary: Batch statements with multiple updates to partition error when table is indexed Key: CASSANDRA-10536 URL: https://issues.apache.org/jira/browse/CASSANDRA-10536 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Sylvain Lebresne Fix For: 3.0.0 rc2 If a {{BATCH}} statement contains multiple {{UPDATE}} statements that update the same partition, and a secondary index exists on that table, the batch statement will error: {noformat} ServerError: {noformat} with the following traceback in the logs: {noformat} ERROR 20:53:46 Unexpected exception during request java.lang.IllegalStateException: An update should not be written again once it has been read at org.apache.cassandra.db.partitions.PartitionUpdate.assertNotBuilt(PartitionUpdate.java:504) ~[main/:na] at org.apache.cassandra.db.partitions.PartitionUpdate.add(PartitionUpdate.java:535) ~[main/:na] at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:96) ~[main/:na] at org.apache.cassandra.cql3.statements.ModificationStatement.addUpdates(ModificationStatement.java:667) ~[main/:na] at org.apache.cassandra.cql3.statements.BatchStatement.getMutations(BatchStatement.java:234) ~[main/:na] at org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:335) ~[main/:na] at org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:321) ~[main/:na] at org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:316) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:205) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:471) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:448) ~[main/:na] at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:130) ~[main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507) [main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401) [main/:na] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [main/:na] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [main/:na] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] {noformat} This is due to {{SecondaryIndexManager.validate()}} triggering a build of the {{PartitionUpdate}} (stacktrace from debugging the build() call): {noformat} at org.apache.cassandra.db.partitions.PartitionUpdate.build(PartitionUpdate.java:571) [main/:na] at org.apache.cassandra.db.partitions.PartitionUpdate.maybeBuild(PartitionUpdate.java:561) [main/:na] at org.apache.cassandra.db.partitions.PartitionUpdate.iterator(PartitionUpdate.java:418) [main/:na] at org.apache.cassandra.index.internal.CassandraIndex.validateRows(CassandraIndex.java:560) [main/:na] at org.apache.cassandra.index.internal.CassandraIndex.validate(CassandraIndex.java:314) [main/:na] at org.apache.cassandra.index.SecondaryIndexManager.lambda$validate$75(SecondaryIndexManager.java:642) [main/:na] at org.apache.cassandra.index.SecondaryIndexManager$$Lambda$166/1388080038.accept(Unknown Source) [main/:na] at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) [na:1.8.0_45] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) [na:1.8.0_45] at java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3566) [na:1.8.0_45] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512) [na:1.8.0_45] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502) [na:1.8.0_45] at
[jira] [Commented] (CASSANDRA-10522) counter upgrade dtest fails on 3.0 with JVM assertions disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959736#comment-14959736 ] Andrew Hust commented on CASSANDRA-10522: - Confirmed that these tests (and duplicate jira tests) now pass and no exception is thrown. Ran on: yukim/10522: {{93783039918f8662760195e0f33c4cab20b17c8d}} > counter upgrade dtest fails on 3.0 with JVM assertions disabled > --- > > Key: CASSANDRA-10522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10522 > Project: Cassandra > Issue Type: Sub-task >Reporter: Andrew Hust >Assignee: Yuki Morishita > Fix For: 3.0.0 rc2 > > > {{counter_tests.TestCounters.upgrade_test}} > will fail when run on a cluster with JVM assertions disabled. The tests will > hang when cassandra throws the following exception: > {code} > java.lang.IllegalStateException: No match found > at java.util.regex.Matcher.group(Matcher.java:536) ~[na:1.8.0_60] > at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:52) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:399) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:552) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:571) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) > ~[main/:na] > at > org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:169) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:548) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:676) > [main/:na] > {code} > These tests both pass with/without JVM assertions on C* 2.2 and pass on 3.0 > when assertions are enabled. > Ran against: > apache/cassandra-2.2: {{7cab3272455bdd16b639c510416ae339a8613414}} > apache/cassandra-3.0: {{f21c888510b0dbbea1a63459476f2dc54093de63}} > Ran with cmd: > {{JVM_EXTRA_OPTS=-da PRINT_DEBUG=true nosetests -xsv > counter_tests.TestCounters.upgrade_test}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
[ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959758#comment-14959758 ] Jim Witschey commented on CASSANDRA-10089: -- Sorry to insert myself, but: you should be able to trigger the builds you want by just pulling down Stefania's branch and pushing it to GitHub. > NullPointerException in Gossip handleStateNormal > > > Key: CASSANDRA-10089 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10089 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Whilst comparing dtests for CASSANDRA-9970 I found [this failing > dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/] > in 2.2: > {code} > Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 > 15:39:57,873 CassandraDaemon.java:183 - Exception in thread > Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat > org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) > ~[main/:na] \tat > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > ~[main/:na] \tat > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]'] > {code} > I wasn't able to find it on unpatched branches but it is clearly not related > to CASSANDRA-9970, if anything it could have been a side effect of > CASSANDRA-9871. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10529) Channel.size() is costly, mutually exclusive, and on the critical path
[ https://issues.apache.org/jira/browse/CASSANDRA-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960139#comment-14960139 ] Stefania commented on CASSANDRA-10529: -- I agree it's definitely worth removing the assertion. Is there anything else you require for this ticket? > Channel.size() is costly, mutually exclusive, and on the critical path > -- > > Key: CASSANDRA-10529 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10529 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Stefania > Fix For: 3.0.0 rc2 > > > [~stefania_alborghetti] mentioned this already on another ticket, but I have > lost track of exactly where. While benchmarking it became apparent this was a > noticeable bottleneck for small in-memory workloads with few files, > especially with RF=1. We should probably fix this soon, since it is trivial > to do so, and the call is only to impose an assertion that our requested > length is less than the file size. It isn't possible to safely memoize a > value anywhere we can guarantee to be able to safely refer to it without some > refactoring, so I suggest simply removing the assertion for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10461) Fix sstableverify_test dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960198#comment-14960198 ] Stefania commented on CASSANDRA-10461: -- The pull request for ignoring extra output lines is [here|https://github.com/riptano/cassandra-dtest/pull/613]. > Fix sstableverify_test dtest > > > Key: CASSANDRA-10461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10461 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jim Witschey >Assignee: Stefania > Labels: test > Fix For: 3.0.0 rc2 > > > The dtest for sstableverify is failing: > http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/ > It fails in the same way when I run it on OpenStack, so I don't think it's > just a CassCI problem. > [~slebresne] Looks like you made changes to this test recently: > https://github.com/riptano/cassandra-dtest/commit/51ab085f21e01cc8e5ad88a277cb4a43abd3f880 > Could you have a look at the failure? I'm assigning you for triage, but feel > free to reassign. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960192#comment-14960192 ] Marcus Eriksson commented on CASSANDRA-10515: - could you post nodetool cfstats and your node config ? This looks like CASSANDRA-9882 but that problem was with DTCS and very many sstables. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10539) Different encodings used between nodes can cause inconsistently generated prepared statement ids
Andy Tolbert created CASSANDRA-10539: Summary: Different encodings used between nodes can cause inconsistently generated prepared statement ids Key: CASSANDRA-10539 URL: https://issues.apache.org/jira/browse/CASSANDRA-10539 Project: Cassandra Issue Type: Bug Reporter: Andy Tolbert Priority: Minor [From the java-driver mailing list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI] / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955] If you have nodes in your cluster that are using a different default character set it's possible for nodes to generate different prepared statement ids for the same 'keyspace + query string' combination. I imagine this is not a very typical or desired configuration (thus the low severity). This is because [MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54] uses [String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()] which relies on the default charset. In the general case this is fine, but if you use some characters in your query string such as [Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE] ('\u') the byte representation may vary based on the coding. I was able to reproduce this configuring a 2-node cluster with node1 using file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}. The java-driver test demonstrates this can be found [here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10539) Different encodings used between nodes can cause inconsistently generated prepared statement ids
[ https://issues.apache.org/jira/browse/CASSANDRA-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Tolbert updated CASSANDRA-10539: - Description: [From the java-driver mailing list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI] / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955] If you have nodes in your cluster that are using a different default character set it's possible for nodes to generate different prepared statement ids for the same 'keyspace + query string' combination. I imagine this is not a very typical or desired configuration (thus the low severity). This is because [MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54] uses [String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()] which relies on the default charset. In the general case this is fine, but if you use some characters in your query string such as [Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE] ('\u') the byte representation may vary based on the coding. I was able to reproduce this configuring a 2-node cluster with node1 using file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}. The java-driver test that demonstrates this can be found [here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java]. was: [From the java-driver mailing list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI] / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955] If you have nodes in your cluster that are using a different default character set it's possible for nodes to generate different prepared statement ids for the same 'keyspace + query string' combination. I imagine this is not a very typical or desired configuration (thus the low severity). This is because [MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54] uses [String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()] which relies on the default charset. In the general case this is fine, but if you use some characters in your query string such as [Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE] ('\u') the byte representation may vary based on the coding. I was able to reproduce this configuring a 2-node cluster with node1 using file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}. The java-driver test demonstrates this can be found [here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java]. > Different encodings used between nodes can cause inconsistently generated > prepared statement ids > - > > Key: CASSANDRA-10539 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10539 > Project: Cassandra > Issue Type: Bug >Reporter: Andy Tolbert >Priority: Minor > > [From the java-driver mailing > list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI] > / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955] > If you have nodes in your cluster that are using a different default > character set it's possible for nodes to generate different prepared > statement ids for the same 'keyspace + query string' combination. I imagine > this is not a very typical or desired configuration (thus the low severity). > This is because > [MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54] > uses > [String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()] > which relies on the default charset. > In the general case this is fine, but if you use some characters in your > query string such as > [Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE] > ('\u') the byte representation may vary based on the coding. > I was able to reproduce this configuring a 2-node cluster with node1 using > file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}. The > java-driver test that demonstrates this can be found > [here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959218#comment-14959218 ] Jeff Griffith commented on CASSANDRA-10515: --- btw, we tried commitlog_segment_recycling: false but we realized after this should already be the default. we briefly thought it made a difference after restart that node but the problem did return after several hours. there is some mention in another jira about tuning the number of memtable flush writers. could this be an issue? it's still difficult to explain though why we only see this in a few nodes in the ten clusters all with the same config. will try to get the thread dump asap. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10519) RepairException: [repair #... on .../..., (...,...]] Validation failed in /w.x.y.z
[ https://issues.apache.org/jira/browse/CASSANDRA-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959239#comment-14959239 ] Yuki Morishita commented on CASSANDRA-10519: {code} Cannot start multiple repair sessions over the same sstables {code} There was leftover incremental repair session on one of the nodes. Restarting node will solve the problem. Recent version of C* will try to clear out leftover, so it should be less likely to happen. (Not perfect though, we need something like CASSANDRA-10302 to keep state clean.) > RepairException: [repair #... on .../..., (...,...]] Validation failed in > /w.x.y.z > -- > > Key: CASSANDRA-10519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10519 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 7, JDK 8u60, Cassandra 2.2.2 (upgraded from 2.1.5) >Reporter: Gábor Auth > > Sometimes the repair fails: > {code} > ERROR [Repair#3:1] 2015-10-14 06:22:56,490 CassandraDaemon.java:185 - > Exception in thread Thread[Repair#3:1,5,RMI Runtime] > com.google.common.util.concurrent.UncheckedExecutionException: > org.apache.cassandra.exceptions.RepairException: [repair > #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, > (2414492737393085601,27880539413409 > 54029]] Validation failed in /w.y.x.z > at > com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387) > ~[guava-16.0.jar:na] > at > com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) > ~[guava-16.0.jar:na] > at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:169) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60] > Caused by: org.apache.cassandra.exceptions.RepairException: [repair > #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, > (2414492737393085601,2788053941340954029]] Validation failed in /w.y.x.z > at > org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[apache-cassandra-2.2.2.jar:2.2.2] > ... 3 common frames omitted > {code} > And here is the w.y.x.z side: > {code} > ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 > CompactionManager.java:1053 - Cannot start multiple repair sessions over the > same sstables > ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,487 Validator.java:246 - > Failed creating a merkle tree for [repair > #018adc70-723c-11e5-b0d8-6b2151e4d388 on keyspace/table, > (2414492737393085601,2788053941340954029]], /a.b.c.d (see log for details) > ERROR [ValidationExecutor:7] 2015-10-14 06:22:56,488 CassandraDaemon.java:185 > - Exception in thread Thread[ValidationExecutor:7,1,main] > java.lang.RuntimeException: Cannot start multiple repair sessions over the > same sstables > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1054) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:86) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:652) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > ... > ERROR [Reference-Reaper:1] 2015-10-14 06:23:21,439 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@74fc054a) to class >
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959208#comment-14959208 ] Jeff Griffith commented on CASSANDRA-10515: --- working on it [~mishail] > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause
[ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959217#comment-14959217 ] Robbie Strickland commented on CASSANDRA-10449: --- Ok [~mishail] I will re-run with heap dump enabled (we had it turned off for some reason) and post it. > OOM on bootstrap due to long GC pause > - > > Key: CASSANDRA-10449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10449 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04, AWS >Reporter: Robbie Strickland > Labels: gc > Fix For: 2.1.x > > Attachments: system.log.10-05, thread_dump.log > > > I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and > 500-700GB per node. SSTable counts are <10 per table. I am attempting to > provision additional nodes, but bootstrapping OOMs every time after about 10 > hours with a sudden long GC pause: > {noformat} > INFO [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old > Generation GC in 1586126ms. G1 Old Gen: 49213756976 -> 49072277176; > ... > ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 > CassandraDaemon.java:223 - Exception in thread > Thread[MemtableFlushWriter:454,5,main] > java.lang.OutOfMemoryError: Java heap space > {noformat} > I have tried increasing max heap to 48G just to get through the bootstrap, to > no avail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959498#comment-14959498 ] Jeff Griffith commented on CASSANDRA-10515: --- A second iteration. Ran into a second instance of metrics via RMI but caught it very early when only a few were blocked behind the compaction. Still looks like the same general place: {code} "CompactionExecutor:16" #1502 daemon prio=1 os_prio=4 tid=0x7fb78c4f2000 nid=0xf7ff runnable [0x7fb751941000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.putVal(HashMap.java:641) at java.util.HashMap.put(HashMap.java:611) at java.util.HashSet.add(HashSet.java:219) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:512) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004bcf24298> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004bcbec488> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004b98f1b00> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)