[jira] [Commented] (CASSANDRA-9870) Improve cassandra-stress graphing
[ https://issues.apache.org/jira/browse/CASSANDRA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697322#comment-14697322 ] Shawn Kumar commented on CASSANDRA-9870: Just a quick update: the code for this lives [here|https://github.com/shawnkumar/cstargraph]. I have built off what Ryan had already written out, but changes were quite significant since the code was previously pretty much limited to displaying raw metrics and organized for that purpose. Here are some things that have been implemented: support for multiple datasets per revision (see lat_all graph), support for baseline-requiring graphs (see throughput % improvement) and fixing/rebuilding the existing functions for these graphs (ie. scaling, legends, colouring etc.). The remaining things left include: boxplot support (currently working on this using d3plus library), logarithmic scaling, fleshing out data processing for remaining graphs, adding legend entries/changing line styles for different datasets under same revision and finally the aesthetic/UI changes - namely the 'aggregating' screen showing all graphs. Improve cassandra-stress graphing - Key: CASSANDRA-9870 URL: https://issues.apache.org/jira/browse/CASSANDRA-9870 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Benedict Assignee: Shawn Kumar Attachments: reads.svg CASSANDRA-7918 introduces graph output from a stress run, but these graphs are a little limited. Attached to the ticket is an example of some improved graphs which can serve as the *basis* for some improvements, which I will briefly describe. They should not be taken as the exact end goal, but we should aim for at least their functionality. Preferably with some Javascript advantages thrown in, such as the hiding of datasets/graphs for clarity. Any ideas for improvements are *definitely* encouraged. Some overarching design principles: * Display _on *one* screen_ all of the information necessary to get a good idea of how two or more branches compare to each other. Ideally we will reintroduce this, painting multiple graphs onto one screen, stretched to fit. * Axes must be truncated to only the interesting dimensions, to ensure there is no wasted space. * Each graph displaying multiple kinds of data should use colour _and shape_ to help easily distinguish the different datasets. * Each graph should be tailored to the data it is representing, and we should have multiple views of each data. The data can roughly be partitioned into three kinds: * throughput * latency * gc These can each be viewed in different ways: * as a continuous plot of: ** raw data ** scaled/compared to a base branch, or other metric ** cumulatively * as box plots ** ideally, these will plot median, outer quartiles, outer deciles and absolute limits of the distribution, so the shape of the data can be best understood Each compresses the information differently, losing different information, so that collectively they help to understand the data. Some basic rules for presentation that work well: * Latency information should be plotted to a logarithmic scale, to avoid high latencies drowning out low ones * GC information should be plotted cumulatively, to avoid differing throughputs giving the impression of worse GC. It should also have a line that is rescaled by the amount of work (number of operations) completed * Throughput should be plotted as the actual numbers To walk the graphs top-left to bottom-right, we have: * Spot throughput comparison of branches to the baseline branch, as an improvement ratio (which can of course be negative, but is not in this example) * Raw throughput of all branches (no baseline) * Raw throughput as a box plot * Latency percentiles, compared to baseline. The percentage improvement at any point in time vs baseline is calculated, and then multiplied by the overall median for the entire run. This simply permits the non-baseline branches to scatter their wins/loss around a relatively clustered line for each percentile. It's probably the most dishonest graph but comparing something like latency where each data point can have very high variance is difficult, and this gives you an idea of clustering of improvements/losses. * Latency percentiles, raw, each with a different shape; lowest percentiles plotted as a solid line as they vary least, with higher percentiles each getting their own subtly different shape to scatter. * Latency box plots * GC time, plotted cumulatively and also scaled by work done * GC Mb, plotted cumulatively and also scaled by work done * GC time, raw * GC time as a box plot These do mostly introduce the concept of a baseline branch. It may be that, ideally, this
[jira] [Created] (CASSANDRA-9868) Archive commitlogs tests failing
Shawn Kumar created CASSANDRA-9868: -- Summary: Archive commitlogs tests failing Key: CASSANDRA-9868 URL: https://issues.apache.org/jira/browse/CASSANDRA-9868 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Attachments: commitlog_archiving.properties A number of archive commitlog dtests (snapshot_tests.py) are failing on trunk at the point in the tests where the node is asked to restore data from archived commitlogs. It appears that the snapshot functionality works, but the [assertion|https://github.com/riptano/cassandra-dtest/blob/master/snapshot_test.py#L312] regarding data that should have been restored from archived commitlogs fails. I also tested this manually on trunk and found no success in restoring either, so it appears to not just be a test issue. Should note that it seems archiving the commitlogs works (in that they are actually copied) and rather restoring them is the issue. Attached is a the commitlog properties file (to show the commands used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9840) global_row_key_cache_test.py fails; loses mutations on cluster restart
[ https://issues.apache.org/jira/browse/CASSANDRA-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9840: --- Description: This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test [neither are used|https://github.com/riptano/cassandra-dtest/blob/master/global_row_key_cache_test.py#L15] and we still hit failure. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand. (was: This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test [neither are used|https://github.com/riptano/cassandra-dtest/blob/master/global_row_key_cache_test.py#L15]. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand.) global_row_key_cache_test.py fails; loses mutations on cluster restart -- Key: CASSANDRA-9840 URL: https://issues.apache.org/jira/browse/CASSANDRA-9840 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Fix For: 3.0.x Attachments: node1.log, node2.log, node3.log, noseout.txt This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test [neither are used|https://github.com/riptano/cassandra-dtest/blob/master/global_row_key_cache_test.py#L15] and we still hit failure. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9840) global_row_key_cache_test.py fails; loses mutations on cluster restart
[ https://issues.apache.org/jira/browse/CASSANDRA-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9840: --- Description: This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test [neither are used|https://github.com/riptano/cassandra-dtest/blob/master/global_row_key_cache_test.py#L15]. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand. (was: This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test neither are used. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand.) global_row_key_cache_test.py fails; loses mutations on cluster restart -- Key: CASSANDRA-9840 URL: https://issues.apache.org/jira/browse/CASSANDRA-9840 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Fix For: 3.0.x Attachments: node1.log, node2.log, node3.log, noseout.txt This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test [neither are used|https://github.com/riptano/cassandra-dtest/blob/master/global_row_key_cache_test.py#L15]. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9840) global_row_key_cache_test.py fails; loses mutations on cluster restart
[ https://issues.apache.org/jira/browse/CASSANDRA-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9840: --- Description: This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test neither are used. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand. (was: This test is currently failing on trunk. I've attached the test output and logs. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand.) global_row_key_cache_test.py fails; loses mutations on cluster restart -- Key: CASSANDRA-9840 URL: https://issues.apache.org/jira/browse/CASSANDRA-9840 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Fix For: 3.0.x Attachments: node1.log, node2.log, node3.log, noseout.txt This test is currently failing on trunk. I've attached the test output and logs. It seems that the failure of the test doesn't necessarily have anything to do with global row/key caches - as on the initial loop of the test neither are used. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9840) global_row_key_cache_test.py fails; loses mutations on cluster restart
Shawn Kumar created CASSANDRA-9840: -- Summary: global_row_key_cache_test.py fails; loses mutations on cluster restart Key: CASSANDRA-9840 URL: https://issues.apache.org/jira/browse/CASSANDRA-9840 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Fix For: 3.0.x Attachments: node1.log, node2.log, node3.log, noseout.txt This test is currently failing on trunk. I've attached the test output and logs. The test itself fails when a second validation of values after a cluster restart fails to capture deletes issued prior to the restart and first successful validation. However, if I add flushes prior to restarting the cluster the test completes successfully, implying an issue with loss of in-memory mutations due to the cluster restart. Initially I had though this might be due to CASSANDRA-9669, but as Benedict pointed out, the fact that this test has been succeeding consistently on both 2.1 and 2.2 branch indicates there may be another issue at hand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9815) Tracing doesn't seem to display tombstones read accurately
[ https://issues.apache.org/jira/browse/CASSANDRA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629954#comment-14629954 ] Shawn Kumar commented on CASSANDRA-9815: Spoke with Aleksey regarding this and it looks like this is actually the correct behaviour and a consequence of CASSANDRA-9299. That being said we agreed the Read 0 live and 0 tombstone cells message should be modified for clarity. Tracing doesn't seem to display tombstones read accurately -- Key: CASSANDRA-9815 URL: https://issues.apache.org/jira/browse/CASSANDRA-9815 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Shawn Kumar Priority: Minor Labels: tracing Fix For: 2.2.x Attachments: tracescreen.png It seems that where tracing [once|http://stackoverflow.com/questions/27063508/how-to-get-tombstone-count-for-a-cql-query] tracked how many tombstones were read in a query, it no longer happens. Can reproduce with the following: 1. Create a simple key, val table. 2. Insert a couple rows 3. Flush 4. Delete a row, add an additional couple rows 5. Flush 6. Try to query deleted row or select *. In the trace it never mentions reading a tombstoned cell, no matter what. Instead you get a line like the following: Read 0 live and 0 tombstone cells [SharedPool-Worker-3]. Attached is a screenshot of the trace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9815) Tracing doesn't seem to display tombstones read accurately
[ https://issues.apache.org/jira/browse/CASSANDRA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-9815. Resolution: Not A Problem Reproduced In: 2.2.0 rc2, 3.x (was: 3.x, 2.2.0 rc2) Tracing doesn't seem to display tombstones read accurately -- Key: CASSANDRA-9815 URL: https://issues.apache.org/jira/browse/CASSANDRA-9815 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Shawn Kumar Priority: Minor Labels: tracing Fix For: 2.2.x Attachments: tracescreen.png It seems that where tracing [once|http://stackoverflow.com/questions/27063508/how-to-get-tombstone-count-for-a-cql-query] tracked how many tombstones were read in a query, it no longer happens. Can reproduce with the following: 1. Create a simple key, val table. 2. Insert a couple rows 3. Flush 4. Delete a row, add an additional couple rows 5. Flush 6. Try to query deleted row or select *. In the trace it never mentions reading a tombstoned cell, no matter what. Instead you get a line like the following: Read 0 live and 0 tombstone cells [SharedPool-Worker-3]. Attached is a screenshot of the trace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9814) test_scrub_collections_table in scrub_test.py fails; removes sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9814: --- Summary: test_scrub_collections_table in scrub_test.py fails; removes sstables (was: test_scrub_collections_table in scrub_test.py fails - possible data loss?) test_scrub_collections_table in scrub_test.py fails; removes sstables - Key: CASSANDRA-9814 URL: https://issues.apache.org/jira/browse/CASSANDRA-9814 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Fix For: 3.x Attachments: node1.log, out.txt The test creates an index on a table with collections and attempts to scrub. The error occurs after the scrub where somehow all relevant sstables are removed, and an assertion in get_sstables fails (due to there not being any sstables). Logs indicate a set of errors under CompactionExecutor related to not being able to read rows. Attached is the test output (out.txt) and the relevant log. Should note that my attempts to replicate this manually weren't successful, so its possible it could be a test issue (though I don't see why). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9814) test_scrub_collections_table in scrub_test.py fails - possible data loss?
[ https://issues.apache.org/jira/browse/CASSANDRA-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9814: --- Summary: test_scrub_collections_table in scrub_test.py fails - possible data loss? (was: test_scrub_collections_table in scrub_test.py fails) test_scrub_collections_table in scrub_test.py fails - possible data loss? - Key: CASSANDRA-9814 URL: https://issues.apache.org/jira/browse/CASSANDRA-9814 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Fix For: 3.x Attachments: node1.log, out.txt The test creates an index on a table with collections and attempts to scrub. The error occurs after the scrub where somehow all relevant sstables are removed, and an assertion in get_sstables fails (due to there not being any sstables). Logs indicate a set of errors under CompactionExecutor related to not being able to read rows. Attached is the test output (out.txt) and the relevant log. Should note that my attempts to replicate this manually weren't successful, so its possible it could be a test issue (though I don't see why). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9814) test_scrub_collections_table in scrub_test.py fails
Shawn Kumar created CASSANDRA-9814: -- Summary: test_scrub_collections_table in scrub_test.py fails Key: CASSANDRA-9814 URL: https://issues.apache.org/jira/browse/CASSANDRA-9814 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Priority: Blocker Fix For: 3.x Attachments: node1.log, out.txt The test creates an index on a table with collections and attempts to scrub. The error occurs after the scrub where somehow all relevant sstables are removed, and an assertion in get_sstables fails (due to there not being any sstables). Logs indicate a set of errors under CompactionExecutor related to not being able to read rows. Attached is the test output (out.txt) and the relevant log. Should note that my attempts to replicate this manually weren't successful, so its possible it could be a test issue (though I don't see why). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9815) Tracing doesn't seem to display tombstones read accurately
Shawn Kumar created CASSANDRA-9815: -- Summary: Tracing doesn't seem to display tombstones read accurately Key: CASSANDRA-9815 URL: https://issues.apache.org/jira/browse/CASSANDRA-9815 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Shawn Kumar Priority: Minor Attachments: tracescreen.png It seems that where tracing [once|http://stackoverflow.com/questions/27063508/how-to-get-tombstone-count-for-a-cql-query] tracked how many tombstones were read in a query, it no longer happens. Can reproduce with the following: 1. Create a simple key, val table. 2. Insert a couple rows 3. Flush 4. Delete a row, add an additional couple rows 5. Flush 6. Try to query deleted row or select *. In the trace it never mentions reading a tombstoned cell, no matter what. Instead you get a line like the following: Read 0 live and 0 tombstone cells [SharedPool-Worker-3]. Attached is a screenshot of the trace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9152) Offline tools tests
[ https://issues.apache.org/jira/browse/CASSANDRA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574885#comment-14574885 ] Shawn Kumar edited comment on CASSANDRA-9152 at 6/5/15 5:30 PM: This is complete, dtest coverage is as follows: sstablescrubber - scrub_test.py sstablelevelreset, sstableofflinerelevel, sstableverify - offline_tools_test.py sstablerepairedset - incremental_repair_test.py sstableloader - sstable_generation_loading_test.py was (Author: shawn.kumar): This is complete, dtest coverage is as follows: sstablescrubber - scrub_test.py sstablelevelreset, sstableofflinerelevel, sstableverify - offline_tools_test.py sstablerepairedset - incremental_repair_test.py Offline tools tests --- Key: CASSANDRA-9152 URL: https://issues.apache.org/jira/browse/CASSANDRA-9152 Project: Cassandra Issue Type: Test Reporter: Marcus Eriksson Assignee: Shawn Kumar Labels: retrospective_generated Fix For: 2.2.x we need more tests of our offline tools, sstablescrubber, sstablelevelreset, sstableofflinerelevel, sstablerepairedset, sstabeloader -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9152) Offline tools tests
[ https://issues.apache.org/jira/browse/CASSANDRA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-9152. Resolution: Implemented This is complete, dtest coverage is as follows: sstablescrubber - scrub_test.py sstablelevelreset, sstableofflinerelevel, sstableverify - offline_tools_test.py sstablerepairedset - incremental_repair_test.py Offline tools tests --- Key: CASSANDRA-9152 URL: https://issues.apache.org/jira/browse/CASSANDRA-9152 Project: Cassandra Issue Type: Test Reporter: Marcus Eriksson Assignee: Shawn Kumar Labels: retrospective_generated Fix For: 2.2.x we need more tests of our offline tools, sstablescrubber, sstablelevelreset, sstableofflinerelevel, sstablerepairedset, sstabeloader -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-5791) A nodetool command to validate all sstables in a node
[ https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-5791: --- Tester: Shawn Kumar A nodetool command to validate all sstables in a node - Key: CASSANDRA-5791 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791 Project: Cassandra Issue Type: New Feature Components: Core Reporter: sankalp kohli Assignee: Jeff Jirsa Priority: Minor Fix For: 2.2.0 beta 1 Attachments: cassandra-5791-20150319.diff, cassandra-5791-patch-3.diff, cassandra-5791.patch-2 CUrrently there is no nodetool command to validate all sstables on disk. The only way to do this is to run a repair and see if it succeeds. But we cannot repair the system keyspace. Also we can run upgrade sstables but that re writes all the sstables. This command should check the hash of all sstables and return whether all data is readable all not. This should NOT care about consistency. The compressed sstables do not have hash so not sure how it will work there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8590) Test repairing large dataset after upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar reassigned CASSANDRA-8590: -- Assignee: Shawn Kumar (was: Ryan McGuire) Test repairing large dataset after upgrade -- Key: CASSANDRA-8590 URL: https://issues.apache.org/jira/browse/CASSANDRA-8590 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Shawn Kumar * Write large dataset in multiple tables * upgrade * replace a few nodes * repair in round-robin fashion * ensure exit codes of cmd line tools are expected * verify data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8252) dtests that involve topology changes should verify system.peers on all nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505497#comment-14505497 ] Shawn Kumar commented on CASSANDRA-8252: Seems to be a number of failures when I tried incorporating into existing code. To try and clarify/separate from other tests, I wrote up a separate [dtest|https://github.com/riptano/cassandra-dtest/blob/peerstest/peers_test.py] to test some basic actions (bootstrapping, removing nodes) and am still running into failures on all of these. dtests that involve topology changes should verify system.peers on all nodes Key: CASSANDRA-8252 URL: https://issues.apache.org/jira/browse/CASSANDRA-8252 Project: Cassandra Issue Type: Test Components: Tests Reporter: Brandon Williams Assignee: Shawn Kumar Fix For: 2.0.15, 2.1.5 This is especially true for replace where I've discovered it's wrong in 1.2.19, which is sad because now it's too late to fix. We've had a lot of problems with incorrect/null system.peers, so after any topology change we should verify it on every live node when everything is finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9178) Test exposed JMX methods
[ https://issues.apache.org/jira/browse/CASSANDRA-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar reassigned CASSANDRA-9178: -- Assignee: Shawn Kumar Test exposed JMX methods Key: CASSANDRA-9178 URL: https://issues.apache.org/jira/browse/CASSANDRA-9178 Project: Cassandra Issue Type: Test Reporter: Carl Yeksigian Assignee: Shawn Kumar [~thobbs] added support for JMX testing in dtests, and we have seen issues related to nodetool testing in various different stages of execution. Tests which exercise the different methods which nodetool calls should be added to catch those issues early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-6335) Hints broken for nodes that change broadcast address
[ https://issues.apache.org/jira/browse/CASSANDRA-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-6335. Resolution: Cannot Reproduce Hints broken for nodes that change broadcast address Key: CASSANDRA-6335 URL: https://issues.apache.org/jira/browse/CASSANDRA-6335 Project: Cassandra Issue Type: Bug Components: Core Reporter: Rick Branson Assignee: Shawn Kumar When a node changes it's broadcast address, the transition process works properly, but hints that are destined for it can't be delivered because of the address change. It produces an exception: java.lang.AssertionError: Missing host ID for 10.1.60.22 at org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:598) at org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:567) at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1679) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-6335) Hints broken for nodes that change broadcast address
[ https://issues.apache.org/jira/browse/CASSANDRA-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-6335: --- Tester: Shawn Kumar Hints broken for nodes that change broadcast address Key: CASSANDRA-6335 URL: https://issues.apache.org/jira/browse/CASSANDRA-6335 Project: Cassandra Issue Type: Bug Components: Core Reporter: Rick Branson Assignee: Ryan McGuire When a node changes it's broadcast address, the transition process works properly, but hints that are destined for it can't be delivered because of the address change. It produces an exception: java.lang.AssertionError: Missing host ID for 10.1.60.22 at org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:598) at org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:567) at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1679) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9056) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-9056. Resolution: Fixed Fix Version/s: 2.1.4 2.0.14 3.0 Reproduced In: 2.0.13, 2.1.3, 3.0 (was: 3.0, 2.1.3, 2.0.13) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS - Key: CASSANDRA-9056 URL: https://issues.apache.org/jira/browse/CASSANDRA-9056 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Assignee: Marcus Eriksson Labels: compaction, dtcs Fix For: 3.0, 2.0.14, 2.1.4 When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: [dtcs_deletion_test|https://github.com/riptano/cassandra-dtest/blob/master/compaction_test.py#L115] in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9056) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391165#comment-14391165 ] Shawn Kumar edited comment on CASSANDRA-9056 at 4/1/15 6:36 PM: It looks like this has been fixed by CASSANDRA-8359, and dtcs_deletion_test is now passing on relevant branches with Marcus' changes. I've committed Marcus' changes to dtests as well. [~jimplush], I suggest testing with the fix provided in CASSANDRA-8359. I will resolve this ticket, but please feel free to post a follow-up comment if you continue to have this problem. was (Author: shawn.kumar): It looks like this has been fixed by CASSANDRA-8359, and dtcs_deletion_test is now passing on relevant branches with Marcus' changes. I've committed Marcus' changes to dtests as well. Jim, I suggest testing with the fix provided in CASSANDRA-8359. I will resolve this ticket, but please feel free to post a follow-up comment if you continue to have this problem. Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS - Key: CASSANDRA-9056 URL: https://issues.apache.org/jira/browse/CASSANDRA-9056 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Assignee: Marcus Eriksson Labels: compaction, dtcs Fix For: 3.0, 2.1.4, 2.0.14 When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: [dtcs_deletion_test|https://github.com/riptano/cassandra-dtest/blob/master/compaction_test.py#L115] in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9056) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391165#comment-14391165 ] Shawn Kumar commented on CASSANDRA-9056: It looks like this has been fixed by CASSANDRA-8359, and dtcs_deletion_test is now passing on relevant branches with Marcus' changes. I've committed Marcus' changes to dtests as well. Jim, I suggest testing with the fix provided in CASSANDRA-8359. I will resolve this ticket, but please feel free to post a follow-up comment if you continue to have this problem. Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS - Key: CASSANDRA-9056 URL: https://issues.apache.org/jira/browse/CASSANDRA-9056 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Assignee: Marcus Eriksson Labels: compaction, dtcs When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: [dtcs_deletion_test|https://github.com/riptano/cassandra-dtest/blob/master/compaction_test.py#L115] in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9056) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS
Shawn Kumar created CASSANDRA-9056: -- Summary: Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS Key: CASSANDRA-9056 URL: https://issues.apache.org/jira/browse/CASSANDRA-9056 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: dtcs_deletion_test in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9056) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9056: --- Description: When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: dtcs_deletion_test in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. (was: When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: dtcs_deletion_test in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. ) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS - Key: CASSANDRA-9056 URL: https://issues.apache.org/jira/browse/CASSANDRA-9056 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Labels: compaction, dtcs When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: dtcs_deletion_test in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9056) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9056: --- Description: When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: [dtcs_deletion_test|https://github.com/riptano/cassandra-dtest/blob/master/compaction_test.py#L115] in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. (was: When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: dtcs_deletion_test in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue.) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS - Key: CASSANDRA-9056 URL: https://issues.apache.org/jira/browse/CASSANDRA-9056 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Assignee: Marcus Eriksson Labels: compaction, dtcs When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: [dtcs_deletion_test|https://github.com/riptano/cassandra-dtest/blob/master/compaction_test.py#L115] in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9056) Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-9056: --- Assignee: Marcus Eriksson Tombstoned SSTables are not removed past max_sstable_age_days when using DTCS - Key: CASSANDRA-9056 URL: https://issues.apache.org/jira/browse/CASSANDRA-9056 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Assignee: Marcus Eriksson Labels: compaction, dtcs When using DTCS, tombstoned sstables past max_sstable_age_days are not removed by minor compactions. I was able to reproduce this manually and also wrote a dtest (currently failing) which reproduces this issue: dtcs_deletion_test in compaction_test.py. I tried applying the patch in CASSANDRA-8359 but found that the test still fails with the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-8870) Tombstone overwhelming issue aborts client queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-8870. Resolution: Cannot Reproduce Tombstone overwhelming issue aborts client queries -- Key: CASSANDRA-8870 URL: https://issues.apache.org/jira/browse/CASSANDRA-8870 Project: Cassandra Issue Type: Bug Environment: cassandra 2.1.2 ubunbtu 12.04 Reporter: Jeff Liu We are getting client queries timeout issues on the clients who are trying to query data from cassandra cluster. Nodetool status shows that all nodes are still up regardless. Logs from client side: {noformat} com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cass-chisel01.abc01.abc02.abc.abc.com/10.66.182.113:9042 (com.datastax.driver.core.TransportException: [cass-chisel01.tgr01.iad02.testd.nestlabs.com/10.66.182.113:9042] Connection has been closed)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na] at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_55] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55] {noformat} Logs from cassandra/system.log {noformat} ERROR [HintedHandoff:2] 2015-02-23 23:46:28,410 SliceQueryFilter.java:212 - Scanned over 10 tombstones in system.hints; query aborted (see tombstone_failure_threshold) ERROR [HintedHandoff:2] 2015-02-23 23:46:28,417 CassandraDaemon.java:153 - Exception in thread Thread[HintedHandoff:2,1,main] org.apache.cassandra.db.filter.TombstoneOverwhelmingException: null at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:214) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:107) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:81) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:310) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:60) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1858) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1666) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:385) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.access$400(HintedHandOffManager.java:94) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:555) ~[apache-cassandra-2.1.2.jar:2.1.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_55] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8870) Tombstone overwhelming issue aborts client queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353202#comment-14353202 ] Shawn Kumar commented on CASSANDRA-8870: I created a few tests to try and reproduce this problem, and check more specifically whether 1) HintedHandoff exhibited any abnormal tombstone behaviour, 2) A TombstoneOverwhelmingException in system.hints would cause any other issues (NoHostAvailableException). I was not able to reproduce problems in either aspect. For aspect 2, I was able to artificially cause the TombstoneOverwhelmingException by having more hints than the tombstone_failure_threshold (and flushing) - but this would seem to be expected behaviour and I was still able to connect to the cluster. Jeff if you have any other information about the context of the error that would be useful ie - queries, schema's, usage, node status; please feel free to share them and I can give it another shot. Tombstone overwhelming issue aborts client queries -- Key: CASSANDRA-8870 URL: https://issues.apache.org/jira/browse/CASSANDRA-8870 Project: Cassandra Issue Type: Bug Environment: cassandra 2.1.2 ubunbtu 12.04 Reporter: Jeff Liu We are getting client queries timeout issues on the clients who are trying to query data from cassandra cluster. Nodetool status shows that all nodes are still up regardless. Logs from client side: {noformat} com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cass-chisel01.abc01.abc02.abc.abc.com/10.66.182.113:9042 (com.datastax.driver.core.TransportException: [cass-chisel01.tgr01.iad02.testd.nestlabs.com/10.66.182.113:9042] Connection has been closed)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na] at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_55] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55] {noformat} Logs from cassandra/system.log {noformat} ERROR [HintedHandoff:2] 2015-02-23 23:46:28,410 SliceQueryFilter.java:212 - Scanned over 10 tombstones in system.hints; query aborted (see tombstone_failure_threshold) ERROR [HintedHandoff:2] 2015-02-23 23:46:28,417 CassandraDaemon.java:153 - Exception in thread Thread[HintedHandoff:2,1,main] org.apache.cassandra.db.filter.TombstoneOverwhelmingException: null at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:214) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:107) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:81) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:310) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:60) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1858) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1666) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:385) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.access$400(HintedHandOffManager.java:94) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:555) ~[apache-cassandra-2.1.2.jar:2.1.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_55] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8870) Tombstone overwhelming issue aborts client queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353236#comment-14353236 ] Shawn Kumar commented on CASSANDRA-8870: I believe the tombstone errors occur due to hints being deleted from system.hints table after a successful handoff. See http://www.datastax.com/dev/blog/modern-hinted-handoff for more info on HintedHandoff. Tombstone overwhelming issue aborts client queries -- Key: CASSANDRA-8870 URL: https://issues.apache.org/jira/browse/CASSANDRA-8870 Project: Cassandra Issue Type: Bug Environment: cassandra 2.1.2 ubunbtu 12.04 Reporter: Jeff Liu We are getting client queries timeout issues on the clients who are trying to query data from cassandra cluster. Nodetool status shows that all nodes are still up regardless. Logs from client side: {noformat} com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cass-chisel01.abc01.abc02.abc.abc.com/10.66.182.113:9042 (com.datastax.driver.core.TransportException: [cass-chisel01.tgr01.iad02.testd.nestlabs.com/10.66.182.113:9042] Connection has been closed)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na] at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_55] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55] {noformat} Logs from cassandra/system.log {noformat} ERROR [HintedHandoff:2] 2015-02-23 23:46:28,410 SliceQueryFilter.java:212 - Scanned over 10 tombstones in system.hints; query aborted (see tombstone_failure_threshold) ERROR [HintedHandoff:2] 2015-02-23 23:46:28,417 CassandraDaemon.java:153 - Exception in thread Thread[HintedHandoff:2,1,main] org.apache.cassandra.db.filter.TombstoneOverwhelmingException: null at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:214) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:107) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:81) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:310) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:60) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1858) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1666) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:385) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager.access$400(HintedHandOffManager.java:94) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:555) ~[apache-cassandra-2.1.2.jar:2.1.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_55] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8252) dtests that involve topology changes should verify system.peers on all nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328118#comment-14328118 ] Shawn Kumar commented on CASSANDRA-8252: Wasn't able to reproduce this manually - will figure out and finish up dtest changes. dtests that involve topology changes should verify system.peers on all nodes Key: CASSANDRA-8252 URL: https://issues.apache.org/jira/browse/CASSANDRA-8252 Project: Cassandra Issue Type: Test Components: Tests Reporter: Brandon Williams Assignee: Shawn Kumar Fix For: 2.1.4 This is especially true for replace where I've discovered it's wrong in 1.2.19, which is sad because now it's too late to fix. We've had a lot of problems with incorrect/null system.peers, so after any topology change we should verify it on every live node when everything is finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8252) dtests that involve topology changes should verify system.peers on all nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314492#comment-14314492 ] Shawn Kumar commented on CASSANDRA-8252: While trying to modify replace_address_test.py to check system.peers, I had some difficulty due to unexpected results. I committed a modified version of replace_address_test with some debug statements [here|https://github.com/riptano/cassandra-dtest/blob/topology/replace_address_test.py] to illustrate the changes in the table through the process. Issues I am having are: variable system.peers table after replacing across runs (although the replaced address does not appear, it seems rare that the replacing node is noticed by 1 and 2), upon trying to truncate the table and restart nodes I can occasionally see the replaced address (in this case 127.0.0.3). dtests that involve topology changes should verify system.peers on all nodes Key: CASSANDRA-8252 URL: https://issues.apache.org/jira/browse/CASSANDRA-8252 Project: Cassandra Issue Type: Test Components: Tests Reporter: Brandon Williams Assignee: Shawn Kumar Fix For: 2.1.3, 2.0.13 This is especially true for replace where I've discovered it's wrong in 1.2.19, which is sad because now it's too late to fix. We've had a lot of problems with incorrect/null system.peers, so after any topology change we should verify it on every live node when everything is finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8613) Regression in mixed single and multi-column relation support
[ https://issues.apache.org/jira/browse/CASSANDRA-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287983#comment-14287983 ] Shawn Kumar commented on CASSANDRA-8613: Managed to reproduce pretty easily on both branches. Seems that this is particular scenario isn't covered in dtests, will add to cql_tests to address this. Regression in mixed single and multi-column relation support Key: CASSANDRA-8613 URL: https://issues.apache.org/jira/browse/CASSANDRA-8613 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Benjamin Lerer Fix For: 2.1.3, 2.0.13 In 2.0.6 through 2.0.8, a query like the following was supported: {noformat} SELECT * FROM mytable WHERE clustering_0 = ? AND (clustering_1, clustering_2) (?, ?) {noformat} However, after CASSANDRA-6875, you'll get the following error: {noformat} Clustering columns may not be skipped in multi-column relations. They should appear in the PRIMARY KEY order. Got (c, d) (0, 0) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8248) Possible memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214961#comment-14214961 ] Shawn Kumar commented on CASSANDRA-8248: Alex, have you determined the issue with resident being larger, and are you still seeing this problem? If there are any further details you can provide (are you carrying out incremental repairs? do compactions have any affect?), they would be much appreciated. Possible memory leak - Key: CASSANDRA-8248 URL: https://issues.apache.org/jira/browse/CASSANDRA-8248 Project: Cassandra Issue Type: Bug Reporter: Alexander Sterligov Assignee: Shawn Kumar Attachments: thread_dump Sometimes during repair cassandra starts to consume more memory than expected. Total amount of data on node is about 20GB. Size of the data directory is 66GC because of snapshots. Top reports: {noformat} PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 15724 loadbase 20 0 493g 55g 44g S 28 44.2 4043:24 java {noformat} At the /proc/15724/maps there are a lot of deleted file maps {quote} 7f63a6102000-7f63a6332000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6332000-7f63a6562000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6562000-7f63a6792000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6792000-7f63a69c2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a69c2000-7f63a6bf2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6bf2000-7f63a6e22000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6e22000-7f63a7052000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7052000-7f63a7282000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7282000-7f63a74b2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a74b2000-7f63a76e2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a76e2000-7f63a7912000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7912000-7f63a7b42000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7b42000-7f63a7d72000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7d72000-7f63a7fa2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7fa2000-7f63a81d2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a81d2000-7f63a8402000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a8402000-7f63a8622000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a8622000-7f63a8842000 r--s 08:21 9442763
[jira] [Assigned] (CASSANDRA-8248) Possible memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar reassigned CASSANDRA-8248: -- Assignee: Shawn Kumar Possible memory leak - Key: CASSANDRA-8248 URL: https://issues.apache.org/jira/browse/CASSANDRA-8248 Project: Cassandra Issue Type: Bug Reporter: Alexander Sterligov Assignee: Shawn Kumar Attachments: thread_dump Sometimes during repair cassandra starts to consume more memory than expected. Total amount of data on node is about 20GB. Size of the data directory is 66GC because of snapshots. Top reports: {quote} PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 15724 loadbase 20 0 493g 55g 44g S 28 44.2 4043:24 java {quote} At the /proc/15724/maps there are a lot of deleted file maps {quote} 7f63a6102000-7f63a6332000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6332000-7f63a6562000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6562000-7f63a6792000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6792000-7f63a69c2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a69c2000-7f63a6bf2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6bf2000-7f63a6e22000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a6e22000-7f63a7052000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7052000-7f63a7282000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7282000-7f63a74b2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a74b2000-7f63a76e2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a76e2000-7f63a7912000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7912000-7f63a7b42000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7b42000-7f63a7d72000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7d72000-7f63a7fa2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a7fa2000-7f63a81d2000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a81d2000-7f63a8402000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a8402000-7f63a8622000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a8622000-7f63a8842000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted) 7f63a8842000-7f63a8a62000 r--s 08:21 9442763 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db (deleted)
[jira] [Commented] (CASSANDRA-7217) Native transport performance (with cassandra-stress) drops precipitously past around 1000 threads
[ https://issues.apache.org/jira/browse/CASSANDRA-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200604#comment-14200604 ] Shawn Kumar commented on CASSANDRA-7217: I'll be continuing testing on a more cpu-perfomant instance but thought I would briefly try the cstar_perf on bdplab. [Here|http://cstar.datastax.com/graph?stats=dd73c4a6-65d9-11e4-9413-bc764e04482cmetric=op_rateoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=279.07ymin=0ymax=120665.6] are the results - I increase the threads from 500 - 1500 in 250 thread increments from the first operation to the last and it seems like there is a noticeable drop. Native transport performance (with cassandra-stress) drops precipitously past around 1000 threads - Key: CASSANDRA-7217 URL: https://issues.apache.org/jira/browse/CASSANDRA-7217 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Shawn Kumar Labels: performance, triaged Fix For: 2.1.2 This is obviously bad. Let's figure out why it's happening and put a stop to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7217) Native transport performance (with cassandra-stress) drops precipitously past around 1000 threads
[ https://issues.apache.org/jira/browse/CASSANDRA-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200604#comment-14200604 ] Shawn Kumar edited comment on CASSANDRA-7217 at 11/6/14 6:32 PM: - I'll be continuing testing on a more cpu-perfomant instance but thought I would briefly try the cstar_perf on bdplab. [Here|http://cstar.datastax.com/graph?stats=dd73c4a6-65d9-11e4-9413-bc764e04482cmetric=op_rateoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=279.07ymin=0ymax=120665.6] are the results - I increase the threads from 500 - 1500 in 250 thread increments from the first operation to the last (ie. 1_write to 5_write) and it seems like there is a noticeable drop in performance especially around 1000 threads. was (Author: shawn.kumar): I'll be continuing testing on a more cpu-perfomant instance but thought I would briefly try the cstar_perf on bdplab. [Here|http://cstar.datastax.com/graph?stats=dd73c4a6-65d9-11e4-9413-bc764e04482cmetric=op_rateoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=279.07ymin=0ymax=120665.6] are the results - I increase the threads from 500 - 1500 in 250 thread increments from the first operation to the last and it seems like there is a noticeable drop. Native transport performance (with cassandra-stress) drops precipitously past around 1000 threads - Key: CASSANDRA-7217 URL: https://issues.apache.org/jira/browse/CASSANDRA-7217 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Shawn Kumar Labels: performance, triaged Fix For: 2.1.2 This is obviously bad. Let's figure out why it's happening and put a stop to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7217) Native transport performance (with cassandra-stress) drops precipitously past around 1000 threads
[ https://issues.apache.org/jira/browse/CASSANDRA-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200604#comment-14200604 ] Shawn Kumar edited comment on CASSANDRA-7217 at 11/6/14 6:46 PM: - I'll be continuing testing on a more cpu-perfomant instance but thought I would briefly try the cstar_perf on bdplab. [Here|http://cstar.datastax.com/graph?stats=dd73c4a6-65d9-11e4-9413-bc764e04482cmetric=op_rateoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=279.07ymin=0ymax=120665.6] are the results - I increase the threads from 500 - 1500 in 250 thread increments from the first operation to the last (ie. 1_write to 5_write) and it seems like there is a noticeable drop in performance especially around 1250 threads. was (Author: shawn.kumar): I'll be continuing testing on a more cpu-perfomant instance but thought I would briefly try the cstar_perf on bdplab. [Here|http://cstar.datastax.com/graph?stats=dd73c4a6-65d9-11e4-9413-bc764e04482cmetric=op_rateoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=279.07ymin=0ymax=120665.6] are the results - I increase the threads from 500 - 1500 in 250 thread increments from the first operation to the last (ie. 1_write to 5_write) and it seems like there is a noticeable drop in performance especially around 1000 threads. Native transport performance (with cassandra-stress) drops precipitously past around 1000 threads - Key: CASSANDRA-7217 URL: https://issues.apache.org/jira/browse/CASSANDRA-7217 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Shawn Kumar Labels: performance, triaged Fix For: 2.1.2 This is obviously bad. Let's figure out why it's happening and put a stop to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8061) tmplink files are not removed
[ https://issues.apache.org/jira/browse/CASSANDRA-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196340#comment-14196340 ] Shawn Kumar commented on CASSANDRA-8061: Hi Catalin, thanks for commenting. I've been trying to reproduce this issue to no avail and would greatly appreciate if you could share any details of your setup that could be helpful - especially the column family details (as above). Thanks! tmplink files are not removed - Key: CASSANDRA-8061 URL: https://issues.apache.org/jira/browse/CASSANDRA-8061 Project: Cassandra Issue Type: Bug Components: Core Environment: Linux Reporter: Gianluca Borello Assignee: Shawn Kumar After installing 2.1.0, I'm experiencing a bunch of tmplink files that are filling my disk. I found https://issues.apache.org/jira/browse/CASSANDRA-7803 and that is very similar, and I confirm it happens both on 2.1.0 as well as from the latest commit on the cassandra-2.1 branch (https://github.com/apache/cassandra/commit/aca80da38c3d86a40cc63d9a122f7d45258e4685 from the cassandra-2.1) Even starting with a clean keyspace, after a few hours I get: $ sudo find /raid0 | grep tmplink | xargs du -hs 2.7G /raid0/cassandra/data/draios/protobuf1-ccc6dce04beb11e4abf997b38fbf920b/draios-protobuf1-tmplink-ka-4515-Data.db 13M /raid0/cassandra/data/draios/protobuf1-ccc6dce04beb11e4abf997b38fbf920b/draios-protobuf1-tmplink-ka-4515-Index.db 1.8G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-1788-Data.db 12M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-1788-Index.db 5.2M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-2678-Index.db 822M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-2678-Data.db 7.3M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3283-Index.db 1.2G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3283-Data.db 6.7M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3951-Index.db 1.1G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3951-Data.db 11M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-4799-Index.db 1.7G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-4799-Data.db 812K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-234-Index.db 122M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-208-Data.db 744K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-739-Index.db 660K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-193-Index.db 796K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-230-Index.db 137M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-230-Data.db 161M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-269-Data.db 139M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-234-Data.db 940K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-786-Index.db 936K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-269-Index.db 161M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-786-Data.db 672K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-197-Index.db 113M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-193-Data.db 116M
[jira] [Resolved] (CASSANDRA-8129) Increase max heap for sstablesplit
[ https://issues.apache.org/jira/browse/CASSANDRA-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-8129. Resolution: Cannot Reproduce Increase max heap for sstablesplit -- Key: CASSANDRA-8129 URL: https://issues.apache.org/jira/browse/CASSANDRA-8129 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Stump Assignee: Shawn Kumar Priority: Minor The max heap for sstablesplit is 256m. For large files that's too small and it will OOM. We should increase the max heap to something like 2-4G with the understanding that sstablesplit will only most likely be invoked to split large files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192027#comment-14192027 ] Shawn Kumar commented on CASSANDRA-8008: As you recommended, increasing Xmx fixed the issue for me. I think this ticket is resolved, though I'm not sure if you want to increase the default. Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core, Tools Reporter: Shawn Kumar Assignee: T Jake Luciani Priority: Minor Attachments: file.log, node1-2.log, node1.log, node2-2.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Attachment: (was: perftest.log) Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1-2.log, node1.log, node2-2.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Attachment: perftest.log Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1-2.log, node1.log, node2-2.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7766) Secondary index not working after a while
[ https://issues.apache.org/jira/browse/CASSANDRA-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7766: --- Description: Since 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). was: cSince 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). Secondary index not working after a while - Key: CASSANDRA-7766 URL: https://issues.apache.org/jira/browse/CASSANDRA-7766 Project: Cassandra Issue Type: Bug Environment: C* 2.1.0-rc5 with small clusters (one or two nodes) Reporter: Fabrice Larcher Attachments: result-failure.txt, result-success.txt Since 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7766) Secondary index not working after a while
[ https://issues.apache.org/jira/browse/CASSANDRA-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151761#comment-14151761 ] Shawn Kumar commented on CASSANDRA-7766: Was unable to reproduce this over a period of a few days. Please feel free to reopen the ticket if you come across any further information that could help us reproduce this. Secondary index not working after a while - Key: CASSANDRA-7766 URL: https://issues.apache.org/jira/browse/CASSANDRA-7766 Project: Cassandra Issue Type: Bug Environment: C* 2.1.0-rc5 with small clusters (one or two nodes) Reporter: Fabrice Larcher Attachments: result-failure.txt, result-success.txt Since 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-7766) Secondary index not working after a while
[ https://issues.apache.org/jira/browse/CASSANDRA-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7766. Resolution: Cannot Reproduce Secondary index not working after a while - Key: CASSANDRA-7766 URL: https://issues.apache.org/jira/browse/CASSANDRA-7766 Project: Cassandra Issue Type: Bug Environment: C* 2.1.0-rc5 with small clusters (one or two nodes) Reporter: Fabrice Larcher Attachments: result-failure.txt, result-success.txt Since 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-6898) describing a table with compression should expand the compression options
[ https://issues.apache.org/jira/browse/CASSANDRA-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-6898. Resolution: Fixed describing a table with compression should expand the compression options - Key: CASSANDRA-6898 URL: https://issues.apache.org/jira/browse/CASSANDRA-6898 Project: Cassandra Issue Type: Bug Components: API Reporter: Brandon Williams Priority: Minor Fix For: 2.0.11 {noformat} cqlsh:foo CREATE TABLE baz ( foo text, bar text, primary KEY (foo)) WITH compression = {}; cqlsh:foo DESCRIBE TABLE baz; CREATE TABLE baz ( foo text, bar text, PRIMARY KEY (foo) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={}; cqlsh:foo {noformat} From this, you can't tell that LZ4 compression is enabled, even though it is. It would be more friendly to expand the option to show the defaults. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151959#comment-14151959 ] Shawn Kumar commented on CASSANDRA-8008: Was able to reproduce locally on a single node running stock 2.1.0 as well, attached an additional thread dump. Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1-2.log, node1.log, node2-2.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Attachment: file.log Locally on single node of 2.1.0 Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: file.log, node1-2.log, node1.log, node2-2.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Comment: was deleted (was: Locally on single node of 2.1.0) Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: file.log, node1-2.log, node1.log, node2-2.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
Shawn Kumar created CASSANDRA-8008: -- Summary: Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1.log, node2.log I've been using cstar_perf to test cassandra with different gc's and came across this error on one run which effectively stopped the test: java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) It looks similar to CASSANDRA-6943, but that should have fixed it, and I haven't been able to consistently replicate this with other runs. This particular run was stress writing/reading about 300M keys, and is an early attempt at carrying out a test of this size so perhaps it only manifests with larger tests. The modifications from stock 2.1.0 were changes to heap size and usage of g1gc, as well as using offheap_objects. I have attached thread dumps from the nodes in question, hopefully they capture the broken state. I am continuing to test this, and will see if I can reproduce this again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Description: I've been using cstar_perf to test a performance scenario and was able to reproduce this error on stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks similar to CASSANDRA-6943, but that should have fixed it, and I haven't been able to consistently replicate this with other runs. This particular run was stress writing/reading about 300M keys, and is an early attempt at carrying out a test of this size so perhaps it only manifests with larger tests. was: I've been using cstar_perf to test cassandra with different gc's and came across this error on one run which effectively stopped the test: java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) It looks similar to CASSANDRA-6943, but that should have fixed it, and I haven't been able to consistently replicate this with other runs. This particular run was stress writing/reading about 300M keys, and is an early attempt at carrying out a test of this size so perhaps it only manifests with larger tests. The modifications from stock 2.1.0 were changes to heap size and usage of g1gc, as well as using offheap_objects. I have attached thread dumps from the nodes in question, hopefully they capture the broken state. I am continuing to test this, and will see if I can reproduce this again. Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1.log, node2.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks similar to CASSANDRA-6943, but that should have fixed it, and I haven't been able to consistently replicate this with other runs. This particular run was stress writing/reading about 300M keys, and is an early attempt at carrying out a test of this size so perhaps it only manifests with larger tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Description: I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. was: I've been using cstar_perf to test a performance scenario and was able to reproduce this error on stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks similar to CASSANDRA-6943, but that should have fixed it, and I haven't been able to consistently replicate this with other runs. This particular run was stress writing/reading about 300M keys, and is an early attempt at carrying out a test of this size so perhaps it only manifests with larger tests. Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1.log, node2.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Attachment: perftest.log Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8008) Timed out waiting for timer thread on large stress command
[ https://issues.apache.org/jira/browse/CASSANDRA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-8008: --- Attachment: node2-2.log node1-2.log Timed out waiting for timer thread on large stress command Key: CASSANDRA-8008 URL: https://issues.apache.org/jira/browse/CASSANDRA-8008 Project: Cassandra Issue Type: Bug Components: Core Reporter: Shawn Kumar Attachments: node1-2.log, node1.log, node2-2.log, node2.log, perftest.log I've been using cstar_perf to test a performance scenario and was able to reproduce this error on a two node cluster with stock 2.1.0 while carrying out large stress writes (50M keys): {noformat} java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck at org.apache.cassandra.stress.util.Timing.snap(Timing.java:83) at org.apache.cassandra.stress.util.Timing.snap(Timing.java:118) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:42) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like a similar error to that found in CASSANDRA-6943. I've also attached the test log and thread dumps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7406) Reset version when closing incoming socket in IncomingTcpConnection should be done atomically
[ https://issues.apache.org/jira/browse/CASSANDRA-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7406: --- Reproduced In: 2.0.6 Reset version when closing incoming socket in IncomingTcpConnection should be done atomically - Key: CASSANDRA-7406 URL: https://issues.apache.org/jira/browse/CASSANDRA-7406 Project: Cassandra Issue Type: Bug Components: Core Environment: CentOS release 5.5 (Tikanga) Reporter: Ray Chen When closing incoming socket, the close() method will call MessagingService.resetVersion(), this behavior may clear version which is set by another thread. This could cause MessagingService.knowsVersion(endpoint) test results as false (expect true here). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-7303) OutOfMemoryError during prolonged batch processing
[ https://issues.apache.org/jira/browse/CASSANDRA-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7303. Resolution: Won't Fix OutOfMemoryError during prolonged batch processing -- Key: CASSANDRA-7303 URL: https://issues.apache.org/jira/browse/CASSANDRA-7303 Project: Cassandra Issue Type: Bug Components: Core Environment: Server: RedHat 6, 64-bit, Oracle JDK 7, Cassandra 2.0.6 Client: Java 7, Astyanax Reporter: Jacek Furmankiewicz Labels: crash, outofmemory, qa-resolved We have a prolonged batch processing job. It writes a lot of records, every batch mutation creates probably on average 300-500 columns per row key (with many disparate row keys). It works fine but within a few hours we get error like this: ERROR [Thrift:15] 2014-05-24 14:16:20,192 CassandraDaemon.java (line | |196) Except | |ion in thread Thread[Thrift:15,5,main] | |java.lang.OutOfMemoryError: Requested array size exceeds VM limit| |at java.util.Arrays.copyOf(Arrays.java:2271) | |at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)| |at java.io.ByteArrayOutputStream.ensureCapacity | |(ByteArrayOutputStream.ja| |va:93) | |at java.io.ByteArrayOutputStream.write | |(ByteArrayOutputStream.java:140) | |at org.apache.thrift.transport.TFramedTransport.write| |(TFramedTransport.j | |ava:146) | |at org.apache.thrift.protocol.TBinaryProtocol.writeBinary| |(TBinaryProtoco | |l.java:183) | |at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write | |(Column. | |java:678)| |at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write | |(Column. | |java:611)| |at org.apache.cassandra.thrift.Column.write(Column.java:538) | |at org.apache.cassandra.thrift.ColumnOrSuperColumn | |$ColumnOrSuperColumnSt | |andardScheme.write(ColumnOrSuperColumn.java:673) | |at org.apache.cassandra.thrift.ColumnOrSuperColumn | |$ColumnOrSuperColumnSt | |andardScheme.write(ColumnOrSuperColumn.java:607) | |at org.apache.cassandra.thrift.ColumnOrSuperColumn.write | |(ColumnOrSuperCo | |lumn.java:517) | |at org.apache.cassandra.thrift.Cassandra$get_slice_result| |$get_slice_resu | |ltStandardScheme.write(Cassandra.java:11682) | |at org.apache.cassandra.thrift.Cassandra$get_slice_result| |$get_slice_resu | |ltStandardScheme.write(Cassandra.java:11603) | |at org.apache.cassandra.thrift.Cassandra The server already has 16 GB heap, which we hear is the max Cassandra can run with. The writes are heavily multi-threaded from a single server. The jist of the issue is that Cassandra should not crash with OOM when under heavy load. It is OK to slow down, even maybe start throwing operation timeout exceptions, etc. But to just crash in the middle of the processing should not be allowed. is there any internal monitoring of heap usage in Cassandra where it could detect that it is getting close to the heap limit and start throttling the incoming requests to avoid this type of error? Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7303) OutOfMemoryError during prolonged batch processing
[ https://issues.apache.org/jira/browse/CASSANDRA-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7303: --- Labels: crash outofmemory qa-resolved (was: crash outofmemory) OutOfMemoryError during prolonged batch processing -- Key: CASSANDRA-7303 URL: https://issues.apache.org/jira/browse/CASSANDRA-7303 Project: Cassandra Issue Type: Bug Components: Core Environment: Server: RedHat 6, 64-bit, Oracle JDK 7, Cassandra 2.0.6 Client: Java 7, Astyanax Reporter: Jacek Furmankiewicz Labels: crash, outofmemory, qa-resolved We have a prolonged batch processing job. It writes a lot of records, every batch mutation creates probably on average 300-500 columns per row key (with many disparate row keys). It works fine but within a few hours we get error like this: ERROR [Thrift:15] 2014-05-24 14:16:20,192 CassandraDaemon.java (line | |196) Except | |ion in thread Thread[Thrift:15,5,main] | |java.lang.OutOfMemoryError: Requested array size exceeds VM limit| |at java.util.Arrays.copyOf(Arrays.java:2271) | |at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)| |at java.io.ByteArrayOutputStream.ensureCapacity | |(ByteArrayOutputStream.ja| |va:93) | |at java.io.ByteArrayOutputStream.write | |(ByteArrayOutputStream.java:140) | |at org.apache.thrift.transport.TFramedTransport.write| |(TFramedTransport.j | |ava:146) | |at org.apache.thrift.protocol.TBinaryProtocol.writeBinary| |(TBinaryProtoco | |l.java:183) | |at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write | |(Column. | |java:678)| |at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write | |(Column. | |java:611)| |at org.apache.cassandra.thrift.Column.write(Column.java:538) | |at org.apache.cassandra.thrift.ColumnOrSuperColumn | |$ColumnOrSuperColumnSt | |andardScheme.write(ColumnOrSuperColumn.java:673) | |at org.apache.cassandra.thrift.ColumnOrSuperColumn | |$ColumnOrSuperColumnSt | |andardScheme.write(ColumnOrSuperColumn.java:607) | |at org.apache.cassandra.thrift.ColumnOrSuperColumn.write | |(ColumnOrSuperCo | |lumn.java:517) | |at org.apache.cassandra.thrift.Cassandra$get_slice_result| |$get_slice_resu | |ltStandardScheme.write(Cassandra.java:11682) | |at org.apache.cassandra.thrift.Cassandra$get_slice_result| |$get_slice_resu | |ltStandardScheme.write(Cassandra.java:11603) | |at org.apache.cassandra.thrift.Cassandra The server already has 16 GB heap, which we hear is the max Cassandra can run with. The writes are heavily multi-threaded from a single server. The jist of the issue is that Cassandra should not crash with OOM when under heavy load. It is OK to slow down, even maybe start throwing operation timeout exceptions, etc. But to just crash in the middle of the processing should not be allowed. is there any internal monitoring of heap usage in Cassandra where it could detect that it is getting close to the heap limit and start throttling the incoming requests to avoid this type of error? Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7861) Node is not able to gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7861: --- Labels: qa-resolved (was: ) Node is not able to gossip -- Key: CASSANDRA-7861 URL: https://issues.apache.org/jira/browse/CASSANDRA-7861 Project: Cassandra Issue Type: Bug Reporter: Ananthkumar K S Labels: qa-resolved Fix For: 2.0.3 The node is running on xxx.xxx.xxx.xxx. All of a sudden, it was not able to gossip and find the other nodes between data centres. We had two nodes indicated as down in DC1 but those two nodes were up and running in DC2. When we check the two nodes status in DC2, all the nodes in DC1 are denoted as DN and the other node in DC2 is denoted as down. There seems to be a disconnect between the nodes. I have attached the thread dump of the node that was down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-7861) Node is not able to gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7861. Resolution: Cannot Reproduce Node is not able to gossip -- Key: CASSANDRA-7861 URL: https://issues.apache.org/jira/browse/CASSANDRA-7861 Project: Cassandra Issue Type: Bug Reporter: Ananthkumar K S Labels: qa-resolved Fix For: 2.0.3 The node is running on xxx.xxx.xxx.xxx. All of a sudden, it was not able to gossip and find the other nodes between data centres. We had two nodes indicated as down in DC1 but those two nodes were up and running in DC2. When we check the two nodes status in DC2, all the nodes in DC1 are denoted as DN and the other node in DC2 is denoted as down. There seems to be a disconnect between the nodes. I have attached the thread dump of the node that was down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7766) Secondary index not working after a while
[ https://issues.apache.org/jira/browse/CASSANDRA-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7766: --- Description: cSince 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). was: Since 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). Secondary index not working after a while - Key: CASSANDRA-7766 URL: https://issues.apache.org/jira/browse/CASSANDRA-7766 Project: Cassandra Issue Type: Bug Environment: C* 2.1.0-rc5 with small clusters (one or two nodes) Reporter: Fabrice Larcher Attachments: result-failure.txt, result-success.txt cSince 2.1.0-rc2, it appears that the secondary indexes are not always working. Immediately after the INSERT of a row, the index seems to be there. But after a while (I do not know when or why), SELECT statements based on any secondary index do not return the corresponding row(s) anymore. I noticed that a restart of C* may have an impact (the data inserted before the restart may be seen through the index, even if it was not returned before the restart). Here is a use-case example (in order to clarify my request) : {code} CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text); CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind); INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello'); SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while {code} The last SELECT statement may or may not return a row depending on the instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters of one and two nodes. Since it depends on the instant of the request, I am not able to deliver any way to reproduce that systematically (It appears to be linked with some scheduled job inside C*). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7406) Reset version when closing incoming socket in IncomingTcpConnection should be done atomically
[ https://issues.apache.org/jira/browse/CASSANDRA-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143972#comment-14143972 ] Shawn Kumar commented on CASSANDRA-7406: Looks like this particular issue was also brought up and is being looked at in 7734. It would be greatly appreciated if you could note the version you noticed this in. Reset version when closing incoming socket in IncomingTcpConnection should be done atomically - Key: CASSANDRA-7406 URL: https://issues.apache.org/jira/browse/CASSANDRA-7406 Project: Cassandra Issue Type: Bug Components: Core Environment: CentOS release 5.5 (Tikanga) Reporter: Ray Chen When closing incoming socket, the close() method will call MessagingService.resetVersion(), this behavior may clear version which is set by another thread. This could cause MessagingService.knowsVersion(endpoint) test results as false (expect true here). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7599) Dtest on low cardinality secondary indexes failing in 2.1
[ https://issues.apache.org/jira/browse/CASSANDRA-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7599: --- Labels: qa-resolved (was: ) Dtest on low cardinality secondary indexes failing in 2.1 - Key: CASSANDRA-7599 URL: https://issues.apache.org/jira/browse/CASSANDRA-7599 Project: Cassandra Issue Type: Bug Components: Core, Tests Reporter: Shawn Kumar Assignee: Tyler Hobbs Labels: qa-resolved Fix For: 2.1.0 Attachments: 7599-follow-up.txt, 7599-followup-bikeshed.txt, 7599.txt test_low_cardinality_indexes in secondary_indexes_test.py is failing when tested on the cassandra-2.1 branch. This test has been failing on cassci for a while (at least the last 10 builds) and can easily be reproduced locally as well. It appears to still work on 2.0. {code} == FAIL: test_low_cardinality_indexes (secondary_indexes_test.TestSecondaryIndexes) -- Traceback (most recent call last): File /home/shawn/git/cstar5/cassandra-dtest/tools.py, line 213, in wrapped f(obj) File /home/shawn/git/cstar5/cassandra-dtest/secondary_indexes_test.py, line 89, in test_low_cardinality_indexes check_request_order() File /home/shawn/git/cstar5/cassandra-dtest/secondary_indexes_test.py, line 84, in check_request_order self.assertTrue('Executing indexed scan' in relevant_events[-1][0], str(relevant_events[-1])) AssertionError: (u'Enqueuing request to /127.0.0.2', '127.0.0.1') {code} The test checks that a series of messages are found in the trace after a select query against an index is carried out. It fails to find an 'Executing indexed scan' from node 1 (which takes the query, note both node2 and node3 produced this message). Brief investigation seemed to show that whichever node you create the patient_cql_connection on will not produce this message, indicating perhaps it does not carry out the scan. Should also note that changing 'numrows' (rows initially added) or 'b' (value on index column we query for) does not appear to make a difference. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7486: --- Description: See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) was: See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-5202) CFs should have globally and temporally unique CF IDs to prevent reusing data from earlier incarnation of same CF name
[ https://issues.apache.org/jira/browse/CASSANDRA-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-5202: --- Labels: qa-resolved test (was: test) CFs should have globally and temporally unique CF IDs to prevent reusing data from earlier incarnation of same CF name Key: CASSANDRA-5202 URL: https://issues.apache.org/jira/browse/CASSANDRA-5202 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.9 Environment: OS: Windows 7, Server: Cassandra 1.1.9 release drop Client: astyanax 1.56.21, JVM: Sun/Oracle JVM 64 bit (jdk1.6.0_27) Reporter: Marat Bedretdinov Assignee: Yuki Morishita Labels: qa-resolved, test Fix For: 2.1 beta1 Attachments: 0001-make-2i-CFMetaData-have-parent-s-CF-ID.patch, 0002-Don-t-scrub-2i-CF-if-index-type-is-CUSTOM.patch, 0003-Fix-user-defined-compaction.patch, 0004-Fix-serialization-test.patch, 0005-Create-system_auth-tables-with-fixed-CFID.patch, 0005-auth-v2.txt, 5202.txt, astyanax-stress-driver.zip Attached is a driver that sequentially: 1. Drops keyspace 2. Creates keyspace 4. Creates 2 column families 5. Seeds 1M rows with 100 columns 6. Queries these 2 column families The above steps are repeated 1000 times. The following exception is observed at random (race - SEDA?): ERROR [ReadStage:55] 2013-01-29 19:24:52,676 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReadStage:55,5,main] java.lang.AssertionError: DecoratedKey(-1, ) != DecoratedKey(62819832764241410631599989027761269388, 313a31) in C:\var\lib\cassandra\data\user_role_reverse_index\business_entity_role\user_role_reverse_index-business_entity_role-hf-1-Data.db at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:60) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:67) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1367) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1229) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1164) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:822) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1271) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This exception appears in the server at the time of client submitting a query request (row slice) and not at the time data is seeded. The client times out and this data can no longer be queried as the same exception would always occur from there on. Also on iteration 201, it appears that dropping column families failed and as a result their recreation failed with unique column family name violation (see exception below). Note that the data files are actually gone, so it appears that the server runtime responsible for creating column family was out of sync with the piece that dropped them: Starting dropping column families Dropped column families Starting dropping keyspace Dropped keyspace Starting creating column families Created column families Starting seeding data Total rows inserted: 100 in 5105 ms Iteration: 200; Total running time for 1000 queries is 232; Average running time of 1000 queries is 0 ms Starting dropping column families Dropped column families Starting dropping keyspace Dropped keyspace Starting creating column families Created column families Starting seeding data Total rows inserted: 100 in 5361 ms Iteration: 201; Total running time for 1000 queries is 222; Average running time of 1000 queries is 0 ms Starting dropping column families Starting creating column families Exception in thread main com.netflix.astyanax.connectionpool.exceptions.BadRequestException: BadRequestException: [host=127.0.0.1(127.0.0.1):9160, latency=2468(2469),
[jira] [Resolved] (CASSANDRA-7501) Test prepared marker for collections inside UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7501. Resolution: Implemented Test prepared marker for collections inside UDT --- Key: CASSANDRA-7501 URL: https://issues.apache.org/jira/browse/CASSANDRA-7501 Project: Cassandra Issue Type: Test Components: API Reporter: Jonathan Ellis Assignee: Shawn Kumar Priority: Minor Fix For: 2.1.1 Test for CASSANDRA-7472. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7501) Test prepared marker for collections inside UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7501: --- Labels: qa-resolved (was: ) Test prepared marker for collections inside UDT --- Key: CASSANDRA-7501 URL: https://issues.apache.org/jira/browse/CASSANDRA-7501 Project: Cassandra Issue Type: Test Components: API Reporter: Jonathan Ellis Assignee: Shawn Kumar Priority: Minor Labels: qa-resolved Fix For: 2.1.1 Test for CASSANDRA-7472. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7568) Replacing a dead node using replace_address fails
[ https://issues.apache.org/jira/browse/CASSANDRA-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7568: --- Labels: qa-resolved (was: ) Replacing a dead node using replace_address fails - Key: CASSANDRA-7568 URL: https://issues.apache.org/jira/browse/CASSANDRA-7568 Project: Cassandra Issue Type: Test Components: Tests Reporter: Ala' Alkhaldi Assignee: Shawn Kumar Priority: Minor Labels: qa-resolved Failed assertion {code} ERROR [main] 2014-07-17 10:24:21,171 CassandraDaemon.java:474 - Exception encountered during startup java.lang.AssertionError: Expected 1 endpoint but found 0 at org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:222) ~[main/:na] at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:131) ~[main/:na] at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:72) ~[main/:na] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1049) ~[main/:na] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:811) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:626) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:511) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546) [main/:na] {code} To replicate the bug run the replace_address_test.replace_stopped_node_test dtest -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (CASSANDRA-7568) Replacing a dead node using replace_address fails
[ https://issues.apache.org/jira/browse/CASSANDRA-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7568. Resolution: Fixed ended up having to pass in -Dconsistent.rangemovement=false to make the test work. Replacing a dead node using replace_address fails - Key: CASSANDRA-7568 URL: https://issues.apache.org/jira/browse/CASSANDRA-7568 Project: Cassandra Issue Type: Test Components: Tests Reporter: Ala' Alkhaldi Assignee: Shawn Kumar Priority: Minor Labels: qa-resolved Failed assertion {code} ERROR [main] 2014-07-17 10:24:21,171 CassandraDaemon.java:474 - Exception encountered during startup java.lang.AssertionError: Expected 1 endpoint but found 0 at org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:222) ~[main/:na] at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:131) ~[main/:na] at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:72) ~[main/:na] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1049) ~[main/:na] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:811) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:626) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:511) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546) [main/:na] {code} To replicate the bug run the replace_address_test.replace_stopped_node_test dtest -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7599) Dtest on low cardinality secondary indexes failing in 2.1
Shawn Kumar created CASSANDRA-7599: -- Summary: Dtest on low cardinality secondary indexes failing in 2.1 Key: CASSANDRA-7599 URL: https://issues.apache.org/jira/browse/CASSANDRA-7599 Project: Cassandra Issue Type: Bug Components: Tests Reporter: Shawn Kumar Fix For: 2.1.0 test_low_cardinality_indexes in secondary_indexes_test.py is failing when tested on the cassandra-2.1 branch. This test has been failing on cassci for a while (at least the last 10 builds) and can easily be reproduced locally as well. It appears to still work on 2.0. {code} == FAIL: test_low_cardinality_indexes (secondary_indexes_test.TestSecondaryIndexes) -- Traceback (most recent call last): File /home/shawn/git/cstar5/cassandra-dtest/tools.py, line 213, in wrapped f(obj) File /home/shawn/git/cstar5/cassandra-dtest/secondary_indexes_test.py, line 89, in test_low_cardinality_indexes check_request_order() File /home/shawn/git/cstar5/cassandra-dtest/secondary_indexes_test.py, line 84, in check_request_order self.assertTrue('Executing indexed scan' in relevant_events[-1][0], str(relevant_events[-1])) AssertionError: (u'Enqueuing request to /127.0.0.2', '127.0.0.1') {code} The test checks that a series of messages are found in the trace after a select query against an index is carried out. It fails to find an 'Executing indexed scan' from node 1 (which takes the query, note both node2 and node3 produced this message). Brief investigation seemed to show that whichever node you create the patient_cql_connection on will not produce this message, indicating perhaps it does not carry out the scan. Should also note that changing 'numrows' (rows initially added) or 'b' (value on index column we query for) does not appear to make a difference. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7599) Dtest on low cardinality secondary indexes failing in 2.1
[ https://issues.apache.org/jira/browse/CASSANDRA-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7599: --- Component/s: Core Dtest on low cardinality secondary indexes failing in 2.1 - Key: CASSANDRA-7599 URL: https://issues.apache.org/jira/browse/CASSANDRA-7599 Project: Cassandra Issue Type: Bug Components: Core, Tests Reporter: Shawn Kumar Fix For: 2.1.0 test_low_cardinality_indexes in secondary_indexes_test.py is failing when tested on the cassandra-2.1 branch. This test has been failing on cassci for a while (at least the last 10 builds) and can easily be reproduced locally as well. It appears to still work on 2.0. {code} == FAIL: test_low_cardinality_indexes (secondary_indexes_test.TestSecondaryIndexes) -- Traceback (most recent call last): File /home/shawn/git/cstar5/cassandra-dtest/tools.py, line 213, in wrapped f(obj) File /home/shawn/git/cstar5/cassandra-dtest/secondary_indexes_test.py, line 89, in test_low_cardinality_indexes check_request_order() File /home/shawn/git/cstar5/cassandra-dtest/secondary_indexes_test.py, line 84, in check_request_order self.assertTrue('Executing indexed scan' in relevant_events[-1][0], str(relevant_events[-1])) AssertionError: (u'Enqueuing request to /127.0.0.2', '127.0.0.1') {code} The test checks that a series of messages are found in the trace after a select query against an index is carried out. It fails to find an 'Executing indexed scan' from node 1 (which takes the query, note both node2 and node3 produced this message). Brief investigation seemed to show that whichever node you create the patient_cql_connection on will not produce this message, indicating perhaps it does not carry out the scan. Should also note that changing 'numrows' (rows initially added) or 'b' (value on index column we query for) does not appear to make a difference. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7140) dtest triggers java.lang.reflect.UndeclaredThrowableException on mixed upgrade from 1.2 to 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-7140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042661#comment-14042661 ] Shawn Kumar commented on CASSANDRA-7140: Unable to reproduce, test passes on 2.0.7-2.1. dtest triggers java.lang.reflect.UndeclaredThrowableException on mixed upgrade from 1.2 to 2.0 -- Key: CASSANDRA-7140 URL: https://issues.apache.org/jira/browse/CASSANDRA-7140 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Shawn Kumar This can be triggered by running the dtest with: {noformat} nosetests -vs upgrade_through_versions_test:TestUpgrade_from_cassandra_1_2_latest_tag_to_cassandra_2_0_HEAD.upgrade_test_mixed} {noformat} The dtest upgrade test code is a bit more obtuse now, so it takes some more work to see what's happening. It's entirely possible that the dtest is doing the upgrade improperly triggering the exception in cassandra. Here's the complete (and very long) stacktrace: {noformat} upgrade_test_mixed (upgrade_through_versions_test.TestUpgrade_from_cassandra_1_2_latest_tag_to_cassandra_2_0_HEAD) ... Exception in thread main java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy0.forceKeyspaceFlush(Unknown Source) at org.apache.cassandra.tools.NodeProbe.forceKeyspaceFlush(NodeProbe.java:210) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1673) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1365) Caused by: javax.management.ReflectionException: No such operation: forceKeyspaceFlush at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:170) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:112) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:275) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:252) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1029) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:292) ... 4 more Caused by: java.lang.NoSuchMethodException: forceKeyspaceFlush(java.lang.String, [Ljava.lang.String;) at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:112) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at
[jira] [Resolved] (CASSANDRA-7140) dtest triggers java.lang.reflect.UndeclaredThrowableException on mixed upgrade from 1.2 to 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-7140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7140. Resolution: Cannot Reproduce dtest triggers java.lang.reflect.UndeclaredThrowableException on mixed upgrade from 1.2 to 2.0 -- Key: CASSANDRA-7140 URL: https://issues.apache.org/jira/browse/CASSANDRA-7140 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Shawn Kumar Labels: qa-resolved This can be triggered by running the dtest with: {noformat} nosetests -vs upgrade_through_versions_test:TestUpgrade_from_cassandra_1_2_latest_tag_to_cassandra_2_0_HEAD.upgrade_test_mixed} {noformat} The dtest upgrade test code is a bit more obtuse now, so it takes some more work to see what's happening. It's entirely possible that the dtest is doing the upgrade improperly triggering the exception in cassandra. Here's the complete (and very long) stacktrace: {noformat} upgrade_test_mixed (upgrade_through_versions_test.TestUpgrade_from_cassandra_1_2_latest_tag_to_cassandra_2_0_HEAD) ... Exception in thread main java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy0.forceKeyspaceFlush(Unknown Source) at org.apache.cassandra.tools.NodeProbe.forceKeyspaceFlush(NodeProbe.java:210) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1673) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1365) Caused by: javax.management.ReflectionException: No such operation: forceKeyspaceFlush at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:170) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:112) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:275) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:252) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1029) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:292) ... 4 more Caused by: java.lang.NoSuchMethodException: forceKeyspaceFlush(java.lang.String, [Ljava.lang.String;) at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:112) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at
[jira] [Updated] (CASSANDRA-7140) dtest triggers java.lang.reflect.UndeclaredThrowableException on mixed upgrade from 1.2 to 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-7140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7140: --- Labels: qa-resolved (was: ) dtest triggers java.lang.reflect.UndeclaredThrowableException on mixed upgrade from 1.2 to 2.0 -- Key: CASSANDRA-7140 URL: https://issues.apache.org/jira/browse/CASSANDRA-7140 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Shawn Kumar Labels: qa-resolved This can be triggered by running the dtest with: {noformat} nosetests -vs upgrade_through_versions_test:TestUpgrade_from_cassandra_1_2_latest_tag_to_cassandra_2_0_HEAD.upgrade_test_mixed} {noformat} The dtest upgrade test code is a bit more obtuse now, so it takes some more work to see what's happening. It's entirely possible that the dtest is doing the upgrade improperly triggering the exception in cassandra. Here's the complete (and very long) stacktrace: {noformat} upgrade_test_mixed (upgrade_through_versions_test.TestUpgrade_from_cassandra_1_2_latest_tag_to_cassandra_2_0_HEAD) ... Exception in thread main java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy0.forceKeyspaceFlush(Unknown Source) at org.apache.cassandra.tools.NodeProbe.forceKeyspaceFlush(NodeProbe.java:210) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1673) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1365) Caused by: javax.management.ReflectionException: No such operation: forceKeyspaceFlush at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:170) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:112) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:275) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:252) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1029) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:292) ... 4 more Caused by: java.lang.NoSuchMethodException: forceKeyspaceFlush(java.lang.String, [Ljava.lang.String;) at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:112) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at
[jira] [Commented] (CASSANDRA-7350) Decommissioning nodes borks the seed node - can't add additional nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038001#comment-14038001 ] Shawn Kumar commented on CASSANDRA-7350: Was unable to reproduce this, looks like it's been fixed. Decommissioning nodes borks the seed node - can't add additional nodes -- Key: CASSANDRA-7350 URL: https://issues.apache.org/jira/browse/CASSANDRA-7350 Project: Cassandra Issue Type: Bug Components: Core Environment: Ubuntu using the auto-clustering AMI Reporter: Steven Lowenthal Assignee: Shawn Kumar Priority: Minor Fix For: 2.0.9 1) Launch a 4 node cluster - I used the auto-clustering AMI (you get nodes 0-3) 2) decommission that last 2 nodes (nodes , leaving a 2 node cluster) 3) wipe the data directories from node 2 4) bootstrap node2 - it won't join unable to gossip with any seeds. If you bootstrap the node a second time, it will join. However if you try to bootstrap node 3, it will also fail. I discovered that bouncing the seed node fixes the problem. I think it cropped up in 2.0.7. Error: ERROR [main] 2014-06-03 21:52:46,649 CassandraDaemon.java (line 497) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:447) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:656) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:505) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569) ERROR [StorageServiceShutdownHook] 2014-06-03 21:52:46,741 CassandraDaemon.java (line 198) Exception in thread Thread[StorageServi -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7350) Decommissioning nodes borks the seed node - can't add additional nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7350: --- Labels: qa-resolved (was: ) Decommissioning nodes borks the seed node - can't add additional nodes -- Key: CASSANDRA-7350 URL: https://issues.apache.org/jira/browse/CASSANDRA-7350 Project: Cassandra Issue Type: Bug Components: Core Environment: Ubuntu using the auto-clustering AMI Reporter: Steven Lowenthal Assignee: Shawn Kumar Priority: Minor Labels: qa-resolved Fix For: 2.0.9 1) Launch a 4 node cluster - I used the auto-clustering AMI (you get nodes 0-3) 2) decommission that last 2 nodes (nodes , leaving a 2 node cluster) 3) wipe the data directories from node 2 4) bootstrap node2 - it won't join unable to gossip with any seeds. If you bootstrap the node a second time, it will join. However if you try to bootstrap node 3, it will also fail. I discovered that bouncing the seed node fixes the problem. I think it cropped up in 2.0.7. Error: ERROR [main] 2014-06-03 21:52:46,649 CassandraDaemon.java (line 497) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:447) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:656) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:505) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569) ERROR [StorageServiceShutdownHook] 2014-06-03 21:52:46,741 CassandraDaemon.java (line 198) Exception in thread Thread[StorageServi -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-5202) CFs should have globally and temporally unique CF IDs to prevent reusing data from earlier incarnation of same CF name
[ https://issues.apache.org/jira/browse/CASSANDRA-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-5202: --- Tester: Shawn Kumar CFs should have globally and temporally unique CF IDs to prevent reusing data from earlier incarnation of same CF name Key: CASSANDRA-5202 URL: https://issues.apache.org/jira/browse/CASSANDRA-5202 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.9 Environment: OS: Windows 7, Server: Cassandra 1.1.9 release drop Client: astyanax 1.56.21, JVM: Sun/Oracle JVM 64 bit (jdk1.6.0_27) Reporter: Marat Bedretdinov Assignee: Yuki Morishita Labels: test Fix For: 2.1 beta1 Attachments: 0001-make-2i-CFMetaData-have-parent-s-CF-ID.patch, 0002-Don-t-scrub-2i-CF-if-index-type-is-CUSTOM.patch, 0003-Fix-user-defined-compaction.patch, 0004-Fix-serialization-test.patch, 0005-Create-system_auth-tables-with-fixed-CFID.patch, 0005-auth-v2.txt, 5202.txt, astyanax-stress-driver.zip Attached is a driver that sequentially: 1. Drops keyspace 2. Creates keyspace 4. Creates 2 column families 5. Seeds 1M rows with 100 columns 6. Queries these 2 column families The above steps are repeated 1000 times. The following exception is observed at random (race - SEDA?): ERROR [ReadStage:55] 2013-01-29 19:24:52,676 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReadStage:55,5,main] java.lang.AssertionError: DecoratedKey(-1, ) != DecoratedKey(62819832764241410631599989027761269388, 313a31) in C:\var\lib\cassandra\data\user_role_reverse_index\business_entity_role\user_role_reverse_index-business_entity_role-hf-1-Data.db at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:60) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:67) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1367) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1229) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1164) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:822) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1271) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This exception appears in the server at the time of client submitting a query request (row slice) and not at the time data is seeded. The client times out and this data can no longer be queried as the same exception would always occur from there on. Also on iteration 201, it appears that dropping column families failed and as a result their recreation failed with unique column family name violation (see exception below). Note that the data files are actually gone, so it appears that the server runtime responsible for creating column family was out of sync with the piece that dropped them: Starting dropping column families Dropped column families Starting dropping keyspace Dropped keyspace Starting creating column families Created column families Starting seeding data Total rows inserted: 100 in 5105 ms Iteration: 200; Total running time for 1000 queries is 232; Average running time of 1000 queries is 0 ms Starting dropping column families Dropped column families Starting dropping keyspace Dropped keyspace Starting creating column families Created column families Starting seeding data Total rows inserted: 100 in 5361 ms Iteration: 201; Total running time for 1000 queries is 222; Average running time of 1000 queries is 0 ms Starting dropping column families Starting creating column families Exception in thread main com.netflix.astyanax.connectionpool.exceptions.BadRequestException: BadRequestException: [host=127.0.0.1(127.0.0.1):9160, latency=2468(2469), attempts=1]InvalidRequestException(why:Keyspace names must be
[jira] [Updated] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-5351: --- Labels: qa-resolved repair (was: repair) Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: qa-resolved, repair Fix For: 2.1 beta1 Attachments: 0001-Incremental-repair-wip.patch, 0001-keep-repairedAt-time-when-scrubbing-and-no-bad-rows-.patch, 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7141) Expand secondary_indexes_test for secondary indexes on sets and maps
[ https://issues.apache.org/jira/browse/CASSANDRA-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-7141: --- Labels: qa-resolved (was: ) Expand secondary_indexes_test for secondary indexes on sets and maps Key: CASSANDRA-7141 URL: https://issues.apache.org/jira/browse/CASSANDRA-7141 Project: Cassandra Issue Type: Test Reporter: Shawn Kumar Assignee: Shawn Kumar Priority: Minor Labels: qa-resolved Fix For: 2.1 rc1 Secondary_indexes_test.py only checks functionality of secondary indexes on lists as of now. This should be expanded to all collections, including maps and sets. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (CASSANDRA-7141) Expand secondary_indexes_test for secondary indexes on sets and maps
[ https://issues.apache.org/jira/browse/CASSANDRA-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7141. Resolution: Fixed Expand secondary_indexes_test for secondary indexes on sets and maps Key: CASSANDRA-7141 URL: https://issues.apache.org/jira/browse/CASSANDRA-7141 Project: Cassandra Issue Type: Test Reporter: Shawn Kumar Assignee: Shawn Kumar Priority: Minor Labels: qa-resolved Fix For: 2.1 rc1 Secondary_indexes_test.py only checks functionality of secondary indexes on lists as of now. This should be expanded to all collections, including maps and sets. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar updated CASSANDRA-5351: --- Tester: Shawn Kumar Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 beta1 Attachments: 0001-Incremental-repair-wip.patch, 0001-keep-repairedAt-time-when-scrubbing-and-no-bad-rows-.patch, 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (CASSANDRA-7109) Create replace_address dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Kumar resolved CASSANDRA-7109. Resolution: Fixed Create replace_address dtest Key: CASSANDRA-7109 URL: https://issues.apache.org/jira/browse/CASSANDRA-7109 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Shawn Kumar Fix For: 2.1 beta2 {noformat} 16:03 driftx well, this just bothers me because either it's been broken for almost ever, or something broke in cassandra. 16:43 thobbs driftx: I'm testing your patch on #6622, but I'm seeing a bit of a weird error: 16:43 CassBotJr https://issues.apache.org/jira/browse/CASSANDRA-6622 (Unresolved; 1.2.16, 2.0.6): Streaming session failures during node replace of same address 16:43 thobbs java.lang.UnsupportedOperationException: Cannot replace token -1017822742317066613 which does not exist! 16:44 thobbs this is on 2.0 with the patch applied 16:44 driftx O_o 16:44 thobbs I'm just stopping a ccm node, clearing it, then starting with replace_address (auto_bootstrap = true, not a seed, initial_tokens is null) 16:45 driftx oh, I'm stupid, hang on 16:47 rcoli is the sum of that that replace_* is still broken in 2.0 ? 16:47 rcoli err, 1.2? 16:48 driftx thobbs: updated the patch 16:48 thobbs rcoli: only for replacing the same address 16:48 rcoli is there another case? 16:49 driftx replacing with a different address. 16:49 rcoli oh, right, _address_ 16:49 rcoli I'm still modeling this as replace _token_ 16:49 rcoli in my brain 16:49 driftx same address never broke for me though, so you can probably just retry 16:55 thobbs can we add a dtest for replace_address coverage? It's kind of annoying to test manually and we've managed to break it a few times 16:56 thobbs I have a PR against ccm open to add replace_address support: https://github.com/pcmanus/ccm/pull/85 16:57 driftx I could have sworn we had one 16:58 driftx we do but it's using replace_token so probably not even running now 16:58 thobbs yeah 16:58 thobbs it would be nice to cover replacing the same address, another address, and expected failures like replacing a still-live node 16:59 driftx +1 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7141) Expand secondary_indexes_test for secondary indexes on sets and maps
Shawn Kumar created CASSANDRA-7141: -- Summary: Expand secondary_indexes_test for secondary indexes on sets and maps Key: CASSANDRA-7141 URL: https://issues.apache.org/jira/browse/CASSANDRA-7141 Project: Cassandra Issue Type: Test Reporter: Shawn Kumar Assignee: Shawn Kumar Priority: Minor Fix For: 2.1 rc1 Secondary_indexes_test.py only checks functionality of secondary indexes on lists as of now. This should be expanded to all collections, including maps and sets. -- This message was sent by Atlassian JIRA (v6.2#6252)