[jira] [Assigned] (CASSANDRA-12362) dtest failure in upgrade_tests.paging_test.TestPagingDatasetChangesNodes2RF1_Upgrade_current_3_x_To_indev_3_x.test_row_TTL_expiry_during_paging
[ https://issues.apache.org/jira/browse/CASSANDRA-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-12362: --- Assignee: (was: Jason Brown) > dtest failure in > upgrade_tests.paging_test.TestPagingDatasetChangesNodes2RF1_Upgrade_current_3_x_To_indev_3_x.test_row_TTL_expiry_during_paging > --- > > Key: CASSANDRA-12362 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12362 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Sean McCarthy >Priority: Normal > Labels: dtest > Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, > node2_debug.log, node2_gc.log > > > example failure: > http://cassci.datastax.com/job/trunk_dtest_upgrade/5/testReport/upgrade_tests.paging_test/TestPagingDatasetChangesNodes2RF1_Upgrade_current_3_x_To_indev_3_x/test_row_TTL_expiry_during_paging > {code} > Stacktrace > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/automaton/cassandra-dtest/upgrade_tests/paging_test.py", line > 1217, in test_row_TTL_expiry_during_paging > self.assertEqual(pf.pagecount(), 3) > File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual > assertion_func(first, second, msg=msg) > File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual > raise self.failureException(msg) > "2 != 3 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13507) dtest failure in paging_test.TestPagingWithDeletions.test_ttl_deletions
[ https://issues.apache.org/jira/browse/CASSANDRA-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-13507: --- Assignee: (was: Jason Brown) > dtest failure in paging_test.TestPagingWithDeletions.test_ttl_deletions > > > Key: CASSANDRA-13507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13507 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Ariel Weisberg >Priority: Normal > Labels: dtest, test-failure, test-failure-fresh > Attachments: test_ttl_deletions_fail.txt > > > {noformat} > Failed 7 times in the last 30 runs. Flakiness: 34%, Stability: 76% > Error Message > 4 != 8 > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-z1xodw > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'num_tokens': '32', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > cassandra.pool: WARNING: Error attempting to reconnect to 127.0.0.5, > scheduling retry in 600.0 seconds: [Errno 111] Tried connecting to > [('127.0.0.5', 9042)]. Last error: Connection refused > cassandra.pool: WARNING: Error attempting to reconnect to 127.0.0.3, > scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to > [('127.0.0.3', 9042)]. Last error: Connection refused > cassandra.pool: WARNING: Error attempting to reconnect to 127.0.0.3, > scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to > [('127.0.0.3', 9042)]. Last error: Connection refused > {noformat} > Most output omitted. It's attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12347) Gossip 2.0 - broadcast tree for data dissemination
[ https://issues.apache.org/jira/browse/CASSANDRA-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-12347: Resolution: Won't Fix Status: Resolved (was: Open) > Gossip 2.0 - broadcast tree for data dissemination > -- > > Key: CASSANDRA-12347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12347 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Distributed Metadata >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Normal > > Description: A broadcast tree (spanning tree) allows an originating node to > efficiently send out updates to all of the peers in the cluster by > constructing a balanced, self-healing tree based upon the view it gets from > the peer sampling service (CASSANDRA-12346). > I propose we use an algorithm based on the [Thicket > paper|http://www.gsd.inesc-id.pt/%7Ejleitao/pdf/srds10-mario.pdf], which > describes a dynamic, self-healing broadcast tree. When a given node needs to > send out a message, it dynamically builds a tree for each node in the > cluster; thus giving us a unique tree for every node in the cluster (a tree > rooted at every cluster node). The trees, of course, would be reusable until > the cluster configurations changes or failures are detected (by the mechanism > described in the paper). Additionally, Thicket includes a mechanism for > load-balancing the trees such that nodes spread out the work amongst > themselves. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12346) Gossip 2.0 - introduce a Peer Sampling Service for partial cluster views
[ https://issues.apache.org/jira/browse/CASSANDRA-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-12346: Resolution: Won't Fix Status: Resolved (was: Open) > Gossip 2.0 - introduce a Peer Sampling Service for partial cluster views > > > Key: CASSANDRA-12346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12346 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Normal > Labels: gossip > > A [Peer Sampling > Service|http://infoscience.epfl.ch/record/83409/files/neg--1184036295all.pdf] > is a module that provides a partial view of a cluster to dependent modules. A > node's partial view, combined with all other nodes' partial views, combine to > create a fully-connected mesh over the cluster. This way, a given node does > not need to have direct connections to every other node in the cluster, and > can be much more efficient in terms of resource management as well as > information dissemination. Peer Sampling Services by their nature must be > self-healing and self-balancing to maintain the fully-connected mesh. > I propose we use an algorithm based on > [HyParView|http://asc.di.fct.unl.pt/~jleitao/pdf/dsn07-leitao.pdf], which is > a concrete algorithm for a Peer Sampling Service. HyParView has a clearly > defined protocol, and is reasonably simple to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12345) Gossip 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-12345: Resolution: Won't Fix Status: Resolved (was: Open) > Gossip 2.0 > -- > > Key: CASSANDRA-12345 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12345 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Normal > Labels: gossip > > This is a parent ticket covering changes to the dissemination aspects of the > current gossip subsystem. (Changes to the actual data being exchanged by the > current gossip (the cluster metadata) will be handled elsewhere, but the > current primary ticket covering that work is CASSANDRA-9667.) > This work requires several components, which largely need to completed in > this order: > - a peer sampling service to create partial cluster views (CASSANDRA-12346). > This forms the basis of the next two components > - a broadcast tree, which creates dynamic spanning trees given the partial > views provided by the peer sampling service (CASSANDRA-12347) > - an anti-entropy component, which is similar to the pair-wise exchange and > reconciliation of the exitsing gossip implementation (CASSANDRA-???) > These base components (primarily the broadcast and anti-entropy) can allow > for generic consumers to simply and effectively share a body of data across > an entire cluster. The most obvious consumer will be a cluster metadata > component, which can replace the existing gossip system, but also other > components like CASSANDRA-12106. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13628) switch peer-to-peer networking to non-blocking I/O via netty
[ https://issues.apache.org/jira/browse/CASSANDRA-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13628: Resolution: Fixed Status: Resolved (was: Open) > switch peer-to-peer networking to non-blocking I/O via netty > > > Key: CASSANDRA-13628 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13628 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core, Legacy/Streaming and Messaging >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Normal > Fix For: 4.0 > > > This is a parent ticket for linking all the work to be done for switching > peer-to-peer networking to use non-blocking I/O via netty -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13630) support large internode messages with netty
[ https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13630: Resolution: Won't Fix Status: Resolved (was: Open) > support large internode messages with netty > --- > > Key: CASSANDRA-13630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13630 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the > scope of that ticket. However, we still need that functionality to ship a > correctly operating internode messaging subsystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13630) support large internode messages with netty
[ https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13630: Status: Open (was: Patch Available) > support large internode messages with netty > --- > > Key: CASSANDRA-13630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13630 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the > scope of that ticket. However, we still need that functionality to ship a > correctly operating internode messaging subsystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13989) Update security docs for 4.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-13989: --- Assignee: (was: Jason Brown) > Update security docs for 4.0 > > > Key: CASSANDRA-13989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13989 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Documentation and Website >Reporter: Jason Brown >Priority: Low > Fix For: 4.x > > > CASSANDRA-8457 and CASSANDRA-10404 have brought changes to the way SSL works > for both internode messaging and the native protocol. Update the docs to > reflect information that is important to users/operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14754) Add verification of state machine in StreamSession
[ https://issues.apache.org/jira/browse/CASSANDRA-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-14754: --- Assignee: (was: Jason Brown) > Add verification of state machine in StreamSession > -- > > Key: CASSANDRA-14754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14754 > Project: Cassandra > Issue Type: Task > Components: Legacy/Streaming and Messaging >Reporter: Jason Brown >Priority: Normal > Fix For: 4.0 > > > {{StreamSession}} contains an implicit state machine, but we have no > verification of the safety of the transitions between states. For example, we > have no checks to ensure we cannot leave the final states (COMPLETED, FAILED). > I propose we add some program logic in {{StreamSession}}, tests, and > documentation to ensure the correctness of the state transitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14575) Reevaluate when to drop an internode connection on message error
[ https://issues.apache.org/jira/browse/CASSANDRA-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-14575: --- Assignee: (was: Jason Brown) > Reevaluate when to drop an internode connection on message error > > > Key: CASSANDRA-14575 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14575 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Streaming and Messaging >Reporter: Jason Brown >Priority: Low > Fix For: 4.0 > > > As mentioned in CASSANDRA-14574, explore if and when we can safely ignore an > incoming internode message on certain classes of failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14575) Reevaluate when to drop an internode connection on message error
[ https://issues.apache.org/jira/browse/CASSANDRA-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14575: Status: Open (was: Patch Available) > Reevaluate when to drop an internode connection on message error > > > Key: CASSANDRA-14575 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14575 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Streaming and Messaging >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Low > Fix For: 4.0 > > > As mentioned in CASSANDRA-14574, explore if and when we can safely ignore an > incoming internode message on certain classes of failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3
[ https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-14760: --- Assignee: (was: Jason Brown) > CVE-2018-10237 Security vulnerability in 3.11.3 > --- > > Key: CASSANDRA-14760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14760 > Project: Cassandra > Issue Type: Bug > Components: Dependencies >Reporter: John F. Gbruoski >Priority: Normal > > As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a > security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be > patched to support Guava 24.1.1 or later? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14503: Resolution: Won't Fix Status: Resolved (was: Open) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14503: Status: Open (was: Patch Available) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807254#comment-16807254 ] Jason Brown commented on CASSANDRA-15066: - I believe many of these changes indeed improve the quality of the code and long-term strengthen the system, but seem best targetted at 4.NEXT. I and others would like to collaborate on this effort going forward. The introduction of a slew of new features (checksumming, reimplementing parts of netty (AsyncPromise, LZ4 compression, replacing netty’s ByteBufAllocator with c*’s)) and major reimplementations (droppable verbs/verb priority, semantic changes to connection types) seven months after the community declared a feature freeze for 4.0 seems ill-advised, at best. The size, scope, and depth of this patch, which touches many vital components, invalidates most 4.0 testing hitherto. In my estimation, a fair and thorough review of the current patch alone, by myself and others, would be at least 2 solid months, as there is a lot of new complexity introduced. Significant additional time would be required for integration testing. At the barest minimum, this patch should be broken up into separate tickets, review them indivdidually, and merge them incrementally. Additionaly, I think having a discussion on dev@, as you proposed, would be highly beneficial. Further, CASSANDRA-14503 was posted for REVIEW, in the hopes that we could have a discussion around the current state of trunk and the patch I submitted. I appreciate the reporting of the bugs you found and work you invested. Beyond that, however, there has not been any meaningful discussion or engagement. I would have appreciated the opportunity to collaborate on this effort, especially as I have personally invested much time and effort into this work. To sum up, I am -1 on this work in it's current form *for 4.0* as the new features violate the freeze, and many of the new implementations violate principle of reducing risk and increase stability as we run up to 4.0. > Improvements to Internode Messaging > --- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Benedict >Assignee: Benedict >Priority: Normal > Fix For: 4.0 > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15030) Add support for SSL and bindable address to sidecar
[ https://issues.apache.org/jira/browse/CASSANDRA-15030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-15030: Reviewers: Chris Lohfink, Vinay Chella Reviewer: (was: Chris Lohfink) > Add support for SSL and bindable address to sidecar > --- > > Key: CASSANDRA-15030 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15030 > Project: Cassandra > Issue Type: New Feature > Components: Sidecar >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Minor > > We need to support SSL for the sidecar's REST interface. We should also have > the ability to bind the sidecar's API to a specific network interface. This > patch adds support for both. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14395) C* Management process
[ https://issues.apache.org/jira/browse/CASSANDRA-14395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14395: Resolution: Fixed Status: Resolved (was: Patch Available) +1 I made two minor changes on commit: * made {{logger}} instances {{static final}} * removed the jolokia license file. [~djoshi3] sorry if I confused you, as what i meant to say is that if we ship the jar in-tree, we should have the license file, as well (like we do in cassandra proper). However, this raises the question of how to correctly address transitive dependencies that we don't ship in-tree. Admittedly, I've been doing it "the cassandra way" for a long time (with jars in-tree), so I'm not sure how properly include licenses with a maven-like system. I'll create a followup ticket to figure it out. Otherwise, this is a good first step toward shipping a working sidecar. Committed as sha {{a15ed267d1977e38ba36d061139839fad7b865f2}}. Thanks, [~djoshi3]! > C* Management process > - > > Key: CASSANDRA-14395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14395 > Project: Cassandra > Issue Type: New Feature >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Attachments: Looking towards an Official Cassandra Sidecar - > Netflix.pdf > > > I would like to propose amending Cassandra's architecture to include a > management process. The detailed description is here: > https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit > I'd like to propose seeding this with a few simple use-cases such as Health > Checks, Bulk Commands with a simple REST API interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769340#comment-16769340 ] Jason Brown commented on CASSANDRA-15013: - Ahh, I just reread the {{doc/native_protocol_v5.spec}}, and the OPTIONS are an open map, basically. I thought they were a fixed listing (primarily because we only support a fixed set of compression types). OK, so any version works for me :). > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769340#comment-16769340 ] Jason Brown edited comment on CASSANDRA-15013 at 2/15/19 2:05 PM: -- Ahh, I just reread the {{doc/native_protocol_v5.spec}}, and the OPTIONS are a semi-defined map, basically. I thought they were a fixed listing (primarily because we only support a fixed set of compression types). OK, so any version works for me :). was (Author: jasobrown): Ahh, I just reread the {{doc/native_protocol_v5.spec}}, and the OPTIONS are an open map, basically. I thought they were a fixed listing (primarily because we only support a fixed set of compression types). OK, so any version works for me :). > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769331#comment-16769331 ] Jason Brown commented on CASSANDRA-15013: - Yup, I agree the harder part, programming wise, is {{requestExecutor}} stuffs, and let's plow through that first. The {{OptionsMessage/client protocol work}} is significantly easier, as I think we agree, but would that qualify as a change to the native protocol, for which we need to wait for a major rev (as in, 4.0)? Or are additive additions ok acceptable for previous native protocol versions? We might have a policy or general advice around this, but I don't know. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769331#comment-16769331 ] Jason Brown edited comment on CASSANDRA-15013 at 2/15/19 1:54 PM: -- Yup, I agree the harder part, programming wise, is {{requestExecutor}} stuffs, and let's plow through that first. The {{OptionsMessage/client protocol work}} is significantly easier, as I think we agree, but would that qualify as a change to the native protocol, for which we need to wait for a major rev (as in, 4.0)? Or are additive additions ok acceptable for previous native protocol versions? We might have a policy or general advice around this, but I don't know. Either way, [~sumanth.pasupuleti] has enough to work forward for now, and we can figure out the native protocol-impacting stuffs in parallel. was (Author: jasobrown): Yup, I agree the harder part, programming wise, is {{requestExecutor}} stuffs, and let's plow through that first. The {{OptionsMessage/client protocol work}} is significantly easier, as I think we agree, but would that qualify as a change to the native protocol, for which we need to wait for a major rev (as in, 4.0)? Or are additive additions ok acceptable for previous native protocol versions? We might have a policy or general advice around this, but I don't know. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14395) C* Management process
[ https://issues.apache.org/jira/browse/CASSANDRA-14395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769326#comment-16769326 ] Jason Brown commented on CASSANDRA-14395: - - inspecting the tarball via produced via \{{gradlew distTar}}, I don't see the jolokia jar packaged in it. Admittedly, I didn't check on the last version of this patch either. - need a license file when including the jolokia jar. I propose we start simple for now, and since there's only one jar for now (which is hopefully being removed in an upcoming patch) - just add the license in a subfolder like we do in cassandra. - in \{{HealthCheck::check}}, when we get a \{{NoHostAvailableException}} from the driver (which is thrown when we cannot connect), it would preferable to not litter the logs with the stack trace. Or maybe log the exception at \{{DEBUG}} or \{{TRACE}}. I discovered this by running c* locally, then terminating it, and watching the logs from the sidecar. - petty nit: sometimes a \{{catch}} keyword is on the same line as the closing brace of a \{[try}} block. See \{{HealhCheck::createCluster}} for an example. > C* Management process > - > > Key: CASSANDRA-14395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14395 > Project: Cassandra > Issue Type: New Feature >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Attachments: Looking towards an Official Cassandra Sidecar - > Netflix.pdf > > > I would like to propose amending Cassandra's architecture to include a > management process. The detailed description is here: > https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit > I'd like to propose seeding this with a few simple use-cases such as Health > Checks, Bulk Commands with a simple REST API interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769292#comment-16769292 ] Jason Brown commented on CASSANDRA-15013: - [~benedict] A, I see now that's what you intended by \{{connection-configurable option}}. I'm fine with that. I'm not sure if specifying the 'backpressure type' would require a change to the native protocol. I think it would be most appropriate in the OPTIONS section (and thus {{OptionasMessage}}), but I might be mistaken. However, I wonder if we should break that work out into a separate ticket to unblock the other work here, so that it can be backported and fixed in production. wdyt? > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768913#comment-16768913 ] Jason Brown commented on CASSANDRA-15013: - I agree with upping the max queue depth (or unbounded plus size monitoring) as well as stop reading from the socket (by setting netty's {{autoRead}} to false). I'm not, however, convinced about adding yet another configuration option; adding more configs options only complicates the lives of operators. How will an operator know how to set it most appropriately to their use case(s)? We should choose the best solution, *document it*, and go with that as a built-in behavior. (Note: I'm amenable to throwing the OverloadedException, as well.) > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14395) C* Management process
[ https://issues.apache.org/jira/browse/CASSANDRA-14395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765375#comment-16765375 ] Jason Brown commented on CASSANDRA-14395: - I've taken a decent look through [~djoshi3]'s first patch, and on the whole, I think this is a good first step for this project. I'm not digging too far into nit picking at this early stage as I feel it's more important to make forward progress overall rather than get tripped up over minor points. general comments: - I'm passing over the gradle scripts for now, in lieu of reviewing everything else first - do we need a lib directly with checked in jars, esp with a gradle/mvn style build files that pulls jars from mavenCentral? - need a script to run the app from the command line :) I was able to use \{{gradlew run}} to see it work. code comments - Configuration - let's add comments to make it easier to distinguish between \{{getCassandraPort}} and \{{getPort}}; maybe update the method names, as well. I needed to read the \{{MainModule::configuration()}} to figure what each method acttually represented. - in general, I think we want to execute scheduled tasks with \{{scheduleWithFixedDelay()}} rather than \{{scheduleAtFixedRate()}}. This way tasks don't end up piling up on top of each other if one takes a looong time to execute. - there a couple of nit picky things where an instance's final fields are in caps, like a constant. trivial at this point. The only thing I'm not entirely thrilled with is how each URL/handler will need to be explicitly wired into the \{{router}} in \{{CassandraSidecarDaemon::start()}}. I'm not sure if there's further guice magick that can mitigate this. However, I don't feel this is a huge problem for the usefulness project as a whole, nor do I think we need to tackle it in the early stages or anytime soon. Looking forward to more activity on this ticket. > C* Management process > - > > Key: CASSANDRA-14395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14395 > Project: Cassandra > Issue Type: New Feature >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Attachments: Looking towards an Official Cassandra Sidecar - > Netflix.pdf > > > I would like to propose amending Cassandra's architecture to include a > management process. The detailed description is here: > https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit > I'd like to propose seeding this with a few simple use-cases such as Health > Checks, Bulk Commands with a simple REST API interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749356#comment-16749356 ] Jason Brown commented on CASSANDRA-14503: - [~benedict] / [~djoshi3] Any update on reviewing this latest patch? This seems to be a blocker for 4.0. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate
[ https://issues.apache.org/jira/browse/CASSANDRA-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14829: Fix Version/s: (was: 4.0.x) (was: 3.11.x) 4.0 3.11.4 > Make stop-server.bat wait for Cassandra to terminate > > > Key: CASSANDRA-14829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14829 > Project: Cassandra > Issue Type: Improvement > Components: Packaging > Environment: Windows 10 >Reporter: Georg Dietrich >Assignee: Georg Dietrich >Priority: Minor > Labels: easyfix, windows > Fix For: 3.11.4, 4.0 > > > While administering a single node Cassandra on Windows, I noticed that the > stop-server.bat script returns before the cassandra process has actually > terminated. For use cases like creating a script "shut down & create backup > of data directory without having to worry about open files, then restart", it > would be good to make stop-server.bat wait for Cassandra to terminate. > All that is needed for that is to change in > apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." > to "start /WAIT /B powershell /file ..." (additional /WAIT parameter). > Does this sound reasonable? > Here is the pull request: https://github.com/apache/cassandra/pull/287 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate
[ https://issues.apache.org/jira/browse/CASSANDRA-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14829: Resolution: Fixed Reviewer: Dinesh Joshi Fix Version/s: (was: 4.x) Status: Resolved (was: Ready to Commit) committed as sha \{{85e402a7fda59110aeea181924035d69db693240}}. Thanks! > Make stop-server.bat wait for Cassandra to terminate > > > Key: CASSANDRA-14829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14829 > Project: Cassandra > Issue Type: Improvement > Components: Packaging > Environment: Windows 10 >Reporter: Georg Dietrich >Assignee: Georg Dietrich >Priority: Minor > Labels: easyfix, windows > Fix For: 3.11.x, 4.0.x > > > While administering a single node Cassandra on Windows, I noticed that the > stop-server.bat script returns before the cassandra process has actually > terminated. For use cases like creating a script "shut down & create backup > of data directory without having to worry about open files, then restart", it > would be good to make stop-server.bat wait for Cassandra to terminate. > All that is needed for that is to change in > apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." > to "start /WAIT /B powershell /file ..." (additional /WAIT parameter). > Does this sound reasonable? > Here is the pull request: https://github.com/apache/cassandra/pull/287 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate
[ https://issues.apache.org/jira/browse/CASSANDRA-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707176#comment-16707176 ] Jason Brown commented on CASSANDRA-14829: - [~djoshi3] i'll commit. > Make stop-server.bat wait for Cassandra to terminate > > > Key: CASSANDRA-14829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14829 > Project: Cassandra > Issue Type: Improvement > Components: Packaging > Environment: Windows 10 >Reporter: Georg Dietrich >Assignee: Georg Dietrich >Priority: Minor > Labels: easyfix, windows > Fix For: 3.11.x, 4.x, 4.0.x > > > While administering a single node Cassandra on Windows, I noticed that the > stop-server.bat script returns before the cassandra process has actually > terminated. For use cases like creating a script "shut down & create backup > of data directory without having to worry about open files, then restart", it > would be good to make stop-server.bat wait for Cassandra to terminate. > All that is needed for that is to change in > apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." > to "start /WAIT /B powershell /file ..." (additional /WAIT parameter). > Does this sound reasonable? > Here is the pull request: https://github.com/apache/cassandra/pull/287 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown resolved CASSANDRA-14896. - Resolution: Fixed Committed v2 patch as {{f3609995c09570d523527d9bd0fd69c2bc65d986}} with updated comments per [~aweisberg]'s recommendation. > 3.0 schema migration pulls from later version incompatible nodes > > > Key: CASSANDRA-14896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14896 > Project: Cassandra > Issue Type: Bug > Components: Core, CQL >Reporter: Ariel Weisberg >Assignee: Jason Brown >Priority: Blocker > Labels: 4.0-pre-rc-bugs > Fix For: 4.0 > > > I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly > different and 3.0 in some scenarios it is pulling schema from a later > version. This causes upgrade tests to have errors in the logs due to > additional columns from configurable storage port. > {noformat} > Failed: Error details: > Errors seen in logs for: node2 > node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 > CassandraDaemon.java:207 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.RuntimeException: Unknown column additional_write_policy during > deserialization > at > org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.17.jar:3.0.17] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703901#comment-16703901 ] Jason Brown commented on CASSANDRA-14896: - The problem with my first patch is that we need the peer's messaging version in order serialize the {{InetaddressAndPort}} correctly to the peer. We still need to write the local node's messaging version into the message, however. Patch here: ||v2|| |[branch|https://github.com/jasobrown/cassandra/tree/14896-v2]| |[utests dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14896-v2]| || > 3.0 schema migration pulls from later version incompatible nodes > > > Key: CASSANDRA-14896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14896 > Project: Cassandra > Issue Type: Bug > Components: Core, CQL >Reporter: Ariel Weisberg >Assignee: Jason Brown >Priority: Blocker > Labels: 4.0-pre-rc-bugs > Fix For: 4.0 > > > I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly > different and 3.0 in some scenarios it is pulling schema from a later > version. This causes upgrade tests to have errors in the logs due to > additional columns from configurable storage port. > {noformat} > Failed: Error details: > Errors seen in logs for: node2 > node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 > CassandraDaemon.java:207 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.RuntimeException: Unknown column additional_write_policy during > deserialization > at > org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.17.jar:3.0.17] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14909) Netty IOExceptions caused by unclean client disconnects being logged at INFO instead of TRACE
[ https://issues.apache.org/jira/browse/CASSANDRA-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14909: Resolution: Fixed Reviewer: Jason Brown Fix Version/s: 3.11.x Status: Resolved (was: Patch Available) +1 committed as sha {{e4d0ce6ba2d6088c7edf8475f02462e1606f606d}}. Thanks! > Netty IOExceptions caused by unclean client disconnects being logged at INFO > instead of TRACE > - > > Key: CASSANDRA-14909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14909 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Minor > Fix For: 4.0, 3.0.x, 3.11.x > > > Observed spam logs on 3.0.17 cluster with redundant Netty IOExceptions caused > due to client-side disconnections. > {code:java} > INFO [epollEventLoopGroup-2-28] 2018-11-20 23:23:04,386 Message.java:619 - > Unexpected exception during request; channel = [id: 0x12995bc1, > L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xxx.xxx:33754] > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {code} > {code:java} > INFO [epollEventLoopGroup-2-23] 2018-11-20 13:16:33,263 Message.java:619 - > Unexpected exception during request; channel = [id: 0x98bd7c0e, > L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xx.xx:33350] > io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: > Connection timed out > at io.netty.channel.unix.Errors.newIOException(Errors.java:117) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at io.netty.channel.unix.Errors.ioResult(Errors.java:138) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.unix.FileDescriptor.readAddress(FileDescriptor.java:175) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.epoll.AbstractEpollChannel.doReadBytes(AbstractEpollChannel.java:238) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:926) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:397) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:302) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > {code} > [CASSANDRA-7849|https://issues.apache.org/jira/browse/CASSANDRA-7849] > addresses this for JAVA IO Exception like "java.io.IOException: Connection > reset by peer", but not for Netty IOException since the exception message in > Netty includes method name. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14909) Netty IOExceptions caused by unclean client disconnects being logged at INFO instead of TRACE
[ https://issues.apache.org/jira/browse/CASSANDRA-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703276#comment-16703276 ] Jason Brown commented on CASSANDRA-14909: - [~sumanth.pasupuleti] added wrt use of the Java stream API on the PR > Netty IOExceptions caused by unclean client disconnects being logged at INFO > instead of TRACE > - > > Key: CASSANDRA-14909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14909 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Minor > Fix For: 4.0, 3.0.x > > > Observed spam logs on 3.0.17 cluster with redundant Netty IOExceptions caused > due to client-side disconnections. > {code:java} > INFO [epollEventLoopGroup-2-28] 2018-11-20 23:23:04,386 Message.java:619 - > Unexpected exception during request; channel = [id: 0x12995bc1, > L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xxx.xxx:33754] > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {code} > {code:java} > INFO [epollEventLoopGroup-2-23] 2018-11-20 13:16:33,263 Message.java:619 - > Unexpected exception during request; channel = [id: 0x98bd7c0e, > L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xx.xx:33350] > io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: > Connection timed out > at io.netty.channel.unix.Errors.newIOException(Errors.java:117) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at io.netty.channel.unix.Errors.ioResult(Errors.java:138) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.unix.FileDescriptor.readAddress(FileDescriptor.java:175) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.epoll.AbstractEpollChannel.doReadBytes(AbstractEpollChannel.java:238) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:926) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:397) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:302) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > {code} > [CASSANDRA-7849|https://issues.apache.org/jira/browse/CASSANDRA-7849] > addresses this for JAVA IO Exception like "java.io.IOException: Connection > reset by peer", but not for Netty IOException since the exception message in > Netty includes method name. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14897) In mixed 3.x/4 version clusters write tracing and repair history information without new columns
[ https://issues.apache.org/jira/browse/CASSANDRA-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14897: Reviewer: Jason Brown > In mixed 3.x/4 version clusters write tracing and repair history information > without new columns > > > Key: CASSANDRA-14897 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14897 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Labels: 4.0-pre-rc-bugs > Fix For: 4.0 > > Attachments: 14897.diff > > > In CASSANDRA-14841 I stopped it from writing to those tables so it wouldn't > generate any errors. Aleksey pointed out I could write just the old columns. > If a user manually adds the new columns to the old version nodes before > upgrade they will be able to query this information across the cluster. This > is a better situation then making it completely impossible for people to run > repairs or perform tracing in mixed version clusters. > This would avoid breaking repair and tracing in mixed version clusters. > I also want to properly document how to do this and maybe even provide a > script people can run to add the columns to old nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14897) In mixed 3.x/4 version clusters write tracing and repair history information without new columns
[ https://issues.apache.org/jira/browse/CASSANDRA-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703219#comment-16703219 ] Jason Brown commented on CASSANDRA-14897: - +1 lgtm > In mixed 3.x/4 version clusters write tracing and repair history information > without new columns > > > Key: CASSANDRA-14897 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14897 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Labels: 4.0-pre-rc-bugs > Fix For: 4.0 > > Attachments: 14897.diff > > > In CASSANDRA-14841 I stopped it from writing to those tables so it wouldn't > generate any errors. Aleksey pointed out I could write just the old columns. > If a user manually adds the new columns to the old version nodes before > upgrade they will be able to query this information across the cluster. This > is a better situation then making it completely impossible for people to run > repairs or perform tracing in mixed version clusters. > This would avoid breaking repair and tracing in mixed version clusters. > I also want to properly document how to do this and maybe even provide a > script people can run to add the columns to old nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14897) In mixed 3.x/4 version clusters write tracing and repair history information without new columns
[ https://issues.apache.org/jira/browse/CASSANDRA-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14897: Status: Ready to Commit (was: Patch Available) > In mixed 3.x/4 version clusters write tracing and repair history information > without new columns > > > Key: CASSANDRA-14897 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14897 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Labels: 4.0-pre-rc-bugs > Fix For: 4.0 > > Attachments: 14897.diff > > > In CASSANDRA-14841 I stopped it from writing to those tables so it wouldn't > generate any errors. Aleksey pointed out I could write just the old columns. > If a user manually adds the new columns to the old version nodes before > upgrade they will be able to query this information across the cluster. This > is a better situation then making it completely impossible for people to run > repairs or perform tracing in mixed version clusters. > This would avoid breaking repair and tracing in mixed version clusters. > I also want to properly document how to do this and maybe even provide a > script people can run to add the columns to old nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702503#comment-16702503 ] Jason Brown commented on CASSANDRA-14896: - The only utest that failed was {{DistributedReadWritePathTest.writeWithSchemaDisagreement}}, which failed with "Forked Java VM exited abnormally". I ran locally and all was fine, so chalking it up to a testing fluke. Will commit shortly. > 3.0 schema migration pulls from later version incompatible nodes > > > Key: CASSANDRA-14896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14896 > Project: Cassandra > Issue Type: Bug > Components: Core, CQL >Reporter: Ariel Weisberg >Assignee: Jason Brown >Priority: Blocker > Labels: 4.0-pre-rc-bugs > Fix For: 3.0.x > > > I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly > different and 3.0 in some scenarios it is pulling schema from a later > version. This causes upgrade tests to have errors in the logs due to > additional columns from configurable storage port. > {noformat} > Failed: Error details: > Errors seen in logs for: node2 > node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 > CassandraDaemon.java:207 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.RuntimeException: Unknown column additional_write_policy during > deserialization > at > org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.17.jar:3.0.17] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14896: Resolution: Fixed Fix Version/s: (was: 3.0.x) 4.0 Status: Resolved (was: Ready to Commit) Committed as sha \{{c5dee08dfb791ba28fecc8ca8b25a4a4d7e9cb07}} > 3.0 schema migration pulls from later version incompatible nodes > > > Key: CASSANDRA-14896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14896 > Project: Cassandra > Issue Type: Bug > Components: Core, CQL >Reporter: Ariel Weisberg >Assignee: Jason Brown >Priority: Blocker > Labels: 4.0-pre-rc-bugs > Fix For: 4.0 > > > I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly > different and 3.0 in some scenarios it is pulling schema from a later > version. This causes upgrade tests to have errors in the logs due to > additional columns from configurable storage port. > {noformat} > Failed: Error details: > Errors seen in logs for: node2 > node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 > CassandraDaemon.java:207 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.RuntimeException: Unknown column additional_write_policy during > deserialization > at > org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.17.jar:3.0.17] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702486#comment-16702486 ] Jason Brown commented on CASSANDRA-14896: - [~aweisberg] is correct. On the third (and last) message of the internode messaging handshake, the node is [incorrectly sending back|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/OutboundHandshakeHandler.java#L180] the messaging version is received from the peer; it should be sending back it's own {{MessagingService.current_version}}. Here's a one-line fix for sending the correct messaging version in {{ThirdHandshakeMessage}} as well as fixing the unit test that ensures the version being sent from {{OutboundHandshakeHandler}} ||14896|| |[branch|https://github.com/jasobrown/cassandra/tree/14896]| |[utests & dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14896]| > 3.0 schema migration pulls from later version incompatible nodes > > > Key: CASSANDRA-14896 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14896 > Project: Cassandra > Issue Type: Bug > Components: Core, CQL >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Blocker > Labels: 4.0-pre-rc-bugs > Fix For: 3.0.x > > > I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly > different and 3.0 in some scenarios it is pulling schema from a later > version. This causes upgrade tests to have errors in the logs due to > additional columns from configurable storage port. > {noformat} > Failed: Error details: > Errors seen in logs for: node2 > node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 > CassandraDaemon.java:207 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.RuntimeException: Unknown column additional_write_policy during > deserialization > at > org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.17.jar:3.0.17] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.17.jar:3.0.17] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14485) Optimize internode messaging protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-14485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662353#comment-16662353 ] Jason Brown commented on CASSANDRA-14485: - bq. make it easier to defer deserialization until the entire contents are in memory Correct, as we never want to block (for deserialization) on the netty event loop > Optimize internode messaging protocol > - > > Key: CASSANDRA-14485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14485 > Project: Cassandra > Issue Type: Improvement > Components: Streaming and Messaging >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > > There's some dead wood and places for optimization in the internode messaging > protocol. Currently, we include the sender's \{{IPAddressAndPort}} in *every* > internode message, even though we already sent that in the handshake that > established the connection/session. Further, there are several places where > we can use vints instead of a fixed, 4-byte integer value- especially as > those values will almost always be less than one byte. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662265#comment-16662265 ] Jason Brown edited comment on CASSANDRA-14503 at 10/24/18 1:09 PM: --- Based on testing conducted with [~jolynch] and [~vinaykumarcse], here's an updated branch with performance fixes and code improvements: ||v2|| |[branch|https://github.com/jasobrown/cassandra/tree/14503-v2]| |[utests dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503-v2]| |[pull request|https://github.com/apache/cassandra/pull/289]| The major change in this branch is I experimented with aggregating the messages to send into a single ByteBuf, instead of sending the messages individually to the netty pipeline. Since we can send up to (the current hard-coded size of) 64 messages per each iteration of OMC.dequeueMessages(), that's 64 times to invoke the pipeline mechanics, 64 ByteBuf allocations (and releases), and 64 times to fulfill the promise corresponding to each message. If, instead, we send one ByteBuf (with data serialized into it) then it's just one message into the pipeline, one allocation, and one promise fulfillment. The primary trade-off is that the single buffer will be, of course, large; perhaps large enough to not be efficient with the netty allocator. To that end, wrote a JMH benchmark, and the results are compelling: TL;DR a single buffer is significantly faster than multiple smaller buffers. The closest case is a single buffer is twice as fast, with the typical percentile difference being about 10-20 times faster for the single buffer (1.5 micros vs. 23 micros). To make this work, I need the allocation and serialization code to be moved outside of the pipeline handler (as it now needs to be invoked from OMC). I had already done this work with CASSANDRA-13630. Thus, I pulled that patch into this branch. That patch also greatly reduced the need for the ChannelWriter abstraction, and combined with the outstanding work in this branch, I am able to eliminate ChannelWriter and the confusion it added. However, I still need to handle large messages separately (as we don't want to use our blocking serializers on the event loop), so I've preserved the "move large message serialization on a separate thread" behavior from CASSANDRA-13630 by creating a new abstraction in OMC by adding (the not cleverly named) MessageDequeuer interface, with implementations for large messages and "small messages" (basically the current behavior of this patch that we've been riffing on). One feature that we've been debating again is the whether to include the message coalescing feature. The current branch does not include it - mostly due to the fact that we've been iterating quite quickly over this code, and I broke it when incorporating the CASSANDRA-13630 patch (and killing off ChannelWriter). There is some testing happening to reevaluate the efficacy of message coalescing with the netty internode messaging. Some other points of interest: - switch OMC#backlog from ConcurrentLinkedQueue to MpscLinkedQueue from jctools. MpscLinkedQueue is dramtically better, and ConcurrentLinkedQueue#isEmpty was a CPU drain. - improved scheduling of the consumerTask in OutboundMessagingConnection, though still needs a bit more refinement - ditched the OMC.State from the last branch - added [~jolynch]'s fixes wrt not setting a default SO_SNDBUF value - OMC - introduced consumerTaskThread vs eventLoop member field - ditched the auto-read in RebufferingByteBufDataInputPlus - I need to document this In general I have a small bit of documenting to add, but the branch is ready for review. was (Author: jasobrown): Based on testing conducted with [~jolynch] and [~vinaykumarcse], here's an updated branch with performance fixes and code improvements: ||v2|| |[branch|https://github.com/jasobrown/cassandra/tree/14503-v2]| |[utests dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503-v2]| ||PR|https://github.com/apache/cassandra/pull/289| The major change in this branch is I experimented with aggregating the messages to send into a single ByteBuf, instead of sending the messages individually to the netty pipeline. Since we can send up to (the current hard-coded size of) 64 messages per each iteration of OMC.dequeueMessages(), that's 64 times to invoke the pipeline mechanics, 64 ByteBuf allocations (and releases), and 64 times to fulfill the promise corresponding to each message. If, instead, we send one ByteBuf (with data serialized into it) then it's just one message into the pipeline, one allocation, and one promise fulfillment. The primary trade-off is that the single buffer will be, of course, large; perhaps large enough to not be efficient with the netty allocator. To that end, wrote a JMH benchmark, and the results are compelling: TL;DR a single buffer
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662265#comment-16662265 ] Jason Brown commented on CASSANDRA-14503: - Based on testing conducted with [~jolynch] and [~vinaykumarcse], here's an updated branch with performance fixes and code improvements: ||v2|| |[branch|https://github.com/jasobrown/cassandra/tree/14503-v2]| |[utests dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503-v2]| ||PR|https://github.com/apache/cassandra/pull/289| The major change in this branch is I experimented with aggregating the messages to send into a single ByteBuf, instead of sending the messages individually to the netty pipeline. Since we can send up to (the current hard-coded size of) 64 messages per each iteration of OMC.dequeueMessages(), that's 64 times to invoke the pipeline mechanics, 64 ByteBuf allocations (and releases), and 64 times to fulfill the promise corresponding to each message. If, instead, we send one ByteBuf (with data serialized into it) then it's just one message into the pipeline, one allocation, and one promise fulfillment. The primary trade-off is that the single buffer will be, of course, large; perhaps large enough to not be efficient with the netty allocator. To that end, wrote a JMH benchmark, and the results are compelling: TL;DR a single buffer is significantly faster than multiple smaller buffers. The closest case is a single buffer is twice as fast, with the typical percentile difference being about 10-20 times faster for the single buffer (1.5 micros vs. 23 micros). To make this work, I need the allocation and serialization code to be moved outside of the pipeline handler (as it now needs to be invoked from OMC). I had already done this work with CASSANDRA-13630. Thus, I pulled that patch into this branch. That patch also greatly reduced the need for the ChannelWriter abstraction, and combined with the outstanding work in this branch, I am able to eliminate ChannelWriter and the confusion it added. However, I still need to handle large messages separately (as we don't want to use our blocking serializers on the event loop), so I've preserved the "move large message serialization on a separate thread" behavior from CASSANDRA-13630 by creating a new abstraction in OMC by adding (the not cleverly named) MessageDequeuer interface, with implementations for large messages and "small messages" (basically the current behavior of this patch that we've been riffing on). One feature that we've been debating again is the whether to include the message coalescing feature. The current branch does not include it - mostly due to the fact that we've been iterating quite quickly over this code, and I broke it when incorporating the CASSANDRA-13630 patch (and killing off ChannelWriter). There is some testing happening to reevaluate the efficacy of message coalescing with the netty internode messaging. Some other points of interest: - switch OMC#backlog from ConcurrentLinkedQueue to MpscLinkedQueue from jctools. MpscLinkedQueue is dramtically better, and ConcurrentLinkedQueue#isEmpty was a CPU drain. - improved scheduling of the consumerTask in OutboundMessagingConnection, though still needs a bit more refinement - ditched the OMC.State from the last branch - added [~jolynch]'s fixes wrt not setting a default SO_SNDBUF value - OMC - introduced consumerTaskThread vs eventLoop member field - ditched the auto-read in RebufferingByteBufDataInputPlus - I need to document this In general I have a small bit of documenting to add, but the branch is ready for review. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might
[jira] [Assigned] (CASSANDRA-12823) dtest failure in topology_test.TestTopology.crash_during_decommission_test
[ https://issues.apache.org/jira/browse/CASSANDRA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-12823: --- Assignee: (was: Jason Brown) > dtest failure in topology_test.TestTopology.crash_during_decommission_test > -- > > Key: CASSANDRA-12823 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12823 > Project: Cassandra > Issue Type: Bug >Reporter: Sean McCarthy >Priority: Major > Labels: dtest, test-failure > Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, > node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log > > > example failure: > http://cassci.datastax.com/job/trunk_novnode_dtest/489/testReport/topology_test/TestTopology/crash_during_decommission_test > {code} > Stacktrace > File "/usr/lib/python2.7/unittest/case.py", line 358, in run > self.tearDown() > File "/home/automaton/cassandra-dtest/dtest.py", line 581, in tearDown > raise AssertionError('Unexpected error in log, see stdout') > "Unexpected error in log, see stdout > {code}{code} > Standard Output > Unexpected error in node2 log, error: > ERROR [GossipStage:1] 2016-10-19 15:44:14,820 CassandraDaemon.java:229 - > Exception in thread Thread[GossipStage:1,5,main] > java.lang.NullPointerException: null > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) > ~[na:1.8.0_45] > at org.apache.cassandra.hints.HintsCatalog.get(HintsCatalog.java:89) > ~[main/:na] > at > org.apache.cassandra.hints.HintsService.excise(HintsService.java:313) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.excise(StorageService.java:2458) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.excise(StorageService.java:2471) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2375) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1905) > ~[main/:na] > at > org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1222) > ~[main/:na] > at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1205) > ~[main/:na] > at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1168) > ~[main/:na] > at > org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58) > ~[main/:na] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_45] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13517) dtest failure in paxos_tests.TestPaxos.contention_test_many_threads
[ https://issues.apache.org/jira/browse/CASSANDRA-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown resolved CASSANDRA-13517. - Resolution: Cannot Reproduce Closing for now as it doesn't seem to be a problem of late. > dtest failure in paxos_tests.TestPaxos.contention_test_many_threads > --- > > Key: CASSANDRA-13517 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13517 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Ariel Weisberg >Assignee: Jason Brown >Priority: Major > Labels: dtest, test-failure, test-failure-fresh > Attachments: test_failure.txt > > > See attachment for details -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-11809) IV misuse in commit log encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-11809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-11809: --- Assignee: (was: Jason Brown) > IV misuse in commit log encryption > -- > > Key: CASSANDRA-11809 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11809 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Priority: Major > Fix For: 3.11.x > > > Commit log segments share iv values between encrypted chunks. The cipher > should be reinitialized with a new iv for each discrete piece of data it > encrypts, otherwise it gives attackers something to compare between chunks of > data. Also, some cipher configurations don't support initialization vectors > ('AES/ECB/NoPadding'), so some logic should be added to determine if the > cipher should be initialized with an iv. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13856) Optimize ByteBuf reallocations in the native protocol pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-13856: --- Assignee: (was: Jason Brown) > Optimize ByteBuf reallocations in the native protocol pipeline > -- > > Key: CASSANDRA-13856 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13856 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jason Brown >Priority: Minor > > This is a follow up to CASSANDRA-13789. I discovered we reallocate the > {{ByteBuf}} when writing data to it, and it would be nice to size the buffer > correctly up-front to avoid reallocating it. I'm not sure how easy that is, > nor if the cost of the realloc is cheaper than calculating the size needed > for the buffer. Adding this ticket, nonetheless, to explore that optimization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-11810) IV misuse in hints encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-11810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-11810: --- Assignee: (was: Jason Brown) > IV misuse in hints encryption > - > > Key: CASSANDRA-11810 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11810 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Priority: Major > Fix For: 3.11.x > > > Encrypted hint files share iv values between encrypted chunks. The cipher > should be reinitialized with a new iv for each discrete piece of data it > encrypts, otherwise it gives attackers something to compare between chunks of > data. Also, some cipher configurations don't support initialization vectors > ('AES/ECB/NoPadding'), so some logic should be added to determine if the > cipher should be initialized with an iv. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-7922) Add file-level encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-7922: -- Assignee: (was: Jason Brown) > Add file-level encryption > - > > Key: CASSANDRA-7922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7922 > Project: Cassandra > Issue Type: New Feature >Reporter: Jason Brown >Priority: Major > Labels: encryption, security > Fix For: 4.x > > > Umbrella ticket for file-level encryption > Some use cases require encrypting files at rest for certain compliance needs: > the healthcare industry (HIPAA regulations), the card payment industry (PCI > DSS regulations) or the US government (FISMA regulations). File system > encryption can be used in some situations, but does not solve all problems. > I can foresee the following components needing at-rest encryption: > - sstables (data, index, and summary files) (CASSANDRA-9633) > - commit log (CASSANDRA-6018) > - hints (CASSANDRA-11040) > - some systems tables (batches, not sure if any others) > - index/row cache > - secondary indexes > The work for those items would be separate tickets, of course. I have a > working version of most of the above components working in 2.0, which I need > to ship in production now, but it's too late for the 2.0 branch and unclear > for 2.1. > Other products, such as Oracle/SqlServer/Datastax Enterprise commonly refer > to at-rest encryption as Transparent Data Encryption (TDE), and I'm happy to > stick with that convention, here, as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-9633) Add ability to encrypt sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-9633: -- Assignee: (was: Jason Brown) > Add ability to encrypt sstables > --- > > Key: CASSANDRA-9633 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9633 > Project: Cassandra > Issue Type: New Feature >Reporter: Jason Brown >Priority: Major > Labels: encryption, security, sstable > Fix For: 4.x > > > Add option to allow encrypting of sstables. > I have a version of this functionality built on cassandra 2.0 that > piggy-backs on the existing sstable compression functionality and ICompressor > interface (similar in nature to what DataStax Enterprise does). However, if > we're adding the feature to the main OSS product, I'm not sure if we want to > use the pluggable compression framework or if it's worth investigating a > different path. I think there's a lot of upside in reusing the sstable > compression scheme, but perhaps add a new component in cqlsh for table > encryption and a corresponding field in CFMD. > Encryption configuration in the yaml can use the same mechanism as > CASSANDRA-6018 (which is currently pending internal review). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638181#comment-16638181 ] Jason Brown edited comment on CASSANDRA-14747 at 10/4/18 12:45 PM: --- Excellent find, [~jolynch]. Looks like we added the ability to set the send/recv buffer size in CASSANDRA-3378 (which apparently I reviewed, 5.5 years ago). Looks like in 3.11 we [set the SO_SNDBUF|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L444] if the operator provided a value in the yaml, but we did not set a default value. However, it does appear I added a hard-coded default in 4.0 with CASSANDRA-8457. As it's been nearly two years since I wrote that part of the patch, I have no recollection of why I added a default. Removing it is trivial and has huge benefits, as [~jolynch] has proven. I'm working on combining the findings [~jolynch] and I have discovered over the last weeks and should have a patch ready in a few days (which will probably be part CASSANDRA-14503, as most of this work was based on that work-in-progress). was (Author: jasobrown): Excellent find, [~jolynch]. Looks like we added the ability to set the send/recv buffer size in CASSANDRA-3378 (which apparently I reviewed, 5.5 years ago). Looks like in 3.11 we [set the SO_SNDBUF|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L444] if the operator provided a value in the yaml, but we did not set a default value. However, it does appear I added a hard-coded default in 4.0 with CASSANDRA-8457. As it's been nearly two years since I wrote that part of the patch, I have no recollection of why I added a default. Removing it is trivial and has huge benefits, as has proven. I'm working on combining the findings [~jolynch] and I have discovered over the last weeks and should have a patch ready in a few days (which will probably be part CASSANDRA-14503, as most of this work was based on that work-in-progress). > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638181#comment-16638181 ] Jason Brown commented on CASSANDRA-14747: - Excellent find, [~jolynch]. Looks like we added the ability to set the send/recv buffer size in CASSANDRA-3378 (which apparently I reviewed, 5.5 years ago). Looks like in 3.11 we [set the SO_SNDBUF|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L444] if the operator provided a value in the yaml, but we did not set a default value. However, it does appear I added a hard-coded default in 4.0 with CASSANDRA-8457. As it's been nearly two years since I wrote that part of the patch, I have no recollection of why I added a default. Removing it is trivial and has huge benefits, as has proven. I'm working on combining the findings [~jolynch] and I have discovered over the last weeks and should have a patch ready in a few days (which will probably be part CASSANDRA-14503, as most of this work was based on that work-in-progress). > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-12297) Privacy Violation - Heap Inspection
[ https://issues.apache.org/jira/browse/CASSANDRA-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-12297: --- Assignee: (was: Jason Brown) > Privacy Violation - Heap Inspection > --- > > Key: CASSANDRA-12297 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12297 > Project: Cassandra > Issue Type: Sub-task >Reporter: Eduardo Aguinaga >Priority: Major > > Overview: > In May through June of 2016 a static analysis was performed on version 3.0.5 > of the Cassandra source code. The analysis included > an automated analysis using HP Fortify v4.21 SCA and a manual analysis > utilizing SciTools Understand v4. The results of that > analysis includes the issue below. > Issue: > In the file PasswordAuthenticator.java on line 129, 164 and 222 a string > object is used to store sensitive data. String objects are immutable and > should not be used to store sensitive data. Sensitive data should be stored > in char or byte arrays and the contents of those arrays should be cleared > ASAP. Operations performed on string objects will require that the original > object be copied and the operation be applied in the new copy of the string > object. This results in the likelihood that multiple copies of sensitive data > being present in the heap until garbage collection takes place. > The snippet below shows the issue on line 129: > PasswordAuthenticator.java, lines 123-134: > {code:java} > 123 public AuthenticatedUser legacyAuthenticate(Map > credentials) throws AuthenticationException > 124 { > 125 String username = credentials.get(USERNAME_KEY); > 126 if (username == null) > 127 throw new AuthenticationException(String.format("Required key > '%s' is missing", USERNAME_KEY)); > 128 > 129 String password = credentials.get(PASSWORD_KEY); > 130 if (password == null) > 131 throw new AuthenticationException(String.format("Required key > '%s' is missing", PASSWORD_KEY)); > 132 > 133 return authenticate(username, password); > 134 } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-8060) Geography-aware, distributed replication
[ https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-8060: -- Assignee: (was: Jason Brown) > Geography-aware, distributed replication > > > Key: CASSANDRA-8060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8060 > Project: Cassandra > Issue Type: Wish >Reporter: Donald Smith >Priority: Major > > We have three data centers in the US (CA in California, TX in Texas, and NJ > in NJ), two in Europe (UK and DE), and two in Asia (JP and CH1). We do all > our writing to CA. That represents a bottleneck, since the coordinator nodes > in CA are responsible for all the replication to every data center. > Far better if we had the option of setting things up so that CA replicated to > TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible > for replicating to UK, which should replicate to DE. Etc, etc. > This could be controlled by the topology file. > The replication could be organized in a tree-like structure instead of a > daisy-chain. > It would require architectural changes and would have major ramifications for > latency but might be appropriate for some scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-12298) Privacy Violation - Heap Inspection
[ https://issues.apache.org/jira/browse/CASSANDRA-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-12298: --- Assignee: (was: Jason Brown) > Privacy Violation - Heap Inspection > --- > > Key: CASSANDRA-12298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12298 > Project: Cassandra > Issue Type: Sub-task >Reporter: Eduardo Aguinaga >Priority: Major > > Overview: > In May through June of 2016 a static analysis was performed on version 3.0.5 > of the Cassandra source code. The analysis included > an automated analysis using HP Fortify v4.21 SCA and a manual analysis > utilizing SciTools Understand v4. The results of that > analysis includes the issue below. > Issue: > In the file RoleOptions.java on line 89 a string object is used to store > sensitive data. String objects are immutable and should not be used to store > sensitive data. Sensitive data should be stored in char or byte arrays and > the contents of those arrays should be cleared ASAP. Operations performed on > string objects will require that the original object be copied and the > operation be applied in the new copy of the string object. This results in > the likelihood that multiple copies of sensitive data will be present in the > heap until garbage collection takes place. > The snippet below shows the issue on line 89: > RoleOptions.java, lines 87-90: > {code:java} > 87 public Optional getPassword() > 88 { > 89 return > Optional.fromNullable((String)options.get(IRoleManager.Option.PASSWORD)); > 90 } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634301#comment-16634301 ] Jason Brown commented on CASSANDRA-14747: - [~jolynch] Nice work. I agree the time bounding of dequeueMessages is somewhat questionable - I added it in when we were making a bunch of other changes for dealing with CPU/task starvation. In your gist, I think we can run into some serious overscheduling (re-enqueueing of the consumer task) when the channel is unwritable. In that case, it will break out of dequeueMessages's while loop immediately, but then immediately reschedule (assuming backlog > 0). We'll keep doing this, very aggressively, until the channel becomes writable again - yet we cannot make any meaningful progress. To counteract this, that's why I had dequeueMessages not reschedule, but instead had handleMessageResult reschedule because at that point (remember, we only attach the listener to that last message of the bunch) we know the bytes have been written to the socket and that channel should be writable again. In this case we only schedule (or directly execute) dequeueMessages when we need to. (Note: this was probably not apparent from the current code's comments, so I should definitely improve that.) > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3
[ https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622024#comment-16622024 ] Jason Brown commented on CASSANDRA-14760: - Cool, I'll go ahead and close. > CVE-2018-10237 Security vulnerability in 3.11.3 > --- > > Key: CASSANDRA-14760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14760 > Project: Cassandra > Issue Type: Bug >Reporter: John F. Gbruoski >Priority: Major > > As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a > security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be > patched to support Guava 24.1.1 or later? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3
[ https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown resolved CASSANDRA-14760. - Resolution: Not A Problem > CVE-2018-10237 Security vulnerability in 3.11.3 > --- > > Key: CASSANDRA-14760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14760 > Project: Cassandra > Issue Type: Bug >Reporter: John F. Gbruoski >Priority: Major > > As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a > security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be > patched to support Guava 24.1.1 or later? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3
[ https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-14760: --- Assignee: Jason Brown > CVE-2018-10237 Security vulnerability in 3.11.3 > --- > > Key: CASSANDRA-14760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14760 > Project: Cassandra > Issue Type: Bug >Reporter: John F. Gbruoski >Assignee: Jason Brown >Priority: Major > > As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a > security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be > patched to support Guava 24.1.1 or later? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14767) Embedded cassandra not working after jdk10 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622018#comment-16622018 ] Jason Brown commented on CASSANDRA-14767: - Cassandra only supports java 8. Java 11 support has been added with CASSANDRA-9608, but that is a feature only on trunk (soon to be cassandra 4.0). > Embedded cassandra not working after jdk10 upgrade > -- > > Key: CASSANDRA-14767 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14767 > Project: Cassandra > Issue Type: Bug >Reporter: parthiban >Priority: Blocker > > Embedded cassandra not working after jdk10 upgrade. Could some one help me on > this. > Cassandra config: > {{try \{ EmbeddedCassandraServerHelper.startEmbeddedCassandra(); }catch > (Exception e) \{ LOGGER.error(" CommonConfig ", " cluster()::Exception while > creating cluster ", e); System.setProperty("cassandra.config", > "cassandra.yaml"); DatabaseDescriptor.daemonInitialization(); > EmbeddedCassandraServerHelper.startEmbeddedCassandra(); } Cluster cluster = > Cluster.builder() > .addContactPoints(environment.getProperty(TextToClipConstants.CASSANDRA_CONTACT_POINTS)).withPort(Integer.parseInt(environment.getProperty(TextToClipConstants.CASSANDRA_PORT))).build(); > Session session = cluster.connect(); > session.execute(KEYSPACE_CREATION_QUERY); > session.execute(KEYSPACE_ACTIVATE_QUERY); }} > > {{build.gradle}} > {{buildscript \{ ext { springBootVersion = '2.0.1.RELEASE' } repositories \{ > mavenCentral() mavenLocal() } dependencies \{ > classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}") > classpath ("com.bmuschko:gradle-docker-plugin:3.2.1") classpath > ("org.sonarsource.scanner.gradle:sonarqube-gradle-plugin:2.5") > classpath("au.com.dius:pact-jvm-provider-gradle_2.12:3.5.13") classpath > ("com.moowork.gradle:gradle-node-plugin:1.2.0") } } plugins \{ //id > "au.com.dius.pact" version "3.5.7" id "com.gorylenko.gradle-git-properties" > version "1.4.17" id "de.undercouch.download" version "3.4.2" } apply plugin: > 'java' apply plugin: 'eclipse' apply plugin: 'org.springframework.boot' apply > plugin: 'io.spring.dependency-management' apply plugin: > 'com.bmuschko.docker-remote-api' apply plugin: 'jacoco' apply plugin: > 'maven-publish' apply plugin: 'org.sonarqube' apply plugin: > 'au.com.dius.pact' apply plugin: 'scala' sourceCompatibility = 1.8 > repositories \{ mavenCentral() maven { url "https://repo.spring.io/milestone; > } mavenLocal() } ext \{ springCloudVersion = 'Finchley.RELEASE' } pact \{ > serviceProviders { rxorder { publish { pactDirectory = > '/Users/sv/Documents/wag-doc-text2clip/target/pacts' // defaults to > $buildDir/pacts pactBrokerUrl = 'http://localhost:80' version=2.0 } } } } > //start of integration tests changes sourceSets \{ integrationTest { java { > compileClasspath += main.output + test.output runtimeClasspath += main.output > + test.output srcDir file('test/functional-api/java') } resources.srcDir > file('test/functional-api/resources') } } configurations \{ > integrationTestCompile.extendsFrom testCompile > integrationTestRuntime.extendsFrom testRuntime } //end of integration tests > changes dependencies \{ //web (Tomcat, Logging, Rest) compile group: > 'org.springframework.boot', name: 'spring-boot-starter-web' // Redis > //compile group: 'org.springframework.boot', name: > 'spring-boot-starter-data-redis' //Mongo Starter compile group: > 'org.springframework.boot', name:'spring-boot-starter-data-mongodb' // > Configuration processor - To Generate MetaData Files. The files are designed > to let developers offer “code completion� as users are working with > application.properties compile group: 'org.springframework.boot', name: > 'spring-boot-configuration-processor' // Actuator - Monitoring compile group: > 'org.springframework.boot', name: 'spring-boot-starter-actuator' //Sleuth - > Tracing compile group: 'org.springframework.cloud', name: > 'spring-cloud-starter-sleuth' //Hystrix - Circuit Breaker compile group: > 'org.springframework.cloud', name: 'spring-cloud-starter-netflix-hystrix' // > Hystrix - Dashboard compile group: 'org.springframework.cloud', name: > 'spring-cloud-starter-netflix-hystrix-dashboard' // Thymeleaf compile group: > 'org.springframework.boot', name: 'spring-boot-starter-thymeleaf' //Voltage > // Device Detection compile group: 'org.springframework.boot', name: > 'spring-boot-starter-data-cassandra', version:'2.0.4.RELEASE' compile group: > 'com.google.guava', name: 'guava', version: '23.2-jre' > compile('com.google.code.gson:gson:2.8.0') compile('org.json:json:20170516') > //Swagger compile group: 'io.springfox', name: 'springfox-swagger2', > version:'2.8.0' compile group: 'io.springfox', name:
[jira] [Commented] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3
[ https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619378#comment-16619378 ] Jason Brown commented on CASSANDRA-14760: - The CVE seems to apply to only to: - AtomicDoubleArray (when serialized with Java serialization) - CompoundOrdering (when serialized with GWT serialization) Cassandra uses neither of those classes, nor do we use Java nor GWT serialization. Thus, it's not clear this CVE is a problem for us. > CVE-2018-10237 Security vulnerability in 3.11.3 > --- > > Key: CASSANDRA-14760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14760 > Project: Cassandra > Issue Type: Bug >Reporter: John F. Gbruoski >Priority: Major > > As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a > security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be > patched to support Guava 24.1.1 or later? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619368#comment-16619368 ] Jason Brown commented on CASSANDRA-14685: - {quote}One weird behavior of streaming is that when the coordinator goes down, "nodetool netstats" still shows progress on the replicas until it reaches 100% and it stays like this. It even starts streaming new files although the target node is still down {quote} I discovered that, as well, when investigating this one. I have a working fix for it, as well as CASSANDRA-14520, and am working out the kinks. Hoping to get it out ASAP.. > Incremental repair 4.0 : SSTables remain locked forever if the coordinator > dies during streaming > - > > Key: CASSANDRA-14685 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14685 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Alexander Dejanovski >Assignee: Jason Brown >Priority: Critical > > The changes in CASSANDRA-9143 modified the way incremental repair performs by > applying the following sequence of events : > * Anticompaction is executed on all replicas for all SSTables overlapping > the repaired ranges > * Anticompacted SSTables are then marked as "Pending repair" and cannot be > compacted anymore, nor part of another repair session > * Merkle trees are generated and compared > * Streaming takes place if needed > * Anticompaction is committed and "pending repair" table are marked as > repaired if it succeeded, or they are released if the repair session failed. > If the repair coordinator dies during the streaming phase, *the SSTables on > the replicas will remain in "pending repair" state and will never be eligible > for repair or compaction*, even after all the nodes in the cluster are > restarted. > Steps to reproduce (I've used Jason's 13938 branch that fixes streaming > errors) : > {noformat} > ccm create inc-repair-issue -v github:jasobrown/13938 -n 3 > # Allow jmx access and remove all rpc_ settings in yaml > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh; > do > sed -i'' -e > 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g' > $f > done > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml; > do > grep -v "rpc_" $f > ${f}.tmp > cat ${f}.tmp > $f > done > ccm start > {noformat} > I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a > few 10s of MBs of data (killed it after some time). Obviously > cassandra-stress works as well : > {noformat} > bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000 > --replication "{'class':'SimpleStrategy', 'replication_factor':2}" > --compaction "{'class': 'SizeTieredCompactionStrategy'}" --host > 127.0.0.1 > {noformat} > Flush and delete all SSTables in node1 : > {noformat} > ccm node1 nodetool flush > ccm node1 stop > rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.* > ccm node1 start{noformat} > Then throttle streaming throughput to 1MB/s so we have time to take node1 > down during the streaming phase and run repair: > {noformat} > ccm node1 nodetool setstreamthroughput 1 > ccm node2 nodetool setstreamthroughput 1 > ccm node3 nodetool setstreamthroughput 1 > ccm node1 nodetool repair tlp_stress > {noformat} > Once streaming starts, shut down node1 and start it again : > {noformat} > ccm node1 stop > ccm node1 start > {noformat} > Run repair again : > {noformat} > ccm node1 nodetool repair tlp_stress > {noformat} > The command will return very quickly, showing that it skipped all sstables : > {noformat} > [2018-08-31 19:05:16,292] Repair completed successfully > [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds > $ ccm node1 nodetool status > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID >Rack > UN 127.0.0.1 228,64 KiB 256 ? > 437dc9cd-b1a1-41a5-961e-cfc99763e29f rack1 > UN 127.0.0.2 60,09 MiB 256 ? > fbcbbdbb-e32a-4716-8230-8ca59aa93e62 rack1 > UN 127.0.0.3 57,59 MiB 256 ? > a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0 rack1 > {noformat} > sstablemetadata will then show that nodes 2 and 3 have SSTables still in > "pending repair" state : > {noformat} > ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | > grep repair > SSTable: > /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big > Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62 > {noformat} > Restarting these nodes
[jira] [Commented] (CASSANDRA-14758) Remove "audit" entry from .gitignore
[ https://issues.apache.org/jira/browse/CASSANDRA-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619075#comment-16619075 ] Jason Brown commented on CASSANDRA-14758: - +1 > Remove "audit" entry from .gitignore > > > Key: CASSANDRA-14758 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14758 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Minor > > Seems there was a "audit" entry added to the .gitignore file in > CASSANDRA-9608, not sure why, but it makes it kind of hard to work with files > in the {{org.apache.cassandra.audit}} package -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14757) GCInspector "Error accessing field of java.nio.Bits" under java11
[ https://issues.apache.org/jira/browse/CASSANDRA-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14757: Description: Running under java11, {{GCInspector}} throws the following exception: {noformat} DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing field of java.nio.Bits java.lang.NoSuchFieldException: totalCapacity at java.base/java.lang.Class.getDeclaredField(Class.java:2412) at org.apache.cassandra.service.GCInspector.(GCInspector.java:72) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679) {noformat} This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} somewhere between java8 and java11. Note: this is a rather harmless error, as we only look at {{Bits.totalCapacity}} for metrics collection on how much direct memory is being used by {{ByteBuffer}}s. If we fail to read the field, we simply return -1 for the metric value. was: Running under java11, {{GCInspector}} throws the following exception: {noformat} DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing field of java.nio.Bits java.lang.NoSuchFieldException: totalCapacity at java.base/java.lang.Class.getDeclaredField(Class.java:2412) at org.apache.cassandra.service.GCInspector.(GCInspector.java:72) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679) {noformat} This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} somewhere between java8 and java11. > GCInspector "Error accessing field of java.nio.Bits" under java11 > - > > Key: CASSANDRA-14757 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14757 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Jason Brown >Priority: Trivial > Labels: Java11 > Fix For: 4.0 > > > Running under java11, {{GCInspector}} throws the following exception: > {noformat} > DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing > field of java.nio.Bits > java.lang.NoSuchFieldException: totalCapacity > at java.base/java.lang.Class.getDeclaredField(Class.java:2412) > at > org.apache.cassandra.service.GCInspector.(GCInspector.java:72) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679) > {noformat} > This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} > from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} > somewhere between java8 and java11. > Note: this is a rather harmless error, as we only look at > {{Bits.totalCapacity}} for metrics collection on how much direct memory is > being used by {{ByteBuffer}}s. If we fail to read the field, we simply return > -1 for the metric value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14757) GCInspector "Error accessing field of java.nio.Bits" under java11
Jason Brown created CASSANDRA-14757: --- Summary: GCInspector "Error accessing field of java.nio.Bits" under java11 Key: CASSANDRA-14757 URL: https://issues.apache.org/jira/browse/CASSANDRA-14757 Project: Cassandra Issue Type: Bug Components: Metrics Reporter: Jason Brown Fix For: 4.0 Running under java11, {{GCInspector}} throws the following exception: {noformat} DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing field of java.nio.Bits java.lang.NoSuchFieldException: totalCapacity at java.base/java.lang.Class.getDeclaredField(Class.java:2412) at org.apache.cassandra.service.GCInspector.(GCInspector.java:72) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679) {noformat} This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} somewhere between java8 and java11. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14754) Add verification of state machine in StreamSession
Jason Brown created CASSANDRA-14754: --- Summary: Add verification of state machine in StreamSession Key: CASSANDRA-14754 URL: https://issues.apache.org/jira/browse/CASSANDRA-14754 Project: Cassandra Issue Type: Task Components: Streaming and Messaging Reporter: Jason Brown Assignee: Jason Brown Fix For: 4.0 {{StreamSession}} contains an implicit state machine, but we have no verification of the safety of the transitions between states. For example, we have no checks to ensure we cannot leave the final states (COMPLETED, FAILED). I propose we add some program logic in {{StreamSession}}, tests, and documentation to ensure the correctness of the state transitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612937#comment-16612937 ] Jason Brown commented on CASSANDRA-14747: - [~jolynch] When you have a chance, please take the branch linked on CASSANDRA-14503 and give it a spin. That has the fix for queue bounds. > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610564#comment-16610564 ] Jason Brown commented on CASSANDRA-14685: - [~adejanovski] ughh, missed the update on this. Definitely looks like something isn't timing out properly in streaming. I'll start digging into the streaming part of this. [~bdeggleston], can you comment about this part: bq. replicas will remain in "pending repair" state and will never be eligible for repair or compaction, even after all the nodes in the cluster are restarted. > Incremental repair 4.0 : SSTables remain locked forever if the coordinator > dies during streaming > - > > Key: CASSANDRA-14685 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14685 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Alexander Dejanovski >Assignee: Jason Brown >Priority: Critical > > The changes in CASSANDRA-9143 modified the way incremental repair performs by > applying the following sequence of events : > * Anticompaction is executed on all replicas for all SSTables overlapping > the repaired ranges > * Anticompacted SSTables are then marked as "Pending repair" and cannot be > compacted anymore, nor part of another repair session > * Merkle trees are generated and compared > * Streaming takes place if needed > * Anticompaction is committed and "pending repair" table are marked as > repaired if it succeeded, or they are released if the repair session failed. > If the repair coordinator dies during the streaming phase, *the SSTables on > the replicas will remain in "pending repair" state and will never be eligible > for repair or compaction*, even after all the nodes in the cluster are > restarted. > Steps to reproduce (I've used Jason's 13938 branch that fixes streaming > errors) : > {noformat} > ccm create inc-repair-issue -v github:jasobrown/13938 -n 3 > # Allow jmx access and remove all rpc_ settings in yaml > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh; > do > sed -i'' -e > 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g' > $f > done > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml; > do > grep -v "rpc_" $f > ${f}.tmp > cat ${f}.tmp > $f > done > ccm start > {noformat} > I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a > few 10s of MBs of data (killed it after some time). Obviously > cassandra-stress works as well : > {noformat} > bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000 > --replication "{'class':'SimpleStrategy', 'replication_factor':2}" > --compaction "{'class': 'SizeTieredCompactionStrategy'}" --host > 127.0.0.1 > {noformat} > Flush and delete all SSTables in node1 : > {noformat} > ccm node1 nodetool flush > ccm node1 stop > rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.* > ccm node1 start{noformat} > Then throttle streaming throughput to 1MB/s so we have time to take node1 > down during the streaming phase and run repair: > {noformat} > ccm node1 nodetool setstreamthroughput 1 > ccm node2 nodetool setstreamthroughput 1 > ccm node3 nodetool setstreamthroughput 1 > ccm node1 nodetool repair tlp_stress > {noformat} > Once streaming starts, shut down node1 and start it again : > {noformat} > ccm node1 stop > ccm node1 start > {noformat} > Run repair again : > {noformat} > ccm node1 nodetool repair tlp_stress > {noformat} > The command will return very quickly, showing that it skipped all sstables : > {noformat} > [2018-08-31 19:05:16,292] Repair completed successfully > [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds > $ ccm node1 nodetool status > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID >Rack > UN 127.0.0.1 228,64 KiB 256 ? > 437dc9cd-b1a1-41a5-961e-cfc99763e29f rack1 > UN 127.0.0.2 60,09 MiB 256 ? > fbcbbdbb-e32a-4716-8230-8ca59aa93e62 rack1 > UN 127.0.0.3 57,59 MiB 256 ? > a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0 rack1 > {noformat} > sstablemetadata will then show that nodes 2 and 3 have SSTables still in > "pending repair" state : > {noformat} > ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | > grep repair > SSTable: > /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big > Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62 > {noformat} > Restarting these nodes wouldn't help either. -- This message was sent by
[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105 ] Jason Brown commented on CASSANDRA-13938: - [~dimitarndimitrov], Thanks for your comments, and apologies for the late response. While your proposed simplification indeed clarifies the logic, unfortunately it doesn't resolve the bug (my dtest still fails - this is due to the need to reset a 'some' value, like the currentBufferOffset, after rebufferring). However, your observation about simplifying this patch (in particular eliminate {{currentBufferOffset}} made me reconsider the needs of this class. Basically, we just need to correctly track the streamOffset for the current buffer, and that's all. When I ported this clas from 3.11, I over-complicated the offsets and counters into the first version of this class (committed with CASSANDRA-12229), and then confused it again (while resolving the error) with the first patch. In short: as long as I correctly calculate streamOffset, that should satisfy the needs for the class. Thus, I eliminated both {{current}} and {{currentBufferOffset}}, and the result is clearer and correct. I've pushed a cleaned up branch (which has been rebased to trunk). Please note that, as with the first patch, the majority of this patch is refactoring to clean up the class in general. I've also updated my dtest patch as my version required a stress profile (based on [~zznate]'s original) to be committed, as well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as before, I'm unable to get that to fail on trunk.) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Critical > Fix For: 4.x > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last
[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105 ] Jason Brown edited comment on CASSANDRA-13938 at 9/11/18 5:01 AM: -- [~dimitarndimitrov], Thanks for your comments, and apologies for the late response. While your proposed simplification indeed clarifies the logic, unfortunately it doesn't resolve the bug (my dtest still fails - this is due to the need to reset a 'some' value, like the currentBufferOffset, after rebufferring). However, your observation about simplifying this patch (in particular eliminate {{currentBufferOffset}} made me reconsider the needs of this class. Basically, we just need to correctly track the streamOffset for the current buffer, and that's all. When I ported this clas from 3.11, I over-complicated the offsets and counters into the first version of this class (committed with CASSANDRA-12229), and then confused it again (while resolving the error) with the first patch. In short: as long as I correctly calculate streamOffset, that should satisfy the needs for the class. Thus, I eliminated both {{current}} and {{currentBufferOffset}}, and the result is clearer and correct. I've pushed a cleaned up branch (which has been rebased to trunk). Please note that, as with the first patch, the majority of this patch is refactoring to clean up the class in general. I've also updated my dtest patch as my version required a stress profile (based on [~zznate]'s original) to be committed, as well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as before, I'm unable to get that to fail on trunk.) was (Author: jasobrown): [~dimitarndimitrov], Thanks for your comments, and apologies for the late response. While your proposed simplification indeed clarifies the logic, unfortunately it doesn't resolve the bug (my dtest still fails - this is due to the need to reset a 'some' value, like the currentBufferOffset, after rebufferring). However, your observation about simplifying this patch (in particular eliminate {{currentBufferOffset}} made me reconsider the needs of this class. Basically, we just need to correctly track the streamOffset for the current buffer, and that's all. When I ported this clas from 3.11, I over-complicated the offsets and counters into the first version of this class (committed with CASSANDRA-12229), and then confused it again (while resolving the error) with the first patch. In short: as long as I correctly calculate streamOffset, that should satisfy the needs for the class. Thus, I eliminated both {{current}} and {{currentBufferOffset}}, and the result is clearer and correct. I've pushed a cleaned up branch (which has been rebased to trunk). Please note that, as with the first patch, the majority of this patch is refactoring to clean up the class in general. I've also updated my dtest patch as my version required a stress profile (based on [~zznate]'s original) to be committed, as well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as before, I'm unable to get that to fail on trunk.) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Critical > Fix For: 4.x > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND >
[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609973#comment-16609973 ] Jason Brown commented on CASSANDRA-14346: - Somehow this got marked as Ready to Commit; switched back to Patch Available. > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14346: Status: Patch Available (was: Awaiting Feedback) > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14346: Status: Awaiting Feedback (was: In Progress) > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14346: Status: In Progress (was: Ready to Commit) > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
[ https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14714: Labels: Java11 (was: ) > `ant artifacts` broken on trunk (4.0); creates no tar artifacts > --- > > Key: CASSANDRA-14714 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Shuler >Priority: Blocker > Labels: Java11 > Fix For: 4.0 > > > `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. > Additionally, the target does not exit non-zero, so the result is: > {noformat} > <...> > artifacts: > BUILD SUCCESSFUL > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14712) Cassandra 4.0 packaging support
[ https://issues.apache.org/jira/browse/CASSANDRA-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14712: Labels: Java11 (was: ) > Cassandra 4.0 packaging support > --- > > Key: CASSANDRA-14712 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14712 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Stefan Podkowinski >Priority: Major > Labels: Java11 > Fix For: 4.x > > > Currently it's not possible to build any native packages (.deb/.rpm) for > trunk. > cassandra-builds - docker/*-image.docker > * Add Java11 to debian+centos build image > * (packaged ant scripts won't work with Java 11 on centos, so we may have to > install ant from tarballs) > cassandra-builds - docker/build-*.sh > * set JAVA8_HOME to Java8 > * set JAVA_HOME to Java11 (4.0) or Java8 (<4.0) > cassandra - redhat/cassandra.spec > * Check if patches still apply after CASSANDRA-14707 > * Add fqltool as %files > We may also have to change the version handling in build.xml or build-*.sh, > depending how we plan to release packages during beta, or if we plan to do so > at all before GA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14503: Fix Version/s: 4.0 Status: Patch Available (was: Open) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609099#comment-16609099 ] Jason Brown commented on CASSANDRA-14503: - Patch available here: ||14503|| |[branch|https://github.com/jasobrown/cassandra/tree/14503]| |[utests dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503]| || Additionally, I've [created a Pull Request|https://github.com/apache/cassandra/pull/264] for review, as well. Note: this patch will need to be rebased when CASSANDRA-13630 is committed, and incorprate the changes ChannelWriter for large messages, but that should not affect this patch much (I've been keeping that in mind as I worked on this) - OutboundMessagingConnection changes -- All producer threads queue messages into the backlog, and messages are only consumed by a task on a fixed thread (the event loop). Producers will contend to schedule the consumer, but have no further involvement in sending a message (unlike the current implementation). -- All netty-related activity (setting up a remote connection, connection-related callbacks and time outs, consuming form the backlog and writing to the channel and associated callbacks) are all handled on the event loop. OutboundMessagingConnection gets a reference to a event loop in it's constructor, and uses that for the duration of it's lifetime. -- Finally forward-ported the queue bounding functionality of CASSANDRA-13265. In short, we want to limit the size of queued messages in order to not OOM. Thus, we schedule a task for the consumer thread that examines the queue looking for elements to prune. Further, I've added a naive upper bound to the queue so that producers drop messages before enqueuing if the backlog is in a *really* bad state. @djoshi3 has recomended bounding by message size rather than by message count, which I agree with, but propose saving that for a followup ticket. -- Cleaner, more documented, and better tested State machine to manage state transitions for the class. - ChannelWriter and MessageOutHandler became much simpler as we can control the flush behaviors from the OMC (instead of the previous complicated CW/MOH dance) because we're already on the event loop when consuming from the backlog and writing to the channel. - I was able to clean up/remove a bunch of extra code due to this simplification, as well (ExpiredException, OutboundMessagingParameters, MessageResult) - Updated all the javadoc documentation for these changes (mostly OMC and ChannelWriter) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
[ https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609050#comment-16609050 ] Jason Brown commented on CASSANDRA-14711: - So, the first thing to know is that 3.2 is an, old unsupported release. 3.11.3 is the currently supported 3.X release. > Apache Cassandra 3.2 crashing with exception > org.apache.cassandra.db.marshal.TimestampType.compareCustom > > > Key: CASSANDRA-14711 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14711 > Project: Cassandra > Issue Type: Bug >Reporter: Saurabh >Priority: Major > Attachments: hs_err_pid32069.log > > > Hi Team, > I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. > Issue: > Cassandra is continuously crashing with generating an HEAP dump log. There > are no errors reported in system.log OR Debug.log. > Exception in hs_err_PID.log: > # Problematic frame: > # J 8283 C2 > org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I > (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] > Java Threads: ( => current thread ) > 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon > [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] > 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon > [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] > 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon > [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] > 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon > [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] > 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon > : > : > lot of threads in BLOCKED status > Other Threads: > 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] > [id=32098] > 0x2b7d38fa9de0 WatcherThread [stack: > 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] > VM state:not at safepoint (normal execution) > VM Mutex/Monitor currently owned by a thread: None > Heap: > garbage-first heap total 8388608K, used 6791168K [0x0003c000, > 0x0003c0404000, 0x0007c000) > region size 4096K, 785 young (3215360K), 55 survivors (225280K) > Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K > class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K > Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), > HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, > PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start) > AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, > 100% used [0x0003c000, 0x0003c040) > AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c040, 0x0003c080) > AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c080, 0x0003c0c0) > AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, > 100% used [0x0003c0c0, 0x0003c100) > AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, > 100% used [0x0003c100, 0x0003c140) > AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, > 100% used [0x0003c140, 0x0003c180) > : > : > lot of such messages -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13630) support large internode messages with netty
[ https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607603#comment-16607603 ] Jason Brown commented on CASSANDRA-13630: - [~djoshi3] Made a few comments on the PR, and in response I have: - moved autoRead check out of {{RebufferingByteBufDataInputPlus.available}} method and into it's own method; also added tests - refactored {{MessageInProcessor.process}} to move the main loop logic into the base class, and moved logic for each case statement into sub-methods rather than directly in-line with the loop > support large internode messages with netty > --- > > Key: CASSANDRA-13630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13630 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > > As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the > scope of that ticket. However, we still need that functionality to ship a > correctly operating internode messaging subsystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14285) Comma at the end of the end of the seed list is interpretated as localhost
[ https://issues.apache.org/jira/browse/CASSANDRA-14285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14285: Reviewer: Jordan West > Comma at the end of the end of the seed list is interpretated as localhost > -- > > Key: CASSANDRA-14285 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14285 > Project: Cassandra > Issue Type: Bug > Components: Configuration >Reporter: Marco >Assignee: Nicolas Guyomar >Priority: Minor > Fix For: 4.0 > > > Seeds: '10.1.20.10,10.1.21.10,10.1.22.10,' cause a flood of the debug log > with messages like this one. > DEBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2018-02-28 > 15:53:57,314 OutboundTcpConnection.java:545 - Unable to connect to > localhost/[127.0.0.1|http://127.0.0.1/] > This code provide by Nicolas Guyomar provide the reason of the issue. > In SImpleSeedProvider : > > String[] hosts = "10.1.20.10,10.1.21.10,10.1.22.10,".split(",", -1); > List seeds = new ArrayList(hosts.length); > for (String host : hosts) > { > System.out.println(InetAddress.getByName(host.trim())); > } > > output : > /[10.1.20.10|http://10.1.20.10/] > /[10.1.21.10|http://10.1.21.10/] > /[10.1.22.10|http://10.1.22.10/] > localhost/[127.0.0.1|http://127.0.0.1/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14618) Create fqltool replay command
[ https://issues.apache.org/jira/browse/CASSANDRA-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599274#comment-16599274 ] Jason Brown commented on CASSANDRA-14618: - +1, and please commit with CASSANDRA-14619 (as both patches are linked, code and review wise) > Create fqltool replay command > - > > Key: CASSANDRA-14618 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14618 > Project: Cassandra > Issue Type: New Feature >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Labels: fqltool > Fix For: 4.x > > > Make it possible to replay the full query logs from CASSANDRA-13983 against > one or several clusters. The goal is to be able to compare different runs of > production traffic against different versions/configurations of Cassandra. > * It should be possible to take logs from several machines and replay them in > "order" by the timestamps recorded > * Record the results from each run to be able to compare different runs > (against different clusters/versions/etc) > * If {{fqltool replay}} is run against 2 or more clusters, the results should > be compared as we go -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14618) Create fqltool replay command
[ https://issues.apache.org/jira/browse/CASSANDRA-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14618: Status: Ready to Commit (was: Patch Available) > Create fqltool replay command > - > > Key: CASSANDRA-14618 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14618 > Project: Cassandra > Issue Type: New Feature >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Labels: fqltool > Fix For: 4.x > > > Make it possible to replay the full query logs from CASSANDRA-13983 against > one or several clusters. The goal is to be able to compare different runs of > production traffic against different versions/configurations of Cassandra. > * It should be possible to take logs from several machines and replay them in > "order" by the timestamps recorded > * Record the results from each run to be able to compare different runs > (against different clusters/versions/etc) > * If {{fqltool replay}} is run against 2 or more clusters, the results should > be compared as we go -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14619) Create fqltool compare command
[ https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14619: Status: Ready to Commit (was: Patch Available) > Create fqltool compare command > -- > > Key: CASSANDRA-14619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14619 > Project: Cassandra > Issue Type: New Feature >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Labels: fqltool > Fix For: 4.x > > > We need a {{fqltool compare}} command that can take the recorded runs from > CASSANDRA-14618 and compares them, it should output any differences and > potentially all queries against the mismatching partition up until the > mismatch -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command
[ https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599273#comment-16599273 ] Jason Brown commented on CASSANDRA-14619: - - ColumnDefsReader.readMarshallable - you read an int32 value, but in ColumnDefsWriter.writeMarshallable, you wrote an int16. Is this correct? The unit tests pass, but I'm not sure if RecordStore is being fully exercised. The same thing happens in RowReader vs RowWriter UPDATE: I stepped through the chronicle code and it looks like the library can optimize the value it writes out (it only gets written as a byte, basically, since your value is zero). So, while your API calls are incongruous, the library does a correct thing under the hood. I would still prefer you to switch to the reads to int16(), but that can be done on commit. I also had a few trivial comments on the PR linked above. They are minor, so just address them on commit (if you choose). Otherwise, +1 from me. > Create fqltool compare command > -- > > Key: CASSANDRA-14619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14619 > Project: Cassandra > Issue Type: New Feature >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Labels: fqltool > Fix For: 4.x > > > We need a {{fqltool compare}} command that can take the recorded runs from > CASSANDRA-14618 and compares them, it should output any differences and > potentially all queries against the mismatching partition up until the > mismatch -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-14685: --- Assignee: Jason Brown > Incremental repair 4.0 : SSTables remain locked forever if the coordinator > dies during streaming > - > > Key: CASSANDRA-14685 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14685 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Alexander Dejanovski >Assignee: Jason Brown >Priority: Critical > > The changes in CASSANDRA-9143 modified the way incremental repair performs by > applying the following sequence of events : > * Anticompaction is executed on all replicas for all SSTables overlapping > the repaired ranges > * Anticompacted SSTables are then marked as "Pending repair" and cannot be > compacted anymore, nor part of another repair session > * Merkle trees are generated and compared > * Streaming takes place if needed > * Anticompaction is committed and "pending repair" table are marked as > repaired if it succeeded, or they are released if the repair session failed. > If the repair coordinator dies during the streaming phase, *the SSTables on > the replicas will remain in "pending repair" state and will never be eligible > for repair or compaction*, even after all the nodes in the cluster are > restarted. > Steps to reproduce (I've used Jason's 13938 branch that fixes streaming > errors) : > {noformat} > ccm create inc-repair-issue -v github:jasobrown/13938 -n 3 > # Allow jmx access and remove all rpc_ settings in yaml > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh; > do > sed -i'' -e > 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g' > $f > done > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml; > do > grep -v "rpc_" $f > ${f}.tmp > cat ${f}.tmp > $f > done > ccm start > {noformat} > I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a > few 10s of MBs of data (killed it after some time). Obviously > cassandra-stress works as well : > {noformat} > bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000 > --replication "{'class':'SimpleStrategy', 'replication_factor':2}" > --compaction "{'class': 'SizeTieredCompactionStrategy'}" --host > 127.0.0.1 > {noformat} > Flush and delete all SSTables in node1 : > {noformat} > ccm node1 nodetool flush > rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.* > {noformat} > Then throttle streaming throughput to 1MB/s so we have time to take node1 > down during the streaming phase and run repair: > {noformat} > ccm node1 nodetool setstreamthroughput 1 > ccm node2 nodetool setstreamthroughput 1 > ccm node3 nodetool setstreamthroughput 1 > ccm node1 nodetool repair tlp_stress > {noformat} > Once streaming starts, shut down node1 and start it again : > {noformat} > ccm node1 stop > ccm node1 start > {noformat} > Run repair again : > {noformat} > ccm node1 nodetool repair tlp_stress > {noformat} > The command will return very quickly, showing that it skipped all sstables : > {noformat} > [2018-08-31 19:05:16,292] Repair completed successfully > [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds > $ ccm node1 nodetool status > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID >Rack > UN 127.0.0.1 228,64 KiB 256 ? > 437dc9cd-b1a1-41a5-961e-cfc99763e29f rack1 > UN 127.0.0.2 60,09 MiB 256 ? > fbcbbdbb-e32a-4716-8230-8ca59aa93e62 rack1 > UN 127.0.0.3 57,59 MiB 256 ? > a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0 rack1 > {noformat} > sstablemetadata will then show that nodes 2 and 3 have SSTables still in > "pending repair" state : > {noformat} > ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | > grep repair > SSTable: > /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big > Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62 > {noformat} > Restarting these nodes wouldn't help either. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599086#comment-16599086 ] Jason Brown commented on CASSANDRA-14685: - Thanks for the report, [~adejanovski]. I'll be able to look into this next week, and I'm assigning the ticket to myself as a reminder. I'm not sure [~bdeggleston] can get to it before next week either. I'm not sure if this is due to the stream sessions on nodes 2 and 3 not properly closing (and thus not informing the repair sessions they are part of), or if it's something getting lost in the repair session. Do nodes 2/3 show any streaming or repair activities (via nodetool cmds) after the repair coordinator dies? > Incremental repair 4.0 : SSTables remain locked forever if the coordinator > dies during streaming > - > > Key: CASSANDRA-14685 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14685 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Alexander Dejanovski >Priority: Critical > > The changes in CASSANDRA-9143 modified the way incremental repair performs by > applying the following sequence of events : > * Anticompaction is executed on all replicas for all SSTables overlapping > the repaired ranges > * Anticompacted SSTables are then marked as "Pending repair" and cannot be > compacted anymore, nor part of another repair session > * Merkle trees are generated and compared > * Streaming takes place if needed > * Anticompaction is committed and "pending repair" table are marked as > repaired if it succeeded, or they are released if the repair session failed. > If the repair coordinator dies during the streaming phase, *the SSTables on > the replicas will remain in "pending repair" state and will never be eligible > for repair or compaction*, even after all the nodes in the cluster are > restarted. > Steps to reproduce (I've used Jason's 13938 branch that fixes streaming > errors) : > {noformat} > ccm create inc-repair-issue -v github:jasobrown/13938 -n 3 > # Allow jmx access and remove all rpc_ settings in yaml > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh; > do > sed -i'' -e > 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g' > $f > done > for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml; > do > grep -v "rpc_" $f > ${f}.tmp > cat ${f}.tmp > $f > done > ccm start > {noformat} > I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a > few 10s of MBs of data (killed it after some time). Obviously > cassandra-stress works as well : > {noformat} > bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000 > --replication "{'class':'SimpleStrategy', 'replication_factor':2}" > --compaction "{'class': 'SizeTieredCompactionStrategy'}" --host > 127.0.0.1 > {noformat} > Flush and delete all SSTables in node1 : > {noformat} > ccm node1 nodetool flush > rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.* > {noformat} > Then throttle streaming throughput to 1MB/s so we have time to take node1 > down during the streaming phase and run repair: > {noformat} > ccm node1 nodetool setstreamthroughput 1 > ccm node2 nodetool setstreamthroughput 1 > ccm node3 nodetool setstreamthroughput 1 > ccm node1 nodetool repair tlp_stress > {noformat} > Once streaming starts, shut down node1 and start it again : > {noformat} > ccm node1 stop > ccm node1 start > {noformat} > Run repair again : > {noformat} > ccm node1 nodetool repair tlp_stress > {noformat} > The command will return very quickly, showing that it skipped all sstables : > {noformat} > [2018-08-31 19:05:16,292] Repair completed successfully > [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds > $ ccm node1 nodetool status > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID >Rack > UN 127.0.0.1 228,64 KiB 256 ? > 437dc9cd-b1a1-41a5-961e-cfc99763e29f rack1 > UN 127.0.0.2 60,09 MiB 256 ? > fbcbbdbb-e32a-4716-8230-8ca59aa93e62 rack1 > UN 127.0.0.3 57,59 MiB 256 ? > a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0 rack1 > {noformat} > sstablemetadata will then show that nodes 2 and 3 have SSTables still in > "pending repair" state : > {noformat} > ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | > grep repair > SSTable: > /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big > Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62 > {noformat} > Restarting
[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command
[ https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598970#comment-16598970 ] Jason Brown commented on CASSANDRA-14619: - [~krummas] added an extra commit (e608fb24d3b00cb623fa9ca7b826b7a3bf2b9064) for versioning the replay output and querylog files. In that commit, every columnDefinition and row entry that is written out is prefixed with a 4-byte version number. Instead of writing out the (presumably) same version number many times in the file, can you write it once at the beginning of the file? I think you'd save a lot on file size that way. > Create fqltool compare command > -- > > Key: CASSANDRA-14619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14619 > Project: Cassandra > Issue Type: New Feature >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Labels: fqltool > Fix For: 4.x > > > We need a {{fqltool compare}} command that can take the recorded runs from > CASSANDRA-14618 and compares them, it should output any differences and > potentially all queries against the mismatching partition up until the > mismatch -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command
[ https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598087#comment-16598087 ] Jason Brown commented on CASSANDRA-14619: - Created a [Pull Request|https://github.com/apache/cassandra/pull/256] for commenting. > Create fqltool compare command > -- > > Key: CASSANDRA-14619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14619 > Project: Cassandra > Issue Type: New Feature >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Labels: fqltool > Fix For: 4.x > > > We need a {{fqltool compare}} command that can take the recorded runs from > CASSANDRA-14618 and compares them, it should output any differences and > potentially all queries against the mismatching partition up until the > mismatch -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14639) Fix a few complaints from eclipse-warnings for 2.2
[ https://issues.apache.org/jira/browse/CASSANDRA-14639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown resolved CASSANDRA-14639. - Resolution: Invalid Fix Version/s: (was: 2.2.x) Sadly, this ended up being a problem in my primary local repo. I cloned a fresh repo (into a different directory) under both macOS and linux, and they both produced no {{ant eclipse-warning}} errors. Thank you, [~sumanth.pasupuleti], for digging into this, and for confirming that 2.2 is clean. Sorry that it ended up being a problem on my end. > Fix a few complaints from eclipse-warnings for 2.2 > -- > > Key: CASSANDRA-14639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14639 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Sumanth Pasupuleti >Priority: Minor > > These failed on 2.2 > [circleci|https://circleci.com/gh/jasobrown/cassandra/1375] > {noformat} > eclipse-warnings: > [mkdir] Created dir: /home/cassandra/cassandra/build/ecj > [echo] Running Eclipse Code Analysis. Output logged to > /home/cassandra/cassandra/build/ecj/eclipse_compiler_checks.txt > [java] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 > [java] incorrect classpath: > /home/cassandra/cassandra/build/cobertura/classes > [java] -- > [java] 1. ERROR in > /home/cassandra/cassandra/src/java/org/apache/cassandra/tools/SSTableExport.java > (at line 315) > [java] ISSTableScanner scanner = reader.getScanner(); > [java] ^^^ > [java] Resource 'scanner' should be managed by try-with-resource > [java] -- > [java] -- > [java] 2. ERROR in > /home/cassandra/cassandra/src/java/org/apache/cassandra/db/compaction/CompactionManager.java > (at line 888) > [java] ISSTableScanner scanner = cleanupStrategy.getScanner(sstable, > getRateLimiter()); > [java] ^^^ > [java] Resource 'scanner' should be managed by try-with-resource > [java] -- > [java] -- > [java] 3. ERROR in > /home/cassandra/cassandra/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java > (at line 257) > [java] scanners.add(new LeveledScanner(intersecting, range)); > [java]^^^ > [java] Potential resource leak: '' may not > be closed > [java] -- > [java] 3 problems (3 errors) > BUILD FAILED > /home/cassandra/cassandra/build.xml:1915: Java returned: 255 > {noformat} > Not failing on 3.0+. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14659) Disable old native protocol versions on demand
[ https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14659: Resolution: Fixed Fix Version/s: 4.0 Status: Resolved (was: Patch Available) +1. Committed as sha {{7b61b0be88ef1fcc29646ae8bdbb05da825bc1b2}}. Thanks! > Disable old native protocol versions on demand > -- > > Key: CASSANDRA-14659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14659 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Labels: usability > Fix For: 4.0 > > > This patch allows the operators to disable older protocol versions on demand. > To use it, you can set {{native_transport_allow_older_protocols}} to false or > use nodetool disableolderprotocolversions. Cassandra will reject requests > from client coming in on any version except the current version. This will > help operators selectively reject connections from clients that do not > support the latest protoocol. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14659) Disable old native protocol versions on demand
[ https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14659: Summary: Disable old native protocol versions on demand (was: Disable old protocol versions on demand) > Disable old native protocol versions on demand > -- > > Key: CASSANDRA-14659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14659 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Labels: usability > > This patch allows the operators to disable older protocol versions on demand. > To use it, you can set {{native_transport_allow_older_protocols}} to false or > use nodetool disableolderprotocolversions. Cassandra will reject requests > from client coming in on any version except the current version. This will > help operators selectively reject connections from clients that do not > support the latest protoocol. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command
[ https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597908#comment-16597908 ] Jason Brown commented on CASSANDRA-14619: - I can work on this today > Create fqltool compare command > -- > > Key: CASSANDRA-14619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14619 > Project: Cassandra > Issue Type: New Feature >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Labels: fqltool > Fix For: 4.x > > > We need a {{fqltool compare}} command that can take the recorded runs from > CASSANDRA-14618 and compares them, it should output any differences and > potentially all queries against the mismatching partition up until the > mismatch -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14681) SafeMemoryWriterTest doesn't compile on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14681: Labels: Java11 (was: ) > SafeMemoryWriterTest doesn't compile on trunk > - > > Key: CASSANDRA-14681 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14681 > Project: Cassandra > Issue Type: Bug >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Trivial > Labels: Java11 > Fix For: 4.0 > > > {{SafeMemoryWriterTest}} references {{sun.misc.VM}}, which doesn't exist in > Java 11, so the build fails. > Proposed patch makes the test work against Java 8 + 11. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14659) Disable old protocol versions on demand
[ https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14659: Status: In Progress (was: Ready to Commit) > Disable old protocol versions on demand > --- > > Key: CASSANDRA-14659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14659 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Labels: usability > > This patch allows the operators to disable older protocol versions on demand. > To use it, you can set {{native_transport_allow_older_protocols}} to false or > use nodetool disableolderprotocolversions. Cassandra will reject requests > from client coming in on any version except the current version. This will > help operators selectively reject connections from clients that do not > support the latest protoocol. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14659) Disable old protocol versions on demand
[ https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597468#comment-16597468 ] Jason Brown commented on CASSANDRA-14659: - On the whole, this is almost there. I think the version check you have in {{Message}} would be best located in {{ProtocolVersion.decode()}}, as that is the section where we already do the general version check, and yours is an extension to that. > Disable old protocol versions on demand > --- > > Key: CASSANDRA-14659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14659 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Labels: usability > > This patch allows the operators to disable older protocol versions on demand. > To use it, you can set {{native_transport_allow_older_protocols}} to false or > use nodetool disableolderprotocolversions. Cassandra will reject requests > from client coming in on any version except the current version. This will > help operators selectively reject connections from clients that do not > support the latest protoocol. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14677) Clean up Message.Request implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596951#comment-16596951 ] Jason Brown edited comment on CASSANDRA-14677 at 8/30/18 12:28 AM: --- I took a decent look at the patch provided. Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once per instance. With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} every time you need to check if logging is still enabled, which references a volatile variable ({{AuditLogManager.isAuditLogEnabled}}). You might consider memoizing the value again. Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem typical of how we usually name things. You should add a comment that {{perform()}} is now the main entry point for running the {{Request}}, and perhaps make {{execute()}} protected (instead of public). I think it would be helpful for committers and for future reviewers to have a better understanding of what is meant by "big mess". Perhaps you could update the description to better outline the specific issues with the {{execute()}} method implementations. This would also make the changes in the patch more clear for review. was (Author: jasobrown): I took a decent look at the patch provided. Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once per instance. With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} everytime you need to reference the volatile variable (\{{AuditLogManager.isAuditLogEnabled}}). You might consider memoizing the value again. Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem typical of how we usually name things. You should add a comment that {{perform()}} is now the main entry point for running the {{Request}}, and perhaps make {{execute()}} protected (instead of public). I think it would be helpful for committers and for future reviewers to have a better understanding of what is meant by "big mess". Perhaps you could update the description to better outline the specific issues with the {{execute()}} method implementations. This would also make the changes in the patch more clear for review. > Clean up Message.Request implementations > > > Key: CASSANDRA-14677 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14677 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 4.0.x > > > First tracing support, many years ago, then most recently audit log, made a > big mess out of {{Message.Request.execute()}} implementations. > This patch tries to clean up some of it by removing tracing logic from > {{QueryState}} and moving shared tracing functionality to > {{Message.Request.perform()}}. It also moves out tracing and audit log boiler > plate into their own small methods instead of polluting {{execute()}} > implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14677) Clean up Message.Request implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596951#comment-16596951 ] Jason Brown edited comment on CASSANDRA-14677 at 8/30/18 12:27 AM: --- I took a decent look at the patch provided. Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once per instance. With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} everytime you need to reference the volatile variable (\{{AuditLogManager.isAuditLogEnabled}}). You might consider memoizing the value again. Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem typical of how we usually name things. You should add a comment that {{perform()}} is now the main entry point for running the {{Request}}, and perhaps make {{execute()}} protected (instead of public). I think it would be helpful for committers and for future reviewers to have a better understanding of what is meant by "big mess". Perhaps you could update the description to better outline the specific issues with the {{execute()}} method implementations. This would also make the changes in the patch more clear for review. was (Author: jasobrown): I took a decent look at the patch provided. Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once per instance. With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} everytime you need to reference the variable. You might consider memoizing the value again. Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem typical of how we usually name things. You should add a comment that {{perform()}} is now the main entry point for running the {{Request}}, and perhaps make {{execute()}} protected (instead of public). I think it would be helpful for committers and for future reviewers to have a better understanding of what is meant by "big mess". Perhaps you could update the description to better outline the specific issues with the {{execute()}} method implementations. This would also make the changes in the patch more clear for review. > Clean up Message.Request implementations > > > Key: CASSANDRA-14677 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14677 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 4.0.x > > > First tracing support, many years ago, then most recently audit log, made a > big mess out of {{Message.Request.execute()}} implementations. > This patch tries to clean up some of it by removing tracing logic from > {{QueryState}} and moving shared tracing functionality to > {{Message.Request.perform()}}. It also moves out tracing and audit log boiler > plate into their own small methods instead of polluting {{execute()}} > implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14677) Clean up Message.Request implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596951#comment-16596951 ] Jason Brown commented on CASSANDRA-14677: - I took a decent look at the patch provided. Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once per instance. With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} everytime you need to reference the variable. You might consider memoizing the value again. Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem typical of how we usually name things. You should add a comment that {{perform()}} is now the main entry point for running the {{Request}}, and perhaps make {{execute()}} protected (instead of public). I think it would be helpful for committers and for future reviewers to have a better understanding of what is meant by "big mess". Perhaps you could update the description to better outline the specific issues with the {{execute()}} method implementations. This would also make the changes in the patch more clear for review. > Clean up Message.Request implementations > > > Key: CASSANDRA-14677 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14677 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 4.0.x > > > First tracing support, many years ago, then most recently audit log, made a > big mess out of {{Message.Request.execute()}} implementations. > This patch tries to clean up some of it by removing tracing logic from > {{QueryState}} and moving shared tracing functionality to > {{Message.Request.perform()}}. It also moves out tracing and audit log boiler > plate into their own small methods instead of polluting {{execute()}} > implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14677) Clean up Message.Request implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14677: Reviewers: Dinesh Joshi, Jason Brown > Clean up Message.Request implementations > > > Key: CASSANDRA-14677 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14677 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 4.0.x > > > First tracing support, many years ago, then most recently audit log, made a > big mess out of {{Message.Request.execute()}} implementations. > This patch tries to clean up some of it by removing tracing logic from > {{QueryState}} and moving shared tracing functionality to > {{Message.Request.perform()}}. It also moves out tracing and audit log boiler > plate into their own small methods instead of polluting {{execute()}} > implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org