[jira] [Assigned] (CASSANDRA-12362) dtest failure in upgrade_tests.paging_test.TestPagingDatasetChangesNodes2RF1_Upgrade_current_3_x_To_indev_3_x.test_row_TTL_expiry_during_paging

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-12362:
---

Assignee: (was: Jason Brown)

> dtest failure in 
> upgrade_tests.paging_test.TestPagingDatasetChangesNodes2RF1_Upgrade_current_3_x_To_indev_3_x.test_row_TTL_expiry_during_paging
> ---
>
> Key: CASSANDRA-12362
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12362
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Sean McCarthy
>Priority: Normal
>  Labels: dtest
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest_upgrade/5/testReport/upgrade_tests.paging_test/TestPagingDatasetChangesNodes2RF1_Upgrade_current_3_x_To_indev_3_x/test_row_TTL_expiry_during_paging
> {code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/upgrade_tests/paging_test.py", line 
> 1217, in test_row_TTL_expiry_during_paging
> self.assertEqual(pf.pagecount(), 3)
>   File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual
> assertion_func(first, second, msg=msg)
>   File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual
> raise self.failureException(msg)
> "2 != 3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13507) dtest failure in paging_test.TestPagingWithDeletions.test_ttl_deletions

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-13507:
---

Assignee: (was: Jason Brown)

> dtest failure in paging_test.TestPagingWithDeletions.test_ttl_deletions 
> 
>
> Key: CASSANDRA-13507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Ariel Weisberg
>Priority: Normal
>  Labels: dtest, test-failure, test-failure-fresh
> Attachments: test_ttl_deletions_fail.txt
>
>
> {noformat}
> Failed 7 times in the last 30 runs. Flakiness: 34%, Stability: 76%
> Error Message
> 4 != 8
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-z1xodw
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.pool: WARNING: Error attempting to reconnect to 127.0.0.5, 
> scheduling retry in 600.0 seconds: [Errno 111] Tried connecting to 
> [('127.0.0.5', 9042)]. Last error: Connection refused
> cassandra.pool: WARNING: Error attempting to reconnect to 127.0.0.3, 
> scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to 
> [('127.0.0.3', 9042)]. Last error: Connection refused
> cassandra.pool: WARNING: Error attempting to reconnect to 127.0.0.3, 
> scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to 
> [('127.0.0.3', 9042)]. Last error: Connection refused
> {noformat}
> Most output omitted. It's attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12347) Gossip 2.0 - broadcast tree for data dissemination

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-12347:

Resolution: Won't Fix
Status: Resolved  (was: Open)

> Gossip 2.0 - broadcast tree for data dissemination
> --
>
> Key: CASSANDRA-12347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12347
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Distributed Metadata
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Normal
>
> Description: A broadcast tree (spanning tree) allows an originating node to 
> efficiently send out updates to all of the peers in the cluster by 
> constructing a balanced, self-healing tree based upon the view it gets from 
> the peer sampling service (CASSANDRA-12346). 
> I propose we use an algorithm based on the [Thicket 
> paper|http://www.gsd.inesc-id.pt/%7Ejleitao/pdf/srds10-mario.pdf], which 
> describes a dynamic, self-healing broadcast tree. When a given node needs to 
> send out a message, it dynamically builds a tree for each node in the 
> cluster; thus giving us a unique tree for every node in the cluster (a tree 
> rooted at every cluster node). The trees, of course, would be reusable until 
> the cluster configurations changes or failures are detected (by the mechanism 
> described in the paper). Additionally, Thicket includes a mechanism for 
> load-balancing the trees such that nodes spread out the work amongst 
> themselves.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12346) Gossip 2.0 - introduce a Peer Sampling Service for partial cluster views

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-12346:

Resolution: Won't Fix
Status: Resolved  (was: Open)

> Gossip 2.0 - introduce a Peer Sampling Service for partial cluster views
> 
>
> Key: CASSANDRA-12346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Normal
>  Labels: gossip
>
> A [Peer Sampling 
> Service|http://infoscience.epfl.ch/record/83409/files/neg--1184036295all.pdf] 
> is a module that provides a partial view of a cluster to dependent modules. A 
> node's partial view, combined with all other nodes' partial views, combine to 
> create a fully-connected mesh over the cluster. This way, a given node does 
> not need to have direct connections to every other node in the cluster, and 
> can be much more efficient in terms of resource management as well as 
> information dissemination. Peer Sampling Services by their nature must be 
> self-healing and self-balancing to maintain the fully-connected mesh.
> I propose we use an algorithm based on 
> [HyParView|http://asc.di.fct.unl.pt/~jleitao/pdf/dsn07-leitao.pdf], which is 
> a concrete algorithm for a Peer Sampling Service. HyParView has a clearly 
> defined protocol, and is reasonably simple to implement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12345) Gossip 2.0

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-12345:

Resolution: Won't Fix
Status: Resolved  (was: Open)

> Gossip 2.0
> --
>
> Key: CASSANDRA-12345
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12345
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Normal
>  Labels: gossip
>
> This is a parent ticket covering changes to the dissemination aspects of the 
> current gossip subsystem. (Changes to the actual data being exchanged by the 
> current gossip (the cluster metadata) will be handled elsewhere, but the 
> current primary ticket covering that work is CASSANDRA-9667.)
> This work requires several components, which largely need to completed in 
> this order:
> - a peer sampling service to create partial cluster views (CASSANDRA-12346). 
> This forms the basis of the next two components
> - a broadcast tree, which creates dynamic spanning trees given the partial 
> views provided by the peer sampling service (CASSANDRA-12347)
> - an anti-entropy component, which is similar to the pair-wise exchange and 
> reconciliation of the exitsing gossip implementation (CASSANDRA-???)
> These base components (primarily the broadcast and anti-entropy) can allow 
> for generic consumers to simply and effectively share a body of data across 
> an entire cluster. The most obvious consumer will be a cluster metadata 
> component, which can replace the existing gossip system, but also other 
> components like CASSANDRA-12106.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13628) switch peer-to-peer networking to non-blocking I/O via netty

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13628:

Resolution: Fixed
Status: Resolved  (was: Open)

> switch peer-to-peer networking to non-blocking I/O via netty
> 
>
> Key: CASSANDRA-13628
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13628
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core, Legacy/Streaming and Messaging
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Normal
> Fix For: 4.0
>
>
> This is a parent ticket for linking all the work to be done for switching 
> peer-to-peer networking to use non-blocking I/O via netty



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13630) support large internode messages with netty

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13630:

Resolution: Won't Fix
Status: Resolved  (was: Open)

> support large internode messages with netty
> ---
>
> Key: CASSANDRA-13630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the 
> scope of that ticket. However, we still need that functionality to ship a 
> correctly operating internode messaging subsystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13630) support large internode messages with netty

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13630:

Status: Open  (was: Patch Available)

> support large internode messages with netty
> ---
>
> Key: CASSANDRA-13630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the 
> scope of that ticket. However, we still need that functionality to ship a 
> correctly operating internode messaging subsystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13989) Update security docs for 4.0

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-13989:
---

Assignee: (was: Jason Brown)

> Update security docs for 4.0
> 
>
> Key: CASSANDRA-13989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13989
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Documentation and Website
>Reporter: Jason Brown
>Priority: Low
> Fix For: 4.x
>
>
> CASSANDRA-8457 and CASSANDRA-10404 have brought changes to the way SSL works 
> for both internode messaging and the native protocol. Update the docs to 
> reflect information that is important to users/operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14754) Add verification of state machine in StreamSession

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-14754:
---

Assignee: (was: Jason Brown)

> Add verification of state machine in StreamSession
> --
>
> Key: CASSANDRA-14754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14754
> Project: Cassandra
>  Issue Type: Task
>  Components: Legacy/Streaming and Messaging
>Reporter: Jason Brown
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamSession}} contains an implicit state machine, but we have no 
> verification of the safety of the transitions between states. For example, we 
> have no checks to ensure we cannot leave the final states (COMPLETED, FAILED).
> I propose we add some program logic in {{StreamSession}}, tests, and 
> documentation to ensure the correctness of the state transitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14575) Reevaluate when to drop an internode connection on message error

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-14575:
---

Assignee: (was: Jason Brown)

> Reevaluate when to drop an internode connection on message error
> 
>
> Key: CASSANDRA-14575
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14575
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Streaming and Messaging
>Reporter: Jason Brown
>Priority: Low
> Fix For: 4.0
>
>
> As mentioned in CASSANDRA-14574, explore if and when we can safely ignore an 
> incoming internode message on certain classes of failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14575) Reevaluate when to drop an internode connection on message error

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14575:

Status: Open  (was: Patch Available)

> Reevaluate when to drop an internode connection on message error
> 
>
> Key: CASSANDRA-14575
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14575
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Streaming and Messaging
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Low
> Fix For: 4.0
>
>
> As mentioned in CASSANDRA-14574, explore if and when we can safely ignore an 
> incoming internode message on certain classes of failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-14760:
---

Assignee: (was: Jason Brown)

> CVE-2018-10237 Security vulnerability in 3.11.3
> ---
>
> Key: CASSANDRA-14760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14760
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: John F. Gbruoski
>Priority: Normal
>
> As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a 
> security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be 
> patched to support Guava  24.1.1 or later?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14503:

Resolution: Won't Fix
Status: Resolved  (was: Open)

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone

2019-06-12 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14503:

Status: Open  (was: Patch Available)

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-01 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807254#comment-16807254
 ] 

Jason Brown commented on CASSANDRA-15066:
-

I believe many of these changes indeed improve the quality of the code and 
long-term strengthen the system, but seem best targetted at 4.NEXT. I and 
others would like to collaborate on this effort going forward.

The introduction of a slew of new features (checksumming, reimplementing parts 
of netty (AsyncPromise, LZ4 compression, replacing netty’s ByteBufAllocator 
with c*’s)) and major reimplementations (droppable verbs/verb priority, 
semantic changes to connection types) seven months after the community declared 
a feature freeze for 4.0 seems ill-advised, at best. The size, scope, and depth 
of this patch, which touches many vital components, invalidates most 4.0 
testing hitherto.

In my estimation, a fair and thorough review of the current patch alone, by 
myself and others, would be at least 2 solid months, as there is a lot of new 
complexity introduced. Significant additional time would be required for 
integration testing. At the barest minimum, this patch should be broken up into 
separate tickets, review them indivdidually, and merge them incrementally. 
Additionaly, I think having a discussion on dev@, as you proposed, would be 
highly beneficial.

Further, CASSANDRA-14503 was posted for REVIEW, in the hopes that we could have 
a discussion around the current state of trunk and the patch I submitted. I 
appreciate the reporting of the bugs you found and work you invested. Beyond 
that, however, there has not been any meaningful discussion or engagement. I 
would have appreciated the opportunity to collaborate on this effort, 
especially as I have personally invested much time and effort into this work.

To sum up, I am -1 on this work in it's current form *for 4.0* as the new 
features violate the freeze, and many of the new implementations violate 
principle of reducing risk and increase stability as we run up to 4.0.

> Improvements to Internode Messaging
> ---
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Benedict
>Assignee: Benedict
>Priority: Normal
> Fix For: 4.0
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15030) Add support for SSL and bindable address to sidecar

2019-02-19 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-15030:

Reviewers: Chris Lohfink, Vinay Chella
 Reviewer:   (was: Chris Lohfink)

> Add support for SSL and bindable address to sidecar
> ---
>
> Key: CASSANDRA-15030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15030
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Sidecar
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Minor
>
> We need to support SSL for the sidecar's REST interface. We should also have 
> the ability to bind the sidecar's API to a specific network interface. This 
> patch adds support for both.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14395) C* Management process

2019-02-18 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14395:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1

I made two minor changes on commit:
 * made {{logger}} instances {{static final}}
 * removed the jolokia license file. [~djoshi3] sorry if I confused you, as 
what i meant to say is that if we ship the jar in-tree, we should have the 
license file, as well (like we do in cassandra proper). However, this raises 
the question of how to correctly address transitive dependencies that we don't 
ship in-tree. Admittedly, I've been doing it "the cassandra way" for a long 
time (with jars in-tree), so I'm not sure how properly include licenses with a 
maven-like system. I'll create a followup ticket to figure it out.

Otherwise, this is a good first step toward shipping a working sidecar. 
Committed as sha {{a15ed267d1977e38ba36d061139839fad7b865f2}}. Thanks, 
[~djoshi3]!

> C* Management process
> -
>
> Key: CASSANDRA-14395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14395
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Attachments: Looking towards an Official Cassandra Sidecar - 
> Netflix.pdf
>
>
> I would like to propose amending Cassandra's architecture to include a 
> management process. The detailed description is here: 
> https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit
> I'd like to propose seeding this with a few simple use-cases such as Health 
> Checks, Bulk Commands with a simple REST API interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-02-15 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769340#comment-16769340
 ] 

Jason Brown commented on CASSANDRA-15013:
-

Ahh, I just reread the {{doc/native_protocol_v5.spec}}, and the OPTIONS are an 
open map, basically. I thought they were a fixed listing (primarily because we 
only support a fixed set of compression types). OK, so any version works for me 
:).

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-02-15 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769340#comment-16769340
 ] 

Jason Brown edited comment on CASSANDRA-15013 at 2/15/19 2:05 PM:
--

Ahh, I just reread the {{doc/native_protocol_v5.spec}}, and the OPTIONS are a 
semi-defined map, basically. I thought they were a fixed listing (primarily 
because we only support a fixed set of compression types). OK, so any version 
works for me :).


was (Author: jasobrown):
Ahh, I just reread the {{doc/native_protocol_v5.spec}}, and the OPTIONS are an 
open map, basically. I thought they were a fixed listing (primarily because we 
only support a fixed set of compression types). OK, so any version works for me 
:).

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-02-15 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769331#comment-16769331
 ] 

Jason Brown commented on CASSANDRA-15013:
-

Yup, I agree the harder part, programming wise, is {{requestExecutor}} stuffs, 
and let's plow through that first. The {{OptionsMessage/client protocol work}} 
is significantly easier, as I think we agree, but would that qualify as a 
change to the native protocol, for which we need to wait for a major rev (as 
in, 4.0)? Or are additive additions ok acceptable for previous native protocol 
versions? We might have a policy or general advice around this, but I don't 
know.

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-02-15 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769331#comment-16769331
 ] 

Jason Brown edited comment on CASSANDRA-15013 at 2/15/19 1:54 PM:
--

Yup, I agree the harder part, programming wise, is {{requestExecutor}} stuffs, 
and let's plow through that first. The {{OptionsMessage/client protocol work}} 
is significantly easier, as I think we agree, but would that qualify as a 
change to the native protocol, for which we need to wait for a major rev (as 
in, 4.0)? Or are additive additions ok acceptable for previous native protocol 
versions? We might have a policy or general advice around this, but I don't 
know.

 

Either way, [~sumanth.pasupuleti] has enough to work forward for now, and we 
can figure out the native protocol-impacting stuffs in parallel.


was (Author: jasobrown):
Yup, I agree the harder part, programming wise, is {{requestExecutor}} stuffs, 
and let's plow through that first. The {{OptionsMessage/client protocol work}} 
is significantly easier, as I think we agree, but would that qualify as a 
change to the native protocol, for which we need to wait for a major rev (as 
in, 4.0)? Or are additive additions ok acceptable for previous native protocol 
versions? We might have a policy or general advice around this, but I don't 
know.

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14395) C* Management process

2019-02-15 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769326#comment-16769326
 ] 

Jason Brown commented on CASSANDRA-14395:
-

- inspecting the tarball via produced via \{{gradlew distTar}}, I don't see the 
jolokia jar packaged in it. Admittedly, I didn't check on the last version of 
this patch either.
- need a license file when including the jolokia jar. I propose we start simple 
for now, and since there's only one jar for now (which is hopefully being 
removed in an upcoming patch) - just add the license in a subfolder like we do 
in cassandra.
- in \{{HealthCheck::check}}, when we get a \{{NoHostAvailableException}} from 
the driver (which is thrown when we cannot connect), it would preferable to not 
litter the logs with the stack trace. Or maybe log the exception at \{{DEBUG}} 
or \{{TRACE}}. I discovered this by running c* locally, then terminating it, 
and watching the logs from the sidecar.
- petty nit: sometimes a \{{catch}} keyword is on the same line as the closing 
brace of a \{[try}} block. See \{{HealhCheck::createCluster}} for an example.

> C* Management process
> -
>
> Key: CASSANDRA-14395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14395
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Attachments: Looking towards an Official Cassandra Sidecar - 
> Netflix.pdf
>
>
> I would like to propose amending Cassandra's architecture to include a 
> management process. The detailed description is here: 
> https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit
> I'd like to propose seeding this with a few simple use-cases such as Health 
> Checks, Bulk Commands with a simple REST API interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-02-15 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769292#comment-16769292
 ] 

Jason Brown commented on CASSANDRA-15013:
-

[~benedict] A, I see now that's what you intended by 
\{{connection-configurable option}}. I'm fine with that.

I'm not sure if specifying the 'backpressure type' would require a change to 
the native protocol. I think it would be most appropriate in the OPTIONS 
section (and thus {{OptionasMessage}}), but I might be mistaken.  However, I 
wonder if we should break that work out into a separate ticket to unblock the 
other work here, so that it can be backported and fixed in production. wdyt?

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-02-14 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768913#comment-16768913
 ] 

Jason Brown commented on CASSANDRA-15013:
-

I agree with upping the max queue depth (or unbounded plus size monitoring) as 
well as stop reading from the socket (by setting netty's {{autoRead}} to 
false). I'm not, however, convinced about adding yet another configuration 
option; adding more configs options only complicates the lives of operators. 
How will an operator know how to set it most appropriately to their use 
case(s)? We should choose the best solution, *document it*, and go with that as 
a built-in behavior. (Note: I'm amenable to throwing the OverloadedException, 
as well.)

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14395) C* Management process

2019-02-11 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765375#comment-16765375
 ] 

Jason Brown commented on CASSANDRA-14395:
-

I've taken a decent look through [~djoshi3]'s first patch, and on the whole, I 
think this is a good first step for this project. I'm not digging too far into 
nit picking at this early stage as I feel it's more important to make forward 
progress overall rather than get tripped up over minor points.

general comments:

- I'm passing over the gradle scripts for now, in lieu of reviewing everything 
else first
- do we need a lib directly with checked in jars, esp with a gradle/mvn style 
build files that pulls jars from mavenCentral?
- need a script to run the app from the command line :) I was able to use 
\{{gradlew run}} to see it work.

code comments
- Configuration - let's add comments to make it easier to distinguish between 
\{{getCassandraPort}} and \{{getPort}}; maybe update the method names, as well. 
I needed to read the \{{MainModule::configuration()}} to figure what each 
method acttually represented.
- in general, I think we want to execute scheduled tasks with 
\{{scheduleWithFixedDelay()}} rather than \{{scheduleAtFixedRate()}}. This way 
tasks don't end up piling up on top of each other if one takes a looong time to 
execute.
- there a couple of nit picky things where an instance's final fields are in 
caps, like a constant. trivial at this point.

The only thing I'm not entirely thrilled with is how each URL/handler will need 
to be explicitly wired into the \{{router}} in 
\{{CassandraSidecarDaemon::start()}}. I'm not sure if there's further guice 
magick that can mitigate this. However, I don't feel this is a huge problem for 
the usefulness project as a whole, nor do I think we need to tackle it in the 
early stages or anytime soon.

Looking forward to more activity on this ticket. 

> C* Management process
> -
>
> Key: CASSANDRA-14395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14395
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Attachments: Looking towards an Official Cassandra Sidecar - 
> Netflix.pdf
>
>
> I would like to propose amending Cassandra's architecture to include a 
> management process. The detailed description is here: 
> https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit
> I'd like to propose seeding this with a few simple use-cases such as Health 
> Checks, Bulk Commands with a simple REST API interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone

2019-01-22 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749356#comment-16749356
 ] 

Jason Brown commented on CASSANDRA-14503:
-

[~benedict] / [~djoshi3] Any update on reviewing this latest patch? This seems 
to be a blocker for 4.0.

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate

2018-12-04 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14829:

Fix Version/s: (was: 4.0.x)
   (was: 3.11.x)
   4.0
   3.11.4

> Make stop-server.bat wait for Cassandra to terminate
> 
>
> Key: CASSANDRA-14829
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14829
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: Windows 10
>Reporter: Georg Dietrich
>Assignee: Georg Dietrich
>Priority: Minor
>  Labels: easyfix, windows
> Fix For: 3.11.4, 4.0
>
>
> While administering a single node Cassandra on Windows, I noticed that the 
> stop-server.bat script returns before the cassandra process has actually 
> terminated. For use cases like creating a script "shut down & create backup 
> of data directory without having to worry about open files, then restart", it 
> would be good to make stop-server.bat wait for Cassandra to terminate.
> All that is needed for that is to change in 
> apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." 
> to "start /WAIT /B powershell /file ..." (additional /WAIT parameter).
> Does this sound reasonable?
> Here is the pull request: https://github.com/apache/cassandra/pull/287



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate

2018-12-03 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14829:

   Resolution: Fixed
 Reviewer: Dinesh Joshi
Fix Version/s: (was: 4.x)
   Status: Resolved  (was: Ready to Commit)

committed as sha \{{85e402a7fda59110aeea181924035d69db693240}}. Thanks!

> Make stop-server.bat wait for Cassandra to terminate
> 
>
> Key: CASSANDRA-14829
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14829
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: Windows 10
>Reporter: Georg Dietrich
>Assignee: Georg Dietrich
>Priority: Minor
>  Labels: easyfix, windows
> Fix For: 3.11.x, 4.0.x
>
>
> While administering a single node Cassandra on Windows, I noticed that the 
> stop-server.bat script returns before the cassandra process has actually 
> terminated. For use cases like creating a script "shut down & create backup 
> of data directory without having to worry about open files, then restart", it 
> would be good to make stop-server.bat wait for Cassandra to terminate.
> All that is needed for that is to change in 
> apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." 
> to "start /WAIT /B powershell /file ..." (additional /WAIT parameter).
> Does this sound reasonable?
> Here is the pull request: https://github.com/apache/cassandra/pull/287



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate

2018-12-03 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707176#comment-16707176
 ] 

Jason Brown commented on CASSANDRA-14829:
-

[~djoshi3] i'll commit.

> Make stop-server.bat wait for Cassandra to terminate
> 
>
> Key: CASSANDRA-14829
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14829
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: Windows 10
>Reporter: Georg Dietrich
>Assignee: Georg Dietrich
>Priority: Minor
>  Labels: easyfix, windows
> Fix For: 3.11.x, 4.x, 4.0.x
>
>
> While administering a single node Cassandra on Windows, I noticed that the 
> stop-server.bat script returns before the cassandra process has actually 
> terminated. For use cases like creating a script "shut down & create backup 
> of data directory without having to worry about open files, then restart", it 
> would be good to make stop-server.bat wait for Cassandra to terminate.
> All that is needed for that is to change in 
> apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." 
> to "start /WAIT /B powershell /file ..." (additional /WAIT parameter).
> Does this sound reasonable?
> Here is the pull request: https://github.com/apache/cassandra/pull/287



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes

2018-11-30 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-14896.
-
Resolution: Fixed

Committed v2 patch as {{f3609995c09570d523527d9bd0fd69c2bc65d986}} with updated 
comments per [~aweisberg]'s recommendation.

> 3.0 schema migration pulls from later version incompatible nodes
> 
>
> Key: CASSANDRA-14896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14896
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, CQL
>Reporter: Ariel Weisberg
>Assignee: Jason Brown
>Priority: Blocker
>  Labels: 4.0-pre-rc-bugs
> Fix For: 4.0
>
>
> I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly 
> different and 3.0 in some scenarios it is pulling schema from a later 
> version. This causes upgrade tests to have errors in the logs due to 
> additional columns from configurable storage port.
> {noformat}
> Failed: Error details: 
> Errors seen in logs for: node2
> node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 
> CassandraDaemon.java:207 - Exception in thread 
> Thread[MessagingService-Incoming-/127.0.0.1,5,main]
> java.lang.RuntimeException: Unknown column additional_write_policy during 
> deserialization
>   at 
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes

2018-11-29 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703901#comment-16703901
 ] 

Jason Brown commented on CASSANDRA-14896:
-

The problem with my first patch is that we need the peer's messaging version in 
order serialize the {{InetaddressAndPort}} correctly to the peer. We still need 
to write the local node's messaging version into the message, however.

Patch here:

||v2||
|[branch|https://github.com/jasobrown/cassandra/tree/14896-v2]|
|[utests  
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14896-v2]|
||

> 3.0 schema migration pulls from later version incompatible nodes
> 
>
> Key: CASSANDRA-14896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14896
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, CQL
>Reporter: Ariel Weisberg
>Assignee: Jason Brown
>Priority: Blocker
>  Labels: 4.0-pre-rc-bugs
> Fix For: 4.0
>
>
> I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly 
> different and 3.0 in some scenarios it is pulling schema from a later 
> version. This causes upgrade tests to have errors in the logs due to 
> additional columns from configurable storage port.
> {noformat}
> Failed: Error details: 
> Errors seen in logs for: node2
> node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 
> CassandraDaemon.java:207 - Exception in thread 
> Thread[MessagingService-Incoming-/127.0.0.1,5,main]
> java.lang.RuntimeException: Unknown column additional_write_policy during 
> deserialization
>   at 
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14909) Netty IOExceptions caused by unclean client disconnects being logged at INFO instead of TRACE

2018-11-29 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14909:

   Resolution: Fixed
 Reviewer: Jason Brown
Fix Version/s: 3.11.x
   Status: Resolved  (was: Patch Available)

+1

committed as sha {{e4d0ce6ba2d6088c7edf8475f02462e1606f606d}}. Thanks!

> Netty IOExceptions caused by unclean client disconnects being logged at INFO 
> instead of TRACE
> -
>
> Key: CASSANDRA-14909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14909
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Minor
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Observed spam logs on 3.0.17 cluster with redundant Netty IOExceptions caused 
> due to client-side disconnections.
> {code:java}
> INFO  [epollEventLoopGroup-2-28] 2018-11-20 23:23:04,386 Message.java:619 - 
> Unexpected exception during request; channel = [id: 0x12995bc1, 
> L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xxx.xxx:33754]
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {code}
> {code:java}
> INFO  [epollEventLoopGroup-2-23] 2018-11-20 13:16:33,263 Message.java:619 - 
> Unexpected exception during request; channel = [id: 0x98bd7c0e, 
> L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xx.xx:33350]
> io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: 
> Connection timed out
>   at io.netty.channel.unix.Errors.newIOException(Errors.java:117) 
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at io.netty.channel.unix.Errors.ioResult(Errors.java:138) 
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.unix.FileDescriptor.readAddress(FileDescriptor.java:175) 
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.epoll.AbstractEpollChannel.doReadBytes(AbstractEpollChannel.java:238)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:926)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:397) 
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:302) 
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> {code}
> [CASSANDRA-7849|https://issues.apache.org/jira/browse/CASSANDRA-7849] 
> addresses this for JAVA IO Exception like "java.io.IOException: Connection 
> reset by peer", but not for Netty IOException since the exception message in 
> Netty includes method name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14909) Netty IOExceptions caused by unclean client disconnects being logged at INFO instead of TRACE

2018-11-29 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703276#comment-16703276
 ] 

Jason Brown commented on CASSANDRA-14909:
-

[~sumanth.pasupuleti] added wrt use of the Java stream API on the PR

> Netty IOExceptions caused by unclean client disconnects being logged at INFO 
> instead of TRACE
> -
>
> Key: CASSANDRA-14909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14909
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Minor
> Fix For: 4.0, 3.0.x
>
>
> Observed spam logs on 3.0.17 cluster with redundant Netty IOExceptions caused 
> due to client-side disconnections.
> {code:java}
> INFO  [epollEventLoopGroup-2-28] 2018-11-20 23:23:04,386 Message.java:619 - 
> Unexpected exception during request; channel = [id: 0x12995bc1, 
> L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xxx.xxx:33754]
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {code}
> {code:java}
> INFO  [epollEventLoopGroup-2-23] 2018-11-20 13:16:33,263 Message.java:619 - 
> Unexpected exception during request; channel = [id: 0x98bd7c0e, 
> L:/xxx.xx.xxx.xxx:7104 - R:/xxx.xx.xx.xx:33350]
> io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: 
> Connection timed out
>   at io.netty.channel.unix.Errors.newIOException(Errors.java:117) 
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at io.netty.channel.unix.Errors.ioResult(Errors.java:138) 
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.unix.FileDescriptor.readAddress(FileDescriptor.java:175) 
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.epoll.AbstractEpollChannel.doReadBytes(AbstractEpollChannel.java:238)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:926)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:397) 
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:302) 
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> {code}
> [CASSANDRA-7849|https://issues.apache.org/jira/browse/CASSANDRA-7849] 
> addresses this for JAVA IO Exception like "java.io.IOException: Connection 
> reset by peer", but not for Netty IOException since the exception message in 
> Netty includes method name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14897) In mixed 3.x/4 version clusters write tracing and repair history information without new columns

2018-11-29 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14897:

Reviewer: Jason Brown

> In mixed 3.x/4 version clusters write tracing and repair history information 
> without new columns
> 
>
> Key: CASSANDRA-14897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Major
>  Labels: 4.0-pre-rc-bugs
> Fix For: 4.0
>
> Attachments: 14897.diff
>
>
> In CASSANDRA-14841 I stopped it from writing to those tables so it wouldn't 
> generate any errors. Aleksey pointed out I could write just the old columns. 
> If a user manually adds the new columns to the old version nodes before 
> upgrade they will be able to query this information across the cluster. This 
> is a better situation then making it completely impossible for people to run 
> repairs or perform tracing in mixed version clusters.
> This would avoid breaking repair and tracing in mixed version clusters.
> I also want to properly document how to do this and maybe even provide a 
> script people can run to add the columns to old nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14897) In mixed 3.x/4 version clusters write tracing and repair history information without new columns

2018-11-29 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703219#comment-16703219
 ] 

Jason Brown commented on CASSANDRA-14897:
-

+1 lgtm

> In mixed 3.x/4 version clusters write tracing and repair history information 
> without new columns
> 
>
> Key: CASSANDRA-14897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Major
>  Labels: 4.0-pre-rc-bugs
> Fix For: 4.0
>
> Attachments: 14897.diff
>
>
> In CASSANDRA-14841 I stopped it from writing to those tables so it wouldn't 
> generate any errors. Aleksey pointed out I could write just the old columns. 
> If a user manually adds the new columns to the old version nodes before 
> upgrade they will be able to query this information across the cluster. This 
> is a better situation then making it completely impossible for people to run 
> repairs or perform tracing in mixed version clusters.
> This would avoid breaking repair and tracing in mixed version clusters.
> I also want to properly document how to do this and maybe even provide a 
> script people can run to add the columns to old nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14897) In mixed 3.x/4 version clusters write tracing and repair history information without new columns

2018-11-29 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14897:

Status: Ready to Commit  (was: Patch Available)

> In mixed 3.x/4 version clusters write tracing and repair history information 
> without new columns
> 
>
> Key: CASSANDRA-14897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Major
>  Labels: 4.0-pre-rc-bugs
> Fix For: 4.0
>
> Attachments: 14897.diff
>
>
> In CASSANDRA-14841 I stopped it from writing to those tables so it wouldn't 
> generate any errors. Aleksey pointed out I could write just the old columns. 
> If a user manually adds the new columns to the old version nodes before 
> upgrade they will be able to query this information across the cluster. This 
> is a better situation then making it completely impossible for people to run 
> repairs or perform tracing in mixed version clusters.
> This would avoid breaking repair and tracing in mixed version clusters.
> I also want to properly document how to do this and maybe even provide a 
> script people can run to add the columns to old nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes

2018-11-28 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702503#comment-16702503
 ] 

Jason Brown commented on CASSANDRA-14896:
-

The only utest that failed was 
{{DistributedReadWritePathTest.writeWithSchemaDisagreement}}, which failed with 
"Forked Java VM exited abnormally". I ran locally and all was fine, so chalking 
it up to a testing fluke. Will commit shortly.

> 3.0 schema migration pulls from later version incompatible nodes
> 
>
> Key: CASSANDRA-14896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14896
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, CQL
>Reporter: Ariel Weisberg
>Assignee: Jason Brown
>Priority: Blocker
>  Labels: 4.0-pre-rc-bugs
> Fix For: 3.0.x
>
>
> I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly 
> different and 3.0 in some scenarios it is pulling schema from a later 
> version. This causes upgrade tests to have errors in the logs due to 
> additional columns from configurable storage port.
> {noformat}
> Failed: Error details: 
> Errors seen in logs for: node2
> node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 
> CassandraDaemon.java:207 - Exception in thread 
> Thread[MessagingService-Incoming-/127.0.0.1,5,main]
> java.lang.RuntimeException: Unknown column additional_write_policy during 
> deserialization
>   at 
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes

2018-11-28 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14896:

   Resolution: Fixed
Fix Version/s: (was: 3.0.x)
   4.0
   Status: Resolved  (was: Ready to Commit)

Committed as sha \{{c5dee08dfb791ba28fecc8ca8b25a4a4d7e9cb07}}

> 3.0 schema migration pulls from later version incompatible nodes
> 
>
> Key: CASSANDRA-14896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14896
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, CQL
>Reporter: Ariel Weisberg
>Assignee: Jason Brown
>Priority: Blocker
>  Labels: 4.0-pre-rc-bugs
> Fix For: 4.0
>
>
> I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly 
> different and 3.0 in some scenarios it is pulling schema from a later 
> version. This causes upgrade tests to have errors in the logs due to 
> additional columns from configurable storage port.
> {noformat}
> Failed: Error details: 
> Errors seen in logs for: node2
> node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 
> CassandraDaemon.java:207 - Exception in thread 
> Thread[MessagingService-Incoming-/127.0.0.1,5,main]
> java.lang.RuntimeException: Unknown column additional_write_policy during 
> deserialization
>   at 
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14896) 3.0 schema migration pulls from later version incompatible nodes

2018-11-28 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702486#comment-16702486
 ] 

Jason Brown commented on CASSANDRA-14896:
-

[~aweisberg] is correct. On the third (and last) message of the internode 
messaging handshake, the node is [incorrectly sending 
back|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/OutboundHandshakeHandler.java#L180]
 the messaging version is received from the peer; it should be sending back 
it's own {{MessagingService.current_version}}.

Here's a one-line fix for sending the correct messaging version in 
{{ThirdHandshakeMessage}} as well as fixing the unit test that ensures the 
version being sent from {{OutboundHandshakeHandler}}
||14896||
|[branch|https://github.com/jasobrown/cassandra/tree/14896]|
|[utests & 
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14896]|

> 3.0 schema migration pulls from later version incompatible nodes
> 
>
> Key: CASSANDRA-14896
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14896
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, CQL
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Blocker
>  Labels: 4.0-pre-rc-bugs
> Fix For: 3.0.x
>
>
> I saw this in upgrade tests. The checks in 3.0 and 3.11 are slightly 
> different and 3.0 in some scenarios it is pulling schema from a later 
> version. This causes upgrade tests to have errors in the logs due to 
> additional columns from configurable storage port.
> {noformat}
> Failed: Error details: 
> Errors seen in logs for: node2
> node2: ERROR [MessagingService-Incoming-/127.0.0.1] 2018-11-15 21:17:46,739 
> CassandraDaemon.java:207 - Exception in thread 
> Thread[MessagingService-Incoming-/127.0.0.1,5,main]
> java.lang.RuntimeException: Unknown column additional_write_policy during 
> deserialization
>   at 
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.17.jar:3.0.17]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14485) Optimize internode messaging protocol

2018-10-24 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662353#comment-16662353
 ] 

Jason Brown commented on CASSANDRA-14485:
-

bq. make it easier to defer deserialization until the entire contents are in 
memory

Correct, as we never want to block (for deserialization) on the netty event loop

> Optimize internode messaging protocol
> -
>
> Key: CASSANDRA-14485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14485
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.0
>
>
> There's some dead wood and places for optimization in the internode messaging 
> protocol. Currently, we include the sender's \{{IPAddressAndPort}} in *every* 
> internode message, even though we already sent that in the handshake that 
> established the connection/session. Further, there are several places where 
> we can use vints instead of a fixed, 4-byte integer value- especially as 
> those values will almost always be less than one byte.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14503) Internode connection management is race-prone

2018-10-24 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662265#comment-16662265
 ] 

Jason Brown edited comment on CASSANDRA-14503 at 10/24/18 1:09 PM:
---

Based on testing conducted with [~jolynch] and [~vinaykumarcse], here's an 
updated branch with performance fixes and code improvements:

||v2||
|[branch|https://github.com/jasobrown/cassandra/tree/14503-v2]|
|[utests  
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503-v2]|
|[pull request|https://github.com/apache/cassandra/pull/289]|

The major change in this branch is I experimented with aggregating the messages 
to send into a single ByteBuf, instead of sending the messages individually to 
the netty pipeline. Since we can send up to (the current hard-coded size of) 64 
messages per each iteration of OMC.dequeueMessages(), that's 64 times to invoke 
the pipeline mechanics, 64 ByteBuf allocations (and releases), and 64 times to 
fulfill the promise corresponding to each message. If, instead, we send one 
ByteBuf (with data serialized into it) then it's just one message into the 
pipeline, one allocation, and one promise fulfillment. The primary trade-off is 
that the single buffer will be, of course, large; perhaps large enough to not 
be efficient with the netty allocator. To that end, wrote a JMH benchmark, and 
the results are compelling: TL;DR a single buffer is significantly faster than 
multiple smaller buffers. The closest case is a single buffer is twice as fast, 
with the typical percentile difference being about 10-20 times faster for the 
single buffer (1.5 micros vs. 23 micros).

To make this work, I need the allocation and serialization code to be moved 
outside of the pipeline handler (as it now needs to be invoked from OMC). I had 
already done this work with CASSANDRA-13630. Thus, I pulled that patch into 
this branch. That patch also greatly reduced the need for the ChannelWriter 
abstraction, and combined with the outstanding work in this branch, I am able 
to eliminate ChannelWriter and the confusion it added. However, I still need to 
handle large messages separately (as we don't want to use our blocking 
serializers on the event loop), so I've preserved the "move large message 
serialization on a separate thread" behavior from CASSANDRA-13630 by creating a 
new abstraction in OMC by adding (the not cleverly named) MessageDequeuer 
interface, with implementations for large messages and "small messages" 
(basically the current behavior of this patch that we've been riffing on).

One feature that we've been debating again is the whether to include the 
message coalescing feature. The current branch does not include it - mostly due 
to the fact that we've been iterating quite quickly over this code, and I broke 
it when incorporating the CASSANDRA-13630 patch (and killing off 
ChannelWriter). There is some testing happening to reevaluate the efficacy of 
message coalescing with the netty internode messaging.

Some other points of interest:

- switch OMC#backlog from ConcurrentLinkedQueue to MpscLinkedQueue from 
jctools. MpscLinkedQueue is dramtically better, and 
ConcurrentLinkedQueue#isEmpty was a CPU drain.
- improved scheduling of the consumerTask in OutboundMessagingConnection, 
though still needs a bit more refinement
- ditched the OMC.State from the last branch
- added [~jolynch]'s fixes wrt not setting a default SO_SNDBUF value
- OMC - introduced consumerTaskThread vs eventLoop member field
- ditched the auto-read in RebufferingByteBufDataInputPlus - I need to document 
this

In general I have a small bit of documenting to add, but the branch is ready 
for review.


was (Author: jasobrown):
Based on testing conducted with [~jolynch] and [~vinaykumarcse], here's an 
updated branch with performance fixes and code improvements:

||v2||
|[branch|https://github.com/jasobrown/cassandra/tree/14503-v2]|
|[utests  
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503-v2]|
||PR|https://github.com/apache/cassandra/pull/289|

The major change in this branch is I experimented with aggregating the messages 
to send into a single ByteBuf, instead of sending the messages individually to 
the netty pipeline. Since we can send up to (the current hard-coded size of) 64 
messages per each iteration of OMC.dequeueMessages(), that's 64 times to invoke 
the pipeline mechanics, 64 ByteBuf allocations (and releases), and 64 times to 
fulfill the promise corresponding to each message. If, instead, we send one 
ByteBuf (with data serialized into it) then it's just one message into the 
pipeline, one allocation, and one promise fulfillment. The primary trade-off is 
that the single buffer will be, of course, large; perhaps large enough to not 
be efficient with the netty allocator. To that end, wrote a JMH benchmark, and 
the results are compelling: TL;DR a single buffer 

[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone

2018-10-24 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662265#comment-16662265
 ] 

Jason Brown commented on CASSANDRA-14503:
-

Based on testing conducted with [~jolynch] and [~vinaykumarcse], here's an 
updated branch with performance fixes and code improvements:

||v2||
|[branch|https://github.com/jasobrown/cassandra/tree/14503-v2]|
|[utests  
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503-v2]|
||PR|https://github.com/apache/cassandra/pull/289|

The major change in this branch is I experimented with aggregating the messages 
to send into a single ByteBuf, instead of sending the messages individually to 
the netty pipeline. Since we can send up to (the current hard-coded size of) 64 
messages per each iteration of OMC.dequeueMessages(), that's 64 times to invoke 
the pipeline mechanics, 64 ByteBuf allocations (and releases), and 64 times to 
fulfill the promise corresponding to each message. If, instead, we send one 
ByteBuf (with data serialized into it) then it's just one message into the 
pipeline, one allocation, and one promise fulfillment. The primary trade-off is 
that the single buffer will be, of course, large; perhaps large enough to not 
be efficient with the netty allocator. To that end, wrote a JMH benchmark, and 
the results are compelling: TL;DR a single buffer is significantly faster than 
multiple smaller buffers. The closest case is a single buffer is twice as fast, 
with the typical percentile difference being about 10-20 times faster for the 
single buffer (1.5 micros vs. 23 micros).

To make this work, I need the allocation and serialization code to be moved 
outside of the pipeline handler (as it now needs to be invoked from OMC). I had 
already done this work with CASSANDRA-13630. Thus, I pulled that patch into 
this branch. That patch also greatly reduced the need for the ChannelWriter 
abstraction, and combined with the outstanding work in this branch, I am able 
to eliminate ChannelWriter and the confusion it added. However, I still need to 
handle large messages separately (as we don't want to use our blocking 
serializers on the event loop), so I've preserved the "move large message 
serialization on a separate thread" behavior from CASSANDRA-13630 by creating a 
new abstraction in OMC by adding (the not cleverly named) MessageDequeuer 
interface, with implementations for large messages and "small messages" 
(basically the current behavior of this patch that we've been riffing on).

One feature that we've been debating again is the whether to include the 
message coalescing feature. The current branch does not include it - mostly due 
to the fact that we've been iterating quite quickly over this code, and I broke 
it when incorporating the CASSANDRA-13630 patch (and killing off 
ChannelWriter). There is some testing happening to reevaluate the efficacy of 
message coalescing with the netty internode messaging.

Some other points of interest:

- switch OMC#backlog from ConcurrentLinkedQueue to MpscLinkedQueue from 
jctools. MpscLinkedQueue is dramtically better, and 
ConcurrentLinkedQueue#isEmpty was a CPU drain.
- improved scheduling of the consumerTask in OutboundMessagingConnection, 
though still needs a bit more refinement
- ditched the OMC.State from the last branch
- added [~jolynch]'s fixes wrt not setting a default SO_SNDBUF value
- OMC - introduced consumerTaskThread vs eventLoop member field
- ditched the auto-read in RebufferingByteBufDataInputPlus - I need to document 
this

In general I have a small bit of documenting to add, but the branch is ready 
for review.

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might 

[jira] [Assigned] (CASSANDRA-12823) dtest failure in topology_test.TestTopology.crash_during_decommission_test

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-12823:
---

Assignee: (was: Jason Brown)

> dtest failure in topology_test.TestTopology.crash_during_decommission_test
> --
>
> Key: CASSANDRA-12823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12823
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sean McCarthy
>Priority: Major
>  Labels: dtest, test-failure
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/489/testReport/topology_test/TestTopology/crash_during_decommission_test
> {code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 358, in run
> self.tearDown()
>   File "/home/automaton/cassandra-dtest/dtest.py", line 581, in tearDown
> raise AssertionError('Unexpected error in log, see stdout')
> "Unexpected error in log, see stdout
> {code}{code}
> Standard Output
> Unexpected error in node2 log, error: 
> ERROR [GossipStage:1] 2016-10-19 15:44:14,820 CassandraDaemon.java:229 - 
> Exception in thread Thread[GossipStage:1,5,main]
> java.lang.NullPointerException: null
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) 
> ~[na:1.8.0_45]
>   at org.apache.cassandra.hints.HintsCatalog.get(HintsCatalog.java:89) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsService.excise(HintsService.java:313) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2458) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2471) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2375)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1905)
>  ~[main/:na]
>   at 
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1222) 
> ~[main/:na]
>   at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1205) 
> ~[main/:na]
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1168) 
> ~[main/:na]
>   at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
>  ~[main/:na]
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_45]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13517) dtest failure in paxos_tests.TestPaxos.contention_test_many_threads

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-13517.
-
Resolution: Cannot Reproduce

Closing for now as it doesn't seem to be a problem of late.

> dtest failure in paxos_tests.TestPaxos.contention_test_many_threads
> ---
>
> Key: CASSANDRA-13517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13517
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Ariel Weisberg
>Assignee: Jason Brown
>Priority: Major
>  Labels: dtest, test-failure, test-failure-fresh
> Attachments: test_failure.txt
>
>
> See attachment for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-11809) IV misuse in commit log encryption

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-11809:
---

Assignee: (was: Jason Brown)

> IV misuse in commit log encryption
> --
>
> Key: CASSANDRA-11809
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11809
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Priority: Major
> Fix For: 3.11.x
>
>
> Commit log segments share iv values between encrypted chunks. The cipher 
> should be reinitialized with a new iv for each discrete piece of data it 
> encrypts, otherwise it gives attackers something to compare between chunks of 
> data. Also, some cipher configurations don't support initialization vectors 
> ('AES/ECB/NoPadding'), so some logic should be added to determine if the 
> cipher should be initialized with an iv.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13856) Optimize ByteBuf reallocations in the native protocol pipeline

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-13856:
---

Assignee: (was: Jason Brown)

> Optimize ByteBuf reallocations in the native protocol pipeline
> --
>
> Key: CASSANDRA-13856
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13856
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Priority: Minor
>
> This is a follow up to CASSANDRA-13789. I discovered we reallocate the 
> {{ByteBuf}} when writing data to it, and it would be nice to size the buffer 
> correctly up-front to avoid reallocating it. I'm not sure how easy that is, 
> nor if the cost of the realloc is cheaper than calculating the size needed 
> for the buffer. Adding this ticket, nonetheless, to explore that optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-11810) IV misuse in hints encryption

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-11810:
---

Assignee: (was: Jason Brown)

> IV misuse in hints encryption
> -
>
> Key: CASSANDRA-11810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11810
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Priority: Major
> Fix For: 3.11.x
>
>
> Encrypted hint files share iv values between encrypted chunks. The cipher 
> should be reinitialized with a new iv for each discrete piece of data it 
> encrypts, otherwise it gives attackers something to compare between chunks of 
> data. Also, some cipher configurations don't support initialization vectors 
> ('AES/ECB/NoPadding'), so some logic should be added to determine if the 
> cipher should be initialized with an iv.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-7922) Add file-level encryption

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-7922:
--

Assignee: (was: Jason Brown)

> Add file-level encryption
> -
>
> Key: CASSANDRA-7922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7922
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jason Brown
>Priority: Major
>  Labels: encryption, security
> Fix For: 4.x
>
>
> Umbrella ticket for file-level encryption
> Some use cases require encrypting files at rest for certain compliance needs: 
> the health­care industry (HIPAA regulations), the card payment industry (PCI 
> DSS regulations) or the US government (FISMA regulations). File system 
> encryption can be used in some situations, but does not solve all problems. 
> I can foresee the following components needing at-rest encryption:
> - sstables (data, index, and summary files) (CASSANDRA-9633)
> - commit log (CASSANDRA-6018)
> - hints (CASSANDRA-11040)
> - some systems tables (batches, not sure if any others)
> - index/row cache
> - secondary indexes
> The work for those items would be separate tickets, of course. I have a 
> working version of most of the above components working in 2.0, which I need 
> to ship in production now, but it's too late for the 2.0 branch and unclear 
> for 2.1.
> Other products, such as Oracle/SqlServer/Datastax Enterprise commonly refer 
> to at-rest encryption as Transparent Data Encryption (TDE), and I'm happy to 
> stick with that convention, here, as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-9633) Add ability to encrypt sstables

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-9633:
--

Assignee: (was: Jason Brown)

> Add ability to encrypt sstables
> ---
>
> Key: CASSANDRA-9633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9633
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jason Brown
>Priority: Major
>  Labels: encryption, security, sstable
> Fix For: 4.x
>
>
> Add option to allow encrypting of sstables.
> I have a version of this functionality built on cassandra 2.0 that 
> piggy-backs on the existing sstable compression functionality and ICompressor 
> interface (similar in nature to what DataStax Enterprise does). However, if 
> we're adding the feature to the main OSS product, I'm not sure if we want to 
> use the pluggable compression framework or if it's worth investigating a 
> different path. I think there's a lot of upside in reusing the sstable 
> compression scheme, but perhaps add a new component in cqlsh for table 
> encryption and a corresponding field in CFMD.
> Encryption configuration in the yaml can use the same mechanism as 
> CASSANDRA-6018 (which is currently pending internal review).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off

2018-10-04 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638181#comment-16638181
 ] 

Jason Brown edited comment on CASSANDRA-14747 at 10/4/18 12:45 PM:
---

Excellent find, [~jolynch].

Looks like we added the ability to set the send/recv buffer size in 
CASSANDRA-3378 (which apparently I reviewed, 5.5 years ago). Looks like in 3.11 
we [set the 
SO_SNDBUF|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L444]
 if the operator provided a value in the yaml, but we did not set a default 
value. However, it does appear I added a hard-coded default in 4.0 with 
CASSANDRA-8457. As it's been nearly two years since I wrote that part of the 
patch, I have no recollection of why I added a default. Removing it is trivial 
and has huge benefits, as [~jolynch] has proven. I'm working on combining the 
findings [~jolynch] and I have discovered over the last weeks and should have a 
patch ready in a few days (which will probably be part CASSANDRA-14503, as most 
of this work was based on that work-in-progress).



was (Author: jasobrown):
Excellent find, [~jolynch].

Looks like we added the ability to set the send/recv buffer size in 
CASSANDRA-3378 (which apparently I reviewed, 5.5 years ago). Looks like in 3.11 
we [set the 
SO_SNDBUF|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L444]
 if the operator provided a value in the yaml, but we did not set a default 
value. However, it does appear I added a hard-coded default in 4.0 with 
CASSANDRA-8457. As it's been nearly two years since I wrote that part of the 
patch, I have no recollection of why I added a default. Removing it is trivial 
and has huge benefits, as  has proven. I'm working on combining the findings 
[~jolynch] and I have discovered over the last weeks and should have a patch 
ready in a few days (which will probably be part CASSANDRA-14503, as most of 
this work was based on that work-in-progress).


> Evaluate 200 node, compression=none, encryption=none, coalescing=off 
> -
>
> Key: CASSANDRA-14747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, 
> 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, 
> 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, 
> 4.0_errors_showing_heap_pressure.txt, 
> 4.0_heap_histogram_showing_many_MessageOuts.txt, 
> i-0ed2acd2dfacab7c1-after-looping-fixes.svg, 
> trunk_vs_3.0.17_latency_under_load.png, 
> ttop_NettyOutbound-Thread_spinning.txt, 
> useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, 
> useast1e-i-08635fa1631601538_flamegraph_96node.svg, 
> useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, 
> useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no 
> compression, no encryption, no coalescing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off

2018-10-04 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638181#comment-16638181
 ] 

Jason Brown commented on CASSANDRA-14747:
-

Excellent find, [~jolynch].

Looks like we added the ability to set the send/recv buffer size in 
CASSANDRA-3378 (which apparently I reviewed, 5.5 years ago). Looks like in 3.11 
we [set the 
SO_SNDBUF|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L444]
 if the operator provided a value in the yaml, but we did not set a default 
value. However, it does appear I added a hard-coded default in 4.0 with 
CASSANDRA-8457. As it's been nearly two years since I wrote that part of the 
patch, I have no recollection of why I added a default. Removing it is trivial 
and has huge benefits, as  has proven. I'm working on combining the findings 
[~jolynch] and I have discovered over the last weeks and should have a patch 
ready in a few days (which will probably be part CASSANDRA-14503, as most of 
this work was based on that work-in-progress).


> Evaluate 200 node, compression=none, encryption=none, coalescing=off 
> -
>
> Key: CASSANDRA-14747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, 
> 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, 
> 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, 
> 4.0_errors_showing_heap_pressure.txt, 
> 4.0_heap_histogram_showing_many_MessageOuts.txt, 
> i-0ed2acd2dfacab7c1-after-looping-fixes.svg, 
> trunk_vs_3.0.17_latency_under_load.png, 
> ttop_NettyOutbound-Thread_spinning.txt, 
> useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, 
> useast1e-i-08635fa1631601538_flamegraph_96node.svg, 
> useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, 
> useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no 
> compression, no encryption, no coalescing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-12297) Privacy Violation - Heap Inspection

2018-10-03 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-12297:
---

Assignee: (was: Jason Brown)

> Privacy Violation - Heap Inspection
> ---
>
> Key: CASSANDRA-12297
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12297
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Eduardo Aguinaga
>Priority: Major
>
> Overview:
> In May through June of 2016 a static analysis was performed on version 3.0.5 
> of the Cassandra source code. The analysis included 
> an automated analysis using HP Fortify v4.21 SCA and a manual analysis 
> utilizing SciTools Understand v4. The results of that 
> analysis includes the issue below.
> Issue:
> In the file PasswordAuthenticator.java on line 129, 164 and 222 a string 
> object is used to store sensitive data. String objects are immutable and 
> should not be used to store sensitive data. Sensitive data should be stored 
> in char or byte arrays and the contents of those arrays should be cleared 
> ASAP. Operations performed on string objects will require that the original 
> object be copied and the operation be applied in the new copy of the string 
> object. This results in the likelihood that multiple copies of sensitive data 
> being present in the heap until garbage collection takes place.
> The snippet below shows the issue on line 129:
> PasswordAuthenticator.java, lines 123-134:
> {code:java}
> 123 public AuthenticatedUser legacyAuthenticate(Map 
> credentials) throws AuthenticationException
> 124 {
> 125 String username = credentials.get(USERNAME_KEY);
> 126 if (username == null)
> 127 throw new AuthenticationException(String.format("Required key 
> '%s' is missing", USERNAME_KEY));
> 128 
> 129 String password = credentials.get(PASSWORD_KEY);
> 130 if (password == null)
> 131 throw new AuthenticationException(String.format("Required key 
> '%s' is missing", PASSWORD_KEY));
> 132 
> 133 return authenticate(username, password);
> 134 }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-8060) Geography-aware, distributed replication

2018-10-03 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-8060:
--

Assignee: (was: Jason Brown)

> Geography-aware, distributed replication
> 
>
> Key: CASSANDRA-8060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Donald Smith
>Priority: Major
>
> We have three data centers in the US (CA in California, TX in Texas, and NJ 
> in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
> our writing to CA.  That represents a bottleneck, since the coordinator nodes 
> in CA are responsible for all the replication to every data center.
> Far better if we had the option of setting things up so that CA replicated to 
> TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
> for replicating to UK, which should replicate to DE.  Etc, etc.
> This could be controlled by the topology file.
> The replication could be organized in a tree-like structure instead of a 
> daisy-chain.
> It would require architectural changes and would have major ramifications for 
> latency but might be appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-12298) Privacy Violation - Heap Inspection

2018-10-03 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-12298:
---

Assignee: (was: Jason Brown)

> Privacy Violation - Heap Inspection
> ---
>
> Key: CASSANDRA-12298
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12298
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Eduardo Aguinaga
>Priority: Major
>
> Overview:
> In May through June of 2016 a static analysis was performed on version 3.0.5 
> of the Cassandra source code. The analysis included 
> an automated analysis using HP Fortify v4.21 SCA and a manual analysis 
> utilizing SciTools Understand v4. The results of that 
> analysis includes the issue below.
> Issue:
> In the file RoleOptions.java on line 89 a string object is used to store 
> sensitive data. String objects are immutable and should not be used to store 
> sensitive data. Sensitive data should be stored in char or byte arrays and 
> the contents of those arrays should be cleared ASAP. Operations performed on 
> string objects will require that the original object be copied and the 
> operation be applied in the new copy of the string object. This results in 
> the likelihood that multiple copies of sensitive data will be present in the 
> heap until garbage collection takes place.
> The snippet below shows the issue on line 89:
> RoleOptions.java, lines 87-90:
> {code:java}
> 87 public Optional getPassword()
> 88 {
> 89 return 
> Optional.fromNullable((String)options.get(IRoleManager.Option.PASSWORD));
> 90 }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off

2018-10-01 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634301#comment-16634301
 ] 

Jason Brown commented on CASSANDRA-14747:
-

[~jolynch] Nice work. I agree the time bounding of dequeueMessages is somewhat 
questionable - I added it in when we were making a bunch of other changes for 
dealing with CPU/task starvation. 

In your gist, I think we can run into some serious overscheduling 
(re-enqueueing of the consumer task) when the channel is unwritable. In that 
case, it will break out of dequeueMessages's while loop immediately, but then 
immediately reschedule (assuming backlog > 0).  We'll keep doing this, very 
aggressively, until the channel becomes writable again - yet we cannot make any 
meaningful progress. To counteract this, that's why I had dequeueMessages not 
reschedule, but instead had handleMessageResult reschedule because at that 
point (remember, we only attach the listener to that last message of the bunch) 
we know the bytes have been written to the socket and that channel should be 
writable again. In this case we only schedule (or directly execute) 
dequeueMessages when we need to. (Note: this was probably not apparent from the 
current code's comments, so I should definitely improve that.)


> Evaluate 200 node, compression=none, encryption=none, coalescing=off 
> -
>
> Key: CASSANDRA-14747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, 
> 4.0.11-after-jolynch-tweaks.svg, 4.0.7-before-my-changes.svg, 
> 4.0_errors_showing_heap_pressure.txt, 
> 4.0_heap_histogram_showing_many_MessageOuts.txt, 
> i-0ed2acd2dfacab7c1-after-looping-fixes.svg, 
> ttop_NettyOutbound-Thread_spinning.txt, 
> useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, 
> useast1e-i-08635fa1631601538_flamegraph_96node.svg, 
> useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, 
> useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no 
> compression, no encryption, no coalescing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3

2018-09-20 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622024#comment-16622024
 ] 

Jason Brown commented on CASSANDRA-14760:
-

Cool, I'll go ahead and close.

> CVE-2018-10237 Security vulnerability in 3.11.3
> ---
>
> Key: CASSANDRA-14760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: John F. Gbruoski
>Priority: Major
>
> As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a 
> security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be 
> patched to support Guava  24.1.1 or later?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3

2018-09-20 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-14760.
-
Resolution: Not A Problem

> CVE-2018-10237 Security vulnerability in 3.11.3
> ---
>
> Key: CASSANDRA-14760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: John F. Gbruoski
>Priority: Major
>
> As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a 
> security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be 
> patched to support Guava  24.1.1 or later?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3

2018-09-20 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-14760:
---

Assignee: Jason Brown

> CVE-2018-10237 Security vulnerability in 3.11.3
> ---
>
> Key: CASSANDRA-14760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: John F. Gbruoski
>Assignee: Jason Brown
>Priority: Major
>
> As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a 
> security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be 
> patched to support Guava  24.1.1 or later?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14767) Embedded cassandra not working after jdk10 upgrade

2018-09-20 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622018#comment-16622018
 ] 

Jason Brown commented on CASSANDRA-14767:
-

Cassandra only supports java 8. Java 11 support has been added with 
CASSANDRA-9608, but that is a feature only on trunk (soon to be cassandra 4.0).

> Embedded cassandra not working after jdk10 upgrade
> --
>
> Key: CASSANDRA-14767
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14767
> Project: Cassandra
>  Issue Type: Bug
>Reporter: parthiban
>Priority: Blocker
>
> Embedded cassandra not working after jdk10 upgrade. Could some one help me on 
> this.
> Cassandra config:
> {{try \{ EmbeddedCassandraServerHelper.startEmbeddedCassandra(); }catch 
> (Exception e) \{ LOGGER.error(" CommonConfig ", " cluster()::Exception while 
> creating cluster ", e); System.setProperty("cassandra.config", 
> "cassandra.yaml"); DatabaseDescriptor.daemonInitialization(); 
> EmbeddedCassandraServerHelper.startEmbeddedCassandra(); } Cluster cluster = 
> Cluster.builder() 
> .addContactPoints(environment.getProperty(TextToClipConstants.CASSANDRA_CONTACT_POINTS)).withPort(Integer.parseInt(environment.getProperty(TextToClipConstants.CASSANDRA_PORT))).build();
>  Session session = cluster.connect(); 
> session.execute(KEYSPACE_CREATION_QUERY); 
> session.execute(KEYSPACE_ACTIVATE_QUERY); }}
>  
> {{build.gradle}}
> {{buildscript \{ ext { springBootVersion = '2.0.1.RELEASE' } repositories \{ 
> mavenCentral() mavenLocal() } dependencies \{ 
> classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
>  classpath ("com.bmuschko:gradle-docker-plugin:3.2.1") classpath 
> ("org.sonarsource.scanner.gradle:sonarqube-gradle-plugin:2.5") 
> classpath("au.com.dius:pact-jvm-provider-gradle_2.12:3.5.13") classpath 
> ("com.moowork.gradle:gradle-node-plugin:1.2.0") } } plugins \{ //id 
> "au.com.dius.pact" version "3.5.7" id "com.gorylenko.gradle-git-properties" 
> version "1.4.17" id "de.undercouch.download" version "3.4.2" } apply plugin: 
> 'java' apply plugin: 'eclipse' apply plugin: 'org.springframework.boot' apply 
> plugin: 'io.spring.dependency-management' apply plugin: 
> 'com.bmuschko.docker-remote-api' apply plugin: 'jacoco' apply plugin: 
> 'maven-publish' apply plugin: 'org.sonarqube' apply plugin: 
> 'au.com.dius.pact' apply plugin: 'scala' sourceCompatibility = 1.8 
> repositories \{ mavenCentral() maven { url "https://repo.spring.io/milestone; 
> } mavenLocal() } ext \{ springCloudVersion = 'Finchley.RELEASE' } pact \{ 
> serviceProviders { rxorder { publish { pactDirectory = 
> '/Users/sv/Documents/wag-doc-text2clip/target/pacts' // defaults to 
> $buildDir/pacts pactBrokerUrl = 'http://localhost:80' version=2.0 } } } } 
> //start of integration tests changes sourceSets \{ integrationTest { java { 
> compileClasspath += main.output + test.output runtimeClasspath += main.output 
> + test.output srcDir file('test/functional-api/java') } resources.srcDir 
> file('test/functional-api/resources') } } configurations \{ 
> integrationTestCompile.extendsFrom testCompile 
> integrationTestRuntime.extendsFrom testRuntime } //end of integration tests 
> changes dependencies \{ //web (Tomcat, Logging, Rest) compile group: 
> 'org.springframework.boot', name: 'spring-boot-starter-web' // Redis 
> //compile group: 'org.springframework.boot', name: 
> 'spring-boot-starter-data-redis' //Mongo Starter compile group: 
> 'org.springframework.boot', name:'spring-boot-starter-data-mongodb' // 
> Configuration processor - To Generate MetaData Files. The files are designed 
> to let developers offer “code completion� as users are working with 
> application.properties compile group: 'org.springframework.boot', name: 
> 'spring-boot-configuration-processor' // Actuator - Monitoring compile group: 
> 'org.springframework.boot', name: 'spring-boot-starter-actuator' //Sleuth - 
> Tracing compile group: 'org.springframework.cloud', name: 
> 'spring-cloud-starter-sleuth' //Hystrix - Circuit Breaker compile group: 
> 'org.springframework.cloud', name: 'spring-cloud-starter-netflix-hystrix' // 
> Hystrix - Dashboard compile group: 'org.springframework.cloud', name: 
> 'spring-cloud-starter-netflix-hystrix-dashboard' // Thymeleaf compile group: 
> 'org.springframework.boot', name: 'spring-boot-starter-thymeleaf' //Voltage 
> // Device Detection compile group: 'org.springframework.boot', name: 
> 'spring-boot-starter-data-cassandra', version:'2.0.4.RELEASE' compile group: 
> 'com.google.guava', name: 'guava', version: '23.2-jre' 
> compile('com.google.code.gson:gson:2.8.0') compile('org.json:json:20170516') 
> //Swagger compile group: 'io.springfox', name: 'springfox-swagger2', 
> version:'2.8.0' compile group: 'io.springfox', name: 

[jira] [Commented] (CASSANDRA-14760) CVE-2018-10237 Security vulnerability in 3.11.3

2018-09-18 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619378#comment-16619378
 ] 

Jason Brown commented on CASSANDRA-14760:
-

The CVE seems to apply to only to:

- AtomicDoubleArray (when serialized with Java serialization)
- CompoundOrdering (when serialized with GWT serialization)

Cassandra uses neither of those classes, nor do we use Java nor GWT 
serialization. Thus, it's not clear this CVE is a problem for us.

 

> CVE-2018-10237 Security vulnerability in 3.11.3
> ---
>
> Key: CASSANDRA-14760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: John F. Gbruoski
>Priority: Major
>
> As described in the CVE, Guava 11.0 through 24.x before 24.1.1 have a 
> security exposure. Cassandra 3.11.3 uses Guava 18.0. Can Cassandra 3.11 be 
> patched to support Guava  24.1.1 or later?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming

2018-09-18 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619368#comment-16619368
 ] 

Jason Brown commented on CASSANDRA-14685:
-

{quote}One weird behavior of streaming is that when the coordinator goes down, 
"nodetool netstats" still shows progress on the replicas until it reaches 100% 
and it stays like this. It even starts streaming new files although the target 
node is still down
{quote}
I discovered that, as well, when investigating this one. I have a working fix 
for it, as well as CASSANDRA-14520, and am working out the kinks. Hoping to get 
it out ASAP..

> Incremental repair 4.0 : SSTables remain locked forever if the coordinator 
> dies during streaming 
> -
>
> Key: CASSANDRA-14685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14685
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Alexander Dejanovski
>Assignee: Jason Brown
>Priority: Critical
>
> The changes in CASSANDRA-9143 modified the way incremental repair performs by 
> applying the following sequence of events : 
>  * Anticompaction is executed on all replicas for all SSTables overlapping 
> the repaired ranges
>  * Anticompacted SSTables are then marked as "Pending repair" and cannot be 
> compacted anymore, nor part of another repair session
>  * Merkle trees are generated and compared
>  * Streaming takes place if needed
>  * Anticompaction is committed and "pending repair" table are marked as 
> repaired if it succeeded, or they are released if the repair session failed.
> If the repair coordinator dies during the streaming phase, *the SSTables on 
> the replicas will remain in "pending repair" state and will never be eligible 
> for repair or compaction*, even after all the nodes in the cluster are 
> restarted. 
> Steps to reproduce (I've used Jason's 13938 branch that fixes streaming 
> errors) : 
> {noformat}
> ccm create inc-repair-issue -v github:jasobrown/13938 -n 3
> # Allow jmx access and remove all rpc_ settings in yaml
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh;
> do
>   sed -i'' -e 
> 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g'
>  $f
> done
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml;
> do
>   grep -v "rpc_" $f > ${f}.tmp
>   cat ${f}.tmp > $f
> done
> ccm start
> {noformat}
> I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a 
> few 10s of MBs of data (killed it after some time). Obviously 
> cassandra-stress works as well :
> {noformat}
> bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000  
> --replication "{'class':'SimpleStrategy', 'replication_factor':2}"   
> --compaction "{'class': 'SizeTieredCompactionStrategy'}"   --host 
> 127.0.0.1
> {noformat}
> Flush and delete all SSTables in node1 :
> {noformat}
> ccm node1 nodetool flush
> ccm node1 stop
> rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.*
> ccm node1 start{noformat}
> Then throttle streaming throughput to 1MB/s so we have time to take node1 
> down during the streaming phase and run repair:
> {noformat}
> ccm node1 nodetool setstreamthroughput 1
> ccm node2 nodetool setstreamthroughput 1
> ccm node3 nodetool setstreamthroughput 1
> ccm node1 nodetool repair tlp_stress
> {noformat}
> Once streaming starts, shut down node1 and start it again :
> {noformat}
> ccm node1 stop
> ccm node1 start
> {noformat}
> Run repair again :
> {noformat}
> ccm node1 nodetool repair tlp_stress
> {noformat}
> The command will return very quickly, showing that it skipped all sstables :
> {noformat}
> [2018-08-31 19:05:16,292] Repair completed successfully
> [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds
> $ ccm node1 nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   OwnsHost ID
>Rack
> UN  127.0.0.1  228,64 KiB  256  ?   
> 437dc9cd-b1a1-41a5-961e-cfc99763e29f  rack1
> UN  127.0.0.2  60,09 MiB  256  ?   
> fbcbbdbb-e32a-4716-8230-8ca59aa93e62  rack1
> UN  127.0.0.3  57,59 MiB  256  ?   
> a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0  rack1
> {noformat}
> sstablemetadata will then show that nodes 2 and 3 have SSTables still in 
> "pending repair" state :
> {noformat}
> ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | 
> grep repair
> SSTable: 
> /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big
> Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62
> {noformat}
> Restarting these nodes 

[jira] [Commented] (CASSANDRA-14758) Remove "audit" entry from .gitignore

2018-09-18 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619075#comment-16619075
 ] 

Jason Brown commented on CASSANDRA-14758:
-

+1

> Remove "audit" entry from .gitignore
> 
>
> Key: CASSANDRA-14758
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14758
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Minor
>
> Seems there was a "audit" entry added to the .gitignore file in 
> CASSANDRA-9608, not sure why, but it makes it kind of hard to work with files 
> in the {{org.apache.cassandra.audit}} package



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14757) GCInspector "Error accessing field of java.nio.Bits" under java11

2018-09-18 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14757:

Description: 
Running under java11, {{GCInspector}} throws the following exception:
{noformat}
DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing 
field of java.nio.Bits
java.lang.NoSuchFieldException: totalCapacity
at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
at 
org.apache.cassandra.service.GCInspector.(GCInspector.java:72)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679)
{noformat}
This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} 
from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} somewhere 
between java8 and java11.

Note: this is a rather harmless error, as we only look at 
{{Bits.totalCapacity}} for metrics collection on how much direct memory is 
being used by {{ByteBuffer}}s. If we fail to read the field, we simply return 
-1 for the metric value.

  was:
Running under java11, {{GCInspector}} throws the following exception:

{noformat}
DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing 
field of java.nio.Bits
java.lang.NoSuchFieldException: totalCapacity
at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
at 
org.apache.cassandra.service.GCInspector.(GCInspector.java:72)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679)
{noformat}

This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} 
from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} somewhere 
between java8 and java11.


> GCInspector "Error accessing field of java.nio.Bits" under java11
> -
>
> Key: CASSANDRA-14757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Jason Brown
>Priority: Trivial
>  Labels: Java11
> Fix For: 4.0
>
>
> Running under java11, {{GCInspector}} throws the following exception:
> {noformat}
> DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing 
> field of java.nio.Bits
> java.lang.NoSuchFieldException: totalCapacity
> at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
> at 
> org.apache.cassandra.service.GCInspector.(GCInspector.java:72)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679)
> {noformat}
> This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} 
> from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} 
> somewhere between java8 and java11.
> Note: this is a rather harmless error, as we only look at 
> {{Bits.totalCapacity}} for metrics collection on how much direct memory is 
> being used by {{ByteBuffer}}s. If we fail to read the field, we simply return 
> -1 for the metric value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14757) GCInspector "Error accessing field of java.nio.Bits" under java11

2018-09-18 Thread Jason Brown (JIRA)
Jason Brown created CASSANDRA-14757:
---

 Summary: GCInspector "Error accessing field of java.nio.Bits" 
under java11
 Key: CASSANDRA-14757
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14757
 Project: Cassandra
  Issue Type: Bug
  Components: Metrics
Reporter: Jason Brown
 Fix For: 4.0


Running under java11, {{GCInspector}} throws the following exception:

{noformat}
DEBUG [main] 2018-09-18 05:18:25,905 GCInspector.java:78 - Error accessing 
field of java.nio.Bits
java.lang.NoSuchFieldException: totalCapacity
at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
at 
org.apache.cassandra.service.GCInspector.(GCInspector.java:72)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:590)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679)
{noformat}

This is because {{GCInspector}} uses reflection to read the {{totalCapacity}} 
from {{java.nio.Bits}}. This field was renamed to {{TOTAL_CAPACITY}} somewhere 
between java8 and java11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14754) Add verification of state machine in StreamSession

2018-09-15 Thread Jason Brown (JIRA)
Jason Brown created CASSANDRA-14754:
---

 Summary: Add verification of state machine in StreamSession
 Key: CASSANDRA-14754
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14754
 Project: Cassandra
  Issue Type: Task
  Components: Streaming and Messaging
Reporter: Jason Brown
Assignee: Jason Brown
 Fix For: 4.0


{{StreamSession}} contains an implicit state machine, but we have no 
verification of the safety of the transitions between states. For example, we 
have no checks to ensure we cannot leave the final states (COMPLETED, FAILED).

I propose we add some program logic in {{StreamSession}}, tests, and 
documentation to ensure the correctness of the state transitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off

2018-09-12 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612937#comment-16612937
 ] 

Jason Brown commented on CASSANDRA-14747:
-

[~jolynch] When you have a chance, please take the branch linked on 
CASSANDRA-14503 and give it a spin. That has the fix for queue bounds.

> Evaluate 200 node, compression=none, encryption=none, coalescing=off 
> -
>
> Key: CASSANDRA-14747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Joseph Lynch
>Priority: Major
> Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, 
> 4.0_errors_showing_heap_pressure.txt, 
> 4.0_heap_histogram_showing_many_MessageOuts.txt, 
> useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no 
> compression, no encryption, no coalescing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming

2018-09-11 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610564#comment-16610564
 ] 

Jason Brown commented on CASSANDRA-14685:
-

[~adejanovski] ughh, missed the update on this. Definitely looks like something 
isn't timing out properly in streaming. I'll start digging into the streaming 
part of this. 

[~bdeggleston], can you comment about this part:

bq. replicas will remain in "pending repair" state and will never be eligible 
for repair or compaction, even after all the nodes in the cluster are 
restarted. 


> Incremental repair 4.0 : SSTables remain locked forever if the coordinator 
> dies during streaming 
> -
>
> Key: CASSANDRA-14685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14685
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Alexander Dejanovski
>Assignee: Jason Brown
>Priority: Critical
>
> The changes in CASSANDRA-9143 modified the way incremental repair performs by 
> applying the following sequence of events : 
>  * Anticompaction is executed on all replicas for all SSTables overlapping 
> the repaired ranges
>  * Anticompacted SSTables are then marked as "Pending repair" and cannot be 
> compacted anymore, nor part of another repair session
>  * Merkle trees are generated and compared
>  * Streaming takes place if needed
>  * Anticompaction is committed and "pending repair" table are marked as 
> repaired if it succeeded, or they are released if the repair session failed.
> If the repair coordinator dies during the streaming phase, *the SSTables on 
> the replicas will remain in "pending repair" state and will never be eligible 
> for repair or compaction*, even after all the nodes in the cluster are 
> restarted. 
> Steps to reproduce (I've used Jason's 13938 branch that fixes streaming 
> errors) : 
> {noformat}
> ccm create inc-repair-issue -v github:jasobrown/13938 -n 3
> # Allow jmx access and remove all rpc_ settings in yaml
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh;
> do
>   sed -i'' -e 
> 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g'
>  $f
> done
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml;
> do
>   grep -v "rpc_" $f > ${f}.tmp
>   cat ${f}.tmp > $f
> done
> ccm start
> {noformat}
> I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a 
> few 10s of MBs of data (killed it after some time). Obviously 
> cassandra-stress works as well :
> {noformat}
> bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000  
> --replication "{'class':'SimpleStrategy', 'replication_factor':2}"   
> --compaction "{'class': 'SizeTieredCompactionStrategy'}"   --host 
> 127.0.0.1
> {noformat}
> Flush and delete all SSTables in node1 :
> {noformat}
> ccm node1 nodetool flush
> ccm node1 stop
> rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.*
> ccm node1 start{noformat}
> Then throttle streaming throughput to 1MB/s so we have time to take node1 
> down during the streaming phase and run repair:
> {noformat}
> ccm node1 nodetool setstreamthroughput 1
> ccm node2 nodetool setstreamthroughput 1
> ccm node3 nodetool setstreamthroughput 1
> ccm node1 nodetool repair tlp_stress
> {noformat}
> Once streaming starts, shut down node1 and start it again :
> {noformat}
> ccm node1 stop
> ccm node1 start
> {noformat}
> Run repair again :
> {noformat}
> ccm node1 nodetool repair tlp_stress
> {noformat}
> The command will return very quickly, showing that it skipped all sstables :
> {noformat}
> [2018-08-31 19:05:16,292] Repair completed successfully
> [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds
> $ ccm node1 nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   OwnsHost ID
>Rack
> UN  127.0.0.1  228,64 KiB  256  ?   
> 437dc9cd-b1a1-41a5-961e-cfc99763e29f  rack1
> UN  127.0.0.2  60,09 MiB  256  ?   
> fbcbbdbb-e32a-4716-8230-8ca59aa93e62  rack1
> UN  127.0.0.3  57,59 MiB  256  ?   
> a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0  rack1
> {noformat}
> sstablemetadata will then show that nodes 2 and 3 have SSTables still in 
> "pending repair" state :
> {noformat}
> ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | 
> grep repair
> SSTable: 
> /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big
> Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62
> {noformat}
> Restarting these nodes wouldn't help either.



--
This message was sent by 

[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105
 ] 

Jason Brown commented on CASSANDRA-13938:
-

[~dimitarndimitrov], Thanks for your comments, and apologies for the late 
response.

While your proposed simplification indeed clarifies the logic, unfortunately it 
doesn't resolve the bug (my dtest still fails - this is due to the need to 
reset a 'some' value, like the currentBufferOffset, after rebufferring). 
However, your observation about simplifying this patch (in particular eliminate 
{{currentBufferOffset}} made me reconsider the needs of this class. Basically, 
we just need to correctly track the streamOffset for the current buffer, and 
that's all. When I ported this clas from 3.11, I over-complicated the offsets 
and
 counters into the first version of this class (committed with 
CASSANDRA-12229), and then confused it again (while resolving the error) with 
the first patch.

In short: as long as I correctly calculate streamOffset, that should satisfy 
the needs for the class. Thus, I eliminated both {{current}} and 
{{currentBufferOffset}}, and the result is clearer and correct.

I've pushed a cleaned up branch (which has been rebased to trunk). Please note 
that, as with the first patch, the majority of this patch is refactoring to 
clean up the class in general. I've also updated my dtest patch as my version 
required a stress profile (based on [~zznate]'s original) to be committed, as 
well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as 
before, I'm unable to get that to fail on trunk.)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Critical
> Fix For: 4.x
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last 

[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105
 ] 

Jason Brown edited comment on CASSANDRA-13938 at 9/11/18 5:01 AM:
--

[~dimitarndimitrov], Thanks for your comments, and apologies for the late 
response.

While your proposed simplification indeed clarifies the logic, unfortunately it 
doesn't resolve the bug (my dtest still fails - this is due to the need to 
reset a 'some' value, like the currentBufferOffset, after rebufferring). 
However, your observation about simplifying this patch (in particular eliminate 
{{currentBufferOffset}} made me reconsider the needs of this class. Basically, 
we just need to correctly track the streamOffset for the current buffer, and 
that's all. When I ported this clas from 3.11, I over-complicated the offsets 
and counters into the first version of this class (committed with 
CASSANDRA-12229), and then confused it again (while resolving the error) with 
the first patch.

In short: as long as I correctly calculate streamOffset, that should satisfy 
the needs for the class. Thus, I eliminated both {{current}} and 
{{currentBufferOffset}}, and the result is clearer and correct.

I've pushed a cleaned up branch (which has been rebased to trunk). Please note 
that, as with the first patch, the majority of this patch is refactoring to 
clean up the class in general. I've also updated my dtest patch as my version 
required a stress profile (based on [~zznate]'s original) to be committed, as 
well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as 
before, I'm unable to get that to fail on trunk.)


was (Author: jasobrown):
[~dimitarndimitrov], Thanks for your comments, and apologies for the late 
response.

While your proposed simplification indeed clarifies the logic, unfortunately it 
doesn't resolve the bug (my dtest still fails - this is due to the need to 
reset a 'some' value, like the currentBufferOffset, after rebufferring). 
However, your observation about simplifying this patch (in particular eliminate 
{{currentBufferOffset}} made me reconsider the needs of this class. Basically, 
we just need to correctly track the streamOffset for the current buffer, and 
that's all. When I ported this clas from 3.11, I over-complicated the offsets 
and
 counters into the first version of this class (committed with 
CASSANDRA-12229), and then confused it again (while resolving the error) with 
the first patch.

In short: as long as I correctly calculate streamOffset, that should satisfy 
the needs for the class. Thus, I eliminated both {{current}} and 
{{currentBufferOffset}}, and the result is clearer and correct.

I've pushed a cleaned up branch (which has been rebased to trunk). Please note 
that, as with the first patch, the majority of this patch is refactoring to 
clean up the class in general. I've also updated my dtest patch as my version 
required a stress profile (based on [~zznate]'s original) to be committed, as 
well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as 
before, I'm unable to get that to fail on trunk.)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Critical
> Fix For: 4.x
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   

[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609973#comment-16609973
 ] 

Jason Brown commented on CASSANDRA-14346:
-

Somehow this got marked as Ready to Commit; switched back to Patch Available.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14346:

Status: Patch Available  (was: Awaiting Feedback)

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14346:

Status: Awaiting Feedback  (was: In Progress)

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14346:

Status: In Progress  (was: Ready to Commit)

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14714:

Labels: Java11  (was: )

> `ant artifacts` broken on trunk (4.0); creates no tar artifacts
> ---
>
> Key: CASSANDRA-14714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Priority: Blocker
>  Labels: Java11
> Fix For: 4.0
>
>
> `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
> Additionally, the target does not exit non-zero, so the result is:
> {noformat}
> <...>
> artifacts:
> BUILD SUCCESSFUL
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14712) Cassandra 4.0 packaging support

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14712:

Labels: Java11  (was: )

> Cassandra 4.0 packaging support
> ---
>
> Key: CASSANDRA-14712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14712
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Stefan Podkowinski
>Priority: Major
>  Labels: Java11
> Fix For: 4.x
>
>
> Currently it's not possible to build any native packages (.deb/.rpm) for 
> trunk.
> cassandra-builds - docker/*-image.docker
>  * Add Java11 to debian+centos build image
>  * (packaged ant scripts won't work with Java 11 on centos, so we may have to 
> install ant from tarballs)
> cassandra-builds - docker/build-*.sh
>  * set JAVA8_HOME to Java8
>  * set JAVA_HOME to Java11 (4.0) or Java8 (<4.0)
> cassandra - redhat/cassandra.spec
>  * Check if patches still apply after CASSANDRA-14707
>  * Add fqltool as %files
> We may also have to change the version handling in build.xml or build-*.sh, 
> depending how we plan to release packages during beta, or if we plan to do so 
> at all before GA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14503:

Fix Version/s: 4.0
   Status: Patch Available  (was: Open)

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609099#comment-16609099
 ] 

Jason Brown commented on CASSANDRA-14503:
-

Patch available here:

||14503||
|[branch|https://github.com/jasobrown/cassandra/tree/14503]|
|[utests  
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503]|
||

Additionally, I've [created a Pull 
Request|https://github.com/apache/cassandra/pull/264] for review, as well.

Note: this patch will need to be rebased when CASSANDRA-13630 is committed, and 
incorprate the changes ChannelWriter for large messages, but that should not 
affect this patch much (I've been keeping that in mind as I worked on this)

- OutboundMessagingConnection changes 
-- All producer threads queue messages into the backlog, and messages are only 
consumed by a task on a fixed thread (the event loop). Producers will contend 
to schedule the consumer, 
but have no further involvement in sending a message (unlike the current 
implementation).
-- All netty-related activity (setting up a remote connection, 
connection-related callbacks and time outs, consuming form the backlog and 
writing to the channel and associated callbacks)
are all handled on the event loop. OutboundMessagingConnection gets a reference 
to a event loop in it's constructor, and uses that for the duration of it's 
lifetime.
-- Finally forward-ported the queue bounding functionality of CASSANDRA-13265. 
In short, we want to limit the size of queued messages in order to not OOM. 
Thus, we schedule a task for the consumer thread
 that examines the queue looking for elements to prune. Further, I've added a 
naive upper bound to the queue so that producers drop messages before enqueuing 
if the backlog is in a *really* bad state.
@djoshi3 has recomended bounding by message size rather than by message count, 
which I agree with, but propose saving that for a followup ticket.
-- Cleaner, more documented, and better tested State machine to manage state 
transitions for the class.

- ChannelWriter and MessageOutHandler became much simpler as we can control the 
flush behaviors from the OMC (instead of the previous complicated CW/MOH dance) 
because we're already on the event loop
when consuming from the backlog and writing to the channel.

- I was able to clean up/remove a bunch of extra code due to this 
simplification, as well (ExpiredException, OutboundMessagingParameters, 
MessageResult)

- Updated all the javadoc documentation for these changes (mostly OMC and 
ChannelWriter)

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609050#comment-16609050
 ] 

Jason Brown commented on CASSANDRA-14711:
-

So, the first thing to know is that 3.2 is an, old unsupported release. 3.11.3 
is the currently supported 3.X release.

> Apache Cassandra 3.2 crashing with exception 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom
> 
>
> Key: CASSANDRA-14711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14711
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Saurabh
>Priority: Major
> Attachments: hs_err_pid32069.log
>
>
> Hi Team,
> I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..
> Issue:
> Cassandra is continuously crashing with generating an HEAP dump log. There 
> are no errors reported in system.log OR Debug.log.
> Exception in hs_err_PID.log:
>  # Problematic frame:
>  # J 8283 C2 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
>  (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
> Java Threads: ( => current thread )
>  0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
> [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
>  0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
> [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
>  0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon 
> [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
>  0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
> [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
>  0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
>  :
>  :
>  lot of threads in BLOCKED status
> Other Threads:
>  0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
> [id=32098]
>  0x2b7d38fa9de0 WatcherThread [stack: 
> 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]
> VM state:not at safepoint (normal execution)
> VM Mutex/Monitor currently owned by a thread: None
> Heap:
>  garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
> 0x0003c0404000, 0x0007c000)
>  region size 4096K, 785 young (3215360K), 55 survivors (225280K)
>  Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
>  class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K
> Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
> HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 
> PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start)
>  AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 
> 100% used [0x0003c000, 0x0003c040)
>  AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c040, 0x0003c080)
>  AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c080, 0x0003c0c0)
>  AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 
> 100% used [0x0003c0c0, 0x0003c100)
>  AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 
> 100% used [0x0003c100, 0x0003c140)
>  AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 
> 100% used [0x0003c140, 0x0003c180)
>  :
>  :
>  lot of such messages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13630) support large internode messages with netty

2018-09-07 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607603#comment-16607603
 ] 

Jason Brown commented on CASSANDRA-13630:
-

[~djoshi3] Made a few comments on the PR, and in response I have:

- moved autoRead check out of {{RebufferingByteBufDataInputPlus.available}} 
method and into it's own method; also added tests
- refactored {{MessageInProcessor.process}} to move the main loop logic into 
the base class, and moved logic for each case statement into sub-methods rather 
than directly in-line with the loop

> support large internode messages with netty
> ---
>
> Key: CASSANDRA-13630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.0
>
>
> As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the 
> scope of that ticket. However, we still need that functionality to ship a 
> correctly operating internode messaging subsystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14285) Comma at the end of the end of the seed list is interpretated as localhost

2018-09-06 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14285:

Reviewer: Jordan West

> Comma at the end of the end of the seed list is interpretated as localhost
> --
>
> Key: CASSANDRA-14285
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14285
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Marco
>Assignee: Nicolas Guyomar
>Priority: Minor
> Fix For: 4.0
>
>
> Seeds: '10.1.20.10,10.1.21.10,10.1.22.10,'  cause a flood of the debug log 
> with messages like this one.
> DEBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2018-02-28 
> 15:53:57,314 OutboundTcpConnection.java:545 - Unable to connect to 
> localhost/[127.0.0.1|http://127.0.0.1/]
> This code provide by Nicolas Guyomar provide the reason of the issue.
> In SImpleSeedProvider : 
>  
> String[] hosts = "10.1.20.10,10.1.21.10,10.1.22.10,".split(",", -1);
> List seeds = new ArrayList(hosts.length);
> for (String host : hosts)
> {
> System.out.println(InetAddress.getByName(host.trim()));
> }
>  
> output : 
> /[10.1.20.10|http://10.1.20.10/]
> /[10.1.21.10|http://10.1.21.10/]
> /[10.1.22.10|http://10.1.22.10/]
> localhost/[127.0.0.1|http://127.0.0.1/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14618) Create fqltool replay command

2018-08-31 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599274#comment-16599274
 ] 

Jason Brown commented on CASSANDRA-14618:
-

+1, and please commit with CASSANDRA-14619 (as both patches are linked, code 
and review wise)

> Create fqltool replay command
> -
>
> Key: CASSANDRA-14618
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14618
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool
> Fix For: 4.x
>
>
> Make it possible to replay the full query logs from CASSANDRA-13983 against 
> one or several clusters. The goal is to be able to compare different runs of 
> production traffic against different versions/configurations of Cassandra.
> * It should be possible to take logs from several machines and replay them in 
> "order" by the timestamps recorded
> * Record the results from each run to be able to compare different runs 
> (against different clusters/versions/etc)
> * If {{fqltool replay}} is run against 2 or more clusters, the results should 
> be compared as we go



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14618) Create fqltool replay command

2018-08-31 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14618:

Status: Ready to Commit  (was: Patch Available)

> Create fqltool replay command
> -
>
> Key: CASSANDRA-14618
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14618
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool
> Fix For: 4.x
>
>
> Make it possible to replay the full query logs from CASSANDRA-13983 against 
> one or several clusters. The goal is to be able to compare different runs of 
> production traffic against different versions/configurations of Cassandra.
> * It should be possible to take logs from several machines and replay them in 
> "order" by the timestamps recorded
> * Record the results from each run to be able to compare different runs 
> (against different clusters/versions/etc)
> * If {{fqltool replay}} is run against 2 or more clusters, the results should 
> be compared as we go



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14619) Create fqltool compare command

2018-08-31 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14619:

Status: Ready to Commit  (was: Patch Available)

> Create fqltool compare command
> --
>
> Key: CASSANDRA-14619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14619
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool
> Fix For: 4.x
>
>
> We need a {{fqltool compare}} command that can take the recorded runs from 
> CASSANDRA-14618 and compares them, it should output any differences and 
> potentially all queries against the mismatching partition up until the 
> mismatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command

2018-08-31 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599273#comment-16599273
 ] 

Jason Brown commented on CASSANDRA-14619:
-

- ColumnDefsReader.readMarshallable - you read an int32 value, but in 
ColumnDefsWriter.writeMarshallable, you wrote an int16. Is this correct? The 
unit tests pass, but I'm not sure if RecordStore is being fully exercised. The 
same thing happens in RowReader vs RowWriter

UPDATE: I stepped through the chronicle code and it looks like the library can 
optimize the value it writes out (it only gets written as a byte, basically, 
since your value is zero). So, while your API calls are incongruous, the 
library does a correct thing under the hood. I would still prefer you to switch 
to the reads to int16(), but that can be done on commit.

I also had a few trivial comments on the PR linked above. They are minor, so 
just address them on commit (if you choose).

Otherwise, +1 from me.

> Create fqltool compare command
> --
>
> Key: CASSANDRA-14619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14619
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool
> Fix For: 4.x
>
>
> We need a {{fqltool compare}} command that can take the recorded runs from 
> CASSANDRA-14618 and compares them, it should output any differences and 
> potentially all queries against the mismatching partition up until the 
> mismatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming

2018-08-31 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-14685:
---

Assignee: Jason Brown

> Incremental repair 4.0 : SSTables remain locked forever if the coordinator 
> dies during streaming 
> -
>
> Key: CASSANDRA-14685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14685
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Alexander Dejanovski
>Assignee: Jason Brown
>Priority: Critical
>
> The changes in CASSANDRA-9143 modified the way incremental repair performs by 
> applying the following sequence of events : 
>  * Anticompaction is executed on all replicas for all SSTables overlapping 
> the repaired ranges
>  * Anticompacted SSTables are then marked as "Pending repair" and cannot be 
> compacted anymore, nor part of another repair session
>  * Merkle trees are generated and compared
>  * Streaming takes place if needed
>  * Anticompaction is committed and "pending repair" table are marked as 
> repaired if it succeeded, or they are released if the repair session failed.
> If the repair coordinator dies during the streaming phase, *the SSTables on 
> the replicas will remain in "pending repair" state and will never be eligible 
> for repair or compaction*, even after all the nodes in the cluster are 
> restarted. 
> Steps to reproduce (I've used Jason's 13938 branch that fixes streaming 
> errors) : 
> {noformat}
> ccm create inc-repair-issue -v github:jasobrown/13938 -n 3
> # Allow jmx access and remove all rpc_ settings in yaml
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh;
> do
>   sed -i'' -e 
> 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g'
>  $f
> done
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml;
> do
>   grep -v "rpc_" $f > ${f}.tmp
>   cat ${f}.tmp > $f
> done
> ccm start
> {noformat}
> I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a 
> few 10s of MBs of data (killed it after some time). Obviously 
> cassandra-stress works as well :
> {noformat}
> bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000  
> --replication "{'class':'SimpleStrategy', 'replication_factor':2}"   
> --compaction "{'class': 'SizeTieredCompactionStrategy'}"   --host 
> 127.0.0.1
> {noformat}
> Flush and delete all SSTables in node1 :
> {noformat}
> ccm node1 nodetool flush
> rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.*
> {noformat}
> Then throttle streaming throughput to 1MB/s so we have time to take node1 
> down during the streaming phase and run repair:
> {noformat}
> ccm node1 nodetool setstreamthroughput 1
> ccm node2 nodetool setstreamthroughput 1
> ccm node3 nodetool setstreamthroughput 1
> ccm node1 nodetool repair tlp_stress
> {noformat}
> Once streaming starts, shut down node1 and start it again :
> {noformat}
> ccm node1 stop
> ccm node1 start
> {noformat}
> Run repair again :
> {noformat}
> ccm node1 nodetool repair tlp_stress
> {noformat}
> The command will return very quickly, showing that it skipped all sstables :
> {noformat}
> [2018-08-31 19:05:16,292] Repair completed successfully
> [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds
> $ ccm node1 nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   OwnsHost ID
>Rack
> UN  127.0.0.1  228,64 KiB  256  ?   
> 437dc9cd-b1a1-41a5-961e-cfc99763e29f  rack1
> UN  127.0.0.2  60,09 MiB  256  ?   
> fbcbbdbb-e32a-4716-8230-8ca59aa93e62  rack1
> UN  127.0.0.3  57,59 MiB  256  ?   
> a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0  rack1
> {noformat}
> sstablemetadata will then show that nodes 2 and 3 have SSTables still in 
> "pending repair" state :
> {noformat}
> ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | 
> grep repair
> SSTable: 
> /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big
> Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62
> {noformat}
> Restarting these nodes wouldn't help either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14685) Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming

2018-08-31 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599086#comment-16599086
 ] 

Jason Brown commented on CASSANDRA-14685:
-

Thanks for the report, [~adejanovski]. I'll be able to look into this next 
week, and I'm assigning the ticket to myself as a reminder. I'm not sure 
[~bdeggleston] can get to it before next week either.

I'm not sure if this is due to the stream sessions on nodes 2 and 3 not 
properly closing (and thus not informing the repair sessions they are part of), 
or if it's something getting lost in the repair session. Do nodes 2/3 show any 
streaming or repair activities (via nodetool cmds) after the repair coordinator 
dies? 

> Incremental repair 4.0 : SSTables remain locked forever if the coordinator 
> dies during streaming 
> -
>
> Key: CASSANDRA-14685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14685
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Alexander Dejanovski
>Priority: Critical
>
> The changes in CASSANDRA-9143 modified the way incremental repair performs by 
> applying the following sequence of events : 
>  * Anticompaction is executed on all replicas for all SSTables overlapping 
> the repaired ranges
>  * Anticompacted SSTables are then marked as "Pending repair" and cannot be 
> compacted anymore, nor part of another repair session
>  * Merkle trees are generated and compared
>  * Streaming takes place if needed
>  * Anticompaction is committed and "pending repair" table are marked as 
> repaired if it succeeded, or they are released if the repair session failed.
> If the repair coordinator dies during the streaming phase, *the SSTables on 
> the replicas will remain in "pending repair" state and will never be eligible 
> for repair or compaction*, even after all the nodes in the cluster are 
> restarted. 
> Steps to reproduce (I've used Jason's 13938 branch that fixes streaming 
> errors) : 
> {noformat}
> ccm create inc-repair-issue -v github:jasobrown/13938 -n 3
> # Allow jmx access and remove all rpc_ settings in yaml
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh;
> do
>   sed -i'' -e 
> 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g'
>  $f
> done
> for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml;
> do
>   grep -v "rpc_" $f > ${f}.tmp
>   cat ${f}.tmp > $f
> done
> ccm start
> {noformat}
> I used [tlp-stress|https://github.com/thelastpickle/tlp-stress] to generate a 
> few 10s of MBs of data (killed it after some time). Obviously 
> cassandra-stress works as well :
> {noformat}
> bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000  
> --replication "{'class':'SimpleStrategy', 'replication_factor':2}"   
> --compaction "{'class': 'SizeTieredCompactionStrategy'}"   --host 
> 127.0.0.1
> {noformat}
> Flush and delete all SSTables in node1 :
> {noformat}
> ccm node1 nodetool flush
> rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.*
> {noformat}
> Then throttle streaming throughput to 1MB/s so we have time to take node1 
> down during the streaming phase and run repair:
> {noformat}
> ccm node1 nodetool setstreamthroughput 1
> ccm node2 nodetool setstreamthroughput 1
> ccm node3 nodetool setstreamthroughput 1
> ccm node1 nodetool repair tlp_stress
> {noformat}
> Once streaming starts, shut down node1 and start it again :
> {noformat}
> ccm node1 stop
> ccm node1 start
> {noformat}
> Run repair again :
> {noformat}
> ccm node1 nodetool repair tlp_stress
> {noformat}
> The command will return very quickly, showing that it skipped all sstables :
> {noformat}
> [2018-08-31 19:05:16,292] Repair completed successfully
> [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds
> $ ccm node1 nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   OwnsHost ID
>Rack
> UN  127.0.0.1  228,64 KiB  256  ?   
> 437dc9cd-b1a1-41a5-961e-cfc99763e29f  rack1
> UN  127.0.0.2  60,09 MiB  256  ?   
> fbcbbdbb-e32a-4716-8230-8ca59aa93e62  rack1
> UN  127.0.0.3  57,59 MiB  256  ?   
> a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0  rack1
> {noformat}
> sstablemetadata will then show that nodes 2 and 3 have SSTables still in 
> "pending repair" state :
> {noformat}
> ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | 
> grep repair
> SSTable: 
> /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big
> Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62
> {noformat}
> Restarting 

[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command

2018-08-31 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598970#comment-16598970
 ] 

Jason Brown commented on CASSANDRA-14619:
-

[~krummas] added an extra commit (e608fb24d3b00cb623fa9ca7b826b7a3bf2b9064) for 
versioning the replay output and querylog files. 

In that commit, every columnDefinition and row entry that is written out is 
prefixed with a 4-byte version number. Instead of writing out the (presumably) 
same version number many times in the file, can you write it once at the 
beginning of the file? I think you'd save a lot on file size that way.

> Create fqltool compare command
> --
>
> Key: CASSANDRA-14619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14619
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool
> Fix For: 4.x
>
>
> We need a {{fqltool compare}} command that can take the recorded runs from 
> CASSANDRA-14618 and compares them, it should output any differences and 
> potentially all queries against the mismatching partition up until the 
> mismatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command

2018-08-30 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598087#comment-16598087
 ] 

Jason Brown commented on CASSANDRA-14619:
-

Created a [Pull Request|https://github.com/apache/cassandra/pull/256] for 
commenting.

> Create fqltool compare command
> --
>
> Key: CASSANDRA-14619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14619
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool
> Fix For: 4.x
>
>
> We need a {{fqltool compare}} command that can take the recorded runs from 
> CASSANDRA-14618 and compares them, it should output any differences and 
> potentially all queries against the mismatching partition up until the 
> mismatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14639) Fix a few complaints from eclipse-warnings for 2.2

2018-08-30 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-14639.
-
   Resolution: Invalid
Fix Version/s: (was: 2.2.x)

Sadly, this ended up being a problem in my primary local repo. I cloned a fresh 
repo (into a different directory) under both macOS and linux, and they both 
produced no {{ant eclipse-warning}} errors.

Thank you, [~sumanth.pasupuleti], for digging into this, and for confirming 
that 2.2 is clean. Sorry that it ended up being a problem on my end.

> Fix a few complaints from eclipse-warnings for 2.2
> --
>
> Key: CASSANDRA-14639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14639
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Sumanth Pasupuleti
>Priority: Minor
>
> These failed on 2.2 
> [circleci|https://circleci.com/gh/jasobrown/cassandra/1375]
> {noformat}
> eclipse-warnings:
> [mkdir] Created dir: /home/cassandra/cassandra/build/ecj
>  [echo] Running Eclipse Code Analysis.  Output logged to 
> /home/cassandra/cassandra/build/ecj/eclipse_compiler_checks.txt
>  [java] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
>  [java] incorrect classpath: 
> /home/cassandra/cassandra/build/cobertura/classes
>  [java] --
>  [java] 1. ERROR in 
> /home/cassandra/cassandra/src/java/org/apache/cassandra/tools/SSTableExport.java
>  (at line 315)
>  [java]   ISSTableScanner scanner = reader.getScanner();
>  [java]   ^^^
>  [java] Resource 'scanner' should be managed by try-with-resource
>  [java] --
>  [java] --
>  [java] 2. ERROR in 
> /home/cassandra/cassandra/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
>  (at line 888)
>  [java]   ISSTableScanner scanner = cleanupStrategy.getScanner(sstable, 
> getRateLimiter());
>  [java]   ^^^
>  [java] Resource 'scanner' should be managed by try-with-resource
>  [java] --
>  [java] --
>  [java] 3. ERROR in 
> /home/cassandra/cassandra/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java
>  (at line 257)
>  [java]   scanners.add(new LeveledScanner(intersecting, range));
>  [java]^^^
>  [java] Potential resource leak: '' may not 
> be closed
>  [java] --
>  [java] 3 problems (3 errors)
> BUILD FAILED
> /home/cassandra/cassandra/build.xml:1915: Java returned: 255
> {noformat}
> Not failing on 3.0+.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14659) Disable old native protocol versions on demand

2018-08-30 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14659:

   Resolution: Fixed
Fix Version/s: 4.0
   Status: Resolved  (was: Patch Available)

+1. Committed as sha {{7b61b0be88ef1fcc29646ae8bdbb05da825bc1b2}}. Thanks!

> Disable old native protocol versions on demand
> --
>
> Key: CASSANDRA-14659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14659
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: usability
> Fix For: 4.0
>
>
> This patch allows the operators to disable older protocol versions on demand. 
> To use it, you can set {{native_transport_allow_older_protocols}} to false or 
> use nodetool disableolderprotocolversions. Cassandra will reject requests 
> from client coming in on any version except the current version. This will 
> help operators selectively reject connections from clients that do not 
> support the latest protoocol.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14659) Disable old native protocol versions on demand

2018-08-30 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14659:

Summary: Disable old native protocol versions on demand  (was: Disable old 
protocol versions on demand)

> Disable old native protocol versions on demand
> --
>
> Key: CASSANDRA-14659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14659
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: usability
>
> This patch allows the operators to disable older protocol versions on demand. 
> To use it, you can set {{native_transport_allow_older_protocols}} to false or 
> use nodetool disableolderprotocolversions. Cassandra will reject requests 
> from client coming in on any version except the current version. This will 
> help operators selectively reject connections from clients that do not 
> support the latest protoocol.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14619) Create fqltool compare command

2018-08-30 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597908#comment-16597908
 ] 

Jason Brown commented on CASSANDRA-14619:
-

I can work on this today

> Create fqltool compare command
> --
>
> Key: CASSANDRA-14619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14619
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool
> Fix For: 4.x
>
>
> We need a {{fqltool compare}} command that can take the recorded runs from 
> CASSANDRA-14618 and compares them, it should output any differences and 
> potentially all queries against the mismatching partition up until the 
> mismatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14681) SafeMemoryWriterTest doesn't compile on trunk

2018-08-30 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14681:

Labels: Java11  (was: )

> SafeMemoryWriterTest doesn't compile on trunk
> -
>
> Key: CASSANDRA-14681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14681
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Trivial
>  Labels: Java11
> Fix For: 4.0
>
>
> {{SafeMemoryWriterTest}} references {{sun.misc.VM}}, which doesn't exist in 
> Java 11, so the build fails.
> Proposed patch makes the test work against Java 8 + 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14659) Disable old protocol versions on demand

2018-08-30 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14659:

Status: In Progress  (was: Ready to Commit)

> Disable old protocol versions on demand
> ---
>
> Key: CASSANDRA-14659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14659
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: usability
>
> This patch allows the operators to disable older protocol versions on demand. 
> To use it, you can set {{native_transport_allow_older_protocols}} to false or 
> use nodetool disableolderprotocolversions. Cassandra will reject requests 
> from client coming in on any version except the current version. This will 
> help operators selectively reject connections from clients that do not 
> support the latest protoocol.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14659) Disable old protocol versions on demand

2018-08-30 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597468#comment-16597468
 ] 

Jason Brown commented on CASSANDRA-14659:
-

On the whole, this is almost there. I think the version check you have in 
{{Message}} would be best located in {{ProtocolVersion.decode()}}, as that is 
the section where we already do the general version check, and yours is an 
extension to that.

> Disable old protocol versions on demand
> ---
>
> Key: CASSANDRA-14659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14659
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: usability
>
> This patch allows the operators to disable older protocol versions on demand. 
> To use it, you can set {{native_transport_allow_older_protocols}} to false or 
> use nodetool disableolderprotocolversions. Cassandra will reject requests 
> from client coming in on any version except the current version. This will 
> help operators selectively reject connections from clients that do not 
> support the latest protoocol.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14677) Clean up Message.Request implementations

2018-08-29 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596951#comment-16596951
 ] 

Jason Brown edited comment on CASSANDRA-14677 at 8/30/18 12:28 AM:
---

I took a decent look at the patch provided.

Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class 
in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once 
per instance. 
 With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} 
every time you need to check if logging is still enabled, which references a 
volatile variable ({{AuditLogManager.isAuditLogEnabled}}). You might consider 
memoizing the value again.

Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem 
typical of how we usually name things. You should add a comment that 
{{perform()}} is now the main entry point for running the {{Request}}, and 
perhaps make {{execute()}} protected (instead of public).

I think it would be helpful for committers and for future reviewers to have a 
better understanding of what is meant by "big mess". Perhaps you could update 
the description to better outline the specific issues with the {{execute()}} 
method implementations. This would also make the changes in the patch more 
clear for review.


was (Author: jasobrown):
I took a decent look at the patch provided.

Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class 
in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once 
per instance. 
 With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} 
everytime you need to reference the volatile variable 
(\{{AuditLogManager.isAuditLogEnabled}}). You might consider memoizing the 
value again.

Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem 
typical of how we usually name things. You should add a comment that 
{{perform()}} is now the main entry point for running the {{Request}}, and 
perhaps make {{execute()}} protected (instead of public).

I think it would be helpful for committers and for future reviewers to have a 
better understanding of what is meant by "big mess". Perhaps you could update 
the description to better outline the specific issues with the {{execute()}} 
method implementations. This would also make the changes in the patch more 
clear for review.

> Clean up Message.Request implementations
> 
>
> Key: CASSANDRA-14677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14677
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Minor
> Fix For: 4.0.x
>
>
> First tracing support, many years ago, then most recently audit log, made a 
> big mess out of {{Message.Request.execute()}} implementations.
> This patch tries to clean up some of it by removing tracing logic from 
> {{QueryState}} and moving shared tracing functionality to 
> {{Message.Request.perform()}}. It also moves out tracing and audit log boiler 
> plate into their own small methods instead of polluting {{execute()}} 
> implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14677) Clean up Message.Request implementations

2018-08-29 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596951#comment-16596951
 ] 

Jason Brown edited comment on CASSANDRA-14677 at 8/30/18 12:27 AM:
---

I took a decent look at the patch provided.

Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class 
in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once 
per instance. 
 With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} 
everytime you need to reference the volatile variable 
(\{{AuditLogManager.isAuditLogEnabled}}). You might consider memoizing the 
value again.

Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem 
typical of how we usually name things. You should add a comment that 
{{perform()}} is now the main entry point for running the {{Request}}, and 
perhaps make {{execute()}} protected (instead of public).

I think it would be helpful for committers and for future reviewers to have a 
better understanding of what is meant by "big mess". Perhaps you could update 
the description to better outline the specific issues with the {{execute()}} 
method implementations. This would also make the changes in the patch more 
clear for review.


was (Author: jasobrown):
I took a decent look at the patch provided.

Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class 
in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once 
per instance. 
 With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} 
everytime you need to reference the variable. You might consider memoizing the 
value again.

Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem 
typical of how we usually name things. You should add a comment that 
{{perform()}} is now the main entry point for running the {{Request}}, and 
perhaps make {{execute()}} protected (instead of public).

I think it would be helpful for committers and for future reviewers to have a 
better understanding of what is meant by "big mess". Perhaps you could update 
the description to better outline the specific issues with the {{execute()}} 
method implementations. This would also make the changes in the patch more 
clear for review.

> Clean up Message.Request implementations
> 
>
> Key: CASSANDRA-14677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14677
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Minor
> Fix For: 4.0.x
>
>
> First tracing support, many years ago, then most recently audit log, made a 
> big mess out of {{Message.Request.execute()}} implementations.
> This patch tries to clean up some of it by removing tracing logic from 
> {{QueryState}} and moving shared tracing functionality to 
> {{Message.Request.perform()}}. It also moves out tracing and audit log boiler 
> plate into their own small methods instead of polluting {{execute()}} 
> implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14677) Clean up Message.Request implementations

2018-08-29 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596951#comment-16596951
 ] 

Jason Brown commented on CASSANDRA-14677:
-

I took a decent look at the patch provided.

Previously, we had memoized {{auditLogEnabled}} in the parent {{Request}} class 
in order read the volatile {{auditLogManager.isAuditingEnabled()}} only once 
per instance. 
 With this refactor, you are calling {{auditLogManager.isAuditingEnabled()}} 
everytime you need to reference the variable. You might consider memoizing the 
value again.

Also, {{Request.perform()}} is an unexpected naming choice, and doesn't seem 
typical of how we usually name things. You should add a comment that 
{{perform()}} is now the main entry point for running the {{Request}}, and 
perhaps make {{execute()}} protected (instead of public).

I think it would be helpful for committers and for future reviewers to have a 
better understanding of what is meant by "big mess". Perhaps you could update 
the description to better outline the specific issues with the {{execute()}} 
method implementations. This would also make the changes in the patch more 
clear for review.

> Clean up Message.Request implementations
> 
>
> Key: CASSANDRA-14677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14677
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Minor
> Fix For: 4.0.x
>
>
> First tracing support, many years ago, then most recently audit log, made a 
> big mess out of {{Message.Request.execute()}} implementations.
> This patch tries to clean up some of it by removing tracing logic from 
> {{QueryState}} and moving shared tracing functionality to 
> {{Message.Request.perform()}}. It also moves out tracing and audit log boiler 
> plate into their own small methods instead of polluting {{execute()}} 
> implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14677) Clean up Message.Request implementations

2018-08-29 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14677:

Reviewers: Dinesh Joshi, Jason Brown

> Clean up Message.Request implementations
> 
>
> Key: CASSANDRA-14677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14677
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Minor
> Fix For: 4.0.x
>
>
> First tracing support, many years ago, then most recently audit log, made a 
> big mess out of {{Message.Request.execute()}} implementations.
> This patch tries to clean up some of it by removing tracing logic from 
> {{QueryState}} and moving shared tracing functionality to 
> {{Message.Request.perform()}}. It also moves out tracing and audit log boiler 
> plate into their own small methods instead of polluting {{execute()}} 
> implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >