[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-09-05 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153528#comment-16153528
 ] 

Jason Brown commented on CASSANDRA-8457:


[~spo...@gmail.com] ahh! you are correct. ninja-fixed as sha 
{{e38015ed1fcf7d95694683601b121c918c5f6976}}

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.0
>
> Attachments: 8457-load.tgz
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-09-04 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152574#comment-16152574
 ] 

Stefan Podkowinski commented on CASSANDRA-8457:
---

[~jasobrown], did you forget to bump netty to 4.1.14 in build.xml?

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.0
>
> Attachments: 8457-load.tgz
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-08-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136520#comment-16136520
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

Yes, I'm +1 on the patch on this specific ticket, and great job on that btw! 
But I do want to clarify that imo CASSANDRA-13630 should  kind of be finished 
as well to call this all to be complete/committable (I've already mentioned 
that to Jason offline and we seem on the same page, just mentioning it for the 
records).

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
> Attachments: 8457-load.tgz
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-08-21 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135790#comment-16135790
 ] 

sankalp kohli commented on CASSANDRA-8457:
--

[~slebresne]  I know you and Jason chatted on IRC. Are you +1 on 
this...Streaming patch has +1 from Ariel 

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
> Attachments: 8457-load.tgz
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-07-26 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102128#comment-16102128
 ] 

Jason Brown commented on CASSANDRA-8457:


dtests running locally, including upgrade and no_vodes tests, are passing (with 
the exception of those not working on trunk, and one other that is related to 
CASSANDRA-12229). 

[~slebresne] and I discussed offline, and he agreed the 
{{OutboundConnectionParams}} taking a {{Consumer}} was 
reasonable. Hence, that was implemented. Current branch also has a minor fix 
for reconnecting when a connection is rejected due to protocol version mismatch 
(discovered via dtest upgrade tests)

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
> Attachments: 8457-load.tgz
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-06-24 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062157#comment-16062157
 ] 

Jason Brown commented on CASSANDRA-8457:


bq. Nit: there has been a few Netty minor released since the one in the branch, 
maybe worth upgrading (since we change it anyway).
there were a few unstable netty releases so I skipped those. 4.1.12 is good, 
but I need a fix in {{Lz4FrameEncoder}} which should be in 4.1.13.

bq. MessageOutHandler.AUTO_FLUSH_THRESHOLD feels like a magic constants. At 
least a comment on why it's a reasonable value would be nice.
I've added a comment to the constant, but the value was the value that should 
have been for the {{Lz4FrameEncoder}} (16k). The value from the [existing 
code|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L85]
 is 64k, so I've reinstated that. Note that the previous {{OTC.BUFFER_SIZE}} 
can be seen also as a flush threshold, as well, as it is a [parameter to the 
output 
stream|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L471].
 Also discovered that {{OTC.BUFFER_SIZE}} is configurable, and made the new 
version configurable, as well.

bq. OutboundHandshakeHandler.WRITE_IDLE_MS: what's the rational behind 10 
seconds? Not saying it feels wrong per-se, but wondering if there is more than 
a gut-feeling to it, and I'd kind of suggest exposing a system property to 
change that default.
I believe 10 seconds was a value [~aweisberg] and I came up with, but I've made 
it configurable now. 

bq. There is still a few TODO:JEB in the code: would be good to resolve/clear 
them if we're going to a more final branch.
Done, most of them I had already done on the CASSANDRA-12229 branch, but now 
backported to this one.

bq. Largely a nit, but its a tad confusing that OutboundConnectionParams 
contains a OutboundMessagingConnection. It feels like saying "The parameters 
you need for an outbound connection is ... an outbound connection". Maybe a 
simple renaming would make this clearer, though it feels maybe this could be 
cleanup up a bit further.
Hmm, yeah. I need the {{OutboundMessagingConnection}} for 
{{ChannelWriter#handleMessageFuture()}} so it can react to the outcome of 
attempting to send the message. wrt a cleanup, the least-worst idea I have is 
to move the error handling functionality of 
{{ChannelWriter#handleMessageFuture()}} to {{OutboundMessageConnection}}, as 
that's the only thing in {{ChannelWriter}} that requires 
{{OutboundMessageConnection}}. In {{OutboundConnectionParams}} we can pass a 
reference to an "error consumer" (not fully thought out but something like 
{{BiConsumer}}, that's a function on 
{{OutboundMessageConncetion}}), and {{ChannelWriter#handleMessageFuture()}} can 
invoke that instead of having a reference to a {{OutboundMessageConncetion}}. 
wdyt?

bq. The handling of "backlogged" messages and channel writability feels a bit 
complex. For instance, it looks like 
MessageOutHandler.channelWritabilityChanged can potentially silently drop a 
message every time it's called.
Good catch, fixed.

bq. The CI links listed with the branch a bunch of comments ago are completely 
out of dates. 
updated the ticket with a pointer to the circleci utests.

bq. Some basic benchmark results wouldn't hurt either ...
Attaching to this ticket, as it's a non-trivial martrix to test all the 
combinations of compression, coalescing, jdk-based TLS, and openssl-based TLS. 

bq. Maybe you could create a parent ticket of which this and CASSANDRA-12229 
would be sub-tasks where we could focus the testing/benchmarking/final 
discussions on the whole thing?
done as CASSANDRA-13628


> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-05-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028159#comment-16028159
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

Had a quick look at the last changes. A few remaining questions/remarks:
* There is still a few {{TODO:JEB}} in the code: would be good to resolve/clear 
them if we're going to a more final branch.
* MessageOutHandler.AUTO_FLUSH_THRESHOLD feels like a magic constants. At least 
a comment on why it's a reasonable value would be nice.
* Largely a nit, but its a tad confusing that {{OutboundConnectionParams}} 
contains a {{OutboundMessagingConnection}}. It feels like saying "The 
parameters you need for an outbound connection is ... an outbound connection". 
Maybe a simple renaming would make this clearer, though it feels maybe this 
could be cleanup up a bit further.
* {{OutboundHandshakeHandler.WRITE_IDLE_MS}}: what's the rational behind 10 
seconds? Not saying it feels wrong per-se, but wondering if there is more than 
a gut-feeling to it, and I'd kind of suggest exposing a system property to 
change that default.
* The handling of "backlogged" messages and channel writability feels a bit 
complex. For instance, it looks like 
{{MessageOutHandler.channelWritabilityChanged}} can potentially silently drop a 
message every time it's called, and I'll admit not being sure if this can have 
unintened consequence in overload situations. In general, it would be really 
great to have some testing of this patch under a fair amount of load (and 
overload) to make sure things work as intended.
* Nit: there has been a few Netty minor released since the one in the branch, 
maybe worth upgrading (since we change it anyway).

Other than that, and related to one comment above, it would be nice to see some 
tests run for this (including upgrade tests as this is particularly sensible to 
any MessagingService changes). The CI links listed with the branch a bunch of 
comments ago are completely out of dates. Some basic benchmark results would 
hurt either since that completely change a fairly sensible part of the code. Of 
course, it's not that meaningful without CASSANDRA-12229, which makes this a 
bit awkward. Maybe you could create a parent ticket of which this and 
CASSANDRA-12229 would be sub-tasks where we could focus the 
testing/benchmarking/final discussions on the whole thing?


> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-05-04 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996536#comment-15996536
 ] 

Jason Brown commented on CASSANDRA-8457:


latest changes:

- add netty's {{IdleStateHandler}} to the outbound pipeline. This allows us to 
monitor for a socket not being able to make progress sending bytes (or, at 
least, putting them into the socket buffer). When we don't make progress, we 
can close the channel and purge the back log of messages.
- makes better use of netty's high/low water marks. Instead of naively using 
that as a flush indicator (which is now, properly, at the end of 
{{MessageOutHandler#write()}}, I'm using the high/low water marks as when to 
stop writing to the channel ({{OutboundMessagingConnection#sendMessage()}}). I 
then add those messages to the backlog queue in 
{{OutboundMessagingConnection}}. When the channel becomes writable again 
(because bytes were shipped out), we can purge some of the backlogged messages 
from {{MessageOutHandler#channelWritabilityChnaged()}}.
- add support for CASSANDRA-9748 (somehow got missed before)

[~slebresne] I'm adding a few more unit tests and minor doc updates, but please 
review the latest set of changes. 

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-13 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968472#comment-15968472
 ] 

Jason Brown commented on CASSANDRA-8457:


So, [~aweisberg] and I spent some time talking offline about the expiring 
messages on the outbound side, and came up with the following: 

1. run a periodic, scheduled task in each channel that checks to make sure the 
channel is making progess wrt sending bytes. If we fail to see any progress 
being made after some number of seconds, we should close the connection/socket 
and throw away the messages.
2. repurpose the high/low water mark (and arguably use it more correctly) to 
indicate when we should stop writing messages to the channel (at the 
{{ChannelWriter}} layer). Currently, I'm just using the water mark to indicate 
when we should flush, but a simple check elsewhere would accomplish the same 
thing. Instead, the water marks should indicate when we really shouldn't write 
to the channel anymore, and either queue up those messages in something like 
{{OutboundMessageConnection#backlog}} or perhaps drop them (I'd prefer to 
queue).
3. When we've exceeded the high water mark, we can disable the reading incoming 
messages from the same peer (achievable by disabiling auto read for the 
channel). This would prevent the current node from executing more work on 
behalf of a peer to which we cannot send any data. Then when the channel drops 
below the low water mark (and the channel is 'writable'), we re-enable netty 
auto read on the read channels for the pper.

1 and 2 are reasonably easy to do (and I'll do them asap), but I'd prefer to 
defer 3 until later as it has a lot of races and other complexities/subtleties 
I'd like to put off for the scope of this ticket (especially as sockets are not 
bidirectional yet). Thoughts?

Note: items 1 & 2 are significantly simpler than my earlier comments wrt 
message expiration, so please disregard them for now.


> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-13 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967621#comment-15967621
 ] 

Jason Brown commented on CASSANDRA-8457:


bq. doesn't Netty already have a standard way to deal with that problem of 
messages piling up in its queues?

Netty has a high/low water mark mechanism that looks at the number of bytes in 
the channel and sends a "writablility changed" event through channel once one 
of those thresholds has been reached. I'm currently using that feature in 
{{MessageOutHandler#channelWritabilityChanged()}} to know when we've hit a 
decent amount of buffered data before we explicitly call flush. Beyond this, I 
do not think netty has any other explicit back pressure mechanism built in 
(like a handler or something similar).

We could expand our use of the high/low water mark to say "if there's greater 
than  number of bytes in the channel, drop 'some' messages". If 
we want to drop older messages for which we feel the client has (or reasonably 
will) timed out, we'll have to do something like what I've proposed in my most 
recent comments (and which I'm working on right now). This message expiration 
behavior can then become not only about timeout (I still think there's 
reasonable use in that), but also protects size of data in the channel. 

One other thing we can do is can bound the number of tasks that can be queued 
in to the channel 
([{{SingleThreadEventLoop#DEFAULT_MAX_PENDING_TASKS}}|https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/SingleThreadEventLoop.java#L35]).
 I quickly traced the netty code, and I *think* a 
{{RejectedExecutionException}} is thrown when you try to add a message to 
channel which is filled to it's capacity. I'm not sure we want to use this as 
the only backpressure mechanism, but, as unbounded queues are awful (the 
default queue size is {{Integer.MAX_VALUE}}), it might not be a bad idea to 
bound this to at least something sane (16k-32k as an upper bound can't be 
unreasonable for a single channel). This, of course, would expect us to be more 
resilient to dropped messages on the enqueuing side, which is probably a good 
idea anyway.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-13 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967586#comment-15967586
 ] 

Jason Brown commented on CASSANDRA-8457:


bq. doesn't Netty already have a standard way to deal with that problem of 
messages piling up in its queues?

Netty has a high/low water mark mechanism that looks at the number of bytes in 
the channel and sends a "writablility changed" event through channel once one 
of those thresholds has been reached. I'm currently using that feature to know 
when we've hit a decent amount of buffered data before we explicitly call flush.

We could expand this to say "if there's greater than  number of 
bytes in the channel, drop older messages"

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967568#comment-15967568
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

bq. I think it's important that a single slow node or network issue resulting 
in a socket that isn't writable shouldn't allow an arbitrary amount of data to 
collect on the heap. Right now there is nothing that can drop the data in that 
scenario.

I don't necessarily disagree on that somewhat general statement, but I'm far 
from convinced that checking for expired message is the right tool for the job 
in the first place. The fact is that expiration is time-based, that default 
timeouts are in multiple of seconds, so plenty of time for message to 
accumulate and blow the heap without having any of them being droppable. On top 
of that, not all message have timeouts, which actually make sense because 
message timeout isn't a back-pressure mechanism, it's about how long we're 
willing to wait for an answer to a request message, and hence one-way message 
have no reason to have such timeout. And that's part of the point, I dislike 
using a concept that isn't meant to be related to back-pressure to do 
back-pressure, especially when it's as flawed as this one. Users shouldn't have 
to worry that nodes could OOM because they put writes timeout high, it's just 
not intuitive.

Don't get me wrong, I don't disagree that some back-pressure mechanism should 
be added for that problem, but that should be more based on the amount of 
message data (or, at the very least the number of such messages) in the Netty 
queue. Surely we're not the only one facing this problem though, doesn't Netty 
already have a standard way to deal with that problem of messages piling up in 
its queues?

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-13 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967513#comment-15967513
 ] 

Jason Brown commented on CASSANDRA-8457:


[~aweisberg] and I spoke with Scott Mitchell of the netty team and we have a 
potential solution better than what I put in the last comment. The TL;DR is to 
add an iterator to netty's {{ChannelOutboundBuffer}} and walk the "queue" 
looking for expired items. It requires a patch netty (which I'm working on), 
but should be straight forward. More details soon...

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966390#comment-15966390
 ] 

Jason Brown commented on CASSANDRA-8457:


[~aweisberg] So, I've kicked this idea around a bit (haven't coded anything up 
yet), but I thought about using something like {{ExpiringMap}} for the 
{{QueuedMessages}}s, and when the message times out, either update a (possibly 
new) field in the {{QueuedMessage}} or failing the {{ChannelPromise}}. Failing 
the {{ChannelPromise}} is a bit racy as the netty event loop may be in the 
process of actually serializing and sending the message, and failing the 
promise (from another thread) might not be great as we're actually doing the 
work. Meanwhile, updating some new {{volatile}} field in the {{QueuedMessage}} 
requires yet another volatile update on the send path (better than a 
synchronized call for sure, but ...).

{{ExpiringMap}}, of course, creates more garbage ({{CacheableObject}}), and has 
it's own internal {{ConcurrentHashMap}}, with more volatiles and such. Not sure 
if this is great idea, but it's the current state of my thinking? Thoughts?

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-12 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966293#comment-15966293
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

bq. But was that even a good idea? I'm not convinced. The chance of a message 
expiring on the sending node are low to start with, but having it expire on the 
producer thread is kind of remote, and it doesn't feel like it's worth 
bothering. Especially since we'll expire the message a tiny bit later when 
processing it anyway. Overall, the benefit to code clarify of not checking for 
expiration in too many places outweigh imo any benefit we'd have here.

In the context of the entire system maybe there are other mechanisms in play 
that prevent messages from piling up. But without any mechanism what will 
happen is that the messages can queue on the heap. With the old blocking 
implementation the thread would eventually block because the TCP buffer is full 
and with the NIO implementation the queued messages will not be pulled and 
discarded because the socket will not signal ready to write.

I think it's important that a single slow node or network issue resulting in a 
socket that isn't writable shouldn't allow an arbitrary amount of data to 
collect on the heap. Right now there is nothing that can drop the data in that 
scenario. The producer doesn't have to do the dropping in Netty land, but maybe 
it makes more sense to make producers the victims for that work rather than 
network threads which have other responsibilities.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965767#comment-15965767
 ] 

Jason Brown commented on CASSANDRA-8457:


bq. you seem to have squashed everything ...

Yes, that's correct, as it makes rebasing a more manageable task with these 
longer-lived projects. However, I always make a backup of a branch before 
squashing and rebasing, and here's the last one: 
[8457_pre-2017Mar16-rebase|https://github.com/jasobrown/cassandra/tree/8457_pre-2017Mar16-rebase].
 It has everything the squashed version does except for the commit to fix ssl 
hostname verification (sha {{5ccdd50624ef166eb46ea559aca3c4b264d8b124}}).

I'll address your other comments ASAP

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-11 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963989#comment-15963989
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

bq. One remaining open issue is the current patch does not expire messages on 
the producer thread, like what we used to do in 
{{OutboundTcpConnection#enqueue}}

But was that even a good idea? I'm not convinced. The chance of a message 
expiring on the sending node are low to start with, but having it expire _on 
the producer thread_ is kind of remote, and it doesn't feel like it's worth 
bothering. Especially since we'll expire the message a tiny bit later when 
processing it anyway. Overall, the benefit to code clarify of not checking for 
expiration in too many places outweigh imo any benefit we'd have here.

{quote}
bq. In {{connectionTimeout()}}, what happens when a connection attempt timeout?
This is what {{OutboundTcpConnnection}} does, and I've replicated it here.
{quote}

Fair enough, but can't we at least log a warning? Sounds to me this would be an 
improvement.

bq. TBH, I think both of these behaviors are incorrect

I agree and never fully understood the reasoning here, it's just not something 
that seem to have created problem. Imo, there is just 2 cases:
# we have a version match: success, just use the connection.
# we have a version mismatch: disconnect from that connection and reconnect 
with what we know will be a proper version. Don't do anything to the backlog, 
we're still just negociating our connection and there is no reason to do 
anything with outstanding messages.

Other than that, I had a quick look over the branch and it looks good, but 
unfortunately you seem to have squashed everything with makes it really hard to 
check the changes made since my last review, and I don't have the time right 
now to redo a full careful read of the whole patch.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-04-05 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956759#comment-15956759
 ] 

Jason Brown commented on CASSANDRA-8457:


belatedly adding support for ssl hostname verification (CASSANDRA-9220). thanks 
to the sslnodetonode_test.py for finding this (that dtest will need an update, 
as well)

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-03-31 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950929#comment-15950929
 ] 

Jason Brown commented on CASSANDRA-8457:


rebased my branch on trunk as of 2017-Mar-28, and added in CASSANDRA-13018 and 
CASSANDRA-13324

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-03-27 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944223#comment-15944223
 ] 

Jason Brown commented on CASSANDRA-8457:


[~norman] if you want to give it another shot, i'm always open for more input 

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-03-27 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943172#comment-15943172
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] let me know if I should review the code again.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-03-27 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943160#comment-15943160
 ] 

Jason Brown commented on CASSANDRA-8457:


- I really like what you've done with {{HandshakeProtocol}}. It's a nice 
clarification/centralization of the handshake data.
- I'm cool with ditching the {{DisabledCoalescingStrategy}}, but didn't like 
passing {{null}} around. Thus, I switched to passing 
{{Optional}} as the intent is a bit more explicit.
- With the refactorings, it made it easier to remove {{OutboundConnection}} as 
it's reduced functionality can now live simply in 
{{OutboundMessagingConnection}}, without complicating it.
- addressed the comments about timeouts on the outbound handshake side, and 
reduced them to a single timeout (the one that was already in 
{{OutboundMessagingConnection}})
- upgraded to netty 4.1.9
- updated a lot of comments and documentation.
- added in a lot more tests. code coverage is now > 90% for the 
{{org.apache.cassandra.net.async}} package.

One remaining open issue is the current patch does not expire messages on the 
producer thread, like what we used to do in {{OutboundTcpConnection#enqueue}}. 
I'm awaiting the outcome of CASSANDRA-13324, and will apply the result here.
   

bq. In {{connectionTimeout()}}, what happens when a connection attempt timeout?

This is what 
[{{OutboundTcpConnnection}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L248]
 does, and I've replicated it here. Note that there could be ssveral connection 
attempts before the connection timeout triggers.

bq. in the DISCONNECT/NEGOTIATION_FAILURE case, we don't seem to actively 
trying to reconect

The {{DISCONNECT}} case handles the cases when the protocol versions do not 
match. In {{OutboundTcpConnnection}}, if the [peer's version is lesser 
than|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L465]
 what we expected it to be, we clear the backlog and do not attempt to 
reconnect until more messages are {{#enqueue()}}'d. On the contrary, if the 
[peer's version is greater 
than|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L488]
 what we expected it to be, we'll finish the handshake, send what's currently 
in the backlog, and then disconnect. 

TBH, I think both of these behaviors are incorrect. If the peer's version is 
lesser than what we expect, we know how to send messages (both handshake and 
message serialization), so we could send what we have in the backlog without 
throwing them away. If the peer's version is greater than what we expect, if we 
try to send messages it might fail on the receiver side due to a change in the 
handshake sequence (unlikely) or messaging framing/format (likely). WDYT? 
Either way, I'll need to update {{OutboundMessageConnection}} as it's not quite 
correct rn.

{{NEGOTIATION_FAILURE}} does what 
[{{OutboundTcpConnnection}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L453]:
 if we can't get a version from the peer, either we couldn't connect or the 
handshake failed. I think the current behavior (throw away the backlog) is 
desirable.

bq. In {{InboundHandshakeHandler.handleMessagingStartResponse()}}, maybe we 
should check the return of handshakeTimeout.cancel()

As the {{handshakeTimeout}} executes in the netty event loop, it won't complete 
with the {{#decode}} method. As {{#failHandshake()}} closes the channel, we 
won't even get the {{ThirdHandshakeMessage}} into 
{{handleMessagingStartResponse}} as {{#decode()}} would never be triggered.

bq. Regarding the sending and receiving buffer size handling, 

I agree with all your points and have implemented them.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-03-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925803#comment-15925803
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

Finally got time to look at the new version, sorry it wasn't sooner.

In general, that looks pretty good and while I have a number of remarks and 
suggestions, those are somewhat minor.

First, I took at stab at addressing some of my comments/suggestions when it 
felt simpler than doing a wordy comment and pushed the resulting branch 
[here|https://github.com/pcmanus/cassandra/commits/8457-review]. I won't go 
into too much details as I tried to put details for the rational of each commit 
in the commit messages, but I want to mention the 3rd commit as it's the only 
substantial one. Basically, I felt {{OutboundMessagingConnection}} was a bit 
dense and that the exact details around flushing behavior being split in 
different classes ({{OutboundMessagingConnection}} and {{MessageOutHandler}} 
mostly but still) was making things slightly less easily understantable. So 
that commit is mostly my attempt at grouping flush behavior in a single place. 
Hopefully, I didn't break anything.

Anyway, I tried to break those suggested commit as much as I could so it's easy 
to skip something if you don't agree with it.

Outside of that, a few additional remarks:
* In {{SSLFactory}}, it's confusing as to what is the difference between 
{{createSslContext}} and {{getSslContext}} and when one is supposed to use one 
or the other.
* In {{OutboundMessagingConnection}}:
** I'd prefer using an {{AtomicReference}} for {{state}} (and {{AtomicBoolean}} 
for {{scheduledFlushState}} but my branch above already does this one). We're 
not supposed to create new connections all the time and we're not creating that 
much of them overall, so I doubt saving a few small allocations makes a 
meaningful difference here, but it makes things slightly less readable (don't 
get me wrong, using field updaters is a good trick, and we do already use it in 
other places, but I prefer keeping it for where it demonstrably make a 
difference, or it's premature optimization).
** At the beginning of {{sendMessage()}}, shouldn't we be checking people 
aren't using a closed connection?
** The naming of {{MIN_MESSAGES_FOR_COALESCE}} and more generally the phrasing 
of the comments on {{otc_coalescing_enough_coalesced_messages}} in 
{{cassandra.yaml}} wasn't all that clear to me on first read. Not necessary new 
to this patch though.
** We set a connection timeout in {{connect()}}, but we also have the exact 
same timeout in {{OutboundHandshakeHandler}}. Can't we simplify a bit and 
remove the latter one?
** In {{connectionTimeout()}}, what happens when a connection attempt timeout? 
I looks like we drop the backlog on the floor without logging any message nor 
attempting to reconnect. Am I missing something?
** In {{finishHandshake}}, in the DISCONNECT/NEGOTIATION_FAILURE case, we don't 
seem to actively trying to reconect. I suppose we rely on the next message 
triggering that connectio? Shouldn't we reconnect more actively? We may have a 
message in the backlog and we don't know when the next message will come.
* In {{InboundHandshakeHandler.handleMessagingStartResponse()}}, maybe we 
should check the return of {{handshakeTimeout.cancel()}} and exit early if it's 
{{false}}, rather that reading from a closed socket and getting a random error. 
Also, as we cancel the timeout before setting the state to "complete", I think 
I would be safer to check for {{if (state == COMPLETE || (handshakeTimeout != 
null && handshakeTimeout.isCancelled()))}} in {{failHandshake()}} (admittedly 
we're not really racy because we're on the netty event loop, but it doesn't 
feel harmful to future proof in case this change).
* Regarding the sending and receiving buffer size handling:
** we don't seem to be handling the default for these buffer the same way for 
inbound and outbound connections. For outbound connections, we use the default 
defined in {{OutboundConnectionParams}}, but for inbound ones it appears we 
just default to whatever Netty defaults to (in 
{{NettyFactory.createInboundChannel}}). Can we consolidate that (probably want 
to always use our own defaults)?
** since we use separate connection for inbound and outbound, I assume we 
barely every use the receive buffer for outbound connection and vice-versa. 
That is, there is handshaking which sends message both ways in both case, but 
the message exhanged are tiny. Would it make sense to hardcode the size of the 
buffer that each side barely use to something big enough for handshake but 
small otherwise? So that when user configure 
{{DatabaseDescriptor.getInternodeRecvBufferSize/getInternodeSendBufferSize}}, 
it only impact what I assume they truly mean to impact.
** in {{Builder.build}} method, feels weird to only check 
{{sendReceiveBufferSize}} as 

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-15 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868315#comment-15868315
 ] 

Jason Brown commented on CASSANDRA-8457:


[~jjordan] That's a fair assessment. I'll table the large messages for the 
moment, and focus my energies a) on getting CASSANDRA-12229 posted and b) 
pinging [~slebresne] for more comments when he can :)

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-15 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868140#comment-15868140
 ] 

Jeremiah Jordan commented on CASSANDRA-8457:


Is adding more complexity to this already complex patch really the answer here 
(handling large messages)?  It seems to me like we should finish up the patch 
as is and make "handle large messages" an optimization ticket after we get this 
in.

Getting streaming to work with this seems like a better place to spend our time.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-15 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868091#comment-15868091
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

{quote}
Thus, I propose to bring back the SwappingByteBufDataOutputStreamPlus that I 
had in an earlier commit. To recap, the basic idea is provide a DataOutputPlus 
that has a backing ByteBuffer that is written to, and when it is filled, it is 
written to the netty context and flushed, then allocate a new buffer for more 
writes - kinda similar to a BufferedOutputStream, but replacing the backing 
buffer when full. Bringing this idea back is also what underpins one of the 
major performance things I wanted to address: buffering up smaller messages 
into one buffer to avoid going back to the netty allocator for every tiny 
buffer we might need - think Mutation acks.
{quote}

What thread is going to be writing to the output stream to serialize the 
messages? If it's a Netty thread you can't block inside a serialization method 
waiting for the bytes to drain to the socket that is not keeping up. You also 
can't wind out of the serialization method and continue it later.

If it's an application thread then it's no longer asynchronous and a slow 
connection can block the application and prevent it from doing say a quorum 
write to just the fast nodes. You would also need to lock during serialization 
or queue concurrently sent messages behind the one currently being written.

With large messages we aren't really fully eliminating the issue only making it 
a factor better. At the other end you still need to materialize a buffer 
containing the message + the object graph you are going to materialize. This is 
different from how things worked previously where we had a dedicated thread 
that would read fixed size buffers and then materialize just the object graph 
from that. 

To really solve this we need to be able to avoid buffering the entire message 
at both sending and receiving side. The buffering is worse because we are 
allocating contagious memory and not just doubling the space impact. We could 
make it incrementally better by using chains of fixed size buffers so there is 
less external fragmentation and allocator overhead. That's still committing 
additional memory compared to pre-8457, but at least it's being committed in a 
more reasonable way.

I think the most elegant solution is to use a lightweight thread 
implementation. What we will probably be boxed into doing is making the 
serialization of result data and other large message portions able to yield. 
This will bound the memory committed to large messages to the largest atomic 
portion we have to serialize (Cell?).

Something like an output stream being able to say "shouldYield". If you 
continue to write it will continue to buffer and not fail, but use memory. Then 
serializers can implement a return value for serialize which indicates whether 
there is more to serialize. You would check shouldYield after each Cell or some 
unit of work when serializing. Most of these large things being serialized are 
iterators. The trick will be that most serialization is stateless, and objects 
are serialized concurrently so you can't stored the serialization state in 
object being serialized safely.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-14 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865754#comment-15865754
 ] 

Jason Brown commented on CASSANDRA-8457:


rebased, and pulled in CASSANDRA-13090

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-13 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864943#comment-15864943
 ] 

Jason Brown commented on CASSANDRA-8457:


One other thing [~aweisberg] and I discussed was large messages and how they 
would be handled. In the current 8457 implementation, we will simply allocate a 
buffer of the {{serializedSize}}. If a message is supposed to be 50Mb, we'll 
allocate that and roll on. With enough large messages, sent to enough peers, we 
could OOM or get into some serious memory pressure problems. For comparison, 
the existing {{OutboundTcpConnection}} uses a {{BufferedOutputStream}}, which 
is defaulted to 64k, which we constantly reuse and never need to realloc.

Thus, I propose to bring back the {{SwappingByteBufDataOutputStreamPlus}} that 
I had in an earlier commit. To recap, the basic idea is provide a 
{{DataOutputPlus}} that has a backing {{ByteBuffer}} that is written to, and 
when it is filled, it is written to the netty context and flushed, then 
allocate a new buffer for more writes - kinda similar to a 
{{BufferedOutputStream}}, but replacing the backing buffer when full. Bringing 
this idea back is also what underpins one of the major performance things I 
wanted to address: buffering up smaller messages into one buffer to avoid going 
back to the netty allocator for every tiny buffer we might need - think 
Mutation acks.

We definitely need to address the large buffer issue, and I wouldn't mind 
knocking out the "buffering small messages" as it's really the same code (that 
I've written before). wdyt [~slebresne] and [~aweisberg]?


> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-13 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864888#comment-15864888
 ] 

Jason Brown commented on CASSANDRA-8457:


Thanks to [~aweisberg], caught another race condition in 
{{OutboundMessagingHandler}} and pushed. Needs to be updated when 
CASSANDRA-13090 is committed (probably today).

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863760#comment-15863760
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] whooot :) I will do another review as well, just to double-check 
again.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-13 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863758#comment-15863758
 ] 

Jason Brown commented on CASSANDRA-8457:


OK, so I've been performance load testing the snot of this code for the last 
several weeks, with help from netty committers, flight recorder, and flame 
graphs. As a result, I've made some major and some minor tweaks, and now I'm 
slightly faster than trunk with slightly better throughput. I have some 
optimizations in my back pocket that will increase even more, but as Sylvain 
has stated before, we'll save those for follow up tickets.

trunk
{code}
 id, type   total ops,op/s,pk/s,   row/s,mean, 
med, .95, .99,.999, max,   time,   stderr, errors,  gc: #,  max 
ms,  sum ms,  sdv ms,  mb
  4 threadCount, total,233344,3889,3889,3889, 1.0, 
1.0, 1.2, 1.3, 1.5,68.2,   60.0,  0.01549,  0,  9, 
538, 538,   4,5381
  8 threadCount, total,544637,9076,9076,9076, 0.8, 
0.8, 1.0, 1.1, 1.4,73.8,   60.0,  0.00978,  0, 20,
1267,1267,   5,   11848
 16 threadCount, total,   1126627,   18774,   18774,   18774, 0.8, 
0.8, 0.9, 1.0, 5.5,78.2,   60.0,  0.01882,  0, 40,
2665,2665,   6,   23666
 24 threadCount, total,   1562460,   26036,   26036,   26036, 0.9, 
0.8, 1.0, 1.1, 9.1,81.3,   60.0,  0.00837,  0, 55,
3543,3543,   9,   32619
 36 threadCount, total,   2098097,   34962,   34962,   34962, 1.0, 
0.9, 1.1, 1.3,60.9,83.0,   60.0,  0.00793,  0, 73,
4665,4665,   7,   43144
 54 threadCount, total,   2741814,   45686,   45686,   45686, 1.1, 
1.0, 1.4, 1.7,62.2,   131.7,   60.0,  0.01321,  0, 93,
5748,5748,   7,   55097
 81 threadCount, total,   3851131,   64166,   64166,   64166, 1.2, 
1.0, 1.6, 2.6,62.3,   151.7,   60.0,  0.01152,  0,159,
8190,8521,  14,  106805
121 threadCount, total,   4798169,   79947,   79947,   79947, 1.5, 
1.1, 2.0, 3.0,63.5,   117.8,   60.0,  0.05689,  0,165,
9323,9439,   5,   97536
181 threadCount, total,   5647043,   94088,   94088,   94088, 1.9, 
1.4, 2.6, 4.9,68.5,   169.2,   60.0,  0.01639,  0,195,   
10106,   11011,  11,  126422
271 threadCount, total,   6450510,  107461,  107461,  107461, 2.5, 
1.8, 3.7,12.0,75.4,   155.8,   60.0,  0.01542,  0,228,   
10304,   12789,   9,  143857
406 threadCount, total,   6700764,  111635,  111635,  111635, 3.6, 
2.5, 5.3,55.8,75.6,   196.5,   60.0,  0.01800,  0,243,
9995,   13170,   7,  144166
609 threadCount, total,   7119535,  118477,  118477,  118477, 5.1, 
3.5, 7.9,62.8,85.1,   170.0,   60.1,  0.01775,  0,250,   
10149,   13781,   7,  148118
913 threadCount, total,   7093347,  117981,  117981,  117981, 7.7, 
4.9,15.7,71.3,   101.1,   173.4,   60.1,  0.02780,  0,246,   
10327,   13859,   8,  155896
{code}

8457
{code}
 id, type   total ops,op/s,pk/s,   row/s,mean, 
med, .95, .99,.999, max,   time,   stderr, errors,  gc: #,  max 
ms,  sum ms,  sdv ms,  mb
  4 threadCount, total,161668,2694,2694,2694, 1.4, 
1.4, 1.6, 1.7, 3.2,68.2,   60.0,  0.01264,  0,  6, 
363, 363,   4,3631
  8 threadCount, total,498139,8301,8301,8301, 0.9, 
0.9, 1.1, 1.3, 1.8,73.5,   60.0,  0.00446,  0, 19,
1164,1164,   6,   11266
 16 threadCount, total,765437,   12756,   12756,   12756, 1.2, 
1.2, 1.4, 1.5, 5.7,74.8,   60.0,  0.01251,  0, 29,
1819,1819,   5,   17238
 24 threadCount, total,   1122768,   18710,   18710,   18710, 1.2, 
1.2, 1.4, 1.5, 8.5,   127.7,   60.0,  0.00871,  0, 42,
2538,2538,   5,   25054
 36 threadCount, total,   1649658,   27489,   27489,   27489, 1.3, 
1.2, 1.4, 1.6,60.1,77.7,   60.0,  0.00627,  0, 57,
3652,3652,   7,   33743
 54 threadCount, total,   2258999,   37641,   37641,   37641, 1.4, 
1.3, 1.6, 1.8,62.5,81.7,   60.0,  0.00771,  0, 79,
4908,4908,   6,   46789
 81 threadCount, total,   3255005,   54220,   54220,   54220, 1.5, 
1.2, 1.7, 2.2,63.8,   133.4,   60.0,  0.02030,  0,117,
6953,7008,   9,   72208
121 threadCount, total,   4643184,   77293,   77293,   77293, 1.5,  

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-12-09 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735327#comment-15735327
 ] 

Jason Brown commented on CASSANDRA-8457:


Ok, [~slebresne], next round of review is ready :) I'm currently ferreting out 
a regression with {{TriggersTest}}, but that shouldn't hold up things any 
further.

bq. {{MessageOutHandler.SwappingByteBufDataOutputStreamPlus}} - I'm wondering 
if it's not premature optimization

Yeah, you may be correct here. This is one of the first parts I wrote, and 
clearly I was worried about the "pack multiple messages into a single flush" 
issue. It's possible that a simpler implementation, each message gets own 
buffer, might just be best. 

However, one counter argument here could be that when we have a very large 
message to serialize, think tens of MBs, we would allocate that one large 
buffer, wait to serialize data into it, and then send/flush it. 
{{OutboundTcpConnection}} and my current branch will buffer a certain subset of 
that data (say 64k) and send/flush that, rather than wait for it all to be 
serialized. I'm not sure if that's compelling enough at this time, or if we 
consider that followup ticket/work. Either way, I've removed 
{{MessageOutHandler.SwappingByteBufDataOutputStreamPlus}}.

bq. OMC.enqueue and background queue
Unfortunately, there's a few things that get in our way here of making it that 
simple, assuming we don't want to block threads. The biggest is that we don't 
want to write general messages to the channel until the internode messaging 
handshake completes successfully as it's a three-way handshake and if we start 
sending general messages in the middle of the handshake, it'll break the 
handshake. Further, it's not so clean or obvious how to get the ChannelFuture 
instance without *some* degree of blocking (some of which we might be able to 
avoid with some clever tricks @normanm showed me). Ultimately, we need a 
singular {{Channel}} instance, which would require blocking to get only one 
instance.

Another reason why I opted for the backlog queue was to allow as close to an 
ordered sending of the messages as possible, to a given peer: in other words, 
to write to the channel as many backlogged messages as possible before sending 
the new message. For the vast majority of cassandra's functionality, ordering 
of messages is irrelevant. However, I'm wondering how sensitive repair and 
cluster membership changes might be to this potential edge-case of message 
reordering.

bq. I'd expect {{enqueue()}} (which might warrant a rename) to just do a 
{{ctx.write()}}

Makes sense, done.

That being said, I've left the code in newly-named {{#sendMessage}} and 
{{#finishHandshake}} a bit tentative, with a note that you and I should hammer 
this part out further :)

bq. ... flushing ...

I really like what you've done wrt flushing in {{FlushingHandler}} and 
{{CoalesingStrategies}}. I think as I was trying to maintain both blocking and 
non-blocking behaviors in {{CoalesingStrategies}}, that's why that code got 
twisted and complex. Thus, I've eliminated {{CoalescingMessageOutHandler}} and 
brought in your {{FlushingHandler}} and {{CoalesingStrategies}} changes. One 
thing to point out is that when scheduling a task to a netty executor thread, 
it will execute on the same IO thread as the current context. Thus, in 
{{FlushingHandler}} you don't have to worry about any concurrency between a new 
message arrival and the execution of the task, as they execute on the same 
thread. I've removed your (well-documented!) javadoc comments and the volatile 
keyword on the {{scheduledFlush}} member field.

I agree we can address "targeted benchmarking ... in a separate ticket", as I 
think getting the essential behavior of flush and coalesce is most important 
here.

bq. Regarding dropped messages, the current implementation was going through 
the {{MessagingService.incrementDroppedMessages()}} ...

Hmm, {{OutboundTcpConnection}} has it's own [counter for connection-specific 
dropped 
messages|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L136],
 which I've maintained in {{OutboundMessagingConnection}}. It seems like 
{{OutboundTcpConnection}} should probably be reporting it's dropped messages to 
{{MessagingService.incrementDroppedMessages()}} in addition to what it 
currently does. If you think that's legit, I'll fix it in this patch and I'll 
open a separate ticket to fix it in 3.0+. wdyt?

bq. InboundHandshakeHandler#handshakeHandlerChannelHandlerName

Removed this. I think I needed it with the earlier version of netty? Either 
way, killed it.

bq. In {{OutboundMessagingConnection#writeBacklogToChannel}}, we seem to be 
sending timed out messages if those are retried

This behavior was changed in CASSANDRA-12192, so I've retained it here.

bq. {{MessagingService.MessageSender}}

Removed

bq. 

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-11-07 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15643999#comment-15643999
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

Alright, here's a new round of reviews:
* On {{MessageOutHandler}}:
** I'm a bit unclear on the benefits of the 
{{SwappingByteBufDataOutputStreamPlus}} (whose name I'd btw probably shorten 
given it's private), compared to the simple solution of just allocating a 
{{ByteBuf}} per-message. My analysis is this: if we have no coalescing (which 
may end up being the default back in 4.0, CASSANDRA-12676), then for any 
message smaller than {{bufferCapacity}}, we end up allocating more than needed 
since we flush for each message, so it's less optimal. For larger messages, we 
end up allocating multiple small buffers (instead of a larger), which may be 
better but not sure how much. With coalescing, I guess we do end up potentially 
packing multiple messages into a {{ByteBuf}}, but does that make such a big 
difference (I mean, even if we were to have one {{ByteBuf}} per message, we 
would still not flush on every message, which feel the most important)? So to 
sum up, I can buy that it's slightly better in some cases, but it seems also 
slightly less optimal in other and it's unclear how much it helps when it does, 
so in the absence of any benchmarking, I'm wondering if it's not premature 
optimization. Shouldn't we leave that to later and keep it simpler?
** If we do keep {{SwappingByteBufDataOutputStreamPlus}}, I'd at least inline 
{{WriteState}}. It creates a new indirection which doesn't seem to have much 
benefit, and I admit the naming confused me (a "state" is not the first place I 
looked to find the buffer).
* In {{OutboundMessagingConnection}}:
** In {{writeBacklogToChannel}}, we seem to be sending timeouted messages if 
those are retried: I don't think we want to do that (we should always drop a 
timeouted message).
** I'm not a fan of the use of the {{backlog}} queue for sending message. I'm 
no Netty expert but that doesn't seem to fit Netty's "spirit". Once we're in a 
stable state (the connection is created), I'd expect {{enqueue()}} (which might 
warrant a rename) to just do a {{ctx.write()}} so it's confusing to me it 
doesn't. We do need to handle connection creation and retries, but it feels we 
can handle that easily enough through the future on channel creation. To 
clarify, I'd expect something along the lines of (it's simplified, just to 
clarify what I have in mind):
   {noformat}
void enqueue(MessageOut msg, int id)
{
QueuedMessage qm = new QueuedMessage(msg, id);
if (state == State.READY)
{
channel.write(qm);
}
else
{
ChannelFuture connectionFuture = connect(); // <- connect would deal 
with concurrency.
connectionFuture.addListener(f -> f.channel().write(qm));
}
}
{noformat}
** I'm a bit unclear on the {{flush()}} strategy. The code seems to flush 
basically on every message (ok, once per emptying of the queue, but if we 
ignore connection creation, this can be very well on every message or very 
close to it), but this seems to defeat the coalescing handler since that 
handler will flush on flush (does that make sense?). I could be getting this 
wrong, but if I don't, it seems we shouldn't flush in 
{{OutboundMessagingConnection}} but delegate that task to the coalescing 
handler. But see next point for a more general remark on flush.
* In general, there is 2 important operations that I think could be better 
extracted (so it's cleaner when those happen): flushes and timeouting messages. 
Here my thinking:
** Flushes: it's my understanding that when you flush is pretty important to 
performance. So it would be great if we could centralize where we flush if at 
all possible. In particular, and as said above, I feel we should not flush in 
{{OutboundMessagingConnection}}. What I would do however is add a 
{{FlushingHandler}} for that task. That handler would mostly be the 
{{CoalescingMessageOutHandler}} renamed, but the point of the renaming is to 
make it clear that it's where the decision to flush happens.
** Timeouts: the patch currently look for timeouted message in different places 
and it would be great to consolidate that. In general, I think it's enough to 
check if a message is timeouted once in the pipeline: messages will rarely 
timeout on the sending time by design and if they do, it's likely because the 
channel is overwhelmed and the message has sit on the Netty event executor 
queue too long, and I believe we'd catch well enough with a simple 
{{MessageTimeoutingHandler}}.
* Regarding dropped messages, the current implementation was going through the 
{{MessagingService.incrementDroppedMessages()}} methods, which is still used by 
other parts of the code and is distinguishing between cross-node and local 
messages, but the new code seems to have its own separate 

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-10-17 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583503#comment-15583503
 ] 

Jason Brown commented on CASSANDRA-8457:


Addressed a few problems due to failing dtests. The currently failing utests 
and dtests are almost *all* related to streaming; I'm working on the few 
remaining dtests.

* correctly handle canceled futures in 
{{OutboundMessagingConnection#handleMessageFuture}}
* Fixed NPE in {{OutboundMessagingConnection#reconnectWithNewIp}}
* refactored some code wrt outbound connection creation
* added package-level documentation, primarily describing the handshake and 
messaging format

Also, I have a patched version of netty 4.1.6 in this branch that fixes a flush 
problem in {{LZ4FrameEncoder}}. Netty folks are reviewing now, and will 
hopefully get an updated netty lib soon.

[~slebresne] ready for your next round of comments when you have time :)

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-09-27 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15527775#comment-15527775
 ] 

Jason Brown commented on CASSANDRA-8457:


TL;DR - I've addressed everything except for the interaction between 
{{ClientHandshakeHandler}} and {{Interno -deMessagingConnection}} (both are 
noew renamed). I've noticed the odd rub there, as well, for a while, and I'll 
take some time to reconsider it.

re: "talking points"
- Backward compatibility - bit the bullet, and just yanked the old code
- streaming - [~slebresne] and I talked offline, and CASSANDRA-12229 will 
address the streaming parts, and will be worked on/reviewed concurrently. Both 
tickets will be committed together to avoid breaking streaming.

re: comments section 1
- Netty openssl - when I implemented this back in February, there was no 
mechanism to use {{KeyFactoryManager}} with the OpenSSL implementaion. 
Fortunately, this has changed since I last checked in, so I've deleted the 
extra {{keyfile}} and friends entries from the yaml/{{Config}}.
- "old code" - deleted now
- package javadoc - I absolutely want this :), I just want things to be more 
solid code-wise before diving into that work.
- naming - names are now more consistent using In/Out (or Inbound/Outbound), 
and use of client/server is removed.

re: comments section 2
- {{getSocketThreads()}} - I've removed this for now, and will be resolved with 
CASSANDRA-12229
- {{MessagingService}} renames - done
- {{MessagingService#createConnection()}} In the previous implementation, 
{{OutboundTcpConnectionPool}} only blocked on creating the threads for it's 
wrapped {{OutboundTcpConnection}} instances (gossip, large, and small 
messages). No sockets were actually opened until a message was actually sent to 
that peer {{OutboundTcpConnection#connect()}}. Since we do not spawn a separate 
thread for each connection type (even though we will have separate sockets), I 
don't think it's necessary to block {{MessagingService#createConnection()}}, or 
more correctly now, {{MessagingService.NettySender#getMessagingConnection()}}.
- "Seems {{NettySender.listen()}} always starts a non-secure connection" - You 
are correct; however, looks like we've always been doing it that way (for 
better or worse). I've gone ahead and made the change (it's a one liner, plus a 
couple extra for error checking).
- {{ClientConnect#connectComplete}} - I've renmaed the function to be more 
accurate ({{connectCallback}}).
- {{CoalescingMessageOutHandler}} - done

Other issues resolved, as well. Branch has been pushed (with several commits at 
the top) and tests running.


> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-09-26 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523138#comment-15523138
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

Here comes more review bits, though it's still incomplete as I have to context 
switch a bit. I'll start by a number of general remarks, followed by some more 
specific comments on some classes, and a few random nits to finish.

The first thing that I think we should discuss is how we want to handle the 
transition to this (that is, what happens to the old implementation). Namely, I 
see 2 possible options:
# we simply switch to Netty, period, and the old implementation goes away;
# we keep both implementations in, with a configuration flag to pick which one 
is used as implemented in the patch, and this probably until 5.0.
I think the main advantage of keeping both implementation for a while is that 
it's somewhat safer and easier for user to evaluate outside of the 4.0 upgrade 
itself. But the downsides are obviously that it's a lot more code to maintain, 
it's more messy right off the bat, and given only the new implementation will 
stay eventually, it's only pushing the switch to later. Personally, since we'll 
ship this in a new major (4.0) and since the cost of maintaining 2 
implementations is pretty unpleasant and we don't have infinite devs resources, 
I'd have a slightly preference for option 1, but I'm not going to argue 
strongly either way. Mostly, I want us to decide, because I think this 
influence the final product here.

The second thing I want to mention is streaming. I could be missing something 
here but it appears streaming is just not supported at all by the new code, 
meaning that the patch as is is unusable outside of toy testing (I mean by that 
that if you start a node with the Netty MessagingService, it seems you can't 
received any streaming, so you're not a fully functioning node). So I'm kind of 
wondering what is the plan here. I'm happy to thinker on another ticket as 
Streaming is a best on its own, but I'm not sure I'm ok committing anything 
until it's a fully working product.

So anyway, with those talking point out of the way, a few more "technical" 
general remarks on the patch:
* Netty's openssl: what's the fundamental difference between the new 
{{keyfile}} and the old {{keystore}}? It sounds confusing to have to configure 
new things when there is already pre-existing parameters for SSL.
* Some "old" code is made public (many constants in particular) and reused by 
the "new" code, while other bits are copy-pasted, but it's not very consistent 
and makes things sometimes a bit hard to follow ({{QueuedMessage}} (used by new 
code too) being an internal class of {{OutboundTcpConnection}} (old code) is a 
particularly bad offender imo). That's typically where I want us to make a 
decision on the point I raise above, as whatever decision we take I feel there 
is some cleanups to be done.
* It would be really nice to have a package javadoc for the new "async" package 
that explains the main classes involved and how they interact.
* There is a few TODO that needs to be handled.
* I feel some of the naming could be made more consistent. In particular, the 
patch uses Client/Server for some classes (e.g. 
{{ClientHandshakeHandler}}/{{ClientConnector}}) but In/Out for others (e.g. 
{{MessageInHandler}}). And things like {{InternodeMessagingPool}} is imo less 
good than the old {{OutboundTcpConnectionPool}} in that it doesn't indicates 
it's only for outbound connections. I'd prefer avoid Client/Server for 
internode stuffs and reserve it for native protocol classes, sticking to In/Out 
(or Inbound/Outbound when that reads better) for internode classes, like we did 
before. I understand this may have been to avoid name clash with existing 
classes, but maybe that's where we should also decide how long the old 
implementation to stay in.

Then a few more specific comments:
* In {{MessagingService}}:
** We should probably move everything related to {{ServerSocket}} inside the 
new {{BlockingIOSender}} (though again, if we decide to just ditch the old 
implementation, it simply goes away).
** Related to the previous point, the {{getSocketThreads()}} is used by 
{{StreamingTransferTest}} to apparently wait for the streaming connections to 
close. Returning an empty list seems to kind of break that. I think we could 
replace the method by some {{awaitAllStreamingConnectionsClosed()}} (that would 
still just be for testing). Of course, that's kind of streaming related.
** I would probably rename {{switchIpAddress()}} to {{reconnectWithNewIp()}} as 
it makes it more explicit what does happen (or at least have some javadoc). 
Similarly, {{getCurrentEndpoint()}} is imo no very clearly named (not this 
patch fault though), maybe {{getPreferredAddress()}}?
** In {{createConnection()}}, it appears the old implementation was starting 
the 

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-09-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514498#comment-15514498
 ] 

Jason Brown commented on CASSANDRA-8457:


I've pushed a couple commits to the same branch, and kicked off tests now.

re: {{LegacyClientHandler}} - it ended up not being too difficult to parse the 
message "header" to get through to the payload size. It's implemented in 
{{MessageReceiveHandler}}. I've also removed {{LegacyClientHandler}} and the 
original message in classes, {{MessageInHandler}} and 
{{AppendingByteBufInputStream}} and their tests.

re: flush: yup, dumped my counter thing, and am using 
{{FlushConsolidationHandler}}. There's still some finer semantics/behaviors to 
groom over wrt flushing, but this is a better solution already.

re: {{MessagingService}} As a short term solution, I created an interface to 
abstract out all the blocking IO vs netty related stuffs in MS. It's not a 
thing of beauty, but hopefully it's cleaner and won't need to live all that 
long. I hope we can live with this during the scope of the review.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-09-20 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508103#comment-15508103
 ] 

Jason Brown commented on CASSANDRA-8457:


re: {{LegacyClientHandler}}, and v3 vs v4 (or, "to frame or not to frame")

I'll admit this was a more difficult decision to make in this the patch. While 
I'm not worried (right now) about the size occurring twice in the message (it 
is only four bytes, and we could move it ;)), I was more concerned about being 
able to fish out the payload size from *after* the message's parameters, for 
which we have no size information. A quick scan of the current code shows we 
only use the message headers for tracing and multi-DC mutations. In both cases, 
the total data is small (tens of bytes). Of course, we don't know if we have 
all the bytes for deserializing without a size field. In the case of having 
headers in a message but not enough bytes in the buffer, we'll then need to 
wait until more bytes come in and try deserailizing again. If we are OK with 
that possibility (looping a couple times for more bytes, creating extra 
garbage), then I have no major concerns about switching to that model, 
especially if we avoid altering the protocol, which I agree would be great to 
defer a different ticket if possible.

bq. Why not just submit a flush task to the channel {{EventLoop}} ...

Yeah, this makes sense, and is much more sane than my counters idea. I spoke 
with the netty folks, and they recommended using netty's 
{{FlushConsolidationHandler}} so I'll give that a shot before something custom 
first. (That spotify example was instructive, though, so thanks!)

bq. The changes to {{MessagingService}} ...

tbh, I think we only need the two implementations until either CASSANDRA-12229 
or CASSANDRA-8911 resolves how streaming is affected by this ticket (streaming 
is effectively broken, for now, with the current internode messaging changes 
because it shares the same listening socket with streaming). This assumes, of 
course, that either of those tickets ship in the same rev as this one, which 
might not be a great idea. So for now, at a minimum, I'll abstract out the 
implementations, and we can throw out the legacy when when ready on either of 
the other two tickets is ready as well. I'm open for disucssion on how/if 
changes to internode messaging should affect streaming (i've discussed it a bit 
on CASSANDRA-12229).

Thanks for the initial comments, I'll have some updated code in a day or so.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-09-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502872#comment-15502872
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
-

I'm still very much in the process of reviewing this and I'll have more on this 
later, but I wanted to provide a few early remarks instead of waiting to having 
fully digested it all:
* On {{LegacyClientHandler}}, I'm no sure it's really needed as current v3 
messages actually do ship with their payload size. Now, granted said size is 
actually within the {{MessageIn/MessageOut}} object rather than the "framing" 
(the distinction between the 2 being a bit of an implementation detail though 
tbh), and there is a little bit more work to get to it than if it was the first 
thing in the "frame" as done by this patch for v4, but it should still be a lot 
easier than what {{LegacyClientHandler}} does. This also means the v4 format 
include twice the (exact same) payload size, which feels wasteful. On that v4 
format, I actually think we could use to rethink that framing/message format a 
bit as we can easily save a few bytes, *but* it's definitively for a different 
ticket. But maybe for this thicket it's actually simpler to stick to the v3 
format using the payload size that prefix said payload.
* Regarding the flushing, I agree relying on counters doesn't feel great. Why 
not just submit a flush task to the channel {{EventLoop}} (something like what 
[this|https://github.com/spotify/netty-zmtp/blob/master/src/main/java/com/spotify/netty4/util/BatchFlusher.java]
 does), which accordingly to Netty 4 thread model should ensure it happens 
after any outstanding other writes.
* The changes to {{MessagingService}} feels a bit ugly tbh given that we only 
ever use one implementation at runtime. I'd prefer making {{MessagingService}} 
abstract, abstracting the (relatively few) methods that use the connections, 
and have 2 sub-classes.


> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-08-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419276#comment-15419276
 ] 

Jason Brown commented on CASSANDRA-8457:


rebased and cleaned up some of the error handling/channel closing based on more 
discussion with [~norman] and Scott Mitchell. 

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-07-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372251#comment-15372251
 ] 

Jason Brown commented on CASSANDRA-8457:


I've pushed a new commit to the same branch after addressing the comments from 
[~norman] and Scott Mitchell - thanks guys for the feedback!. Tests running now.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-07-10 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370179#comment-15370179
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] I did a first review-round and left some comments. All in all it 
looks solid from a Netty perspective, can't say a lot about cassandra 
perspective here :) Let me know if you have any questions etc.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-07-08 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368604#comment-15368604
 ] 

Jason Brown commented on CASSANDRA-8457:


Here's the first pass at switching internode messaging to netty.
||8457||
|[branch|https://github.com/jasobrown/cassandra/tree/8457]|
|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-8457-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-8457-testall/]|

I've tried to preserve all of the functionality/behaviors of the existing 
implementation as I could, and some aspects were a bit tricky in the 
non-blocking IO, netty world. I've also extensively documented the code as much 
as I can, and I still want to add more high-level docs on 1) the internode 
protocol itself, and 2) the use of netty in internode messaging. Hopefully the 
current state of documentation helps understanding and reviewing the changes. 
Here's some high-level notes as points of departure/interest/discussion:

- I've left the existing {{OutboundTcpConnection}} code largely intact for the 
short term (read on for more detail). But mostly the new and existing behaviors 
and coexist in the code together (though not at run time)
- There is a yaml property to enable/disable using netty for internode 
messaging. If disabled, we'll fall back to the existing 
{{OutboundTcpConnection}} code. Part of this stems from the fact that streaming 
also uses the same the socket infrastructure as internode messaging handshake 
as messaging, and streaming would be broken without the 
{{OutboundTcpConnection}} implementation. I am knees deep in switching 
streaming over to a non-blocking, netty-based solution, but that is a separate 
ticket/body of work.
- In order to support non-blocking IO, I've altered the internode messaging 
protocol such that each message is framed, and the frame contains a message 
size. The protocol change is what forces these changes to happen at a major rev 
update, hence 4.0
- Backward compatibility - We will need to handle the case of cluster upgrade 
where some nodes are on the previous version of the protocol (not upgraded), 
and some are upgraded. The upgraded nodes will still need to behave and operate 
correctly with the older nodes, and that functionality is encapsulated and 
documented in {{LegacyClientHandler}} (for the receive side) and 
{{MessageOutHandler}} for the send side.
- Message coalescing - The existing behaviors in {{CoalescingStrategies}} are 
predicated on parking the thread to allow outbound messages to arrive (and be 
coalesced). Parking a thread in a non-blocking/netty context is a bad thing, so 
I've inverted the behavior of message coalescing a bit. Instead of blocking a 
thread, I've extended the {{CoalescingStrategies.CoalescingStrategy}} 
implementations to return a 'time to wait' to left messages arrive for sending. 
I then schedule a task in the netty scheduler to execute that many nanoseconds 
in the future, queuing up incoming message, and then send them out when the 
scheduled task executes (this is {{CoalescingMessageOutHandler}}). I've also 
added callback functions to {{CoalescingStrategies.CoalescingStrategy}} 
implementations for the non-blocking paradigm to record updates to the strategy 
(for recalculation of the time window, etc).
- Message flushing - Currently in {{OutboundTcpConnection}}, we only call flush 
on the output stream if the backlog is empty (there's no more messages to send 
to the peer). Unfortunately there's no equivalent API in netty to know there's 
any messages in the channel waiting to be sent. The solution that I've gone 
with is to have a shared counter outside of the channel 
{{InternodeMessagingConnection#outboundCount}} and inside the channel 
{{CoalescingMessageOutHandler#outboundCounter}}, and when 
{{CoalescingMessageOutHandler}} sees the value of that counter is zero, it 
knows it can explicitly call flush. I'm not entirely thrilled with this 
approach, and there's some potential race/correctness problems (and 
complexity!) when reconnections occur, so I'm open to suggestions on how to 
achieve this functionality.
- I've included support for netty's OpenSSL library. The operator will need to 
deploy an extra netty jar (http://netty.io/wiki/forked-tomcat-native.html) to 
get the OpenSSL behavior (I'm not sure if we can or want to include it in our 
distro). {{SSLFactory}} needed to be refactored a bit to support the OpenSSL 
functionality.

I'll be doing some more extensive testing next week (including a more thorough 
exploration of the backward compatibility).

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Priority: Minor
>   

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-02-10 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141370#comment-15141370
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] you know where to find me ;)

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: performance
> Fix For: 3.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2015-01-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271701#comment-14271701
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

Took a stab at writing an adaptive approach to coalescing based on a moving 
average. Numbers look good for the workloads tested.
Code 
https://github.com/aweisberg/cassandra/compare/6be33289f34782e12229a7621022bb5ce66b2f1b...e48133c4d5acbaa6563ea48a0ca118c278b2f6f7

Testing in AWS, 14 servers 6 clients.

Using a fixed coalescing window at low concurrency there is a drop of 
performance from 6746 to 3929. With adaptive coalescing I got 6758.

At medium concurrency (5 threads per client, 6 clients) I got 31097 with 
coalescing disable and 31120 with coalescing.

At high concurrency (500 threads per client, 6 clients) I got 479532 with 
coalescing and 166010 without. This is with a maximum coalescing window of 200 
milliseconds.

I added debug output to log when coalescing starts and stops and it's 
interesting. At the beginning of the benchmark things flap, but they don't flap 
madly. After a few minutes it settles. I also notice a strange thing where CPU 
utilization at the start of a benchmark is 500% or so and then after a while it 
climbs. Like something somewhere is warming up or balancing. I recall seeing 
this in GCE as well.

I had one of the OutboundTcpConnections (first to get the permit) log a trace 
of all outgoing message times. I threw that into a histogram for informational 
purposes. 50% of messages are sent within 100 microseconds of each other and 
92% are sent within one millisecond. This is without any coalescing.

{noformat}
   Value Percentile TotalCount 1/(1-Percentile)

   0.000 0.   5554   1.00
   5.703 0.1000 124565   1.11
  13.263 0.2000 249128   1.25
  24.143 0.3000 373630   1.43
  40.607 0.4000 498108   1.67
  94.015 0.5000 622664   2.00
 158.463 0.5500 684867   2.22
 244.351 0.6000 747137   2.50
 305.407 0.6500 809631   2.86
 362.239 0.7000 871641   3.33
 428.031 0.7500 933978   4.00
 467.711 0.7750 965085   4.44
 520.703 0.8000 996254   5.00
 595.967 0.82501027359   5.71
 672.767 0.85001058457   6.67
 743.935 0.87501089573   8.00
 780.799 0.88751105290   8.89
 821.247 0.90001120774  10.00
 868.351 0.91251136261  11.43
 928.767 0.92501151889  13.33
1006.079 0.93751167421  16.00
1049.599 0.943750001175260  17.78
1095.679 0.95001183041  20.00
1143.807 0.956250001190779  22.86
1198.079 0.96251198542  26.67
1264.639 0.968750001206301  32.00
1305.599 0.971875001210228  35.56
1354.751 0.97501214090  40.00
1407.999 0.978125001217975  45.71
1470.463 0.981250001221854  53.33
1542.143 0.984375001225759  64.00
1586.175 0.985937501227720  71.11
1634.303 0.98751229643  80.00
1688.575 0.989062501231596  91.43
1756.159 0.990625001233523 106.67
1839.103 0.992187501235464 128.00
1887.231 0.992968751236430 142.22
1944.575 0.993750001237409 160.00
2007.039 0.994531251238384 182.86
2084.863 0.995312501239358 213.33
2174.975 0.996093751240326 256.00
2230.271 0.9964843750001240818 284.44
2293.759 0.996875001241292 320.00
2369.535 0.9972656250001241785 365.71
2455.551 0.997656251242271 426.67
2578.431 0.9980468750001242752 512.00
2656.255 0.9982421875001242999 568.89
2740.223 0.998437501243244 640.00
2834.431 0.9986328125001243482 731.43
2957.311 0.9988281250001243725 853.33
3131.391 0.99902343750012439691024.00
3235.839 0.99912109375012440911137.78
3336.191 0.9992187512442121280.00
3471.359 0.99931640625012443321462.86
3641.343 0.99941406250012444551706.67
3837.951 0.99951171875012445762048.00
4001.791 0.99956054687512446362275.56
4136.959 0.99960937500012446972560.00
4399.103 0.99965820312512447582925.71
4628.479 

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2015-01-07 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268221#comment-14268221
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

I wanted to account for the impact of coalescing at low concurrency. Low 
concurrency is not a recipe for great performance, but it is part of the out of 
the box experience and people do compare different databases at low concurrency.

In GCE coalescing provided a 12% increase in throughput in this specific 
message heavy high concurrency workload. The penalty is that at low concurrency 
there is an immediate loss of performance with any coalescing and a large 
window has a greater impact at low concurrency so there is tension there. The 
larger the window the better the performance bump.

Testing with 3 client threads each running on a dedicated client instance (3 
threads total). This is in GCE.

With TCP no delay on and coalescing
||Coalesce window microseconds|Throughput||
|0| 2191|
|6| 1910|
|12| 1873|
|25| 1867|
|50| 1779|
|100| 1667|
|150| 1566|
|200| 1491|

I also tried disabling coalescing when using HSHA and it didn't seem to make a 
difference. Surprising considering the impact of 25 microseconds of coalescing 
intra-cluster.

I also experimented with some other things. Binding interrupts cores 0 and 8 
and task setting C* off of those cores. I didn't see a big impact.

I did see a small positive impact using 3 clients 8 servers which means the 
measurements with 2 clients might be a little suspect. With 3 clients and 200 
microseconds of coalescing it peaked at 165k in GCE.

I also found out that banned CPUs in irqbalance is broken and has no effect and 
this has been the case for some time.

Right scale does not offer instances with enhanced networking. To find out 
whether coalescing provides real benefits in EC2/Xen or milder GCE like 
benefits I will have to get my hands on some.


 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2015-01-06 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266963#comment-14266963
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

I ran on GCE. 11 n1-highcpu-16, 2 clients 9 servers. 
ubuntu-1404-trusty-v20141212

GCE distributes interrupts to all cores by default.

The conclusion is that on GCE at that cluster config coalescing provides some 
benefit, although not to the extreme degree it was beneficial in EC2 without 
enhanced networking. The window to coalesce in doesn't have to be large to get 
the value. Binding interrupts to core 0 has a negative impact on throughput 
that doesn't change depending on whether no delay is enabled/disabled, but 
coalescing and binding yields the same similar throughput as distributing 
interrupts and coalescing.

It's very much an apples to oranges comparison to the EC2 c3.8xlarge which is 
much bigger instance (certainly in terms of RAM and exposed hardware threads) 
and more likely to benefit from more packet throughput. They are also 
completely different virtualization technologies and I guess it shows with 
things like toggling no delay having no impact in GCE even though performing 
the coalescing manually does.

Running with TCP no delay off
132254
132862

With TCP no delay on and coalescing
||Coalesce window microseconds| Throughput||
|200| 149791|
|150| 148755|
|100| 147678|
|50| 142565|
|25| 144542|
|12| 141679|
|6| 142626|
|0| 133905|
|0| 132849|

I then tested with all interrupts bound to core 0

With TCP no delay off 
116157
No delay on, no coalescing
118134
No delay on, coalescing 200 microseconds
147257
146453


 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2015-01-06 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266422#comment-14266422
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

https://forums.aws.amazon.com/thread.jspa?messageID=459260
This looks like a Xen/EC2 issue. I'll bet bare metal never has this issue 
because it has multiple interrupt vectors for NICs. The only work around is 
using multiple elastic network interfaces and doing something to make that 
practical for intra-cluster communication.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2015-01-06 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266402#comment-14266402
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

I think I stumbled onto what is going on based on Benedict's suggestion to 
disable TCP no delay. It looks like there is a small message performance issue. 

This is something I have seen before in EC2 where you can only send a 
surprisingly small number of messages in/out of a VM. I don't have the numbers 
from when I micro benchmarked it, but it is something like 450k messages with 
TCP no delay and a million or so without. Adding more sockets helps but it 
doesn't  even double the number of messages in/out. Throwing more cores at the 
problem doesn't help you just end up with under utilized cores which matches 
the mysterious levels of starvation I was seeing in C* even though I was 
exposing sufficient concurrency.

14 servers nodes. 6 client nodes. 500 threads per client. Server started with
row_cache_size_in_mb : 2000,
key_cache_size_in_mb:500,
rpc_max_threads : 1024,
rpc_min_threads : 16,
native_transport_max_threads : 1024
8-gig old gen, 2 gig new gen.

Client running CL=ALL and the same schema I have been using throughout this 
ticket.

With no delay off
First set of runs
390264
387958
392322
After replacing 10 instances
366579
365818
378221

No delay on 
162987

Modified trunk to fix a bug batching messages and add a configurable window for 
coalescing multiple messages into a socket see 
https://github.com/aweisberg/cassandra/compare/f733996...49c6609
||Coalesce window microseconds|Throughput||
|250| 502614|
|200| 496206|
|150| 487195|
|100| 423415|
|50| 326648|
|25| 308175|
|12| 292894|
|6| 268456|
|0| 153688|

I did not expect get mileage out of coalescing at the application level but it 
works extremely well. CPU utilization is still low at 1800%. There seems to be 
less correlation between CPU utilization and throughput as I vary the 
coalescing window and throughput changes dramatically. I do see that core 0 is 
looking pretty saturated and is only 10% idle. That might be the next or actual 
bottleneck.

What role this optimization plays at different cluster sizes is an important 
question. There hast to be a tipping point where coalescing stops working  
because not enough packets go to each end point at the same time. With vnodes 
it wouldn't be unusual to be communicating with a large number of other hosts 
right?

It also takes a significant amount of additional latency to get the mileage at 
high levels of throughput, but at lower concurrency there is no benefit and it 
will probably show up as decreased throughput. It makes it tough to crank it up 
as a default. Either it is adaptive or most people don't get the benefit.

At high levels of throughput it is a clear latency win. Latency is much lower 
for individual requests on average. Making this a config option is viable as a 
starting point. Possibly a separate option for local/remote DC coalescing. 
Ideally we could make it adapt to the workload.

I am going to chase down what impact coalescing has at lower levels of 
concurrency so we can quantify the cost of turning it on. I'm also going to try 
and get to the bottom of all interrupts going to core 0. Maybe it is the real 
problem and coalescing is just a band aid to get more throughput.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2015-01-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263096#comment-14263096
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

i can't get performance counters for cache behaviors on EC2 as far as I can 
tell and I don't have a good answer for why I get the performance numbers I am 
seeing.

I ran the measurements with CL.QUORUM, ONE, and ALL against trunk and my branch 
with/without rpc_max_threads increased to 1024.

This was prompted by measurements on a 15 node cluster where CL.ONE was 10x 
faster then CL.ALL. I measured the full matrix on a 9 node cluster and CL.ONE 
was 5x faster than CL.ALL which with RF=5 is the expected performance delta. I 
definitely see under utilization. With CL.ONE run right at 1600% and with 
CL.ALL they don't make it up that high although trunk does better in that 
respect.

The under utilization is worse with the modified code that uses SEPExecutor. I 
maybe have to run with 15 nodes again to see if the jump from 9-15 is what 
causes CL.ALL to perform worse or if the difference is that I was using a 
placement group and 14.04 in the 9 node cluster.

The change to use SEPExecutor for writes was slightly slower to a lot slower in 
QUORUM and ALL cases at 9 nodes. I think that is a dead end, but I do wonder if 
that is because SEPExecutor might not have the same cache friendly behavior 
that running dedicated threads does. Dedicated threads require signaling and 
context switching, but thread scheduling policies could result in threads 
servicing each socket alway running in the same spot.

I am going to try again with netty. I should at least be able to match the 
performance of trunk with a non-blocking approach so I think it is still worth 
digging.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-29 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260565#comment-14260565
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

I reran this with placement groups. Took a while to get cluster setup scripted. 
Unfortunately I couldn't go past 20 nodes due to a restriction on # of spot 
instances. Might be able to start regular instances in the same placement 
group. Still working on it.

I upped the number of servers to 14 thinking the clients weren't heavily loaded 
since I got more throughput per client with only 3 servers and there is not a 
lot of load on them. This is going from 9 to 14 servers. Throughput was worse 
so something is wrong. This is a different kernel, I went from 12.04 with the 
3.2 kernel to 14.04 running 3.13 so more then one variable has changed.

I wanted to add more clients to see if that would get the throughput up. 
Something is very fishy. Clients claim they are fully connected to the cluster. 
They print New Cassandra host /172.31.56.45:9042 added for each server node.

||Run|trunk|modified||
|#1|101,395|114,981|
|#2|123,916|98,308|
|#3|117,831|109,442|
|#4|120,267|111,571|

Going to try and find out why adding capacity did not yield more throughput so 
I can start running valid benchmarks.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-22 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256293#comment-14256293
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

Testing on AWS with 6 client and 9 servers. c3.8xlarge running Ubuntu 12.04 and 
no placement groups. I changed the workload to run RF=5 with the same data set 
size and increased the number of operations so the test runs longer. All tests 
were run on the same set of instances without restarting the instances.

Numbers for trunk are all over the map and range from fantastically faster to 
slower. The modified version was more consistent but slower. When trunk was 
faster it had better CPU utilization. There is a correlation between low CPU 
utilization (not 1600% on a 16 core 32 thread server) and lower throughput. It 
kind of suggests there is some kind of starvation or contention preventing 
tasks from flowing.

||Run|Trunk|Modified||
|#1| 147925|119058|
|#2|101740| 139155|
|#3| 197265|147973|
|#4|105774|#|

Trunk performance is not ideal if only from a consistency perspective. I am 
going to try and get the performance counters for cache misses and such from 
perf as well as a profile tomorrow.


 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-22 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256326#comment-14256326
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

Repeating these benchmarks, trunk can be slower as or more often then it is 
faster. It's one of those peak performance vs. average and worst case sort of 
decisions.

This is odd. Some measurements running trunk. Soft interrupt hotspot on one 
CPU. Don't see why that should happen. Interrupt hotspots are usually related 
to not enough TCP streams and there should be several in this workload. Kind of 
looks like someone somewhere pinned interrupts to core 0.
{noformat}
11:20:04 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
11:20:05 PM  all   11.190.009.820.000.000.981.27
0.00   76.74
11:20:05 PM0   18.970.00   12.070.000.00   32.761.72
0.00   34.48
11:20:05 PM16.120.00   10.200.000.000.004.08
0.00   79.59
11:20:05 PM27.550.00   11.320.000.000.003.77
0.00   77.36
11:20:05 PM3   11.110.009.260.000.000.001.85
0.00   77.78
11:20:05 PM4   17.860.005.360.000.000.000.00
0.00   76.79
11:20:05 PM5   10.710.008.930.000.000.000.00
0.00   80.36
11:20:05 PM6   10.170.00   10.170.000.000.001.69
0.00   77.97
11:20:05 PM7   10.000.00   11.670.000.000.001.67
0.00   76.67
11:20:05 PM8   13.560.006.780.000.000.001.69
0.00   77.97
11:20:05 PM9   12.500.00   10.940.000.000.001.56
0.00   75.00
11:20:05 PM   10   15.150.009.090.000.000.001.52
0.00   74.24
11:20:05 PM   117.940.00   12.700.000.000.001.59
0.00   77.78
11:20:05 PM   129.520.009.520.000.000.001.59
0.00   79.37
11:20:05 PM   139.090.00   12.120.000.000.000.00
0.00   78.79
11:20:05 PM   14   11.760.00   10.290.000.000.001.47
0.00   76.47
11:20:05 PM   159.230.009.230.000.000.001.54
0.00   80.00
11:20:05 PM   16   10.290.00   14.710.000.000.001.47
0.00   73.53
11:20:05 PM   17   10.450.00   11.940.000.000.000.00
0.00   77.61
11:20:05 PM   18   11.940.008.960.000.000.001.49
0.00   77.61
11:20:05 PM   19   12.680.00   12.680.000.000.001.41
0.00   73.24
11:20:05 PM   20   13.240.008.820.000.000.000.00
0.00   77.94
11:20:05 PM   215.800.00   15.940.000.000.000.00
0.00   78.26
11:20:05 PM   22   13.890.009.720.000.000.001.39
0.00   75.00
11:20:05 PM   239.380.009.380.000.000.000.00
0.00   81.25
11:20:05 PM   247.810.009.380.000.000.001.56
0.00   81.25
11:20:05 PM   258.960.008.960.000.000.001.49
0.00   80.60
11:20:05 PM   26   11.590.00   10.140.000.000.000.00
0.00   78.26
11:20:05 PM   27   13.040.007.250.000.000.001.45
0.00   78.26
11:20:05 PM   287.810.007.810.000.000.000.00
0.00   84.38
11:20:05 PM   29   11.760.008.820.000.000.000.00
0.00   79.41
11:20:05 PM   30   11.430.00   10.000.000.000.001.43
0.00   77.14
11:20:05 PM   31   12.500.004.690.000.000.000.00
0.00   82.81
{noformat}  
 
Modified mpstat output is similar. perf stat doesn't have access to performance 
counters in these instances. I'll have to see if I can get instances that do 
that. I have a flight recording of each, but not a flight recording of great 
performance on trunk.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249678#comment-14249678
 ] 

Benedict commented on CASSANDRA-8457:
-

A couple of comments on the code: 

# Requiring synchronized seems _possible_ to negate any benefit, since we could 
easily get more threads competing. Certainly the cost seems higher than what we 
save from a lazySet; my typical strategy for this situation is to set the 
wakeup flag to true, check if any more work needs to be done, and _if so_ 
atomically re-adopt the state. So the despatch task would be a a loop, 
terminating only when there is definitely no work to do.
# If we're worrying about context switching, we should probably switch to CLQ 
so that producers never conflict. If we're worried about less efficient 
draining, we can later introduce a MPSC queue with an efficient drainTo().
# droppedUpdater.decrementAndGet() looks like a typo

However I doubt any of these will have a _significant_ impact. 

The stress test you're running should exercise these pathways in a typical 
manner. I'm not terribly surprised at a lack of impact, but it is probably 
worth trying on a much larger cluster to see if more-often-empty queues and a 
surfeit of threads can elicit a result.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-17 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249854#comment-14249854
 ] 

sankalp kohli commented on CASSANDRA-8457:
--

I agree with [~iamaleksey]. We should try this with many nodes. 

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-17 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249884#comment-14249884
 ] 

T Jake Luciani commented on CASSANDRA-8457:
---

This test doesn't deal with the signaling cost within the node.  We have a 
thread blocking on the readExecutors completing.  This does tie into 
CASSANDRA-5239 but part of the goal here is to push the callbacks to the edge 
without relying on explicit block/signalling calls.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249930#comment-14249930
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

#1 The wakeup is protected by a CAS so in the common case there shouldn't be 
multiple threads contending to dispatch. The synchronized block is there for 
the case where the thread that is finishing up dispatching signals that it is 
going to sleep and a dispatch task will be necessary for the next submission. 
At that point it has to check the queue one more time to avoid lost wakeups, 
and it is possible a new dispatch task will be created while that is happening. 
The synchronized forces the new task to wait while the last check and drain 
completes. How often this race occurs and blocks a thread I have no idea. I 
could add a counter and check.

The only way to avoid it is to lock while checking the queue empty condition 
and updating the needs wakeup field, or to have a 1:1 mapping between sockets 
and dispatch threads (AKA not SEP). This would force producers to lock on task 
submission as well. I don't see how the dispatch task can atomically check that 
there is no work to do and set the needs wakeup flag at the same time. At that 
point is there a reason to use a lock free queue? 

#2 I didn't replace the queue because I needed to maintain size for the dropped 
message functionality and I didn't want to reason about maintaining size 
non-atomically with queue operations like offer/poll/drainto. I could give it a 
whirl. I am also not sure how well iterator.remove in CLQ works, but I can 
check.

#3 Indeed this is a a typo

Jake it definitely doesn't address several sources of signaling, but should 
reduce total # of threads signaled per request.

I will profile the two versions today and then add more nodes. For benchmark 
purposes I could disable the message dropping functionality and use 
MPSCLinkedQueue from Netty.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249940#comment-14249940
 ] 

Benedict commented on CASSANDRA-8457:
-

bq. The only way to avoid it is to lock while checking the queue empty 
condition and updating the needs wakeup field, or to have a 1:1 mapping between 
sockets and dispatch threads (AKA not SEP). This would force producers to lock 
on task submission as well. I don't see how the dispatch task can atomically 
check that there is no work to do and set the needs wakeup flag at the same 
time. At that point is there a reason to use a lock free queue?

{code}
private Runnable dispatchTask = new Runnable() {
@Override
public void run() {
while (true)
{
while (!backlog.isEmpty()) 
dispatchQueue();

needsWakeup = true;
if (backlog.isEmpty() || 
!needsWakeupUpdater.compareAndSet(OutboundTcpConnection.this, TRUE, FALSE))
return;
}
}
};
{code}

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249943#comment-14249943
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

Well there you have it. Thanks Benedict.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249085#comment-14249085
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

I have some code and results. https://github.com/aweisberg/cassandra/tree/C-8457

I tested on AWS using a 3 node cluster of c3.8xlarge instances in the same 
placement group using HVM with Ubuntu 14.04. Other then 
/etc/security/limits.conf I made no changes to the install which was the 
Rightscale ServerTemplate Base ServerTemplate for Linux (v14.1.0).

Config provided to cstar bootstrap was 
{code:JavaScript}
{
revision: aweisberg/C-8457,
label: test,
yaml: key_cache_size_in_mb: 256\nrow_cache_size_in_mb: 
2000\ncommitlog_sync: periodic\ncommitlog_sync_batch_window_in_ms: 
null\ncommitlog_sync_period_in_ms: 1\ncompaction_throughput_mb_per_sec: 
0\nconcurrent_compactors: 4,
env: MAX_HEAP_SIZE=8g\nHEAP_NEWSIZE=2g,
options: {
use_vnodes: true
}
}
{
commitlog_directory: /mnt/ephemeral/commitlog,
data_file_directories: [
/mnt/ephemeral/datadir
],
block_devices: [
/dev/mapper/vg--data-ephemeral0
],
blockdev_readahead: 128,
hosts: {
ec2-54-175-1-84.compute-1.amazonaws.com: {
internal_ip: 172.31.49.199,
hostname: ec2-54-175-1-84.compute-1.amazonaws.com,
seed: true
 },
ec2-54-175-32-238.compute-1.amazonaws.com: {
internal_ip: 172.31.53.77,
hostname: ec2-54-175-32-238.compute-1.amazonaws.com,
seed: true
},
ec2-54-175-32-206.compute-1.amazonaws.com: {
internal_ip: 172.31.57.63,
hostname: ec2-54-175-32-206.compute-1.amazonaws.com,
seed: true
}
},
user: ariel_weisberg,
name: example1,
saved_caches_directory: /mnt/ephemeral/caches
}
{code}

To populate data I used
bq. ./cassandra-stress write n=10 -pop seq=1...10 no-wrap -rate 
threads=50 -col 'n=fixed(1)' -schema 'replication(factor=3)' -node 
file=$HOME/hosts

To read I used
bq. ./cassandra-stress read n=1000 cl=ALL -pop 'dist=UNIFORM(1...10)' 
-rate threads=200 -col 'n=fixed(1)' -schema 'replication(factor=3)' -node 
file=~/hosts

I ran two client instances on two nodes (one per node) also on c3.8xlarge in 
the same placement group.

Unmodified trunk
{noformat}
op rate   : 87497
partition rate: 87497
row rate  : 87497
latency mean  : 2.3
latency median: 2.1
latency 95th percentile   : 3.2
latency 99th percentile   : 3.7
latency 99.9th percentile : 4.5
latency max   : 124.0
total gc count: 28
total gc mb   : 44299
total gc time (s) : 1
avg gc time(ms)   : 21
stdev gc time(ms) : 17
Total operation time  : 00:01:54
END

op rate   : 87598
partition rate: 87598
row rate  : 87598
latency mean  : 2.3
latency median: 2.1
latency 95th percentile   : 3.2
latency 99th percentile   : 3.8
latency 99.9th percentile : 4.4
latency max   : 124.8
total gc count: 133
total gc mb   : 211358
total gc time (s) : 3
avg gc time(ms)   : 20
stdev gc time(ms) : 17
Total operation time  : 00:01:54
END
{noformat}

Modified
{noformat}
Results:
op rate   : 87476
partition rate: 87476
row rate  : 87476
latency mean  : 2.3
latency median: 2.1
latency 95th percentile   : 3.2
latency 99th percentile   : 3.7
latency 99.9th percentile : 4.0
latency max   : 130.2
total gc count: 102
total gc mb   : 165487
total gc time (s) : 3
avg gc time(ms)   : 25
stdev gc time(ms) : 21
Total operation time  : 00:01:54
END

Results:
op rate   : 87347
partition rate: 87347
row rate  : 87347
latency mean  : 2.3
latency median: 2.1
latency 95th percentile   : 3.1
latency 99th percentile   : 3.6
latency 99.9th percentile : 3.9
latency max   : 129.2
total gc count: 59
total gc mb   : 93416
total gc time (s) : 1
avg gc time(ms)   : 23
stdev gc time(ms) : 16
Total operation time  : 00:01:54
END
{noformat}

[~benedict] Can you look at the code and the stress params and validate that 
you think I am measuring what I think I am measuring?

I am going to profile the client and server tomorrow to get my bearings on what 
executing this workload actually looks like.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
 

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249139#comment-14249139
 ] 

Jonathan Ellis commented on CASSANDRA-8457:
---

TLDR throughput is about the same, but latency variance improves?

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249237#comment-14249237
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

For me the TLDR is that they look the same. I would have to run a bunch more 
times restart the DB each time and then throw away the best and the worst and 
then average to see if there is a difference and if there is it is small. I 
want to understand why the change had 0 impact as opposed to an insignificant 
one.

The number is the run, so matching numbers ran at the same time.
Trunk
||Client|Throughput|99.9%tile||
|A #1| 87497|4.5|
|B #1| 87598|4.4|
|A#2| 89089|3.7|
|B#2| 88461|4.1|
|A#3| 89002|4.4|
|B#3| 88529|4.4|
|restart|||
|A#4| 87140|3.8|
|B#4| 87434|4.2|

With changes
||Client|Throughput|99.9%tile||
|A#1| 87476|4.0|
|B#1| 87347|3.9|
|restart|||
|A#2| 87688|4.0|
|B#2| 87849|4.4|
|A#3| 87905|4.0|
|B#3| 88834|3.9|
|A#4| 88132|4.3|
|B#4| 88187|3.8|


 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-16 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249242#comment-14249242
 ] 

Aleksey Yeschenko commented on CASSANDRA-8457:
--

Were we expecting any difference at all with just 3 nodes?

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243924#comment-14243924
 ] 

Benedict commented on CASSANDRA-8457:
-

bq. cstar doesn't support multiple stress clients

Stress could be modified to support simulating true multiple client access; 
i've filed CASSANDRA-8466.

What we really need is to be able to fire up a (much) larger cluster, though. 
With our current hardware probably necessitating multiple VMs per node - say, 
4, giving a viable cluster of 24 which is probably about bare minimum for these 
kinds of tests. This necessarily pollutes the results somewhat since each VM 
will have only half a CPU, and incur extra thread signalling penalties, but 
it's better than nothing. Either that or we get a bunch of cheapo nodes, or we 
add EC2 integration.

[~enigmacurry] any plans in the works to introduce support for large clusters?

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-12 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244239#comment-14244239
 ] 

Ryan McGuire commented on CASSANDRA-8457:
-

I've just tried to bring up a 25 node EC2 cluster on cstar_perf, and it did so 
without hiccup. We have an automatic tool for installing cstar_perf.tool on 
EC2, it's still lacking integration with the frontend though (meaning you can 
run tests against it, with the frontend, if we attatch the cluster. You just 
can't create the cluster from the frontend, yet.)



 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-12 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244258#comment-14244258
 ] 

Ryan McGuire commented on CASSANDRA-8457:
-

@Benedict modifying cstar_perf to run multiple instances per node is a larger 
task, and I'm wondering how useful that will be (seems like a lot of resource 
contention / non-real-world variables.) Assuming we had the alternative of 100% 
automated EC2 cluster bootstrapping/teardown, how often would we want to run 
these larger tests for it to be worth it? 

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244297#comment-14244297
 ] 

Benedict commented on CASSANDRA-8457:
-

[~enigmacurry] if we could fire up EC2 instances from cstar, that would seem to 
me to be plenty sufficient for adhoc testing. We can figure something out 
internally for a more regular CI approach. 

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242859#comment-14242859
 ] 

Benedict commented on CASSANDRA-8457:
-

FTR, I strongly doubt _context switching_ is actually as much of a problem as 
we think, although constraining it is never a bad thing. The big hit we have is 
_thread signalling_ costs, which is a different but related beast. Certainly 
the talking point that raised this was discussing system time spent serving 
context switches which would definitely be referring to signalling, not the 
switching itself.

Now, we do use a BlockingQueue for OutboundTcpConnection which will incur these 
costs, however I strongly suspect the impact will be much lower than predicted 
- especially as the testing done to flag this up was on small clusters with 
RF=1, where these threads would not be being exercised at all. The costs of 
going to the network itself are likely to exceed the context switching costs, 
and naturally permit messages to accumulate in the queue, reducing the number 
of signals actually needed. 

There's then the negative performance implications we have found from small 
numbers of connections under NIO to consider, so that this change could have 
significant downsides for the majority of deployed clusters (although if we get 
batching in the client driver we may see these penalties disappear).

To establish if there's likely a benefit to exploit, we could most likely 
refactor this code comparatively minimally (than rewriting to NIO/Netty) to 
make use of the SharedExecutorPool to establish if such a positive effect is 
indeed to be had, as this would reduce the number of threads in flight to those 
actually serving work on the OTCs. This wouldn't affect the ITC, but I am 
dubious of their contribution. We should probably also actually test if this is 
indeed a problem from clusters at scale performing in-memory CL1 reads.


 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243008#comment-14243008
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

 bq. To establish if there's likely a benefit to exploit, we could most likely 
refactor this code comparatively minimally (than rewriting to NIO/Netty) to 
make use of the SharedExecutorPool to establish if such a positive effect is 
indeed to be had, as this would reduce the number of threads in flight to those 
actually serving work on the OTCs. This wouldn't affect the ITC, but I am 
dubious of their contribution. We should probably also actually test if this is 
indeed a problem from clusters at scale performing in-memory CL1 reads.
I wonder what there is to be gained by having a single socket for 
inbound/outbound?

Running a representative test will take some doing. cstar doesn't support 
multiple stress clients and it seems like the clusters only have 3 nodes? This 
is another argument for getting some decent size performance runs in CI working 
rather then doing one-off manual tests. Having profiling artifacts collected as 
part of this would also make doing performance research and validation easier. 
I feel pretty under informed when we discuss what to do next due to the lack of 
profiling information and the lack of canonical/repeatable performance data and 
workloads.

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-11 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243095#comment-14243095
 ] 

T Jake Luciani commented on CASSANDRA-8457:
---

bq. Running a representative test will take some doing. cstar doesn't support 
multiple stress clients and it seems like the clusters only have 3 nodes? 

But if you run with RF  1 you can stress the internal network which is what we 
are changing in this ticket

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2014-12-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243188#comment-14243188
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
---

Thank's Jake that is a good point. 3 nodes is still a problem as that allows 
these threads to get a lot hotter then they normally would in then in larger 
cluster.

I will try Benedict's suggestion since that would be easy to put in.


 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)