[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523138#comment-15523138
 ] 

Sylvain Lebresne commented on CASSANDRA-8457:
---------------------------------------------

Here comes more review bits, though it's still incomplete as I have to context 
switch a bit. I'll start by a number of general remarks, followed by some more 
specific comments on some classes, and a few random nits to finish.

The first thing that I think we should discuss is how we want to handle the 
transition to this (that is, what happens to the old implementation). Namely, I 
see 2 possible options:
# we simply switch to Netty, period, and the old implementation goes away;
# we keep both implementations in, with a configuration flag to pick which one 
is used as implemented in the patch, and this probably until 5.0.
I think the main advantage of keeping both implementation for a while is that 
it's somewhat safer and easier for user to evaluate outside of the 4.0 upgrade 
itself. But the downsides are obviously that it's a lot more code to maintain, 
it's more messy right off the bat, and given only the new implementation will 
stay eventually, it's only pushing the switch to later. Personally, since we'll 
ship this in a new major (4.0) and since the cost of maintaining 2 
implementations is pretty unpleasant and we don't have infinite devs resources, 
I'd have a slightly preference for option 1, but I'm not going to argue 
strongly either way. Mostly, I want us to decide, because I think this 
influence the final product here.

The second thing I want to mention is streaming. I could be missing something 
here but it appears streaming is just not supported at all by the new code, 
meaning that the patch as is is unusable outside of toy testing (I mean by that 
that if you start a node with the Netty MessagingService, it seems you can't 
received any streaming, so you're not a fully functioning node). So I'm kind of 
wondering what is the plan here. I'm happy to thinker on another ticket as 
Streaming is a best on its own, but I'm not sure I'm ok committing anything 
until it's a fully working product.

So anyway, with those talking point out of the way, a few more "technical" 
general remarks on the patch:
* Netty's openssl: what's the fundamental difference between the new 
{{keyfile}} and the old {{keystore}}? It sounds confusing to have to configure 
new things when there is already pre-existing parameters for SSL.
* Some "old" code is made public (many constants in particular) and reused by 
the "new" code, while other bits are copy-pasted, but it's not very consistent 
and makes things sometimes a bit hard to follow ({{QueuedMessage}} (used by new 
code too) being an internal class of {{OutboundTcpConnection}} (old code) is a 
particularly bad offender imo). That's typically where I want us to make a 
decision on the point I raise above, as whatever decision we take I feel there 
is some cleanups to be done.
* It would be really nice to have a package javadoc for the new "async" package 
that explains the main classes involved and how they interact.
* There is a few TODO that needs to be handled.
* I feel some of the naming could be made more consistent. In particular, the 
patch uses Client/Server for some classes (e.g. 
{{ClientHandshakeHandler}}/{{ClientConnector}}) but In/Out for others (e.g. 
{{MessageInHandler}}). And things like {{InternodeMessagingPool}} is imo less 
good than the old {{OutboundTcpConnectionPool}} in that it doesn't indicates 
it's only for outbound connections. I'd prefer avoid Client/Server for 
internode stuffs and reserve it for native protocol classes, sticking to In/Out 
(or Inbound/Outbound when that reads better) for internode classes, like we did 
before. I understand this may have been to avoid name clash with existing 
classes, but maybe that's where we should also decide how long the old 
implementation to stay in.

Then a few more specific comments:
* In {{MessagingService}}:
** We should probably move everything related to {{ServerSocket}} inside the 
new {{BlockingIOSender}} (though again, if we decide to just ditch the old 
implementation, it simply goes away).
** Related to the previous point, the {{getSocketThreads()}} is used by 
{{StreamingTransferTest}} to apparently wait for the streaming connections to 
close. Returning an empty list seems to kind of break that. I think we could 
replace the method by some {{awaitAllStreamingConnectionsClosed()}} (that would 
still just be for testing). Of course, that's kind of streaming related.
** I would probably rename {{switchIpAddress()}} to {{reconnectWithNewIp()}} as 
it makes it more explicit what does happen (or at least have some javadoc). 
Similarly, {{getCurrentEndpoint()}} is imo no very clearly named (not this 
patch fault though), maybe {{getPreferredAddress()}}?
** In {{createConnection()}}, it appears the old implementation was starting 
the connections and waiting on them to be properly started before returning, 
but the new implementation doesn't. Not sure how important that is in practice, 
but it feels unintuitive that a {{createConnection()}} method wouldn't actually 
do anything.
** Seems {{NettySender.listen()}} always starts a non-secure connection, even 
if we have {{ServerEncryptionOptions.InternodeEncryption.all}}, which sounds 
wrong.
* In {{ClientConnect}}, in {{connectComplete}}, checking that the future is 
done is somewhat non-sensical as it shouldn't happen by definition of the 
method. Maybe an {{assert}} instead? Also, the method should be 
{{@VisibleForTesting}}.
* In {{ClientHandshakeHandler}}:
** There seem to be a typo in javadoc of {{remoteAddr}} (at least I'm having a 
hard time parsing it).
** Could abstract the message size (in the call to {{ioBuffer}}) into a 
constant, if only for symmetry with {{SECOND_MESSAGE_LENGTH}}.
** {{handshakeTimeout}} feels misnamed (with the current name, the call in 
{{exceptionCaught}} is confusing: "why are we timeouting on an exception?!"). 
It's really failing the handshake, so I'd call it {{failHandshake}} or 
{{abortHanshake}}. Somewhat related, the field {{handshakeResponse}} is really 
a future on the timeout, so I'd call it {{timeoutFuture}}.
** In {{decode}}, there is 2 different cases of version mismatch: one where we 
use to do a hard disconnect, and one where we did a "soft" one. The new 
implementation still pretend to do something different in the message it logs, 
but doesn't appear to make a difference in practice, which is confusing.
** It's a tad unexpected to have {{ClientHandshakeResult}} as a subclass of 
{{InternodeMessagingConnection}} rather than of this class 
({{ClientHandshakeHandler}}), and the fact some final parts of the handshake 
are actually happening in {{InternodeMessagingConnection}} make things a tad 
harder to follow.
** If the handshake timeout, it doesn't seem we log anything special. Might be 
worth at least having some debug message?
* In {{CoalescingMessageOutHandler}}:
** It would feel simpler (and possibly more efficient) to not add the handler 
to the pipeline at all if the strategy isn't doing coalescing (and assert it 
does in the handler).
** In {{write}}, we can avid adding the listener if {{coalesceCallback == 
null}} rather than checking inside the listener.

And that's mostly it for now, but more comments on other classes should come at 
a later time. Just a few small random
nits for the road:
* In {{IncomingTcpConnection}}, a new {{from}} parameter is added to 
{{receiveMessage}} but it seems unused.
* You editor seems to expand {{.*}} imports, which adds unecessary noise to the 
patch/we generally don't do.
* {{MessageOutHandler}} creates it's {{Logger}} using 
{{CoalescingMessageOutHandler}} rather than himself (copy-paste typo I assume).


> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to