[ https://issues.apache.org/jira/browse/HDFS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549959#comment-13549959 ]
Colin Patrick McCabe commented on HDFS-4353: -------------------------------------------- This patch refactors some code in the {{DFSClient}} and the DataNode's {{DataXceiver}}. The refactor encapsulates connections to peers into a single class named {{Peer}}. Suresh, please excuse me if I'm covering things you already know, but I want to give some context to random people reading this JIRA. Java has no standard mechanism for setting write timeouts on blocking sockets. So we usually wrap our sockets in {{org.apache.hadoop.net.SocketOutputStream}}. This class sets the {{Socket}} to nonblocking and simulates blocking I/O with a timeout. (There is also a parallel {{org.apache.hadoop.net.SocketInputStream}}.) However, we can't * always* do this, since some Sockets cannot be used in non-blocking mode-- for example, the SOCKS sockets classes don't support this. The other thing that we do a lot of is wrapping output and input streams in encrypted streams. The end result of this is that we end up passing around a lot of objects just to represent a single connection to a Peer. {{IOStreamPair}} is a good example of this. We also end up using {{instanceof}} a lot because we're dealing with types that don't have a common ancestor. This refactor encapsulates all of thos objects in a single object, the {{Peer}}. This avoids the need to use {{instanceof}} to set socket timeouts and other properties. The main reason for doing this refactor now is that {{DomainSocket}}, which is introduced by HDFS-4354, doesn't inherit from {{Socket}}. We made the decision not to inherit from {{Socket}} because inheriting would require us to rely on non-public JVM classes. There is more discsussion on HDFS-347 about this issue, if you're curious. Specific changes: {{PeerServer}}: a class that creates {{Peers}}. {{TcpPeerServer}} is basically a wrapper around {{ServerSocket}}. The next patch introduces another subclass, {{DomainPeerServer}}. {{BlockReader#close}}: now returns the Peer to the PeerCache directly. This replaces the multi-step process involving {{hasSentStatusCode}}, {{takeSocket}}, and {{getStreams}}. {{SocketCache}}: was renamed to {{PeerCache}}. Now caches based on {{DatanodeID}} rather than socket address. This is needed to prepare the way for putting DomainSockets into the cache. Aside from that it should be very similar. > Encapsulate connections to peers in Peer and PeerServer classes > --------------------------------------------------------------- > > Key: HDFS-4353 > URL: https://issues.apache.org/jira/browse/HDFS-4353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client > Affects Versions: 2.0.3-alpha > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: _02a.patch, 02b-cumulative.patch, 02c.patch, 02c.patch, > 02-cumulative.patch, 02d.patch, 02e.patch, 02f.patch > > > Encapsulate connections to peers into the {{Peer}} and {{PeerServer}} > classes. Since many Java classes may be involved with these connections, it > makes sense to create a container for them. For example, a connection to a > peer may have an input stream, output stream, readablebytechannel, encrypted > output stream, and encrypted input stream associated with it. > This makes us less dependent on the {{NetUtils}} methods which use > {{instanceof}} to manipulate socket and stream states based on the runtime > type. it also paves the way to introduce UNIX domain sockets which don't > inherit from {{java.net.Socket}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira