[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866485#action_12866485 ]
bc Wong commented on HDFS-941: ------------------------------ Thanks for looking at the patch, Sam. bq. avoid making the cache implementation tied to the class ReaderSocketCache. Do you mean separating out a more generic cache class? I've thought about that quite a bit. But in the end, the socket cache I need is too tied to the {{BlockReader}} (i.e. the concept of "owner"), because the {{BlockReader}} doesn't multiplex on the same socket. Detaching the {{BlockReader}} from the {{CacheEntry}} could mean having an extra interface, which seems unnecessary until another usage scenario arrives. We can factor it out later when needed, since the caching implementation shouldn't affect the API/protocol. bq. Don't make the cache a static member of the same class. Let the cache be an instantiable object. Let DFSClient store the cache either as an instance or static var Sounds good. I'll storing it in the DFSClient. bq. not clear why sockets are always in the cache even if not usable Only usable sockets are put into the cache. However, sockets can become ususable while they sit in the cache. The connection could drop. The DN side closes it after a short wait. So when a client wants to retrieve a socket, it might be bad. Here, we could: # Handle that from within the cache and do our best to return a good socket to the client, or # Let the client check the socket and retry. I went for (1) because it's less waste. Sorry about the misunderstanding. I'll put in more comments. bq. shouldn't there be a cap on the # of sockets there can be in the cache? There is. The code checks against {{cacheLimit}} when adding sockets to the free map. bq. the freemap has to be traversed every time to get a socket in a sync block. True. I think it only sounds bad (but actually isn't). The reader has to read a lot of blocks simultaneously to see contention. First, the {{cacheLimit}} should likely be small (default to 16) for performance reason, to avoid hitting too many unusable sockets. Second, it's only traversing through *a portion* of the {{freeMap}}, the list of sockets to the same target IP. Plus, I kind of want it to check for unusable sockets before returning to the caller. I feel that a finer locking granularity here could actually make it slower. bq. do we have real performance benchmarks from actual clusters that show a significant benefit? I'll ask Todd to provide more details. He was testing a backported version of the patch with HBase. *Preliminary* results showed a 20% gain. > Datanode xceiver protocol should allow reuse of a connection > ------------------------------------------------------------ > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.