[ https://issues.apache.org/jira/browse/HADOOP-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745122#action_12745122 ]
Mahadev konar commented on HADOOP-6097: --------------------------------------- ben, koji is right. The caching is just a filesystem caching. The filesystem cache has a cache for each scheme cached. So for har filesystem its caching the scheme and harpath to create a cache for a filesystem meaning that a har filesystem is uniquely identified by a har:///archivepath. the connection caching has nothing to do with this filesystem cache. The connection caching is done via the RPC layer and you cwould not be able to cache connections at the har filesystem layer. > Multiple bugs w/ Hadoop archives > -------------------------------- > > Key: HADOOP-6097 > URL: https://issues.apache.org/jira/browse/HADOOP-6097 > Project: Hadoop Common > Issue Type: Bug > Components: fs > Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0 > Reporter: Ben Slusky > Fix For: 0.20.1 > > Attachments: HADOOP-6097.patch > > > Found and fixed several bugs involving Hadoop archives: > - In makeQualified(), the sloppy conversion from Path to URI and back mangles > the path if it contains an escape-worthy character. > - It's possible that fileStatusInIndex() may have to read more than one > segment of the index. The LineReader and count of bytes read need to be reset > for each block. > - har:// connections cannot be indexed by (scheme, authority, username) -- > the path is significant as well. Caching them in this way limits a hadoop > client to opening one archive per filesystem. It seems to be safe not to > cache them, since they wrap another connection that does the actual > networking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.