[ https://issues.apache.org/jira/browse/MAPREDUCE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated MAPREDUCE-865: ----------------------------------- Attachment: mapreduce-865-0.patch Primitive patch for discussion. bq. So instead of open->read->close _index for each part file, thinking of keeping the index file open when possible. Instead of keeping an open handle, this one simply reads 'Stores' (range of caches) and keep last 5 of them (configurable) in memory. If the files are typical mapreduce outputs with many part-* files, number of open calls to _index will be significantly reduced. > harchive: Reduce the number of open calls to _index and _masterindex > ---------------------------------------------------------------------- > > Key: MAPREDUCE-865 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-865 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: harchive > Reporter: Koji Noguchi > Priority: Minor > Attachments: mapreduce-865-0.patch > > > When I have har file with 1000 files in it, > % hadoop dfs -lsr har:///user/knoguchi/myhar.har/ > would open/read/close the _index/_masterindex files 1000 times. > This makes the client slow and add some load to the namenode as well. > Any ways to reduce this number? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.