[ 
https://issues.apache.org/jira/browse/MAPREDUCE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated MAPREDUCE-865:
-----------------------------------

    Attachment: mapreduce-865-0.patch

Primitive patch for discussion.

bq. So instead of open->read->close _index for each part file, thinking of 
keeping the index file open when possible.

Instead of keeping an open handle, this one simply reads 'Stores' (range of 
caches) and keep last 5 of them (configurable) in memory.
If the files are typical mapreduce outputs with many part-* files, number of 
open calls to _index  will be significantly reduced.




> harchive: Reduce the number of open calls  to _index and _masterindex 
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-865
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-865
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: harchive
>            Reporter: Koji Noguchi
>            Priority: Minor
>         Attachments: mapreduce-865-0.patch
>
>
> When I have har file with 1000 files in it, 
>    % hadoop dfs -lsr har:///user/knoguchi/myhar.har/
> would open/read/close the _index/_masterindex files 1000 times.
> This makes the client slow and add some load to the namenode as well.
> Any ways to reduce this number?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to