Hongbing Wang created HDFS-15987:
------------------------------------
Summary: Improve oiv tool to parse fsimage file in parallel with
delimited format
Key: HDFS-15987
URL: https://issues.apache.org/jira/browse/HDFS-15987
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Hongbing Wang
The purpose of this Jira is to improve oiv tool to parse fsimage file with
sub-sections (see -HDFS-14617-) in parallel with delmited format.
1.Serial parsing is time-consuming
The time to serially parse a large fsimage with delimited format (e.g. `hdfs
oiv -p Delimited -t <tmp> ...`) is as follows:
{code:java}
1) Loading string table: -> Not time consuming.
2) Loading inode references: -> Not time consuming
3) Loading directories in INode section: -> Slightly time consuming (3%)
4) Loading INode directory section: -> A bit time consuming (11%)
5) Output: -> Very time consuming (86%){code}
Therefore, output is the most parallelized stage.
2.How to output in parallel
The sub-sections are grouped in order, and each thread processes a group and
outputs it to the file corresponding to each thread, and finally merges the
output files.
3. The result of a test
{code:java}
input fsimage file info:
3.4G, 12 sub-sections, 55976500 INodes
-----------------------------------------
Threads TotalTime OutputTime MergeTime
1 18m37s 16m18s –
4 8m7s 4m49s 41s{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]