[ https://issues.apache.org/jira/browse/HDFS-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402581#comment-17402581 ]
Hongbing Wang commented on HDFS-15987: -------------------------------------- [~mofei] The PR works well in our cluster. I will give an online report in the next few days. Thank you for your attention. > Improve oiv tool to parse fsimage file in parallel with delimited format > ------------------------------------------------------------------------ > > Key: HDFS-15987 > URL: https://issues.apache.org/jira/browse/HDFS-15987 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Hongbing Wang > Assignee: Hongbing Wang > Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > The purpose of this Jira is to improve oiv tool to parse fsimage file with > sub-sections (see -HDFS-14617-) in parallel with delmited format. > 1.Serial parsing is time-consuming > The time to serially parse a large fsimage with delimited format (e.g. `hdfs > oiv -p Delimited -t <tmp> ...`) is as follows: > {code:java} > 1) Loading string table: -> Not time consuming. > 2) Loading inode references: -> Not time consuming > 3) Loading directories in INode section: -> Slightly time consuming (3%) > 4) Loading INode directory section: -> A bit time consuming (11%) > 5) Output: -> Very time consuming (86%){code} > Therefore, output is the most parallelized stage. > 2.How to output in parallel > The sub-sections are grouped in order, and each thread processes a group and > outputs it to the file corresponding to each thread, and finally merges the > output files. > 3. The result of a test > {code:java} > input fsimage file info: > 3.4G, 12 sub-sections, 55976500 INodes > ----------------------------------------- > Threads TotalTime OutputTime MergeTime > 1 18m37s 16m18s – > 4 8m7s 4m49s 41s{code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org