[ https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605 ]
liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:14 AM: ------------------------------------------------------------ 1、HDFS-16147.002.patch fix the error of test unit at org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled 2、Upon careful examination, oiv can indeed work normally, and I can't explain why it works. You can simply verify as follows: In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, add and remove such code , make a contrast:{color} {color:#172b4d}note:{color} first get my patch ! {code:java} // turn on both parallelization and compression conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true); conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY, "org.apache.hadoop.io.compress.GzipCodec"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2"); {code} run test unit {color:#ffc66d}testPBDelimitedWriter , {color:#172b4d}y{color}{color}{color:#172b4d}o{color}u can get the answer. 3、If I create a parallel compressed image with this patch, and then try to load it in a NN without this patch and parallel loading disabled, the NN still able to load it. You can simply verify as follows: in TestFSImageWithSnapshot {code:java} public void setUp() throws Exception { conf = new Configuration(); //*************add************** conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true); conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY, "org.apache.hadoop.io.compress.GzipCodec"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3"); //**************add************* cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION) .build(); cluster.waitActive(); fsn = cluster.getNamesystem(); hdfs = cluster.getFileSystem(); } {code} In class TestOfflineImageViewer , method createOriginalFSImage, change as follow: {code:java} class FSImageFormatProtobuf, method loadInternal case INODE: { currentStep = new Step(StepType.INODES); prog.beginStep(Phase.LOADING_FSIMAGE, currentStep); stageSubSections = getSubSectionsOfName( subSections, SectionName.INODE_SUB); // if (loadInParallel && (stageSubSections.size() > 0)) { // inodeLoader.loadINodeSectionInParallel(executorService, // stageSubSections, summary.getCodec(), prog, currentStep); // } else { // inodeLoader.loadINodeSection(in, prog, currentStep); // } inodeLoader.loadINodeSection(in, prog, currentStep); } {code} was (Author: mofei): 1、HDFS-16147.002.patch fix the error of test unit at org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled 2、Upon careful examination, oiv can indeed work normally, and I can't explain why it works. You can simply verify as follows: In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, add and remove such code , make a contrast:{color} {color:#172b4d}{color:#de350b}note:{color} first get my patch !{color} {code:java} // turn on both parallelization and compression conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true); conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY, "org.apache.hadoop.io.compress.GzipCodec"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2"); {code} You can get answer. 3、If I create a parallel compressed image with this patch, and then try to load it in a NN without this patch and parallel loading disabled, the NN still able to load it. You can simply verify as follows: in TestFSImageWithSnapshot {code:java} public void setUp() throws Exception { conf = new Configuration(); //*************add************** conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true); conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY, "org.apache.hadoop.io.compress.GzipCodec"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3"); conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3"); //**************add************* cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION) .build(); cluster.waitActive(); fsn = cluster.getNamesystem(); hdfs = cluster.getFileSystem(); } {code} In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, change as follow:{color} {code:java} class FSImageFormatProtobuf, method loadInternal case INODE: { currentStep = new Step(StepType.INODES); prog.beginStep(Phase.LOADING_FSIMAGE, currentStep); stageSubSections = getSubSectionsOfName( subSections, SectionName.INODE_SUB); // if (loadInParallel && (stageSubSections.size() > 0)) { // inodeLoader.loadINodeSectionInParallel(executorService, // stageSubSections, summary.getCodec(), prog, currentStep); // } else { // inodeLoader.loadINodeSection(in, prog, currentStep); // } inodeLoader.loadINodeSection(in, prog, currentStep); } {code} > load fsimage with parallelization and compression > ------------------------------------------------- > > Key: HDFS-16147 > URL: https://issues.apache.org/jira/browse/HDFS-16147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode > Affects Versions: 3.3.0 > Reporter: liuyongpan > Priority: Minor > Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, > subsection.svg > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org