[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-04 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393310#comment-17393310
 ] 

Stephen O'Donnell edited comment on HDFS-16147 at 8/4/21, 4:44 PM:
---

When the image is saved, there is a single stream written out serially. To 
enable parallel load on the image, index entries are added for the sub-sections 
as they are written out.

This means we have a single stream, with the position of the sub-sections saved.

That means, when we load the image, there are two choices:

1. We start at the beginning of a section and open a stream and read the entire 
section.

2. We open several streams, reading each sub-section in parallel by jumping to 
the indexed sub-section position and read the given length.

When you enable compression too, this means the entire stream is compressed 
from end to end as a single compressed stream. I wrongly thought there would be 
many compressed streams within the image file, and that is why I though OIV etc 
would have trouble reading this.

So it makes sense OIV can read the image serially, and the namenode can also 
read the image with parallel disabled when compression is on. The surprise to 
me, is that we can load the image in parallel, as that involves jumping into 
the compressed stream somewhere in the middle and starting to read, which most 
compression codecs do not support. It was my belief that gzip does not support 
this.

However, looking at the existing code, before this change, I see that we jump 
around in the stream already:

{code}
for (FileSummary.Section s : sections) {
  channel.position(s.getOffset());
  InputStream in = new BufferedInputStream(new LimitInputStream(fin,
  s.getLength()));

  in = FSImageUtil.wrapInputStreamForCompression(conf, summary.getCodec(), in);
{code}

So that must mean the compression codecs are splitable somehow, and they can 
start decompressing from an random position in the stream. Due to this, if the 
image is compressed, the existing parallel code can be mostly reused to load 
the sub-sections within the compressed stream.

>From the above, could we allow parallel loading of compressed images by simply 
>removing the code which disallows it?

{code}
-if (loadInParallel) {
-  if (compressionEnabled) {
-LOG.warn("Parallel Image loading and saving is not supported when {}" +
-" is set to true. Parallel will be disabled.",
-DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY);
-loadInParallel = false;
-  }
-}
{code}

Then let the image save compressed with the sub-sections indexed and try to 
load it?


was (Author: sodonnell):
When the image is saved, there is a single stream written out serially. To 
enable parallel load on the image, index entries are added for the sub-sections 
as they are written out.

This means we have a single stream, with the position of the sub-sections saved.

That means, when we load the image, there are two choices:

1. We start at the beginning of a section and open a stream and read the entire 
section.

2. We open several streams at by jumping to that position and read the given 
length.

When you enabled compression too, this means the entire stream is compressed 
from end to end as a single compressed stream. I wrongly thought there would be 
many compressed streams within the image file, and that is why I though OIV etc 
would have trouble reading this.

So it makes sense OIV can read the image serially, and the namenode can also 
read the image with parallel disabled when compression is on. The surprise to 
me, is that we can load the image in parallel, as that involves jumping into 
the compressed stream somewhere in the middle and starting to read, which more 
compression codecs do not support. It was my belief that gzip does not support 
this.


However, looking at the existing code, before this change, I see that we jump 
around in the stream already:

{code}
for (FileSummary.Section s : sections) {
  channel.position(s.getOffset());
  InputStream in = new BufferedInputStream(new LimitInputStream(fin,
  s.getLength()));

  in = FSImageUtil.wrapInputStreamForCompression(conf, summary.getCodec(), in);
{code}

So that must mean the compression codecs are splittable somehow, and they are 
start decompressing from an random position in the stream. Due to this, if the 
image is compressed, the existing parallel code can be mostly reused to load 
the sub-sections within the compressed stream.

>From the above, could we allow parallel loading of compressed images by simply 
>removing the code:

{code}
-if (loadInParallel) {
-  if (compressionEnabled) {
-LOG.warn("Parallel Image loading and saving is not supported when {}" +
-" is set to true. Parallel will be disabled.",
-DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY);
-loadInParallel = false;
-   

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-03 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 11:26 AM:
-

[~sodonnell], I have tested your question carefully, and here is my answer.
 1、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code to make a contrast:{color}

{color:#de350b}{color}first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , 
{color:#172b4d}y{color}{color}{color:#172b4d}ou c{color}an get the answer.
 2、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}*{color} first you must merge the patch HDFS-14617

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code, to make you 
save fsImage with parallel and compressed:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow, to make it run single thread.
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , you can get the 
answer.{color}

{color:#172b4d}3、50% improvement measured against a compressed single threaded 
load verses parallel compressed loading. {color}

{color:#172b4d}An FsImage is generated, before compression it is  27.18M{color} 
, after compression it is {color:#172b4d}128M{color}. A  simple comparisons 
were made in table. 
||state||ave loading time||
|compress and parallel|7.5sec|
|compress and unparallel|9.5sec|
|uncompress and parallel|6.5sec|
 {color:#313131}In fact loading fsimage with uncompress and parallel will 
be faster than compress and parallel. {color}{color:#313131}As disscussed in 
HDFS-1435 , compressed fsimage is necessary.{color}
  
  3、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
  


was (Author: mofei):
[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-03 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 11:21 AM:
-

[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}note:{color} first you must merge the patch HDFS-14617

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , you can get the 
answer.{color}

{color:#172b4d}3、50% improvement measured against a compressed single threaded 
load verses parallel compressed loading. {color}

{color:#172b4d}An FsImage is generated, before compression it is  27.18M{color} 
, after compression it is {color:#172b4d}128M{color}. A  simple comparisons 
were made in table.

 
||state||ave loading time||
|compress and parallel|7.5sec|
|compress and unparallel|9.5sec|
|uncompress and parallel|6.5sec|

 
{color:#313131}In fact loading fsimage with uncompress and parallel will be 
faster than compress and parallel. {color}
{color:#313131}As disscussed in HDFS-1435 , compressed fsimage is 
necessary.{color}
 
 
 


was (Author: mofei):
[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfi

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-03 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 11:13 AM:
-

[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}note:{color} first you must merge the patch HDFS-14617

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , you can get the 
answer.{color}

{color:#172b4d}3、50% improvement measured against a compressed single threaded 
load verses parallel compressed loading. {color}

{color:#172b4d}An FsImage is generated, before compression it is 
{color:#172b4d} 27.18M{color} , after compression it is 
{color:#172b4d}128M{color}. A  simple comparisons were made in table.
{color}

 
||state||ave loading time||
|{color:#33} compress and parallel{color}| 7.5 sec|
|{color:#33} compress and unparallel{color}| 9.5sec|
|{color:#33} uncompress + parallel{color}| 6.5sec|
 * {color:#313131}In fact*,In fact, loading fsimage with uncompress and 
parallel will be faster.*{color}
 * **

 


  

 


was (Author: mofei):
[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSCo

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-03 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 11:00 AM:
-

[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}note:{color} first you must merge the patch HDFS-14617

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , you can get the 
answer.{color}

{color:#172b4d}3、50% improvement measured against a compressed single threaded 
load verses parallel compressed loading. {color}
 


was (Author: mofei):
[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}note:{color} first you must merge the patch HDFS-1461

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-03 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 10:56 AM:
-

[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}note:{color} first you must merge the patch HDFS-14617

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , you can get the 
answer.{color}

{color:#ffc66d}{color:#172b4d}3、{color}
{color}


was (Author: mofei):
[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}note:{color} first you must merge the patch HDFS-14617

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : 

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:21 AM:


[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

{color:#de350b}note:{color} first you must merge the patch HDFS-14617

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , you can get the 
answer.{color}


was (Author: mofei):
[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:20 AM:


[~sodonnell], I have tested your question carefully, and here is my answer.

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , you can get the 
answer.{color}


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:18 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , {color:#172b4d}you can 
get the answer.{color}{color}


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:18 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage {color:#172b4d}, you can 
get the answer.{color}{color}


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:17 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage {color:#172b4d}, you can 
get the answer.{color}
{color}


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , 
{color:#172b4d}y{color}{color}ou can get the answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFS

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:16 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , 
{color:#172b4d}y{color}{color}ou can get the answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:

 
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKe

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:15 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:

 
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#172b4d}note:{color} first get my patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , 
{color:#172b4d}y{color}{color}{color:#172b4d}o{color}u can get the answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.
 
 You can simply verify as follows:
 
 in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConf

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:14 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#172b4d}note:{color} first get my patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , 
{color:#172b4d}y{color}{color}{color:#172b4d}o{color}u can get the answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.
 
 You can simply verify as follows:
 
 in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}

 In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:

 
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#172b4d}{color:#de350b}note:{color} first get my patch !{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:13 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#172b4d}{color:#de350b}note:{color} first get my patch !{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
change as follow:{color}
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:12 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
change as follow:{color}
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code:{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot 
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:11 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code:{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

in  TestFSImageWithSnapshot 
{code:java}
  public void setUp() throws Exception {
conf = new Configuration();
//*add**
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
"org.apache.hadoop.io.compress.GzipCodec");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**add*
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
.build();
cluster.waitActive();
fsn = cluster.getNamesystem();
hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
change as follow:{color}
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code:{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
change as follow:{color}
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 

> load fsi

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:07 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code:{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
change as follow:{color}
{code:java}
class FSImageFormatProtobuf, method loadInternal 
case INODE: {
  currentStep = new Step(StepType.INODES);
  prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
  stageSubSections = getSubSectionsOfName(
  subSections, SectionName.INODE_SUB);
//  if (loadInParallel && (stageSubSections.size() > 0)) {
//inodeLoader.loadINodeSectionInParallel(executorService,
//stageSubSections, summary.getCodec(), prog, currentStep);
//  } else {
//inodeLoader.loadINodeSection(in, prog, currentStep);
//  }
   inodeLoader.loadINodeSection(in, prog, currentStep);
}
{code}
 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code:
{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

It may be hard to understand, but it's true, and you can check it out.

 

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:02 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows:

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code:
{color}
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
You can get answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

It may be hard to understand, but it's true, and you can check it out.

 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

It may be hard to understand, but it's true, and you can check it out.

 

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389905#comment-17389905
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 2:21 AM:


sorry , OIV can't work

supplement:OIV can work!Because my JVM memory is so small that I mistakenly 
thought it wouldn't work.


was (Author: mofei):
sorry , OIV can't work

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-02 Thread liuyongpan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ] 

liuyongpan edited comment on HDFS-16147 at 8/3/21, 2:19 AM:


1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

It may be hard to understand, but it's true, and you can check it out.

 


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
2、Upon careful examination, Oiv can indeed work normally, and I can't explain 
why it works.
3、If I create a parallel compressed image with this patch, and then try to load 
it in a NN without this patch and parallel loading disabled, the NN still able 
to load it.

It may be hard to understand, but it's true, and you can check it out.

 

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org