[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924286#action_12924286
 ] 

Hudson commented on HDFS-1435:
--

Integrated in Hadoop-Hdfs-trunk-Commit #415 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/415/])
HDFS-1435. Provide an option to store fsimage compressed. Contributed by 
Hairong Kuang.


 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch, 
 trunkImageCompress2.patch, trunkImageCompress3.patch, 
 trunkImageCompress4.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922987#action_12922987
 ] 

Aaron T. Myers commented on HDFS-1435:
--

Thanks for the clarification, Hairong.

Konstantin commented in the original OIV JIRA (HADOOP-5467) that it would be 
nice if we eliminated the code duplication stemming from effectively having two 
distinct FS image loaders. Had we done that, you wouldn't have needed to 
remember to make this change in another place. This work probably shouldn't be 
done as part of this JIRA, this problem that you hit just reminded me of that.

I've filed HDFS-1465 to address this problem.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch, 
 trunkImageCompress2.patch, trunkImageCompress3.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-20 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923213#action_12923213
 ] 

dhruba borthakur commented on HDFS-1435:


+1, code looks good to me. 

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch, 
 trunkImageCompress2.patch, trunkImageCompress3.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-19 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922695#action_12922695
 ] 

Aaron T. Myers commented on HDFS-1435:
--

Hi Hairong, what bug in the OIV did this fix? and did the bug predate this 
JIRA? or was it introduced by one of the earlier patches?

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch, 
 trunkImageCompress2.patch, trunkImageCompress3.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-19 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922702#action_12922702
 ] 

Hairong Kuang commented on HDFS-1435:
-

Sorry for the confusion. The bug is introduced by my patch 
trunkImageCompress2.patch, where I make numFiles, genStamp in the header not 
compressed, but I forgot to make the same change in OIV.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch, 
 trunkImageCompress2.patch, trunkImageCompress3.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-15 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921494#action_12921494
 ] 

Hairong Kuang commented on HDFS-1435:
-

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to i
 [exec] nclude 7 new or modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system tests framework.  The patch passed system tests 
framework compile.


 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch, trunkImageCompress2.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-14 Thread Lu Yilei (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920929#action_12920929
 ] 

Lu Yilei commented on HDFS-1435:


to Kuang,  If the fsimage is very big. The network is full in a short time when 
SeconaryNamenode do checkpoint, leading to Jobtracker access Namenode to get 
relevant file data to fail in job initialization phase. So we limit 
transmission speed and compress transmission to resolve the problem. 
We have complete the development and testing. But we not add the code that 
check NameNode fsimage and SecondaryNameNode when download CheckpointFiles. 
Because it may lead to other risks.
Please see the patch that limit transmission speed and compress transmission.
Next I will contribute other patch that check NameNode fsimage and 
SecondaryNameNode when download CheckpointFiles. Thanks.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: trunkImageCompress.patch, trunkImageCompress1.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-14 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921060#action_12921060
 ] 

Hairong Kuang commented on HDFS-1435:
-

I did experiments with a secondary namenode using our internal 0.20 branch. I 
used LzoCodec to compress the image. Here are the results:

||uncompressed||LZO compressed||
|image size|13G|2.9G| 
|loading image from disk|5 mins|8 mins| 
|save image to disk|2 mins|4.5 mins| 
|download image from primary NN|16.5 mins|6.5 mins| 
|upload image to primary NN|16.5 mins|6.5 mins|
|whole checkpoint|40 mins|25 mins|

The result shows that a compressed image greatly improves image downloading and 
uploading overhead although it adds 5.5 minutes overhead to loading/saving the 
image. Overall this gives us 15 minutes reduction for checkpointing a 13G 
image. 

As Lu pointed out, another very obvious optimization we could easily do is not 
to download the image from the primary NameNode if the secondary has the same 
one. This will in addition give us 6.5 minute reduction. 

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-14 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921068#action_12921068
 ] 

Hairong Kuang commented on HDFS-1435:
-

Oops! The header of the table above should shift right one column.

@Lu, I really like your idea and thanks a lot for your patch. I created 
HDFS-1457 to track this.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921170#action_12921170
 ] 

Todd Lipcon commented on HDFS-1435:
---

Hey Hairong. Another idea which you may want to experiment with at some point 
is to write a BufferedInputStream equivalent that does readahead or buffer 
filling in a second thread. That way the extra CPU caused by compression goes 
onto another core. Given that the actual application of the image data to the 
namespace is single-threaded due to the FSN lock, I bet compressed reading 
could actually get faster than uncompressed.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-14 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921178#action_12921178
 ] 

Hairong Kuang commented on HDFS-1435:
-

Todd, this is a wonderful idea! When I discussed with Dmytro, he also talked 
about this optimization. Could you please file a jira on this? Thanks a lot.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-14 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921193#action_12921193
 ] 

dhruba borthakur commented on HDFS-1435:


I looked at the code, looks good. Have two comments:

1. I would prefer to have the compression start after the header record. For 
example, the imgVersion, numFiles, genStamp, defaultReplication, etc not be 
compressed. This allows easier debugging, ability to dump the header of a file 
to find out its contents (via od -x), etc.

2. The new image version is 25. but the code refers to it as -19

{code}
  if (imgVersion = -19) {  // -19: 1st version providing compression option
isCompressed = in.readBoolean();
if (isCompressed) {
  String codecClassName = Text.readString(in);
{code}

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkImageCompress.patch, trunkImageCompress1.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-13 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920513#action_12920513
 ] 

Doug Cutting commented on HDFS-1435:


Hairong, I don' think that using Avro is critical here.  Avro's primarily 
intended for user data.  Using Avro here could simplify long-term maintenance 
but short-term might add a significant amount of work.  So I would not file 
another Jira unless you intend to implement it soon.  Thanks!

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: trunkImageCompress.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-13 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920770#action_12920770
 ] 

Hairong Kuang commented on HDFS-1435:
-

@Lu, compressing the fsimage has additional advantage of reducing disk I/O and 
as well as networkbandwidth when writing to a remote copy. I like your proposed 
optimizations like limiting transmission speed and not to download an fsimage 
if the one at primary NameNode is the same as the one at secondary NameNode. 
Could you please contribute those back to the community?

@Doug, thanks for your feedback. Hope that we will get some time to work on the 
avro format soon.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: trunkImageCompress.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-12 Thread Lu Yilei (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920237#action_12920237
 ] 

Lu Yilei commented on HDFS-1435:


We also encountered the same problem, our fsimage has 12G. We will limit 
transmission speed and compress transmission compression to resolve the 
problem, but not to change format of the fsimage. We plan to check NameNode  
fsimage and SecondaryNameNode when download CheckpointFiles, if they are the 
same, SecondaryNameNode will not download the fsimage from the NameNode.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: trunkImageCompress.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-11 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920008#action_12920008
 ] 

Hairong Kuang commented on HDFS-1435:
-

Sorry that I did not get to work on Avro's file format. Doug, shall we create a 
different jira for adopting Avro?

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: trunkImageCompress.patch


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-07 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918800#action_12918800
 ] 

Jeff Hammerbacher commented on HDFS-1435:
-

bq. Jeff, I'd like to take a look at the Avro file format. Do you know if Avro 
file format has any overhead than the current fsimage format?

I don't know about the current fsimage format. The Avro format, however, is 
detailed in the Avro spec: 
http://avro.apache.org/docs/current/spec.html#Object+Container+Files

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-07 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919079#action_12919079
 ] 

Doug Cutting commented on HDFS-1435:


Hairong, Avro's file format has little overhead.  It supports compression.  
However it assumes that a file is composed of a sequence of entries with a the 
same schema.  The fsimage has various sections.  The header information could 
be added as Avro file metadata.  The files and directories, datanodes and files 
under construction are currently written as separate blocks.  Instead, the 
schema for every item might be something like a union of [File, Directory, 
Symlink, DataNode, FileUnderConstruction].

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-05 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918222#action_12918222
 ] 

Hairong Kuang commented on HDFS-1435:
-

LZO compression codec is not supported in Hadoop standard package. So the 
compression algorithm has to be configurable.

If we compress the entire image file, the challenge is to decide where to put 
the compression algorithm information.
Dhruba suggested to store this information in file VERSION. This idea is neat. 
The only problem is that now saving the fsimage needs to touch two files and 
its hard to guarantee atomicity.

Another solution is to use a suffix to the image file name to indicate the 
compression algorithm. The problem with this is that now the image file no 
longer has a unique name so it is possible one storage directory has multiple 
fsimages. How do we handle this?

After discussions back and forth, I am kind of thinking to use the approach 
that I originally proposed, changing the binary format. Therefore we could 
store the compression algorithm information in the fsimage header. In this way, 
we don't need to deal with any of the complexity that compressing the entire 
image file presents.

What do the community think?

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-04 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917768#action_12917768
 ] 

Hairong Kuang commented on HDFS-1435:
-

I thought more about Philip's suggestion. So instead of changing the fsimage 
format, I have an option simply compress the whole image file and then when 
loading the fsimage, it decompress the image if the image file ends with a 
compression suffix.

This has a couple of advantages over my original idea:
1. No need to change layout versions;
2. Give admin more flexibility to use existing tools to compress fsimage even 
if HDFS is not configured to compress fsimage.

I also did a few experiments with different compression algorithms. I tried 
both gzip and LZO with a 13G fsimage, both using the default level of 
compression.
Gzip used 13 minutes to compress the 13G fsimage to be 2.3G bytes and 
decompression used 2 minutes 47 seconds.
LZO used only 3 minutes to compress the 13G fsimage to be 3G bytes and 
decompression used 2 minutes 51 seconds.

This is very promising results. I think fsimage has a lot of duplicate bytes so 
it could compress really well. And also it is very obvious that LZO provides 
good compression speed and good enough compression quality.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-04 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917876#action_12917876
 ] 

dhruba borthakur commented on HDFS-1435:


+1 on compressing the entire file.

The VERSIONS file should have a entry of the form:
codec=org.apache.hadoop.io.compress.GzipCodec
if the fsimage has been compressed using gzip.

1. At namenode startup time, it reads the VERSIONS file to determine how the 
fsimage is compressed. If the VERSIONS file does not have a codec=xxx entry, 
then the NN assumes that the image is not compressed.

2. while saving the fsimage, the NN looks at its own configuration to see if a 
config parameter named io.compression.codec is defined in the config. If it is 
defined, then it uses that codec to compress the fsimage and also updates the 
VERSIONS file.

This approach would be fully backward compatible and supports different 
compression algorithms for fsimage.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-02 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917164#action_12917164
 ] 

dhruba borthakur commented on HDFS-1435:


  time to load the image and the time to do saveNamespace to go up by a lot 
 with this change?
It might go up a little, and we can measure it and provide details here.


 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-01 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916793#action_12916793
 ] 

Philip Zeyliger commented on HDFS-1435:
---

Instead of changing the binary format, could you update code to either read 
fsimage or fsimage.gz, whichever is available?  Obviously, one could start 
compressing at any point, but there's significant value to being able to use 
existing tools to decompress if anything goes awry.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-01 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916795#action_12916795
 ] 

Jeff Hammerbacher commented on HDFS-1435:
-

Could we use the Avro file format to store the fsimage? We've designed 
configurable compression into the format, and tools will automatically be 
available for inspection of the file.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-09-30 Thread Dmytro Molkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916737#action_12916737
 ] 

Dmytro Molkov commented on HDFS-1435:
-

Do you expect the time to load the image and the time to do saveNamespace to go 
up by a lot with this change?

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-09-30 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916747#action_12916747
 ] 

Hairong Kuang commented on HDFS-1435:
-

This depends on the compression algorithm to be used. We need to choose an 
algorithm to provide a balance between compression quality and speed. I will do 
some experiments to provide more data.

Load image overhead is a concern since it adds overhead to NN restart time.

 Provide an option to store fsimage compressed
 -

 Key: HDFS-1435
 URL: https://issues.apache.org/jira/browse/HDFS-1435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


 Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
 bandwidth when secondary NN uploads a new fsimage to primary NN.
 If we could store fsimage compressed, the problem could be greatly alleviated.
 I plan to provide a new configuration hdfs.image.compressed with a default 
 value of false. If it is set to be true, fsimage is stored as compressed.
 The fsimage will have a new layout with a new field compressed in its 
 header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.