[jira] [Updated] (HADOOP-11666) Revert the format change of du output introduced by HADOOP-6857

2015-03-03 Thread Byron Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HADOOP-11666:

Status: Patch Available  (was: Open)

 Revert the format change of du output introduced by HADOOP-6857
 ---

 Key: HADOOP-11666
 URL: https://issues.apache.org/jira/browse/HADOOP-11666
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Byron Wong
 Attachments: HADOOP-6857-revert.patch


 HADOOP-6857 did two things about `du` at the same time.
 * Fix a bug for querying snapshottable directory
 * Change the output format (incompatible change)
 This issue is to revert the latter from branch-2 for keeping compatibility. 
 The bug fix is left.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2015-03-03 Thread Byron Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HADOOP-6857:
---
Attachment: HADOOP-6857-revert.patch

Attached HADOOP-6857-revert.patch.
Reverted only the changes that involved command line output.

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Fix For: 2.7.0

 Attachments: HADOOP-6857-revert.patch, HADOOP-6857.patch, 
 HADOOP-6857.patch, HADOOP-6857.patch, revert-HADOOP-6857-from-branch-2.patch, 
 show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2015-03-03 Thread Byron Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345898#comment-14345898
 ] 

Byron Wong commented on HADOOP-6857:


Yea, I agree. The bug fix should remain, but the output changes can be reverted.

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Fix For: 2.7.0

 Attachments: HADOOP-6857.patch, HADOOP-6857.patch, HADOOP-6857.patch, 
 revert-HADOOP-6857-from-branch-2.patch, show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2014-10-24 Thread Byron Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HADOOP-6857:
---
Attachment: HADOOP-6857.patch

Added new patch to fix TestHDFSCLI.
I don't think the failure in TestWebHDFSFOrHA has to do with my changes. It 
passed locally for me.

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Attachments: HADOOP-6857.patch, HADOOP-6857.patch, HADOOP-6857.patch, 
 show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2014-10-23 Thread Byron Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HADOOP-6857:
---
Target Version/s: 2.7.0
Hadoop Flags:   (was: Incompatible change)
  Status: Patch Available  (was: Reopened)

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Attachments: HADOOP-6857.patch, HADOOP-6857.patch, 
 show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2014-10-22 Thread Byron Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HADOOP-6857:
---
Attachment: HADOOP-6857.patch

Attached new patch.
This patch addresses Scenario 2. Snapshot$Root should computeContentSummary 
based on its snapshotId rather than the current state. Added unit test to 
verify this case.

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Attachments: HADOOP-6857.patch, HADOOP-6857.patch, 
 show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2014-10-15 Thread Byron Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172771#comment-14172771
 ] 

Byron Wong commented on HADOOP-6857:


*Scenario 2*: we still have snapshottable directory /test with same file 
a. We then create a fresh snapshot ss1. We then run {{hadoop fs -rm 
-skipTrash /test/a}}.
{{hadoop fs -du /test}} gives an empty output, as expected.
{{hadoop fs -du -s /test}} outputs:
{code}
41  123  /test
{code}
which makes sense, given that we know about the existence of the snapshot.
However, when we run {{hadoop fs -du -s /test/.snapshot/ss1}}, we get:
{code}
0  0  /test/.snapshot/ss1
{code}
This is inconsistent with the numbers we get when we run {{hadoop fs -du 
/test/.snapshot/ss1}}:
{code}
41  123  /test/.snapshot/ss1/a
{code}
Upon further investigation, we see that running {{hadoop fs -du -s 
/test/.snapshot/anySnapshot}} gives us the information about the current state 
of the real directory. This means that {{hadoop fs -du -s 
/test/.snapshot/anySnapshot}} is equivalent to running {{hadoop fs -du /test/}} 
and adding the numbers up, which is non-intuitive.
For example, let's add a 2 byte, 3 replication file /test/1 (/test/a is still 
deleted). Now {{hadoop fs -du -s /test/.snapshot/ss1}} gives us:
{code}
2  6  /test/.snapshot/ss1
{code}
whereas the results of {{hadoop fs -du /test/.snapshot/ss1}} remains the same:
{code}
41  123  /test/.snapshot/ss1/a
{code}

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Attachments: HADOOP-6857.patch, show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2014-10-14 Thread Byron Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171831#comment-14171831
 ] 

Byron Wong commented on HADOOP-6857:


In the case when a directory /D and snapshot S are in the exact same state 
(e.g. a fresh snapshot has been made), everything works fine, meaning the sum 
of the disk consumed numbers reported by -du /D equals the disk consumed number 
reported by -du -s /D.
When /D and S start deviating (files getting renamed, deleted, etc.), the disk 
consumed calculation will  take the lastFileSize within the snapshots, find the 
maximum replication factor for that file within the snapshots, multiply the 2 
together, and increment disk consumed by that number, which inflates the total 
disk consumed calculation, so -du -s /D  the sum of numbers in -du /D.

I'd also like to point out that this implementation only takes replication 
factor of a file into account, even if that replication factor is greater than 
number of data nodes, which further inflates the -du calculation. For example, 
if we setrep 10 a file when we only have 3 datanodes, -du will still multiply 
fileLength * 10, and report that number.

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Attachments: HADOOP-6857.patch, show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2014-10-13 Thread Byron Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong reassigned HADOOP-6857:
--

Assignee: Byron Wong

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Attachments: HADOOP-6857.patch, show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2014-10-08 Thread Byron Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Wong updated HADOOP-6857:
---
Attachment: HADOOP-6857.patch

Updated [~atm]'s patch to patch on top of trunk.
I'd like to point out that creating a snapshot of a directory will increase the 
disk usage number reported by -du (-s). Is this what we want?

 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
 Attachments: HADOOP-6857.patch, show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)