[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

Wellington Chevreuil (JIRA) Thu, 09 Nov 2017 11:56:34 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246394#comment-16246394
 ]


Wellington Chevreuil commented on HDFS-12618:
---------------------------------------------

Thanks [~xiaochen]. Last patch was not indeed handling renames. I had done some 
further analysis on this, here what I found out:

1) *iip.getLastINode()* always return an instance of *INodeFile* if the given 
file under snapshot related folder has not been renamed.
2) If a file within a snapshot gets renamed, the snapshot entry returned by 
*iip.getLastINode()* will be an instance of *INodeReference$WithName*. While 
counting for blocks, I guess we can simply ignore these ones and not count 
blocks for *INodeReference$WithName*, as the blocks will either be accounted 
outside of snapshot check, or by the new file name.
3) If a snapshot is taken after the rename, the snapshot entry for the new file 
name returned by *iip.getLastINode()* will be an instance of 
*INodeReference$DstReference*.
4) The main problem is with renamed files that then got deleted, because then 
related *INodeReference$DstReference* entries should have blocks accounted. 
That requires for an additional check on the referred inode path, so that if 
*getINodesInPath().getLastINode()* returns null, that means this 
*INodeReference$DstReference* is a deleted renamed file and needs to have its 
blocks counted.

For example:
{noformat}
1) Directory content:
/snap-test/file1
/snap-test/file2

2) Snapshot is taken:
$ hdfs dfs -createSnasphot /snap-test snap1

3) *iip.getLastINode()* for */snap-test/.snapshot/snap1/file1* entry will be an 
instance of *INodeFile*

4) file1 is renamed to /snap-test/file3. Now *iip.getLastINode()* for 
*/snap-test/.snapshot/snap1/file1* entry will be an instance of 
*INodeReference$WithName*.

5) Another snapshot is taken:
$ hdfs dfs -createSnasphot /snap-test snap2

6) *iip.getLastINode()* for */snap-test/.snapshot/snap2/file3* entry will be an 
instance of *INodeReference$DstReference*.

7) If /snap-test/file3 is deleted, *iip.getLastINode()* for 
*/snap-test/.snapshot/snap2/file3* entry still returns an instance of 
*INodeReference$DstReference*.

{noformat}

I will work on a new patch following this solutions, and will add more unit 
tests to cover further scenarios, such as renames and multiple snapshots 
references. Let me know on your thoughts, if you see any issues with this 
strategy.

> fsck -includeSnapshots reports wrong amount of total blocks
> -----------------------------------------------------------
>
>                 Key: HDFS-12618
>                 URL: https://issues.apache.org/jira/browse/HDFS-12618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup          0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:        1
>  Number of racks:             1
>  Total dirs:                  6
>  Total symlinks:              0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):    12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:      0 (0.0 %)
>  Under-replicated blocks:     0 (0.0 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:              0
>  Corrupt blocks:              0
>  Missing replicas:            0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

Reply via email to