[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2018-03-15 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400978#comment-16400978
 ] 

genericqa commented on HDFS-12618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 52s{color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs generated 3 new + 390 unchanged - 
3 fixed = 393 total (was 393) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 48s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 112 unchanged - 4 fixed = 115 total (was 116) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 15s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
24s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}132m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HDFS-12618 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904662/HDFS-12618.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e4c39aff985d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5e013d5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23503/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23503/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2018-03-15 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400670#comment-16400670
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

Any comments on the last patch proposed?

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch, 
> HDFS-12618.005.patch, HDFS-12618.006.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2018-01-05 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312887#comment-16312887
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

Ain't sure on the *javac* WARN, it's complaining about TestFsck class, however 
there are no changes for it on latest patch, all tests were moved to 
TestFsckIncludeSnapshots class, TestFsck is not even present in the latest 
patch.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch, 
> HDFS-12618.005.patch, HDFS-12618.006.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2018-01-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312131#comment-16312131
 ] 

genericqa commented on HDFS-12618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 54s{color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs generated 3 new + 391 unchanged - 
3 fixed = 394 total (was 394) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 112 unchanged - 4 fixed = 115 total (was 116) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 34s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
29s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.TestMaintenanceState |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12618 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904662/HDFS-12618.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 182b243c1d8b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / dc735b2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22561/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| checkstyle | 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2018-01-04 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311620#comment-16311620
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

bq. validate() then catch AssertionError should be changed, for the reasons 
Daryn mentioned, plus the fact that assertion could be disabled at run time. 
See 
https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html#enable-disable
 .
Latest submitted patch already has refactorings to replace usage of validate 
and reliance on AssertionError handling.

bq. I'm not sure the current getLastINode()==null check is enough for 
INodeReference}}s. What if the block changed in the middle of the snapshots? 
For example, say file 1 has block 1&2. Then the following happened: snapshot 
s1, truncate so file has block 1-only, snapshot s2, append so file has block 
1&3, snapshot s3. Would we be able to tell the difference when {{fsck 
-includeSnapshots now?
I added a unit test for such scenario, it must appear in the next patch to be 
submitted with additional observations. Current condition seems to be covering 
this situation, test is passing.

bq. Because locks are reacquired during fsck, it's theoretically possible that 
snapshots are created / deleted during the scan. I think current behavior is 
we're not aware of new snapshots, and skip the deleted snapshots (since 
snapshottableDirs is populated before the check call. Possible to add a 
fault-injected test to make sure we don't NPE on deleted snapshots?
I'm not sure I follow this. Is it that we need to make sure snapshottableDir != 
null? If so, we do have an if on *checkDir* method, line #506.

bq. Speechlessly NamenodeFsck also has other block counts like 
numMinReplicatedBlocks. Current code only takes care of total blocks, which IMO 
is the most important. This also seems to be the goal of this jira as suggested 
by the title and description, so Okay to split that to another jira.
So there's another metric currently broken? I may open another jira for that, 
but would like to first get this sorted.

bq. I see the variable name of checkDir is changed to filePath, which is not 
accurate. Prefer to keep the old name path.
That was changed to fix checkstyle warning.

bq. checkFilesInSnapshotOnly: suggest to handle inode==null in it's own block, 
so we don't have to worry about that for non INodeFile code paths. (FYI null is 
not instanceof anything, so patch 4 code didn't have to check. Need to be 
careful after changing to isFile, as (correctly) suggested by Daryn.)
Last patch applied suggestions from Daryn to use *isFile* helper method, so now 
I guess we need to make sure inode is not null.

bq. lastSnapshotId = -1 should use Snapshot.NO_SNAPSHOT_ID rather than -1.
Applied on last patch.

bq. inodeFile..getFileWithSnapshotFeature().getDiffs() cannot never null 
judging from FileWithSnapshotFeature, so no need for nullity check
Fixed on last patch.

bq. Please format the code you changed. There are many space inconsistencies 
around brackets.
Formatted on last patch.

bq. Test should add timeouts. Perhaps better to just use a Rule on the class, 
to safeguard cases by default with something like 3 minutes.
Will be available on next patch.

bq. Feels to me the "HEALTHY" check in the beginning of each test case is not 
necessary.
Will be available on next patch.

bq. Could use GenericTestUtils.waitFor() for the waits.
Will be available on next patch.

bq. Optional - TestFsck is already 2.4k+ lines long. Maybe better to create a 
new test class for snapshot blockcount specifically. In that class the name of 
each test would be shorter and more readable.
Will be available on next patch.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch, 
> HDFS-12618.005.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2018-01-02 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308915#comment-16308915
 ] 

genericqa commented on HDFS-12618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 114 unchanged - 4 fixed = 116 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 53s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}121m 37s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12618 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904286/HDFS-12618.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fa80c8c0ef60 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b4d1133 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22532/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-12-29 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306556#comment-16306556
 ] 

Xiao Chen commented on HDFS-12618:
--

Sorry for my month-long delay on reviewing, finally locked myself to the chair 
and reviewed the latest patch and comments before the end of the year. 

Good to see we're improving, and happy to see the many added test cases. Thanks 
for the continued work [~wchevreuil], and [~daryn] for the reviews.

The problem is pretty hard, but direction looks good. Some comments on patch  4:
- {{validate()}} then catch {{AssertionError}} should be changed, for the 
reasons Daryn mentioned, plus the fact that assertion could be disabled at run 
time. See 
https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html#enable-disable
 . 
- I'm not sure the current {{getLastINode()==null}} check is enough for 
{{INodeReference}}s. What if the block changed in the middle of the snapshots? 
For example, say file 1 has block 1&2. Then the following happened: snapshot 
s1, truncate so file has block 1-only, snapshot s2, append so file has block 
1&3, snapshot s3. Would we be able to tell the difference when {{fsck 
-includeSnapshots}} now?
- Because locks are reacquired during fsck, it's theoretically possible that 
snapshots are created / deleted during the scan. I think current behavior is 
we're not aware of new snapshots, and skip the deleted snapshots (since 
{{snapshottableDirs}} is populated before the {{check}} call. Possible to add a 
fault-injected test to make sure we don't NPE on deleted snapshots?
- Speechlessly {{NamenodeFsck}} also has other block counts like 
{{numMinReplicatedBlocks}}. Current code only takes care of total blocks, which 
IMO is the most important. This also seems to be the goal of this jira as 
suggested by the title and description, so Okay to split that to another jira.

Trivial ones:
- I see the variable name of {{checkDir}} is changed to {{filePath}}, which is 
not accurate. Prefer to keep the old name {{path}}.
- {{checkFilesInSnapshotOnly}}: suggest to handle {{inode==null}} in it's own 
block, so we don't have to worry about that for non {{INodeFile}} code paths. 
(FYI null is not instanceof anything, so patch 4 code didn't have to check. 
Need to be careful after changing to {{isFile}}, as (correctly) suggested by 
Daryn.)
- {{lastSnapshotId = -1}} should use {{Snapshot.NO_SNAPSHOT_ID}} rather than -1.
- {{inodeFile..getFileWithSnapshotFeature().getDiffs()}} cannot never null 
judging from {{FileWithSnapshotFeature}}, so no need for nullity check
- Test should add timeouts. Perhaps better to just use a {{Rule}} on the class, 
to safeguard cases by default with something like 3 minutes.
- Feels to me the "HEALTHY" check in the beginning of each test case is not 
necessary.
- Could use {{GenericTestUtils.waitFor()}} for the waits.
- Optional - {{TestFsck}} is already 2.4k+ lines long. Maybe better to create a 
new test class for snapshot blockcount specifically. In that class the name of 
each test would be shorter and more readable.

Also a curiosity question to [~daryn]:
bq. try - lock - finally unlock v.s. lock - try - finally unlock
Understood and completely agree with the advice. Curiosity comes from: in 
current HDFS, it looks like only {{FSN#writeLockInterruptibly}} is possible to 
throw a checked exception. Of course there could also be unchecked exceptions - 
is this a coding advice or something you have run into in practice? Care to 
share the fun details? :)

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-12-07 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281811#comment-16281811
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

bq. How is FSDirectory.getINodesInPath(filePath, FSDirectory.DirOp.READ) broken?
I wouldn't say it's broken, as the "filePath" being passed to it in this case 
does not actually exist. When dealing with appended or truncated files, the 
original file path may still exist (if the file has never been deleted), but 
the file version on the snapshot folders may have different blocks. That adds 
the need to check if the original file path still exists out of snapshots. 
Thats the reason behind the 
*dir.getINodesInPath(inodeFile.getName(),FSDirectory.DirOp.READ).validate();*, 
as *inodeFile.getName()* returns the original file path. So if the file has 
been deleted, *inodeFile.getName()* actually refers to an invalid path, that's 
why 
*dir.getINodesInPath(inodeFile.getName(),FSDirectory.DirOp.READ).validate();*. 
An alternative that I thought then was to go through the INodes array from this 
IIP, comparing it with the *inodeFile.getName()*, since usage of *validate()* 
is discouraged.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-12-05 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279130#comment-16279130
 ] 

Daryn Sharp commented on HDFS-12618:


This is more of a lament, but HDFS-12811 would make the problem simpler at 
least for a fast fsck of the whole namespace via the inode map.  Determining if 
the inode was visited goes away...

How is {{FSDirectory.getINodesInPath(filePath, FSDirectory.DirOp.READ)}} 
broken?  With reserved inode paths?  Otherwise how can you resolve a path 
(incorrectly) to something that doesn't exist in the live namespace?

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-28 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269038#comment-16269038
 ] 

Daryn Sharp commented on HDFS-12618:


bq. Admittedly snapshot is very complicated and requires a lot of efforts (at 
least for me) to get things right.
Unfortunately it takes a lot of effort for everyone which is why it's so hard 
to fix issues involving them.

Never do "try - lock - finally unlock".  If the lock attempt throws for any 
reason, the finally performs an unbalanced unlock.  Always use the pattern 
"lock - try - finally unlock".

In general, re-constituting and re-resolving paths is very expensive and should 
be avoided.  This patch does a lot.

Catching {{AssertionError}}, or any unchecked exception, and assuming the 
semantics is a terrible idea.  I see you are relying on {{IIP#validate}} to 
throw it but it's an implementation detail.  Past experience indicates that 
method is broken.  When I've tiptoed around snapshots and added logging of 
{{IIP#toString}} during permission checking, it had many false positives.  
Don't use it.

Abbreviated:
{code}
if(inodeFile.getFileWithSnapshotFeature()!=null && 
inodeFile.getFileWithSnapshotFeature().getDiffs()!=null) {
  lastSnapshotId = 
inodeFile.getFileWithSnapshotFeature().getDiffs().getLastSnapshotId();
}
{code}
(I'm not sure the snapshot code is correct).  From a style/performance 
perspective, you don't want to keep repeatedly calling the same method, notably 
the feature lookup, versus using a local variable.

No need to use {{instanceof}} on the inodes and explicitly cast.  Ie. Rather 
than {{inode instanceof  INodeFile}}, use {{inode#isFile}} followed by 
{{inode#asFile}}.  Same for references.

Catching up from vacation, but I need to find more time to grok the snapshot 
code.  Like most snapshot code it feels overly complex and brittle.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-14 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251747#comment-16251747
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

I'm having problems with append and truncate. In both cases, 
*iip.getLastINode()* returns an instance of *INodeFile*. For append, the 
original block for the file is always kept. Thus, let's say I had a file 
*file1* originally with 1 block only, and took a snapshot *snap1* for this 
folder. Then had performed append, so that *file1* has now 2 blocks. This 
*file1* will have a total of 2 blocks now, block for *snap1* file would be 1st 
block of the real file, so snap1 file should not have it's block accounted. I 
thought this would be feasible by performing the following check:

{noformat}
if (inodeFile.isWithSnapshot() && 
inodeFile.getFileWithSnapshotFeature().getDiffs().getLastSnapshotId() == 
iip.getPathSnapshotId() && 
inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()) {
replRes.totalBlocks += inodeFile.getBlocks().length;
  }
{noformat} 

I guess this would work, as we just have to count blocks for files inside 
snapshots if the original file has been removed, and the file is on the last 
snapshot. Problem here is that 
*inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()* returns *true* 
as long as the *DELETE* operation that removed the original file is younger 
than last fsimage file. Once checkpoint happens and NN is restarted, 
*inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()* then returns 
*false* for all snapshot files, breaking the logic above. I believe it's 
possible to set that flag from info available on fsimage during image loading 
time, but I'm not sure if that's the expected behaviour.

Truncate problem is harder. It generates new blocks (because it shrinks the 
file). So now we would need pass the above checks, but within *truncate*, 
*inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()* will never 
return *true* (cause there were not any deletion). We could obviously make this 
flag updated by truncate, but it does not seem right.

These problems actually make me feel it would be simpler if we could have a way 
to get all related file paths for a given block from block manager, so that 
while in snapshot check we could only count blocks for those paths that are 
actually in snapshots. I don't know if we have any way to do that now, have 
found *BlockManager.getStoredBlock(Block)*, but returned *BlockInfo* instance 
only has id for the last inode in the file path, which in this case is the same 
for both real path and snapshot path. 

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-10 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248306#comment-16248306
 ] 

Xiao Chen commented on HDFS-12618:
--

Yup, append is definitely a valid scenario. We can also do a truncate. 

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-10 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247872#comment-16247872
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

bq. Let's please make sure the test also covers the case that different inode 
references contain different blocks, and verify there is no over/under counting.
My understanding is that this would be the case when a file already present on 
snapshot(s) has then been appended. Do you guys know of any other condition for 
such? I'm working on tests and solution to also cover this scenario as well.



> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-09 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246526#comment-16246526
 ] 

Xiao Chen commented on HDFS-12618:
--

Yup, rename is what makes replaces a child inode to become inodereference. What 
you described sounds fine to me. Let's please make sure the test also covers 
the case that different inode references contain different blocks, and verify 
there is no over/under counting.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-09 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246394#comment-16246394
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

Thanks [~xiaochen]. Last patch was not indeed handling renames. I had done some 
further analysis on this, here what I found out:

1) *iip.getLastINode()* always return an instance of *INodeFile* if the given 
file under snapshot related folder has not been renamed.
2) If a file within a snapshot gets renamed, the snapshot entry returned by 
*iip.getLastINode()* will be an instance of *INodeReference$WithName*. While 
counting for blocks, I guess we can simply ignore these ones and not count 
blocks for *INodeReference$WithName*, as the blocks will either be accounted 
outside of snapshot check, or by the new file name.
3) If a snapshot is taken after the rename, the snapshot entry for the new file 
name returned by *iip.getLastINode()* will be an instance of 
*INodeReference$DstReference*.
4) The main problem is with renamed files that then got deleted, because then 
related *INodeReference$DstReference* entries should have blocks accounted. 
That requires for an additional check on the referred inode path, so that if 
*getINodesInPath().getLastINode()* returns null, that means this 
*INodeReference$DstReference* is a deleted renamed file and needs to have its 
blocks counted.

For example:
{noformat}
1) Directory content:
/snap-test/file1
/snap-test/file2

2) Snapshot is taken:
$ hdfs dfs -createSnasphot /snap-test snap1

3) *iip.getLastINode()* for */snap-test/.snapshot/snap1/file1* entry will be an 
instance of *INodeFile*

4) file1 is renamed to /snap-test/file3. Now *iip.getLastINode()* for 
*/snap-test/.snapshot/snap1/file1* entry will be an instance of 
*INodeReference$WithName*.

5) Another snapshot is taken:
$ hdfs dfs -createSnasphot /snap-test snap2

6) *iip.getLastINode()* for */snap-test/.snapshot/snap2/file3* entry will be an 
instance of *INodeReference$DstReference*.

7) If /snap-test/file3 is deleted, *iip.getLastINode()* for 
*/snap-test/.snapshot/snap2/file3* entry still returns an instance of 
*INodeReference$DstReference*.

{noformat}

I will work on a new patch following this solutions, and will add more unit 
tests to cover further scenarios, such as renames and multiple snapshots 
references. Let me know on your thoughts, if you see any issues with this 
strategy.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-08 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245239#comment-16245239
 ] 

Xiao Chen commented on HDFS-12618:
--

Finally managed to review this with more details, together with Daryn's (great) 
comments. Also helpful to know Yahoo is able to run fsck on / frequently...

Looking again at the patch and comments, I think Daryn's comment regarding 
{{INodeReference.With\[Name|Count\]}} is not understood and reflected in this 
patch. It's not a safe assumption that {{iip.getLastINode()}} will be an 
{{INodeFile}} - for snapshots, that's usually an {{INodeReference.WithName}}.
Admittedly snapshot is very complicated and requires a lot of efforts (at least 
for me) to get things right. One way to look into it is probably with some 
examples, and debug from {{FSDirRenameOp$RenameOperation}} when a rename 
happens.

Related to the above, suggest we have unit test to cover renames (for the 
above, since delete doesn't trigger the same INode references link as rename 
does), and multiple snapshots (to cover when multiple snapshots having 
different but also overlapping blocks, so we test the not-the-last-WithName 
path).

Thanks.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-04 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238893#comment-16238893
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

Thanks [~xiaochen], your last suggestions all make sense to me, will start work 
on these already. Meanwhile, feel free to make any further observations.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235007#comment-16235007
 ] 

Xiao Chen commented on HDFS-12618:
--

Thanks for the new patch Wellington.

>From a quick look this seems to work, nice job. I'd like to see:
- more thorough unit tests covering the description scenario (2 snapshots 
referring to a deleted file)
- tests covering some combinations of create / delete snapshot, and verify the 
number is correct
- not an expert on lambda expert, but it seems {{DirTypeCheck}} could be 
private.
- In general we'd need to acquire the FSDirectory lock as well as the 
FSNamesystem lock. So need dir.readLock() after the name system readlock, and 
dir.readUnlock before the fsn unlock.
- Looks like you have applied a formatter to the entire NamenodeFsck.java 
(instead of just the changed code), which resulted in some unnecessary changes. 
Let's not make those changes.

Will provide a more complete review later this week.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224145#comment-16224145
 ] 

Hadoop QA commented on HDFS-12618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 115 unchanged - 4 fixed = 118 total (was 119) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 59s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
19s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12618 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12894634/HDFS-12618.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f65133dcb2b9 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9114d7a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21867/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21867/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21867/testReport/ |
| asflicense | 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-26 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220685#comment-16220685
 ] 

Daryn Sharp commented on HDFS-12618:


bq.  I did not -1 on this one because this only happens when someone runs 
-includeSnapshots explicitly. Not sure if these snapshot problems can be solved 
without doing this, please feel free to share any alternatives in mind. 
Been awhile since I (attempted to) analyze snapshots.  I think snapshotted 
files, even deleted ones, are a {{INodeReference.WithName}} with a parent of 
{{INodeReference.WithCount}} which maintains a list of all 
{{INodeReference.WithName}}.  Perhaps we could detect whether the inode is 
linked into the current namesystem.  If yes, it will be picked up in the 
namesystem crawl; if no, count it based on all the {{WithName}} refs.  And/or 
maybe count a reference only if it's the last ref 
({{INodeReference.WithCount#getLastWithName}}).  Maybe.

bq. For large clusters doing fsck alone on / are a bad idea.
We do this nightly.   Every day.  Every cluster.

bq. Would it work for you if we put a memory limit on how much each fsck 
-includeSnapshots' block map could consume on the NN?
I'm not sure how that could work in a user-friendly manner.  I run the fsck, it 
fails.  I have to run it again on subdirs?  Some fail again.  I have to run it 
on lower subdirs, then write code to collate all the mini-reports back together 
in a unified report?

Fsck can run for tens of minutes or hours.  Keeping excessively large state 
during the operation will cause lots of pressure on old and risk OOM.  It has 
to stay lightweight (or only as heavy as it already is).

bq. Given this check is just relevant for files that had been deleted and 
reside on snapshots only, would it still be a possibility for these files to be 
truncated/appended?
I'm by no means a truncate/append + snapshot expert.  A renamed file appears as 
a delete in a snapshot diff.  It can be subsequent modified in later versions 
that may also be snapshotted and "deleted".  Correct snapshot handling has been 
problematic so we need to ensure it works correctly in all cases w/o causing 
significant issues.


> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-26 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220202#comment-16220202
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

Thanks for the comments [~daryn].

bq. Ignoring my rejection, I'm not even sure the logic for checking just the 
first block is even correct in light of truncate and append.
I was not aware of the implications of truncate/append. Given this check is 
just relevant for files that had been deleted and reside on snapshots only, 
would it still be a possibility for these files to be truncated/appended?

bq. I'm not a lambda expect, but creating a primitive array that requires 
forced casting of indexed element appears to be an abuse of the construct. The 
wrapping and unwrapping of IOEs as suppressed exceptions embedded in 
RuntimeExceptions in an apparent attempt to thwart checked dependencies is also 
unnecessary and appears related to the lamba construct.
That was an attempt to use built-in *java.util.function.Consumer* interface, 
that defines only one parameter on its accept method and throws no Exception. 
Indeed, looking further into lambda API, I guess this can be sorted by creating 
an own *@FunctionalInterface* that defines the types and checked exceptions 
needed by *check* and *checkFilesInSnapshotOnly* methods.

 



> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-25 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219732#comment-16219732
 ] 

Xiao Chen commented on HDFS-12618:
--

Hi [~daryn],

Thanks for the comment. I surely learned the lesson from HDFS-10797, and had 
exactly the same concerns in my [early 
comment|https://issues.apache.org/jira/browse/HDFS-12618?focusedCommentId=16213269=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16213269]

I did not -1 on this one because this only happens when someone runs 
-includeSnapshots explicitly. Not sure if these snapshot problems can be solved 
without doing this, please feel free to share any alternatives in mind. For 
large clusters doing fsck alone on {{/}} are a bad idea. Would it work for you 
if we put a memory limit on how much each {{fsck -includeSnapshots}}' block map 
could consume on the NN?

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-25 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219694#comment-16219694
 ] 

Daryn Sharp commented on HDFS-12618:


I understand the problem, but as soon as you update documentation to warn users 
with:
{code}
+  + "\t2. Option -includeSnapshots should be used with caution. For "
+  + "clusters with too many files deleted but available on snapshots "
+  + "only, this would require extra NameNode heap to account its blocks "
+  + "properly.";
{code}

That should set off alarms.  I assumed it would be a repeat of a reverted 
content summary patch that put all visited inodes in a set.   It essentially 
is, but worse.  It's an array list.  I then contemplated whether I cared enough 
to review because I have no intention of using snapshots.  Then I realized some 
fatefully day I'll be forced to support them and this will blow up my NN.

For that reason alone: *-1*.

Ignoring my rejection, I'm not even sure the logic for checking just the first 
block is even correct in light of truncate and append.

I'm not a lambda expect, but creating a primitive array that requires forced 
casting of indexed element appears to be an abuse of the construct.  The 
wrapping and unwrapping of IOEs as suppressed exceptions embedded in 
RuntimeExceptions in an apparent attempt to thwart checked dependencies is also 
unnecessary and appears related to the lamba construct.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219479#comment-16219479
 ] 

Hadoop QA commented on HDFS-12618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  8m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 54s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 115 unchanged - 4 fixed = 116 total (was 119) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}129m  4s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestMissingBlocksAlert |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12618 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893975/HDFS-12618.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 08f1c6d304e4 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 5b98639 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-23 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215919#comment-16215919
 ] 

Xiao Chen commented on HDFS-12618:
--

Thanks for the explanation [~wchevreuil]. The arraylist and lambda exception 
wrapping LGTM. I also looked around this area, and the approach here looks to 
be the best we can get now. Only caching the 1st block is a smart way. :) 
Perhaps we could add a warning message to the fsck usage output, to warn the 
potential impact of -includeSnapshot and -blocks to the NN.

You're correct about the locking code, I was talking about that block. Sorry I 
wasn't explicit.
Looking at that code, there are several improvements I could see:
- {{getINodesInPath}} should be done within the readlock.
- {{inode}} could be gotten from {{iip.getLastINode}}.
- Need to check and handle inode==null

Also, could you format the code to follow existing style? We usually put a 
space to separate parenthesis / brackets. I thought there was an official 
format file but couldn't find it, just forwarded you the format XML offline.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-23 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214955#comment-16214955
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

Hi [~xiaochen], thanks for reviewing and sharing your thoughts. Yeah, for Web 
UI is just a matter of showing the total number of blocks currently mapped on 
BlockManager. Regarding the issues you had highlighted previously, I have some 
comments below:

bq. snapshotSeenBlocks could be huge, if the snapshot dir is at the top level, 
and has a lot of blocks under it. In extreme cases, this may put pressure on 
NN. At the minimum we should make sure the extra space is allocated only when 
-includeSnapshots is set.
That's indeed something that concerned me while thinking on this. To minimize 
heap pressure effects, this only adds the 1st block of each snapshot file 
already checked. It also adds only blocks from files that exist only os 
snapshots (files that had been already deleted). So, although the *ArrayList* 
is created on a generic scope, it will only be really populated with blocks in 
case where "-includeSnapshots" is passed. Unfortunately, because *checkDir* is 
invoked recursively, I needed to initialise the *ArrayList* at first cal to 
*checkDir*, from *check* method.

bq. where inodes are involved, file system locks are required. This adds burden 
to the NN which should be minimized.
I guess this is related to this block of code below, where I'm not acquiring 
any namesystem lock (apart from the one done on *getBlockLocations*), which 
could lead to inconsistent report of blocks:
{noformat}
  FSDirectory dir = namenode.getNamesystem().getFSDirectory();
  final INodesInPath iip = dir.getINodesInPath(filePath,
  FSDirectory.DirOp.READ);
  final INodeFile inode = INodeFile.valueOf(iip.getLastINode(), filePath);
  LocatedBlocks blocks = getBlockLocations(filePath, snapshotFileInfo);
  printFileStats(filePath, blocks, snapshotFileInfo);
  if(inode.isWithSnapshot()){
if(!seenBlocks.contains(inode.getBlocks()[0].getBlockName())) {
  replRes.totalBlocks += inode.getBlocks().length;
  seenBlocks.add(inode.getBlocks()[0].getBlockName());
}
  }
{noformat}

I guess that should be rewritten as below:

{noformat}
FSDirectory dir = namenode.getNamesystem().getFSDirectory();
  final INodesInPath iip = dir.getINodesInPath(filePath,
  FSDirectory.DirOp.READ);
  namenode.getNamesystem().readLock();
  try {
final INodeFile inode = INodeFile.valueOf(iip.getLastINode(), filePath);
if (inode.isWithSnapshot()) {
  if (!seenBlocks.contains(inode.getBlocks()[0].getBlockName())) {
replRes.totalBlocks += inode.getBlocks().length;
seenBlocks.add(inode.getBlocks()[0].getBlockName());
  }
}
  }finally {
namenode.getNamesystem().readUnlock();
  }
  LocatedBlocks blocks = getBlockLocations(filePath, snapshotFileInfo);
  printFileStats(filePath, blocks, snapshotFileInfo);
{noformat}
Agreed that this can cause concurrency issues. Still this code would be 
executed only for snapshot dirs. I can't see other way to get the block info 
details required to avoid over counting these blocks.

bq.Catching a RunTimeException and re-throw looks like a bad practice. What are 
the RTEs that could be thrown that we need to wrap?
This was a problem I came with while using java lambda functions. The problem 
is that I had to declare a *java.util.function.Consumer* param on *checkDir* 
signature, in order to be able to pass both *checkFilesInSnapshotOnly* and 
*check* calls as lambda functions, but *java.util.function.Consumer.accept* 
does not declare any checked exception. That causes any *IOException* thrown by 
*checkFilesInSnapshotOnly* or *check* methods when executed from a lambda 
context to be suppressed. Only way I had found to keep any of the *IOException* 
occurrences to be tracked on the outer method was to wrap it on a RTE, forcibly 
catch it on the outer method and recover the original *IOException* again by 
unwrapping it. I still thought using lambdas though would require less 
refactoring while avoiding code duplication.

Please share ony other ideas/concerns you may have.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-20 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213269#comment-16213269
 ] 

Xiao Chen commented on HDFS-12618:
--

Thanks for reporting and working on this [~wchevreuil]!

I agree this is an issue, but not sure what the best solution would be - this 
appears to be a difficult problem. Let me research into this too see if any 
other ideas pop up.

The reason webui shows correctly is it's just showing the total number of 
blocks from the block manager. I'm afraid we don't have information as to 
'Total blocks under to a directory', so not useful for fsck.

Some issues I can see from the current patch:
- {{snapshotSeenBlocks}} could be huge, if the snapshot dir is at the top 
level, and has a lot of blocks under it. In extreme cases, this may put 
pressure on NN. At the minimum we should make sure the extra space is allocated 
only when {{-includeSnapshots}} is set.
- where inodes are involved, file system locks are required. This adds burden 
to the NN which should be minimized.
- Catching a {{RunTimeException}} and re-throw looks like a bad practice. What 
are the RTEs that could be thrown that we need to wrap?



> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212858#comment-16212858
 ] 

Hadoop QA commented on HDFS-12618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 91 unchanged - 4 fixed = 92 total (was 95) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 57s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 22s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}119m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | HDFS-12618 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893245/HDFS-12618.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fb6aa782d100 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1f4cdf1 |
| Default Java | 1.8.0_131 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21763/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21763/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21763/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207768#comment-16207768
 ] 

Hadoop QA commented on HDFS-12618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 91 unchanged - 4 fixed = 94 total (was 95) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0de40f0 |
| JIRA Issue | HDFS-12618 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12892595/HDFS-121618.001.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3f5ab7aa89b6 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 31ebccc |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21726/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21726/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21726/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-14 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204856#comment-16204856
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

Definitely environmental, tests are passing on trunk after restarting my 
machine. Am working on tests for this change.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-10-14 Thread Wellington Chevreuil (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204794#comment-16204794
 ] 

Wellington Chevreuil commented on HDFS-12618:
-

I was going to add some tests to TestFsck, but for some reason when running the 
tests on trunk (without my changes), am getting below errors for some already 
existing tests:

{noformat}
Running org.apache.hadoop.hdfs.server.namenode.TestFsck
Tests run: 32, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 136.329 sec 
<<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFsck
testFsckCorruptECFile(org.apache.hadoop.hdfs.server.namenode.TestFsck)  Time 
elapsed: 3.022 sec  <<< ERROR!
java.lang.IllegalStateException: failed to create a child event loop
at sun.nio.ch.KQueueArrayWrapper.init(Native Method)
at sun.nio.ch.KQueueArrayWrapper.(KQueueArrayWrapper.java:98)
at sun.nio.ch.KQueueSelectorImpl.(KQueueSelectorImpl.java:88)
at 
sun.nio.ch.KQueueSelectorProvider.openSelector(KQueueSelectorProvider.java:42)
at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:174)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:150)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:103)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
at 
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:50)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:70)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:65)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:56)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:48)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:40)
at 
org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.(DatanodeHttpServer.java:132)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:954)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1402)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:497)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2778)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2681)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1635)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:882)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:494)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:453)
at 
org.apache.hadoop.hdfs.server.namenode.TestFsck.testFsckCorruptECFile(TestFsck.java:2304)

testBlockIdCKDecommission(org.apache.hadoop.hdfs.server.namenode.TestFsck)  
Time elapsed: 2.702 sec  <<< ERROR!
java.lang.IllegalStateException: failed to create a child event loop
at sun.nio.ch.IOUtil.makePipe(Native Method)
at sun.nio.ch.KQueueSelectorImpl.(KQueueSelectorImpl.java:84)
at 
sun.nio.ch.KQueueSelectorProvider.openSelector(KQueueSelectorProvider.java:42)
at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:174)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:150)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:103)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
at 
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:50)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:70)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:65)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:56)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:48)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:40)
at 
org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.(DatanodeHttpServer.java:132)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:954)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1402)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:497)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2778)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2681)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1635)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:882)
at