[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960263#comment-14960263
 ] 

Hadoop QA commented on HDFS-9129:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  7s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 26s | The applied patch generated  6 
new checkstyle issues (total was 626, now 577). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 34s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  50m 12s | Tests failed in hadoop-hdfs. |
| | |  96m 47s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure000 |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.TestReplication |
|   | hadoop.hdfs.util.TestByteArrayManager |
|   | hadoop.hdfs.TestGetBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 |
| Timed out tests | org.apache.hadoop.hdfs.TestFileCreation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766984/HDFS-9129.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf23f2c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13024/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13024/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13024/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13024/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13024/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13024/console |


This message was automatically generated.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7964) Add support for async edit logging

2015-10-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960114#comment-14960114
 ] 

Yi Liu edited comment on HDFS-7964 at 10/16/15 6:40 AM:


Thanks [~daryn] for the work.

Further comments:

*1.* In FSEditLogAsync#run
{code}
@Override
  public void run() {
try {
  while (true) {

if (doSync) {
  ...
logSync(getLastWrittenTxId());
  ...
{code}
I think it's better to pass the txid of current edit to {{logSync}}, not need 
to wait for all txid written. Then it's more efficient and client can get more 
faster response? 

*2.*
{code}
-log4j.rootLogger=OFF, CONSOLE
+log4j.rootLogger=DEBUG, CONSOLE
{code}
Any reason to change it?

*3.*
{code}
call.abortResponse(syncEx);
{code}
Seems this code is not available?


was (Author: hitliuyi):
Thanks [~daryn] for the work.

Further comments:

*1.* In FSEditLogAsync#run
{code}
@Override
  public void run() {
try {
  while (true) {

if (doSync) {
  ...
logSync(getLastWrittenTxId());
  ...
{code}
I think it's better to pass the txid of current edit to {{logSync}}, not need 
to wait for all txid written. Then it's more efficient and client can get more 
faster response? 

*2.*
{code}
+  editsBatchedInSync = txid - synctxid - 1;
{code}
Isn't it "txid - synctxid"?   The txid is the max txid written, and synctxid is 
the max txid already synced, suppose txid = 20, synctxid = 10, then the 
editsBatchedInSync should be (txid - synctxid) = (20 - 10) = 10.   Also you can 
get it from the existing log message:
{code}
final String msg =
"Could not sync enough journals to persistent storage " +
"due to " + e.getMessage() + ". " +
"Unsynced transactions: " + (txid - synctxid);
{code}

*3.*
{code}
-log4j.rootLogger=OFF, CONSOLE
+log4j.rootLogger=DEBUG, CONSOLE
{code}
Any reason to change it?

*4.*
{code}
call.abortResponse(syncEx);
{code}
Seems this code is not available?

> Add support for async edit logging
> --
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960230#comment-14960230
 ] 

Jing Zhao commented on HDFS-9129:
-

Thanks for the work, Mingliang! Glad to hear that finally we're allowed to 
review your patch now :)))

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-10-15 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960222#comment-14960222
 ] 

Walter Su commented on HDFS-9173:
-

Oh, I get your point. I can do that. I must be very careful to touch the code 
for contiguous. Firstly I think I should separate the code movement in another 
jira as [~rakeshr] suggested. And try your suggestion in here.

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960207#comment-14960207
 ] 

Hadoop QA commented on HDFS-9253:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | native |   3m  9s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |   0m 43s | Tests passed in 
hadoop-hdfs-native-client. |
| | |  40m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766982/HDFS-9253.002.patch |
| Optional Tests | javadoc javac unit |
| git revision | trunk / cf23f2c |
| hadoop-hdfs-native-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13023/artifact/patchprocess/testrun_hadoop-hdfs-native-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13023/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13023/console |


This message was automatically generated.

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch, HDFS-9253.001.patch, 
> HDFS-9253.002.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9254) HDFS Secure Mode Documentation updates

2015-10-15 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9254:

Status: Patch Available  (was: In Progress)

> HDFS Secure Mode Documentation updates
> --
>
> Key: HDFS-9254
> URL: https://issues.apache.org/jira/browse/HDFS-9254
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9254.01.patch
>
>
> Some Kerberos configuration parameters are not documented well enough. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9254) HDFS Secure Mode Documentation updates

2015-10-15 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9254:

Attachment: HDFS-9254.01.patch

v1 patch.
# Documents missing settings in hdfs-default.xml.
# Add more detail to SecureMode.html, rewrite some sections. Document 
JournalNode settings.
# Update HdfsMultihoming.html to document new security settings added by 
HADOOP-12437.
# Minor cleanup of HttpAuthentication.html to convert textual description to a 
table. I think this doc needs more detail but I don't understand this part of 
the configuration well enough to add content.

> HDFS Secure Mode Documentation updates
> --
>
> Key: HDFS-9254
> URL: https://issues.apache.org/jira/browse/HDFS-9254
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9254.01.patch
>
>
> Some Kerberos configuration parameters are not documented well enough. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.005.patch

The failing test is not related. The v5 patch is not tracking safe blocks after 
leaving safe mode. Any comment is welcome. Will work on reducing 
synchronization overhead in {{BlockManagerSafeMode}}.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9253:
-
Attachment: HDFS-9253.002.patch

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch, HDFS-9253.001.patch, 
> HDFS-9253.002.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-10-15 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960162#comment-14960162
 ] 

Zhe Zhang commented on HDFS-9173:
-

bq. #2. about syncBlockFinalized, syncBlockUnfinalized.
bq. Unfortunately, The logic isn't quite the same between contiguous and 
striped. For example,
I meant that {{syncBlockFinalized}} is only for contiguous blocks (in the code 
snippet it only appears in {{RecoveryTaskContiguous}}). 

The logic of syncing unfinalized replicas in {{RecoveryTaskContiguous}} (RWR, 
RBW) is very similar to syncing striped internal blocks. So ideally 
{{RecoveryTaskContiguous}} and {{RecoveryTaskStriped}} should share a 
{{syncBlockUnfinalized}} method (it appears in both {{RecoveryTaskContiguous}} 
and {{RecoveryTaskStriped}} in the code snippet).

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9252) Change TestFileTruncate to FsDatasetTestUtils to get block file size and genstamp.

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960161#comment-14960161
 ] 

Hadoop QA commented on HDFS-9252:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m 21s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |  10m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  13m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 31s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 58s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 57s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 44s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   4m 10s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  62m 20s | Tests failed in hadoop-hdfs. |
| | | 123m 52s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.util.TestByteArrayManager |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766899/HDFS-9252.00.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf23f2c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13021/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13021/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13021/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13021/console |


This message was automatically generated.

> Change TestFileTruncate to FsDatasetTestUtils to get block file size and 
> genstamp.
> --
>
> Key: HDFS-9252
> URL: https://issues.apache.org/jira/browse/HDFS-9252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9252.00.patch
>
>
> {{TestFileTruncate}} verifies block size and genstamp by directly accessing 
> the  local filesystem, e.g.:
> {code}
> assertTrue(cluster.getBlockMetadataFile(dn0,
>newBlock.getBlock()).getName().endsWith(
>newBlock.getBlock().getGenerationStamp() + ".meta"));
> {code}
> Lets abstract the fsdataset-special logic behind FsDatasetTestUtils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9184:

Status: Open  (was: Patch Available)

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information 
> across multiple layers. The span is created in many places interconnected 
> like a tree structure which relies on offline analysis across RPC boundary. 
> For this use case, {{htrace}} has to be enabled at 100% sampling rate which 
> introduces significant overhead. Moreover, passing additional information 
> (via annotations) other than span id from root of the tree to leaf is a 
> significant additional work.
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there 
> are some related discussion on this topic. The final patch implemented the 
> tracking id as a part of delegation token. This protects the tracking 
> information from being changed or impersonated. However, kerberos 
> authenticated connections or insecure connections don't have tokens. 
> [HADOOP-8779] proposes to use tokens in all the scenarios, but that might 
> mean changes to several upstream projects and is a major change in their 
> security implementation.
> We propose another approach to address this problem. We also treat HDFS audit 
> log as a good place for after-the-fact root cause analysis. We propose to put 
> the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
> the threadlocal object is passed to NN as a part of RPC header (optional), 
> while on sever side NN retrieves it from header and put it to {{Handler}}'s 
> threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
> caller context for each operation. In this way, the existing code is not 
> affected.
> It is still challenging to keep "lying" client from abusing the caller 
> context. Our proposal is to add a {{signature}} field to the caller context. 
> The client choose to provide its signature along with the caller id. The 
> operator may need to validate the signature at the time of offline analysis. 
> The NN is not responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9184:

Status: Patch Available  (was: Open)

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information 
> across multiple layers. The span is created in many places interconnected 
> like a tree structure which relies on offline analysis across RPC boundary. 
> For this use case, {{htrace}} has to be enabled at 100% sampling rate which 
> introduces significant overhead. Moreover, passing additional information 
> (via annotations) other than span id from root of the tree to leaf is a 
> significant additional work.
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there 
> are some related discussion on this topic. The final patch implemented the 
> tracking id as a part of delegation token. This protects the tracking 
> information from being changed or impersonated. However, kerberos 
> authenticated connections or insecure connections don't have tokens. 
> [HADOOP-8779] proposes to use tokens in all the scenarios, but that might 
> mean changes to several upstream projects and is a major change in their 
> security implementation.
> We propose another approach to address this problem. We also treat HDFS audit 
> log as a good place for after-the-fact root cause analysis. We propose to put 
> the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
> the threadlocal object is passed to NN as a part of RPC header (optional), 
> while on sever side NN retrieves it from header and put it to {{Handler}}'s 
> threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
> caller context for each operation. In this way, the existing code is not 
> affected.
> It is still challenging to keep "lying" client from abusing the caller 
> context. Our proposal is to add a {{signature}} field to the caller context. 
> The client choose to provide its signature along with the caller id. The 
> operator may need to validate the signature at the time of offline analysis. 
> The NN is not responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7964) Add support for async edit logging

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960144#comment-14960144
 ] 

Hadoop QA commented on HDFS-7964:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 28s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 11 new or modified test files. |
| {color:red}-1{color} | javac |   1m 55s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766854/HDFS-7964.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / cf23f2c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13022/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13022/console |


This message was automatically generated.

> Add support for async edit logging
> --
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8425) [umbrella] Performance tuning, investigation and optimization for erasure coding

2015-10-15 Thread Takuya Fukudome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Fukudome updated HDFS-8425:
--
Attachment: testdfsio-read-mbsec.png
testdfsio-write-mbsec.png

Hi. I have ran TestDFSIO on both normal directory and the 
directory with EC policy set.

I attached two charts which respectively show write and 
read throughput(mb/sec) of both replicaions files and EC files.
And the throughputs are calculated by dividing the total 
bytes of TestDFSIO's data by the total elapsed time.

In summary, writing EC files is better than writing 
replication files at throughput. And reading EC files is the 
same performance as reading replication files. 
Though DataNodes' average CPU usage of writing EC files raised 5.5% comparing 
to writing replication files(from 9.8% to 15.3%).


The specification of our test cluster is bellow

|| Number of DataNodes | 20 |

server info:
|| CPU | Xeon E5-2630L 2.00GHz/2CPU |
|| RAM | 64GB |
|| Disk | SATA 300 |

Our test cluster was build with trunk codes. Its commit 
revision id is r30e2f836a26490a24c7ddea754dd19f95b24bbc8.

Those are initial performance test result, we are still working
on further test. Please let me know if the initial test result
make sense to you. Any advise is welcome! Thank you.

> [umbrella] Performance tuning, investigation and optimization for erasure 
> coding
> 
>
> Key: HDFS-8425
> URL: https://issues.apache.org/jira/browse/HDFS-8425
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: GAO Rui
> Attachments: testClientWriteReadFile_v1.pdf, 
> testdfsio-read-mbsec.png, testdfsio-write-mbsec.png
>
>
> This {{umbrella}} jira aims to track performance tuning, investigation and 
> optimization for erasure coding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-10-15 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960115#comment-14960115
 ] 

Walter Su commented on HDFS-9173:
-

bq. #2. about {{syncBlockFinalized}}, {{syncBlockUnfinalized}}.
Unfortunately, The logic isn't quite the same between contiguous and striped. 
For example,
{noformat}
blk_0  blk_1  blk_2  blk_3  blk_4  blk_5  blk_6  blk_7  blk_8
 64k64k64k64k64k64k64k64k64k
 64k____________64k64k64k64k
{noformat}
blk_0, blk5~8 are finalized, blk1~4 are RBW. In this case, the last cell of 
blk_5 is garbage, should be truncated. Because the data isn't sequential in 
last stripe and can't decode. So I have to keep blk_0, truncate blk_5, and 
re-encode blk_6~8 (I'm talking about last cells). (It's done in step 4) So I 
don't care too much whether it's finalized or not.

For contiguous blocks, If it finds a finalized, it keeps the finalized.

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7964) Add support for async edit logging

2015-10-15 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7964:
-
Status: Patch Available  (was: Open)

> Add support for async edit logging
> --
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7964) Add support for async edit logging

2015-10-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960114#comment-14960114
 ] 

Yi Liu commented on HDFS-7964:
--

Thanks [~daryn] for the work.

Further comments:

*1.* In FSEditLogAsync#run
{code}
@Override
  public void run() {
try {
  while (true) {

if (doSync) {
  ...
logSync(getLastWrittenTxId());
  ...
{code}
I think it's better to pass the txid of current edit to {{logSync}}, not need 
to wait for all txid written. Then it's more efficient and client can get more 
faster response? 

*2.*
{code}
+  editsBatchedInSync = txid - synctxid - 1;
{code}
Isn't it "txid - synctxid"?   The txid is the max txid written, and synctxid is 
the max txid already synced, suppose txid = 20, synctxid = 10, then the 
editsBatchedInSync should be (txid - synctxid) = (20 - 10) = 10.   Also you can 
get it from the existing log message:
{code}
final String msg =
"Could not sync enough journals to persistent storage " +
"due to " + e.getMessage() + ". " +
"Unsynced transactions: " + (txid - synctxid);
{code}

*3.*
{code}
-log4j.rootLogger=OFF, CONSOLE
+log4j.rootLogger=DEBUG, CONSOLE
{code}
Any reason to change it?

*4.*
{code}
call.abortResponse(syncEx);
{code}
Seems this code is not available?

> Add support for async edit logging
> --
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960100#comment-14960100
 ] 

Hadoop QA commented on HDFS-9253:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   8m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 26s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | native |   3m  9s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |   0m 35s | Tests failed in 
hadoop-hdfs-native-client. |
| | |  40m 57s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-hdfs-native-client |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766943/HDFS-9253.001.patch |
| Optional Tests | javadoc javac unit |
| git revision | trunk / cf23f2c |
| hadoop-hdfs-native-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13020/artifact/patchprocess/testrun_hadoop-hdfs-native-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13020/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13020/console |


This message was automatically generated.

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch, HDFS-9253.001.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-10-15 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960092#comment-14960092
 ] 

Walter Su commented on HDFS-9173:
-

bq. #3, In StripedRecoveryTask#recover we are calling callInitReplicaRecovery 
twice. Is the second call necessary?
It's about blocks state changes,
For RecoveryTaskContiguous, it's 3 RBW --> 3 RUR --> 3 Finalized
For RecoveryTaskStriped, it's 9 RBW --> 6 RUR + 3 RBW  --> 6 Finialzed + 3 RBW. 
 The total RPC calls is 9+6+6=21.
2nd option is 9 RBW --> 9 RUR  --> 9 Finalized. The total RPC calls is 9+9=18.
3rd option 9 RBW --> 9 RUR  --> 6 Finalized + 3 RUR. We left the 3 RUR 
unfinalized. The total RPC calls is 9+6=15. The RUR will be removed as long as 
the block completed (The 6 finalized must be reported). 

I still choose the 1st option. 
1. The most important reason is for step 3. A RUR can't be appended with more 
data. 
2. We need remove the dead/stale one. We must make sure we have 6 healthy RBW 
can be converted to RUR.  If not, we shouldn't convert them to RUR too early.
3. The 2nd option, It is it's unnecessary to update the 3 smallest RBW' length.
4. The 3d option, If the 6 Finalized isn't all reported. We can't start a new 
recovery.
5. I haven't thought how to address the issue of 2 recoveryTasks running at the 
same time. 1st option is a good start.

I must say reason #1 beats the rest. You can check 01 patch which include step 
3.

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8972) EINVAL Invalid argument when RAM_DISK usage 90%+

2015-10-15 Thread Xu Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Chen updated HDFS-8972:
--
Description: 
the directory  which uses LAZY_PERSIST policy , so use "df" command look up 
tmpfs is usage >=90% , run spark,hive or mapreduce application , Datanode come 
out  following exception 

{code}
2015-08-26 17:37:34,123 WARN org.apache.hadoop.io.ReadaheadPool: Failed 
readahead on null
EINVAL: Invalid argument
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
Method)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at 
org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}

And the application is slowly than without exception  25%

Regards

  was:
the directory  which is use LAZY_PERSIST policy , so use "df" command look up 
tmpfs is usage >=90% , run spark,hive or mapreduce application , Datanode come 
out  following exception 

{code}
2015-08-26 17:37:34,123 WARN org.apache.hadoop.io.ReadaheadPool: Failed 
readahead on null
EINVAL: Invalid argument
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
Method)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at 
org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}

And the application is slowly than without exception  25%

Regards


> EINVAL Invalid argument when RAM_DISK usage 90%+
> 
>
> Key: HDFS-8972
> URL: https://issues.apache.org/jira/browse/HDFS-8972
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xu Chen
>Assignee: Jagadesh Kiran N
>Priority: Critical
>
> the directory  which uses LAZY_PERSIST policy , so use "df" command look up 
> tmpfs is usage >=90% , run spark,hive or mapreduce application , Datanode 
> come out  following exception 
> {code}
> 2015-08-26 17:37:34,123 WARN org.apache.hadoop.io.ReadaheadPool: Failed 
> readahead on null
> EINVAL: Invalid argument
> at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
> at 
> org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> And the application is slowly than without exception  25%
> Regards



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9252) Change TestFileTruncate to FsDatasetTestUtils to get block file size and genstamp.

2015-10-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9252:

Status: Patch Available  (was: Open)

> Change TestFileTruncate to FsDatasetTestUtils to get block file size and 
> genstamp.
> --
>
> Key: HDFS-9252
> URL: https://issues.apache.org/jira/browse/HDFS-9252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9252.00.patch
>
>
> {{TestFileTruncate}} verifies block size and genstamp by directly accessing 
> the  local filesystem, e.g.:
> {code}
> assertTrue(cluster.getBlockMetadataFile(dn0,
>newBlock.getBlock()).getName().endsWith(
>newBlock.getBlock().getGenerationStamp() + ".meta"));
> {code}
> Lets abstract the fsdataset-special logic behind FsDatasetTestUtils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-10-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960050#comment-14960050
 ] 

Ming Ma commented on HDFS-8647:
---

[~brahmareddy], can you please check again if TestBalancer and maybe other test 
failures are related? Somehow these tests timed out with the patch. Thanks!

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, 
> HDFS-8647-005.patch, HDFS-8647-006.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty

2015-10-15 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960044#comment-14960044
 ] 

Xiao Chen commented on HDFS-9250:
-

Hi Andrew, thanks for the comments and additional information. I will further 
investigate it.

> LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
> ---
>
> Key: HDFS-9250
> URL: https://issues.apache.org/jira/browse/HDFS-9250
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9250.001.patch
>
>
> We may see the following exception:
> {noformat}
> java.lang.ArrayStoreException
> at java.util.ArrayList.toArray(ArrayList.java:389)
> at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> {noformat}
> The cause is that in LocatedBlock.java, when {{addCachedLoc}}:
> - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added 
> to {{cachedList}}
> - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type 
> {{DatanodeInfoWithStorage}}.
> Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of 
> {{DatanodeInfo}} but do not inherit from each other, resulting in the 
> ArrayStoreException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9254) HDFS Secure Mode Documentation updates

2015-10-15 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-9254:
---

 Summary: HDFS Secure Mode Documentation updates
 Key: HDFS-9254
 URL: https://issues.apache.org/jira/browse/HDFS-9254
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


Some Kerberos configuration parameters are not documented well enough. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9254) HDFS Secure Mode Documentation updates

2015-10-15 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9254 started by Arpit Agarwal.
---
> HDFS Secure Mode Documentation updates
> --
>
> Key: HDFS-9254
> URL: https://issues.apache.org/jira/browse/HDFS-9254
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> Some Kerberos configuration parameters are not documented well enough. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty

2015-10-15 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960039#comment-14960039
 ] 

Andrew Wang commented on HDFS-9250:
---

It may also be good to add a Precondition check somewhere in addCachedLoc so we 
can more easily debug this in the future, and as a form of documentation about 
this assumption.

> LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
> ---
>
> Key: HDFS-9250
> URL: https://issues.apache.org/jira/browse/HDFS-9250
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9250.001.patch
>
>
> We may see the following exception:
> {noformat}
> java.lang.ArrayStoreException
> at java.util.ArrayList.toArray(ArrayList.java:389)
> at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> {noformat}
> The cause is that in LocatedBlock.java, when {{addCachedLoc}}:
> - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added 
> to {{cachedList}}
> - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type 
> {{DatanodeInfoWithStorage}}.
> Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of 
> {{DatanodeInfo}} but do not inherit from each other, resulting in the 
> ArrayStoreException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty

2015-10-15 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960038#comment-14960038
 ] 

Andrew Wang commented on HDFS-9250:
---

Hi Xiao, thanks for working on this,

So one question about this, we're not supposed to add cached locations that do 
not also have a backing disk replica. So in your test case, {{dn}} would be 
present in locs already. If I edit your test case to do this, it passes without 
the change.

This is probably related to HDFS-8646 which I worked on before, we missed some 
places where cache state could get out of sync with replica state. I thought I 
added enough pruning to safeguard against this, but maybe I missed a place. 
Could you investigate?

> LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
> ---
>
> Key: HDFS-9250
> URL: https://issues.apache.org/jira/browse/HDFS-9250
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9250.001.patch
>
>
> We may see the following exception:
> {noformat}
> java.lang.ArrayStoreException
> at java.util.ArrayList.toArray(ArrayList.java:389)
> at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> {noformat}
> The cause is that in LocatedBlock.java, when {{addCachedLoc}}:
> - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added 
> to {{cachedList}}
> - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type 
> {{DatanodeInfoWithStorage}}.
> Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of 
> {{DatanodeInfo}} but do not inherit from each other, resulting in the 
> ArrayStoreException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960033#comment-14960033
 ] 

Hadoop QA commented on HDFS-9245:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  6s | Pre-patch trunk has 2 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 22s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings, and fixes 2 pre-existing warnings. |
| {color:green}+1{color} | native |   3m 40s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |   1m 53s | Tests passed in 
hadoop-hdfs-nfs. |
| | |  47m 54s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766930/HDFS-9245.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf23f2c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13019/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-nfs.html
 |
| hadoop-hdfs-nfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13019/artifact/patchprocess/testrun_hadoop-hdfs-nfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13019/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13019/console |


This message was automatically generated.

> Fix findbugs warnings in hdfs-nfs/WriteCtx
> --
>
> Key: HDFS-9245
> URL: https://issues.apache.org/jira/browse/HDFS-9245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9245.000.patch
>
>
> There are findbugs warnings as follows, brought by [HDFS-9092].
> It seems fine to ignore them by write a filter rule in the 
> {{findbugsExcludeFile.xml}} file. 
> {code:xml}
>  instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
> {code}
> and
> {code:xml}
>  instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
>  name="originalCount" primary="true" signature="I">
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java">
> In WriteCtx.java
> 
> 
> Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9231:

Status: Open  (was: Patch Available)

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch
>
>
> For snapshot files, fsck shows corrupt blocks with the original file dir 
> instead of the snapshot dir.
> This can be confusing since even when the original file is deleted, a new 
> fsck run will still show that file as corrupted although what's actually 
> corrupted is the snapshot. 
> This is true even when given the -includeSnapshots option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960018#comment-14960018
 ] 

Hadoop QA commented on HDFS-9129:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 50s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 27s | The applied patch generated  6 
new checkstyle issues (total was 626, now 577). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 38s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 19s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  49m 13s | Tests failed in hadoop-hdfs. |
| | |  96m 55s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | 
hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766923/HDFS-9129.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a121fa1 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13015/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13015/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13015/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13015/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13015/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13015/console |


This message was automatically generated.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9205) Do not schedule corrupt blocks for replication

2015-10-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9205:
--
Fix Version/s: (was: 3.0.0)
   2.8.0

Merged this to branch-2.

> Do not schedule corrupt blocks for replication
> --
>
> Key: HDFS-9205
> URL: https://issues.apache.org/jira/browse/HDFS-9205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: h9205_20151007.patch, h9205_20151007b.patch, 
> h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, 
> h9205_20151013.patch, h9205_20151015.patch
>
>
> Corrupted blocks by definition are blocks cannot be read. As a consequence, 
> they cannot be replicated.  In UnderReplicatedBlocks, there is a queue for 
> QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks 
> from it.  It seems that scheduling corrupted block for replication is wasting 
> resource and potentially slow down replication for the higher priority blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3059) ssl-server.xml causes NullPointer

2015-10-15 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960014#comment-14960014
 ] 

Xiao Chen commented on HDFS-3059:
-

Thanks Yongjun for the comments. Attached patch 06 addressed your suggestions.
{quote}
3. Would you please explain why the following comments? maybe add the 
explanation as an addition to the comment.
{quote}
This is added because I met the same NPE described when running 
secondarynamenode (2NN). Running a command like {{hdfs secondarynamenode 
-checkpoint}} with kerberos enabled will fail with the same NPE thrown. The 
cause is that 2NN web server is needed when starting as a daemon, to show 
status/metrics etc., which needs to get credentials. When running from shell, 
the environment doesn't have the credentials and prompts for password. When the 
password is not correct, {{getPassword}} sets the password to null, causing the 
NPE. Note that clients are't supposed to know the password, but we should 
definitely allow them to checkpoint. Since the metrics etc. are not needed when 
running 2NN from shell, I think it makes sense to not start the web server at 
all.
I have updated the comments like below, to give more information.
{code}
  // The web server is only needed when starting SNN as a daemon,
  // and not needed if called from shell command. Starting the web server
  // from shell may fail when getting credentials, if the environment is not
  // set up for it, which is most of the case.
{code}

> ssl-server.xml causes NullPointer
> -
>
> Key: HDFS-3059
> URL: https://issues.apache.org/jira/browse/HDFS-3059
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 2.7.1
> Environment: in core-site.xml:
> {code:xml}
>   
> hadoop.security.authentication
> kerberos
>   
>   
> hadoop.security.authorization
> true
>   
> {code}
> in hdfs-site.xml:
> {code:xml}
>   
> dfs.https.server.keystore.resource
> /etc/hadoop/conf/ssl-server.xml
>   
>   
> dfs.https.enable
> true
>   
>   
> ...other security props
>   
> {code}
>Reporter: Evert Lammerts
>Assignee: Xiao Chen
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3059.02.patch, HDFS-3059.03.patch, 
> HDFS-3059.04.patch, HDFS-3059.05.patch, HDFS-3059.06.patch, HDFS-3059.patch, 
> HDFS-3059.patch.2
>
>
> If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a 
> DN will crash during startup while setting up an SSL socket with a 
> NullPointerException:
> {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: 
> useKerb = false, useCerts = true
> jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO 
> mortbay.log: jetty-6.1.26.cloudera.1
> 12/03/07 17:08:36 INFO mortbay.log: Started 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new 
> KrbServerSocket for: 0.0.0.0
> 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: 
> !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: 
> java.io.IOException: !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, 
> active threads is 0{noformat}
> The same happens if I set an absolute path to an existing 
> dfs.https.server.keystore.resource - in this case the file cannot be found 
> but not even a WARN is given.
> Since in dfs.https.server.keystore.resource we know we need to have 4 
> properties specified (ssl.server.truststore.location, 
> ssl.server.keystore.location, ssl.server.keystore.password, and 
> ssl.server.keystore.keypassword) we should check if they are set and throw an 
> IOException if they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9251) Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly creating Files in tests code.

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960011#comment-14960011
 ] 

Hadoop QA commented on HDFS-9251:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   7m 53s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   9m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 34s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 54s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m 13s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |  55m 13s | Tests passed in hadoop-hdfs. 
|
| | |  80m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766916/HDFS-9251.01.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / cf23f2c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13016/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13016/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13016/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13016/console |


This message was automatically generated.

> Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly 
> creating Files in tests code.
> ---
>
> Key: HDFS-9251
> URL: https://issues.apache.org/jira/browse/HDFS-9251
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9251.00.patch, HDFS-9251.01.patch
>
>
> In {{TestWriteToReplica}} and {{TestFsDatasetImpl}}, tests directly creates 
> block and metadata files:
> {code}
> replicaInfo.getBlockFile().createNewFile();
> replicaInfo.getMetaFile().createNewFile();
> {code}
> It leaks the implementation details of {{FsDatasetImpl}}. This JIRA proposes 
> to use {{FsDatasetImplTestUtils}} (HDFS-9188) to create replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-10-15 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960004#comment-14960004
 ] 

Wei-Chiu Chuang commented on HDFS-9249:
---

This is more of a supportability improvement patch. The Findbugs warning is 
unrelated. It's supportability improvement therefore I deem there is no need 
for a test case. I can not reproduce the failed test, and it looks unrelated.

> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9249.001.patch
>
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
> syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, 
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-0

[jira] [Updated] (HDFS-3059) ssl-server.xml causes NullPointer

2015-10-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-3059:

Attachment: HDFS-3059.06.patch

> ssl-server.xml causes NullPointer
> -
>
> Key: HDFS-3059
> URL: https://issues.apache.org/jira/browse/HDFS-3059
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 2.7.1
> Environment: in core-site.xml:
> {code:xml}
>   
> hadoop.security.authentication
> kerberos
>   
>   
> hadoop.security.authorization
> true
>   
> {code}
> in hdfs-site.xml:
> {code:xml}
>   
> dfs.https.server.keystore.resource
> /etc/hadoop/conf/ssl-server.xml
>   
>   
> dfs.https.enable
> true
>   
>   
> ...other security props
>   
> {code}
>Reporter: Evert Lammerts
>Assignee: Xiao Chen
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3059.02.patch, HDFS-3059.03.patch, 
> HDFS-3059.04.patch, HDFS-3059.05.patch, HDFS-3059.06.patch, HDFS-3059.patch, 
> HDFS-3059.patch.2
>
>
> If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a 
> DN will crash during startup while setting up an SSL socket with a 
> NullPointerException:
> {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: 
> useKerb = false, useCerts = true
> jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO 
> mortbay.log: jetty-6.1.26.cloudera.1
> 12/03/07 17:08:36 INFO mortbay.log: Started 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new 
> KrbServerSocket for: 0.0.0.0
> 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: 
> !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: 
> java.io.IOException: !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, 
> active threads is 0{noformat}
> The same happens if I set an absolute path to an existing 
> dfs.https.server.keystore.resource - in this case the file cannot be found 
> but not even a WARN is given.
> Since in dfs.https.server.keystore.resource we know we need to have 4 
> properties specified (ssl.server.truststore.location, 
> ssl.server.keystore.location, ssl.server.keystore.password, and 
> ssl.server.keystore.keypassword) we should check if they are set and throw an 
> IOException if they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3059) ssl-server.xml causes NullPointer

2015-10-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-3059:

Status: Open  (was: Patch Available)

> ssl-server.xml causes NullPointer
> -
>
> Key: HDFS-3059
> URL: https://issues.apache.org/jira/browse/HDFS-3059
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 2.7.1
> Environment: in core-site.xml:
> {code:xml}
>   
> hadoop.security.authentication
> kerberos
>   
>   
> hadoop.security.authorization
> true
>   
> {code}
> in hdfs-site.xml:
> {code:xml}
>   
> dfs.https.server.keystore.resource
> /etc/hadoop/conf/ssl-server.xml
>   
>   
> dfs.https.enable
> true
>   
>   
> ...other security props
>   
> {code}
>Reporter: Evert Lammerts
>Assignee: Xiao Chen
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3059.02.patch, HDFS-3059.03.patch, 
> HDFS-3059.04.patch, HDFS-3059.05.patch, HDFS-3059.06.patch, HDFS-3059.patch, 
> HDFS-3059.patch.2
>
>
> If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a 
> DN will crash during startup while setting up an SSL socket with a 
> NullPointerException:
> {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: 
> useKerb = false, useCerts = true
> jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO 
> mortbay.log: jetty-6.1.26.cloudera.1
> 12/03/07 17:08:36 INFO mortbay.log: Started 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new 
> KrbServerSocket for: 0.0.0.0
> 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: 
> !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: 
> java.io.IOException: !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, 
> active threads is 0{noformat}
> The same happens if I set an absolute path to an existing 
> dfs.https.server.keystore.resource - in this case the file cannot be found 
> but not even a WARN is given.
> Since in dfs.https.server.keystore.resource we know we need to have 4 
> properties specified (ssl.server.truststore.location, 
> ssl.server.keystore.location, ssl.server.keystore.password, and 
> ssl.server.keystore.keypassword) we should check if they are set and throw an 
> IOException if they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9253:
-
Attachment: HDFS-9253.001.patch

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch, HDFS-9253.001.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959988#comment-14959988
 ] 

Yi Liu commented on HDFS-9053:
--

Hi Nicholas, sorry, I may bypass some description here.  For 2047, I wanted to 
say it just an example threshold of small elements size. Currently the small 
elements size is an assumption value, actually we can set the degree of BTree 
as any value we want to.  If we want 4K as the threshold of small elements 
size, we can set the degree of B-Tree to 2K, then max degree is (4K - 1).   (I 
should make the description clear..)
Thanks.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959972#comment-14959972
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

> For small elements size (assume # < max degree which is 2047), ... Do I miss 
> something?

According to [your 
comment|https://issues.apache.org/jira/browse/HDFS-9053?focusedCommentId=14950498&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14950498]
 (also copied below), you were saying that B-Tree only increased 8 bytes when 
#children < 4K, i.e. when 2047 < #children < 4K.  Is it still true?  If not, 
how much memory is needed when 2047 < #children < 4K?
{quote}
I find a good approach to improve B-Tree memory overhead to make it only 
increase 8 bytes memory usage comparing with using ArrayList for small elements 
size.
So we don't need to use ArrayList when #children is small (< 4K), and we can 
always use the BTree.
{quote}

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959955#comment-14959955
 ] 

Yi Liu edited comment on HDFS-9053 at 10/16/15 12:56 AM:
-

Thanks [~szetszwo] for the comments.

{quote}
>> 24 8 Object[] Node.elements  N/A
>> 32 8 Object[] Node.children  N/A
It only counts the reference but array objects are not counted. So the BTree 
overhead is still a lot more than ArrayList.
{quote}

For small elements size (assume # < max degree which is 2047), the {{children}} 
is null reference, so there is no array object of {{children}} here, just 8 
bytes for null reference. And for {{elements}}, {{ArrayList}} also has it. So 
as described above, {{BTree}} increases 8 bytes compared with {{ArrayList}} for 
small size elements.  Do I miss something?


was (Author: hitliuyi):
Thanks [~szetszwo] for the comments.

{quote}
>> 24 8 Object[] Node.elements  N/A
>> 32 8 Object[] Node.children  N/A
It only counts the reference but array objects are not counted. So the BTree 
overhead is still a lot more than ArrayList.
{quote}

For small elements size (assume # < max degree which is 2047), the {{children}} 
is null reference, so there is no array object of {{children}} here,  and for 
{{elements}}, {{ArrayList}} also has it. So as described above, {{BTree}} 
increases 8 bytes compared with {{ArrayList}} for small size elements.  Do I 
miss something?

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-10-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959955#comment-14959955
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks [~szetszwo] for the comments.

{quote}
>> 24 8 Object[] Node.elements  N/A
>> 32 8 Object[] Node.children  N/A
It only counts the reference but array objects are not counted. So the BTree 
overhead is still a lot more than ArrayList.
{quote}

For small elements size (assume # < max degree which is 2047), the {{children}} 
is null reference, so there is no array object of {{children}} here,  and for 
{{elements}}, {{ArrayList}} also has it. So as described above, {{BTree}} 
increases 8 bytes compared with {{ArrayList}} for small size elements.  Do I 
miss something?

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959950#comment-14959950
 ] 

Hadoop QA commented on HDFS-9253:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 52s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:red}-1{color} | javac |   2m 22s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766917/HDFS-9253.000.patch |
| Optional Tests | javadoc javac unit |
| git revision | trunk / cf23f2c |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13017/console |


This message was automatically generated.

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959929#comment-14959929
 ] 

Hadoop QA commented on HDFS-9184:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  35m 12s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |  15m 18s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  22m 22s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   1m 11s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 24s | The applied patch generated  3 
new checkstyle issues (total was 403, now 405). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   3m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   1m 13s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   8m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  18m 12s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  62m  1s | Tests failed in hadoop-hdfs. |
| | | 171m 10s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.shell.find.TestIname |
|   | hadoop.fs.shell.find.TestFind |
|   | hadoop.ipc.TestIPC |
|   | hadoop.security.token.delegation.TestZKDelegationTokenSecretManager |
|   | hadoop.fs.shell.find.TestPrint0 |
|   | hadoop.fs.shell.find.TestPrint |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
| Timed out tests | 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | org.apache.hadoop.hdfs.TestConnCache |
|   | org.apache.hadoop.hdfs.TestSetrepDecreasing |
|   | org.apache.hadoop.hdfs.TestEncryptedTransfer |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766871/HDFS-9184.007.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13012/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13012/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13012/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13012/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13012/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13012/console |


This message was automatically generated.

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support fo

[jira] [Updated] (HDFS-9214) Reconfigure DN concurrent move on the fly

2015-10-15 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-9214:

Attachment: HDFS-9214.002.patch

Patch 002 with more tests added.

> Reconfigure DN concurrent move on the fly
> -
>
> Key: HDFS-9214
> URL: https://issues.apache.org/jira/browse/HDFS-9214
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9214.001.patch, HDFS-9214.002.patch
>
>
> This is to reconfigure
> {code}
> dfs.datanode.balance.max.concurrent.moves
> {code} without restarting DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9245:

Attachment: HDFS-9245.000.patch

Hi [~yzhangal], please review the patch v0. Thanks.

> Fix findbugs warnings in hdfs-nfs/WriteCtx
> --
>
> Key: HDFS-9245
> URL: https://issues.apache.org/jira/browse/HDFS-9245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9245.000.patch
>
>
> There are findbugs warnings as follows, brought by [HDFS-9092].
> It seems fine to ignore them by write a filter rule in the 
> {{findbugsExcludeFile.xml}} file. 
> {code:xml}
>  instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
> {code}
> and
> {code:xml}
>  instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
>  name="originalCount" primary="true" signature="I">
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java">
> In WriteCtx.java
> 
> 
> Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9245:

Status: Patch Available  (was: Open)

> Fix findbugs warnings in hdfs-nfs/WriteCtx
> --
>
> Key: HDFS-9245
> URL: https://issues.apache.org/jira/browse/HDFS-9245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9245.000.patch
>
>
> There are findbugs warnings as follows, brought by [HDFS-9092].
> It seems fine to ignore them by write a filter rule in the 
> {{findbugsExcludeFile.xml}} file. 
> {code:xml}
>  instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
> {code}
> and
> {code:xml}
>  instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
>  name="originalCount" primary="true" signature="I">
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java">
> In WriteCtx.java
> 
> 
> Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9220) Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum

2015-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959884#comment-14959884
 ] 

Hudson commented on HDFS-9220:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2440 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2440/])
HDFS-9220. Reading small file (< 512 bytes) that is open for append (kihwal: 
rev c7c36cbd6218f46c33d7fb2f60cd52cb29e6d720)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Reading small file (< 512 bytes) that is open for append fails due to 
> incorrect checksum
> 
>
> Key: HDFS-9220
> URL: https://issues.apache.org/jira/browse/HDFS-9220
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-9220.000.patch, HDFS-9220.001.patch, 
> HDFS-9220.002.patch, test2.java
>
>
> Exception:
> 2015-10-09 14:59:40 WARN  DFSClient:1150 - fetchBlockByteRange(). Got a 
> checksum exception for /tmp/file0.05355529331575182 at 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882:0 from 
> DatanodeInfoWithStorage[10.10.10.10]:5001
> All 3 replicas cause this exception and the read fails entirely with:
> BlockMissingException: Could not obtain block: 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882 
> file=/tmp/file0.05355529331575182
> Code to reproduce is attached.
> Does not happen in 2.7.0.
> Data is read correctly if checksum verification is disabled.
> More generally, the failure happens when reading from the last block of a 
> file and the last block has <= 512 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-10-15 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959859#comment-14959859
 ] 

Zhe Zhang commented on HDFS-9173:
-

Thanks Walter for the updates. Some additional comments:

# In the current patch {{StripedRecoveryTask}} only shares a few lines of 
trivial code with {{RecoveryTask}}. It's a little odd to subclass it.
# Fundamentally the 2 {{RecoveryTask}} types do share a lot of logics. They 
both go through steps 1, 2, 5 as described in the Javadoc. So here's an 
alternative way to structure the 2 classes:
{code}
public class RecoveryTaskContiguous {
  protected void recover() {
// Step 1.1: callInitReplicaRecovery and get all blocks lengths from 
DataNodes, generate list of BlockRecord
// Step 1.2: check if there's any FINALIZED replica
  }
  void syncBlockFinalized(List syncList) {
  }
  void syncBlockUnfinalized(List syncList) {
  }
}

public class RecoveryTaskStriped {
  protected void recover() {
// Step 1.1: callInitReplicaRecovery and get all blocks lengths from 
DataNodes, generate list of BlockRecord
  }
  void syncBlockUnfinalized(List syncList) {
  }
}
{code}
Step 1.1 is identical in both classes so we can use a shared static method to 
do it. The logic of synchronizing striped internal blocks is very similar to 
handling {{RBW}} and {{RWR}} replicas. We can use a shared 
{{syncBlockUnfinalized}} method. This suggested consolidation can be done 
separately.
# In {{StripedRecoveryTask#recover}} we are calling {{callInitReplicaRecovery}} 
twice. Is the second call necessary?
# {{StripedRecoveryTask#ecPolicy}} is unnecessary.

Nits:
# We can take this chance to fix the {{if}} statements without parenthesis
# Can also update ""Convenience"" / "convenient" to "helper" / "util" when 
describing classes and methods
# Steps 3 and 4 in the Javadoc should be marked as TODO or removed for now

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.004.patch

The v4 patch is to address the failing tests and further refactoring is 
possible. Please hold on before reviewing this.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness

2015-10-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959826#comment-14959826
 ] 

Daryn Sharp commented on HDFS-9239:
---

It seems like a good idea at first, but I don't think the proposal solves the 
stated issues:
* This  prevents the NameNode from spuriously marking healthy 
DataNodes as stale or dead.
* ... delayed DataNodes may be flagged as stale, and applications may 
erroneously choose to avoid accessing those nodes
* ... DataNodes may be flagged as dead. In extreme cases, this can cause a 
NameNode to schedule wasteful re­replication activity.

Let's say the NN can't service heartbeats to avoid false-staleness (stale 
defaults to 30s). That means it definitely can't process IBRs either. Would a 
lifeline to prevent the stale flag matter at this point?  At this level of 
congestion, nearly all of the nodes are going stale.  The staleness is probably 
the least of your worries.  If nodes are marked dead from inability to keep up 
with heartbeats (defaults to ~10min), the cluster itself is already.  Worrying 
about wasted replications is dubious because the NN can't issue replications if 
it can't process the heartbeats.

That is not a heavy load scenario.  From personal experience, it sounds like 
the fallout of a 120GB+ heap stop-the-world GC.  The NN wakes up, heartbeat 
monitor starts marking everything dead.  This sparks a replication storm, 
followed by invalidation storm, which the NN recovers from... unless it goes 
into another full GC.  The lifeline might help slow the rise of false-dead 
nodes.  However, I recently patched the heartbeat monitor to detect long GCs 
and be very gracious before marking nodes dead.

If I've misinterpreted anything, please describe the incident that prompted 
this approach so we can see if it would have helped.

> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode 
> liveness
> ---
>
> Key: HDFS-9239
> URL: https://issues.apache.org/jira/browse/HDFS-9239
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: DataNode-Lifeline-Protocol.pdf
>
>
> This issue proposes introduction of a new feature: the DataNode Lifeline 
> Protocol.  This is an RPC protocol that is responsible for reporting liveness 
> and basic health information about a DataNode to a NameNode.  Compared to the 
> existing heartbeat messages, it is lightweight and not prone to resource 
> contention problems that can harm accurate tracking of DataNode liveness 
> currently.  The attached design document contains more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959819#comment-14959819
 ] 

Haohui Mai commented on HDFS-9253:
--

The v0 patch moves the tests of libhdfs into the {{libhdfs-tests}} directory. 
It also moves {{hdfs.h}} (which is the only public include file) to 
{{libhdfs/include/hdfs}} so that other modules can only put {{hdfs.h}} into its 
include path.

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9253:
-
Attachment: HDFS-9253.000.patch

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9253:
-
Status: Patch Available  (was: Open)

> Refactor tests of libhdfs into a directory
> --
>
> Key: HDFS-9253
> URL: https://issues.apache.org/jira/browse/HDFS-9253
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9253.000.patch
>
>
> This jira proposes to refactor the current tests in libhdfs into a separate 
> directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
> libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
> different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9253) Refactor tests of libhdfs into a directory

2015-10-15 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-9253:


 Summary: Refactor tests of libhdfs into a directory
 Key: HDFS-9253
 URL: https://issues.apache.org/jira/browse/HDFS-9253
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


This jira proposes to refactor the current tests in libhdfs into a separate 
directory. The refactor opens up the opportunity to reuse tests in libhdfs, 
libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of 
different implementation of the libhdfs API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9251) Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly creating Files in tests code.

2015-10-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9251:

Attachment: HDFS-9251.01.patch

Address whitespace warnings.

The findbugs warnings and test failures are not relevant. 

> Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly 
> creating Files in tests code.
> ---
>
> Key: HDFS-9251
> URL: https://issues.apache.org/jira/browse/HDFS-9251
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9251.00.patch, HDFS-9251.01.patch
>
>
> In {{TestWriteToReplica}} and {{TestFsDatasetImpl}}, tests directly creates 
> block and metadata files:
> {code}
> replicaInfo.getBlockFile().createNewFile();
> replicaInfo.getMetaFile().createNewFile();
> {code}
> It leaks the implementation details of {{FsDatasetImpl}}. This JIRA proposes 
> to use {{FsDatasetImplTestUtils}} (HDFS-9188) to create replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8831) Trash Support for files in HDFS encryption zone

2015-10-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959792#comment-14959792
 ] 

Mingliang Liu commented on HDFS-8831:
-

Thanks for working on this. The design doc is very helpful.

> Trash Support for files in HDFS encryption zone
> ---
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8831-10152015.pdf
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of 
> the file being deleted appropriately to the same encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9251) Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly creating Files in tests code.

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959786#comment-14959786
 ] 

Hadoop QA commented on HDFS-9251:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   7m 58s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 25s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m  2s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  50m 16s | Tests failed in hadoop-hdfs. |
| | |  73m 28s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766872/HDFS-9251.00.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13011/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13011/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13011/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13011/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13011/console |


This message was automatically generated.

> Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly 
> creating Files in tests code.
> ---
>
> Key: HDFS-9251
> URL: https://issues.apache.org/jira/browse/HDFS-9251
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9251.00.patch
>
>
> In {{TestWriteToReplica}} and {{TestFsDatasetImpl}}, tests directly creates 
> block and metadata files:
> {code}
> replicaInfo.getBlockFile().createNewFile();
> replicaInfo.getMetaFile().createNewFile();
> {code}
> It leaks the implementation details of {{FsDatasetImpl}}. This JIRA proposes 
> to use {{FsDatasetImplTestUtils}} (HDFS-9188) to create replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8766) Implement a libhdfs(3) compatible API

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959780#comment-14959780
 ] 

Hadoop QA commented on HDFS-8766:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766903/HDFS-8766.HDFS-8707.006.patch
 |
| Optional Tests | javac unit |
| git revision | HDFS-8707 / 4cd3b99 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13014/console |


This message was automatically generated.

> Implement a libhdfs(3) compatible API
> -
>
> Key: HDFS-8766
> URL: https://issues.apache.org/jira/browse/HDFS-8766
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-8766.HDFS-8707.000.patch, 
> HDFS-8766.HDFS-8707.001.patch, HDFS-8766.HDFS-8707.002.patch, 
> HDFS-8766.HDFS-8707.003.patch, HDFS-8766.HDFS-8707.004.patch, 
> HDFS-8766.HDFS-8707.005.patch, HDFS-8766.HDFS-8707.006.patch
>
>
> Add a synchronous API that is compatible with the hdfs.h header used in 
> libhdfs and libhdfs3.  This will make it possible for projects using 
> libhdfs/libhdfs3 to relink against libhdfspp with minimal changes.
> This also provides a pure C interface that can be linked against projects 
> that aren't built in C++11 mode for various reasons but use the same 
> compiler.  It also allows many other programming languages to access 
> libhdfspp through builtin FFI interfaces.
> The libhdfs API is very similar to the posix file API which makes it easier 
> for programs built using posix filesystem calls to be modified to access HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8831) Trash Support for files in HDFS encryption zone

2015-10-15 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8831:
-
Attachment: HDFS-8831-10152015.pdf

Attach a design document for review and discussion. 

> Trash Support for files in HDFS encryption zone
> ---
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8831-10152015.pdf
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of 
> the file being deleted appropriately to the same encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8766) Implement a libhdfs(3) compatible API

2015-10-15 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-8766:
--
Attachment: HDFS-8766.HDFS-8707.006.patch

Stripped stuff out.

> Implement a libhdfs(3) compatible API
> -
>
> Key: HDFS-8766
> URL: https://issues.apache.org/jira/browse/HDFS-8766
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-8766.HDFS-8707.000.patch, 
> HDFS-8766.HDFS-8707.001.patch, HDFS-8766.HDFS-8707.002.patch, 
> HDFS-8766.HDFS-8707.003.patch, HDFS-8766.HDFS-8707.004.patch, 
> HDFS-8766.HDFS-8707.005.patch, HDFS-8766.HDFS-8707.006.patch
>
>
> Add a synchronous API that is compatible with the hdfs.h header used in 
> libhdfs and libhdfs3.  This will make it possible for projects using 
> libhdfs/libhdfs3 to relink against libhdfspp with minimal changes.
> This also provides a pure C interface that can be linked against projects 
> that aren't built in C++11 mode for various reasons but use the same 
> compiler.  It also allows many other programming languages to access 
> libhdfspp through builtin FFI interfaces.
> The libhdfs API is very similar to the posix file API which makes it easier 
> for programs built using posix filesystem calls to be modified to access HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959752#comment-14959752
 ] 

Hadoop QA commented on HDFS-9198:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766881/HDFS-9198-trunk.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13013/console |


This message was automatically generated.

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9252) Change TestFileTruncate to FsDatasetTestUtils to get block file size and genstamp.

2015-10-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9252:

Attachment: HDFS-9252.00.patch

Add {{getDataLenght()}} and {{getPersistentGenStamp()}} to 
{{FsDatasetTestutils}}.

> Change TestFileTruncate to FsDatasetTestUtils to get block file size and 
> genstamp.
> --
>
> Key: HDFS-9252
> URL: https://issues.apache.org/jira/browse/HDFS-9252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9252.00.patch
>
>
> {{TestFileTruncate}} verifies block size and genstamp by directly accessing 
> the  local filesystem, e.g.:
> {code}
> assertTrue(cluster.getBlockMetadataFile(dn0,
>newBlock.getBlock()).getName().endsWith(
>newBlock.getBlock().getGenerationStamp() + ".meta"));
> {code}
> Lets abstract the fsdataset-special logic behind FsDatasetTestUtils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9252) Change TestFileTruncate to FsDatasetTestUtils to get block file size and genstamp.

2015-10-15 Thread Lei (Eddy) Xu (JIRA)
Lei (Eddy) Xu created HDFS-9252:
---

 Summary: Change TestFileTruncate to FsDatasetTestUtils to get 
block file size and genstamp.
 Key: HDFS-9252
 URL: https://issues.apache.org/jira/browse/HDFS-9252
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu


{{TestFileTruncate}} verifies block size and genstamp by directly accessing the 
 local filesystem, e.g.:

{code}
assertTrue(cluster.getBlockMetadataFile(dn0,
   newBlock.getBlock()).getName().endsWith(
   newBlock.getBlock().getGenerationStamp() + ".meta"));
{code}

Lets abstract the fsdataset-special logic behind FsDatasetTestUtils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959726#comment-14959726
 ] 

Hadoop QA commented on HDFS-9079:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  22m 43s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  7s | The applied patch generated  
96 new checkstyle issues (total was 91, now 181). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 46s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m  5s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 37s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  68m 15s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 34s | Tests passed in 
hadoop-hdfs-client. |
| | | 126m 40s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-client |
| Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.TestSecureNameNode |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766857/HDFS-9079.03.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13010/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13010/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13010/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-client.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13010/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13010/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13010/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13010/console |


This message was automatically generated.

> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> 
>
> Key: HDFS-9079
> URL: https://issues.apache.org/jira/browse/HDFS-9079
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch, 
> HDFS-9079.02.patch, HDFS-9079.03.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped 
> block group ({{FSN#createNewBlock}}). For each new striped block group we can 
> reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can 
> be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we 
> shouldn't try to further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959715#comment-14959715
 ] 

Hadoop QA commented on HDFS-9249:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  22m 50s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 32s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  1s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   4m 10s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  62m  8s | Tests failed in hadoop-hdfs. |
| | | 116m 59s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766847/HDFS-9249.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13009/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13009/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13009/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13009/console |


This message was automatically generated.

> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9249.001.patch
>
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNo

[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959676#comment-14959676
 ] 

Hadoop QA commented on HDFS-9236:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 43s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 28s | The applied patch generated  2 
new checkstyle issues (total was 142, now 142). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |  49m 37s | Tests passed in hadoop-hdfs. 
|
| | |  96m 54s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766845/HDFS-9236.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13008/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13008/console |


This message was automatically generated.

> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,

[jira] [Updated] (HDFS-9198) Coalesce IBR processing in the NN

2015-10-15 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-9198:
--
Attachment: HDFS-9198-trunk.patch

Took care of minor findbugs warning, cleanup up most of the silly style stuff.  
Some of the complaints about metrics I don't think are valid due to the 
annotation magic that occurs.

Updated the tests to flush the block ops queue to prevent races.

Changed the queue offer/add to offer/put.

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9083) Replication violates block placement policy.

2015-10-15 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-9083:
---
Priority: Blocker  (was: Major)

> Replication violates block placement policy.
> 
>
> Key: HDFS-9083
> URL: https://issues.apache.org/jira/browse/HDFS-9083
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS, namenode
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
>
> Recently we are noticing many cases in which all the replica of the block are 
> residing on the same rack.
> During the block creation, the block placement policy was honored.
> But after node failure event in some specific manner, the block ends up in 
> such state.
> On investigating more I found out that BlockManager#blockHasEnoughRacks is 
> dependent on the config (net.topology.script.file.name)
> {noformat}
>  if (!this.shouldCheckForEnoughRacks) {
>   return true;
> }
> {noformat}
> We specify DNSToSwitchMapping implementation (our own custom implementation) 
> via net.topology.node.switch.mapping.impl and no longer use 
> net.topology.script.file.name config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9220) Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum

2015-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959577#comment-14959577
 ] 

Hudson commented on HDFS-9220:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #502 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/502/])
HDFS-9220. Reading small file (< 512 bytes) that is open for append (kihwal: 
rev c7c36cbd6218f46c33d7fb2f60cd52cb29e6d720)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java


> Reading small file (< 512 bytes) that is open for append fails due to 
> incorrect checksum
> 
>
> Key: HDFS-9220
> URL: https://issues.apache.org/jira/browse/HDFS-9220
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-9220.000.patch, HDFS-9220.001.patch, 
> HDFS-9220.002.patch, test2.java
>
>
> Exception:
> 2015-10-09 14:59:40 WARN  DFSClient:1150 - fetchBlockByteRange(). Got a 
> checksum exception for /tmp/file0.05355529331575182 at 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882:0 from 
> DatanodeInfoWithStorage[10.10.10.10]:5001
> All 3 replicas cause this exception and the read fails entirely with:
> BlockMissingException: Could not obtain block: 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882 
> file=/tmp/file0.05355529331575182
> Code to reproduce is attached.
> Does not happen in 2.7.0.
> Data is read correctly if checksum verification is disabled.
> More generally, the failure happens when reading from the last block of a 
> file and the last block has <= 512 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty

2015-10-15 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959574#comment-14959574
 ] 

Xiao Chen commented on HDFS-9250:
-

The Findbugs warning and test failure are not relevant.
Please review. Thanks.

> LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
> ---
>
> Key: HDFS-9250
> URL: https://issues.apache.org/jira/browse/HDFS-9250
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9250.001.patch
>
>
> We may see the following exception:
> {noformat}
> java.lang.ArrayStoreException
> at java.util.ArrayList.toArray(ArrayList.java:389)
> at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> {noformat}
> The cause is that in LocatedBlock.java, when {{addCachedLoc}}:
> - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added 
> to {{cachedList}}
> - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type 
> {{DatanodeInfoWithStorage}}.
> Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of 
> {{DatanodeInfo}} but do not inherit from each other, resulting in the 
> ArrayStoreException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9251) Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly creating Files in tests code.

2015-10-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9251:

Status: Patch Available  (was: Open)

> Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly 
> creating Files in tests code.
> ---
>
> Key: HDFS-9251
> URL: https://issues.apache.org/jira/browse/HDFS-9251
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9251.00.patch
>
>
> In {{TestWriteToReplica}} and {{TestFsDatasetImpl}}, tests directly creates 
> block and metadata files:
> {code}
> replicaInfo.getBlockFile().createNewFile();
> replicaInfo.getMetaFile().createNewFile();
> {code}
> It leaks the implementation details of {{FsDatasetImpl}}. This JIRA proposes 
> to use {{FsDatasetImplTestUtils}} (HDFS-9188) to create replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9251) Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly creating Files in tests code.

2015-10-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9251:

Attachment: HDFS-9251.00.patch

This patch:
* Add {{CreateFinalizedReplica}}, {{CreateRbw}}, {{CreateReplicaInPipeline}}, 
{{CreateRBW}} and {{CreateReplicaWaitingToBeRecovered}} to 
{{FsDatasetTestUtils}}.  
* Refactor {{TestFsDatasetImpl}} and {{TestWriteToReplica}} to use them.


> Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly 
> creating Files in tests code.
> ---
>
> Key: HDFS-9251
> URL: https://issues.apache.org/jira/browse/HDFS-9251
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-9251.00.patch
>
>
> In {{TestWriteToReplica}} and {{TestFsDatasetImpl}}, tests directly creates 
> block and metadata files:
> {code}
> replicaInfo.getBlockFile().createNewFile();
> replicaInfo.getMetaFile().createNewFile();
> {code}
> It leaks the implementation details of {{FsDatasetImpl}}. This JIRA proposes 
> to use {{FsDatasetImplTestUtils}} (HDFS-9188) to create replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9251) Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly creating Files in tests code.

2015-10-15 Thread Lei (Eddy) Xu (JIRA)
Lei (Eddy) Xu created HDFS-9251:
---

 Summary: Refactor TestWriteToReplica and TestFsDatasetImpl to 
avoid explicitly creating Files in tests code.
 Key: HDFS-9251
 URL: https://issues.apache.org/jira/browse/HDFS-9251
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu


In {{TestWriteToReplica}} and {{TestFsDatasetImpl}}, tests directly creates 
block and metadata files:

{code}
replicaInfo.getBlockFile().createNewFile();
replicaInfo.getMetaFile().createNewFile();
{code}

It leaks the implementation details of {{FsDatasetImpl}}. This JIRA proposes to 
use {{FsDatasetImplTestUtils}} (HDFS-9188) to create replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9184:

Attachment: HDFS-9184.007.patch

Thanks for your view, [~jnp]. The v7 patch addresses the latest comments.

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information 
> across multiple layers. The span is created in many places interconnected 
> like a tree structure which relies on offline analysis across RPC boundary. 
> For this use case, {{htrace}} has to be enabled at 100% sampling rate which 
> introduces significant overhead. Moreover, passing additional information 
> (via annotations) other than span id from root of the tree to leaf is a 
> significant additional work.
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there 
> are some related discussion on this topic. The final patch implemented the 
> tracking id as a part of delegation token. This protects the tracking 
> information from being changed or impersonated. However, kerberos 
> authenticated connections or insecure connections don't have tokens. 
> [HADOOP-8779] proposes to use tokens in all the scenarios, but that might 
> mean changes to several upstream projects and is a major change in their 
> security implementation.
> We propose another approach to address this problem. We also treat HDFS audit 
> log as a good place for after-the-fact root cause analysis. We propose to put 
> the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
> the threadlocal object is passed to NN as a part of RPC header (optional), 
> while on sever side NN retrieves it from header and put it to {{Handler}}'s 
> threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
> caller context for each operation. In this way, the existing code is not 
> affected.
> It is still challenging to keep "lying" client from abusing the caller 
> context. Our proposal is to add a {{signature}} field to the caller context. 
> The client choose to provide its signature along with the caller id. The 
> operator may need to validate the signature at the time of offline analysis. 
> The NN is not responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959561#comment-14959561
 ] 

Hadoop QA commented on HDFS-9250:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 24s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 27s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 57s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  50m 26s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 32s | Tests passed in 
hadoop-hdfs-client. |
| | | 103m 17s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestBlockStoragePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766835/HDFS-9250.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13006/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13006/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13006/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13006/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13006/console |


This message was automatically generated.

> LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
> ---
>
> Key: HDFS-9250
> URL: https://issues.apache.org/jira/browse/HDFS-9250
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9250.001.patch
>
>
> We may see the following exception:
> {noformat}
> java.lang.ArrayStoreException
> at java.util.ArrayList.toArray(ArrayList.java:389)
> at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> {noformat}
> The cause is that in LocatedBlock.java, when {{addCachedLoc}}:
> - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added 
> to {{cachedList}}
> - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type 
> {{DatanodeInfoWithStorage}}.
> Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of 
> {{DatanodeInfo}} but do not inherit from each other, resulting in the 
> ArrayStoreException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959556#comment-14959556
 ] 

Hadoop QA commented on HDFS-9129:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 35s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   9m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 32s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 21s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 39s | The applied patch generated  
30 new checkstyle issues (total was 626, now 602). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 43s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 24s | Post-patch findbugs 
hadoop-hdfs-project/hadoop-hdfs compilation is broken. |
| {color:green}+1{color} | findbugs |   0m 24s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:green}+1{color} | native |   0m 31s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  53m 29s | Tests failed in hadoop-hdfs. |
| | | 100m  1s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.tools.TestDFSHAAdminMiniCluster |
|   | hadoop.fs.TestHdfsNativeCodeLoader |
|   | hadoop.hdfs.security.TestDelegationToken |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766673/HDFS-9129.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13007/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13007/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13007/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13007/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13007/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13007/console |


This message was automatically generated.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9208) Disabling atime may fail clients like distCp

2015-10-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959496#comment-14959496
 ] 

Mingliang Liu commented on HDFS-9208:
-

Hi [~kihwal], this is on my list this week but I'm not working on this 
actively. Sorry for the delay. I need more context for the options.

I assign it back to you in case you're blocked.

> Disabling atime may fail clients like distCp
> 
>
> Key: HDFS-9208
> URL: https://issues.apache.org/jira/browse/HDFS-9208
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Mingliang Liu
>
> When atime is disabled, {{setTimes()}} throws an exception if the passed-in 
> atime is not -1.  But since atime is not -1, distCp fails when it tries to 
> set the mtime and atime. 
> There are several options:
> 1) make distCp check for 0 atime and call {{setTimes()}} with -1. I am not 
> very enthusiastic about it.
> 2) make NN also accept 0 atime in addition to -1, when the atime support is 
> disabled.
> 3) support setting mtime & atime regardless of the atime support.  The main 
> reason why atime is disabled is to avoid edit logging/syncing during 
> {{getBlockLocations()}} read calls. Explicit setting can be allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9208) Disabling atime may fail clients like distCp

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9208:

Assignee: Kihwal Lee  (was: Mingliang Liu)

> Disabling atime may fail clients like distCp
> 
>
> Key: HDFS-9208
> URL: https://issues.apache.org/jira/browse/HDFS-9208
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>
> When atime is disabled, {{setTimes()}} throws an exception if the passed-in 
> atime is not -1.  But since atime is not -1, distCp fails when it tries to 
> set the mtime and atime. 
> There are several options:
> 1) make distCp check for 0 atime and call {{setTimes()}} with -1. I am not 
> very enthusiastic about it.
> 2) make NN also accept 0 atime in addition to -1, when the atime support is 
> disabled.
> 3) support setting mtime & atime regardless of the atime support.  The main 
> reason why atime is disabled is to avoid edit logging/syncing during 
> {{getBlockLocations()}} read calls. Explicit setting can be allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9220) Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum

2015-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959475#comment-14959475
 ] 

Hudson commented on HDFS-9220:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #538 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/538/])
HDFS-9220. Reading small file (< 512 bytes) that is open for append (kihwal: 
rev c7c36cbd6218f46c33d7fb2f60cd52cb29e6d720)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java


> Reading small file (< 512 bytes) that is open for append fails due to 
> incorrect checksum
> 
>
> Key: HDFS-9220
> URL: https://issues.apache.org/jira/browse/HDFS-9220
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-9220.000.patch, HDFS-9220.001.patch, 
> HDFS-9220.002.patch, test2.java
>
>
> Exception:
> 2015-10-09 14:59:40 WARN  DFSClient:1150 - fetchBlockByteRange(). Got a 
> checksum exception for /tmp/file0.05355529331575182 at 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882:0 from 
> DatanodeInfoWithStorage[10.10.10.10]:5001
> All 3 replicas cause this exception and the read fails entirely with:
> BlockMissingException: Could not obtain block: 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882 
> file=/tmp/file0.05355529331575182
> Code to reproduce is attached.
> Does not happen in 2.7.0.
> Data is read correctly if checksum verification is disabled.
> More generally, the failure happens when reading from the last block of a 
> file and the last block has <= 512 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers

2015-10-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9079:

Attachment: HDFS-9079.03.patch

Updating the patch to fix all reported test failures. Main change is to add 
logic to get new block token when current one expires.

I'm currently working on adding Javadocs and fixing exception handling.

> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> 
>
> Key: HDFS-9079
> URL: https://issues.apache.org/jira/browse/HDFS-9079
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch, 
> HDFS-9079.02.patch, HDFS-9079.03.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped 
> block group ({{FSN#createNewBlock}}). For each new striped block group we can 
> reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can 
> be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we 
> shouldn't try to further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN

2015-10-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959448#comment-14959448
 ] 

Daryn Sharp commented on HDFS-9198:
---

Most of the test failures are a race condition from IBRs not be sync now.  Will 
update shortly.  All you watchers, any comments on the approach?

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7964) Add support for async edit logging

2015-10-15 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7964:
--
Attachment: HDFS-7964.patch

Updated, simplified.

> Add support for async edit logging
> --
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9208) Disabling atime may fail clients like distCp

2015-10-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959431#comment-14959431
 ] 

Kihwal Lee commented on HDFS-9208:
--

[~liuml07] Are you actively working on this?

> Disabling atime may fail clients like distCp
> 
>
> Key: HDFS-9208
> URL: https://issues.apache.org/jira/browse/HDFS-9208
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Mingliang Liu
>
> When atime is disabled, {{setTimes()}} throws an exception if the passed-in 
> atime is not -1.  But since atime is not -1, distCp fails when it tries to 
> set the mtime and atime. 
> There are several options:
> 1) make distCp check for 0 atime and call {{setTimes()}} with -1. I am not 
> very enthusiastic about it.
> 2) make NN also accept 0 atime in addition to -1, when the atime support is 
> disabled.
> 3) support setting mtime & atime regardless of the atime support.  The main 
> reason why atime is disabled is to avoid edit logging/syncing during 
> {{getBlockLocations()}} read calls. Explicit setting can be allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-10-15 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9249:
--
Status: Patch Available  (was: Open)

> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9249.001.patch
>
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
> syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, 
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-003
> 2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor 
> (CacheReplicationMonitor.java:run(169)) - Shutting down 
> CacheReplicationMonitor
> 2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopp

[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-10-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959417#comment-14959417
 ] 

Yongjun Zhang commented on HDFS-9236:
-

Hi [~twu],

Thanks for the updated rev 3 which looks reasonable to me.

Hi [~kihwal], would you please help taking a look? really appreciate it.

Thanks.


> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-10-15 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9249:
--
Attachment: HDFS-9249.001.patch

Check for null pointer, and add more verbose log info when IOException is 
thrown.

> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9249.001.patch
>
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
> syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, 
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-003
> 2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor 
> (CacheReplicationMonitor.java:run(169)) - Shutting down 
> CacheReplicationMon

[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-10-15 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959412#comment-14959412
 ] 

Wei-Chiu Chuang commented on HDFS-9249:
---

Hadoop JIRA does not allow me to edit comments. But what I am saying is that 
instead of wildly guessing Kerberos is to blame, add more log info to expose 
the problem when it happens. After all, this is a bug that looks rarely happens.

> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
> syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, 
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-003
> 2015-10-14 19:45:07,

[jira] [Updated] (HDFS-9236) Missing sanity check for block size during block recovery

2015-10-15 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HDFS-9236:
--
Attachment: HDFS-9236.003.patch

Addressed [~yzhangal]'s review comments.

> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-10-15 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959400#comment-14959400
 ] 

Wei-Chiu Chuang commented on HDFS-9249:
---

[~ste...@apache.org] Thanks for the suggestion.
The exception was thrown when auth is default (i.e. SIMPLE). I did what you 
suggested, and instead of NPE at BackupNode, an IOException is thrown by 
NameNode, but unlike BackupNode.stop(), NameNode.stop() checks if namesystem is 
null. Additionally, I looked further and found there are other IOException 
possibilities at other places.

So I think in addition to logging the exception, BackupNode should also check 
for the null pointer.

> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
> syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, 
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/dat

[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-10-15 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959398#comment-14959398
 ] 

Rakesh R commented on HDFS-9173:


Awesome work [~walter.k.su], its really interesting. I've few comments:

# {{getSafeLength}} -> Could you please tell me the behavior of a file where it 
doesn't have any block with full cell size of 64k. For example, all the blocks 
having less than the CELL_SIZE number of bytes. Say CELL_SIZE=64 * 1024, now 
blocks are blk1=1024 blk2=2*1024 blk3=3*1024 blk4=4*1024 blk5=5*104 blk6=6*1024 
number of bytes etc. Please add a test if not included.
# The patch is quite large, just a suggestion to simplify the review & patch 
rework effort. I could see you have created {{BlockRecoveryWorker}}, 
{{RecoveryTask}} classes to segregate the logic and done few improvements. Its 
nice idea. If everyone agrees, it would be good to create another jira to do 
these pre-requisite changes separately and can safely push it within no time. 
Later in this jira, will do focused review/test for the 
{{StripedRecoveryTask#recover()}} related logic. Any thoughts?

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-15 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959385#comment-14959385
 ] 

Jitendra Nath Pandey commented on HDFS-9184:


[~liuml07], I think it will be a good idea to move the new configurations to 
common instead of having them in hdfs, because CallerContext is defined in 
common.

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information 
> across multiple layers. The span is created in many places interconnected 
> like a tree structure which relies on offline analysis across RPC boundary. 
> For this use case, {{htrace}} has to be enabled at 100% sampling rate which 
> introduces significant overhead. Moreover, passing additional information 
> (via annotations) other than span id from root of the tree to leaf is a 
> significant additional work.
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there 
> are some related discussion on this topic. The final patch implemented the 
> tracking id as a part of delegation token. This protects the tracking 
> information from being changed or impersonated. However, kerberos 
> authenticated connections or insecure connections don't have tokens. 
> [HADOOP-8779] proposes to use tokens in all the scenarios, but that might 
> mean changes to several upstream projects and is a major change in their 
> security implementation.
> We propose another approach to address this problem. We also treat HDFS audit 
> log as a good place for after-the-fact root cause analysis. We propose to put 
> the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
> the threadlocal object is passed to NN as a part of RPC header (optional), 
> while on sever side NN retrieves it from header and put it to {{Handler}}'s 
> threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
> caller context for each operation. In this way, the existing code is not 
> affected.
> It is still challenging to keep "lying" client from abusing the caller 
> context. Our proposal is to add a {{signature}} field to the caller context. 
> The client choose to provide its signature along with the caller id. The 
> operator may need to validate the signature at the time of offline analysis. 
> The NN is not responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9207) Move the implementation to the hdfs-native-client module

2015-10-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-9207.
--
Resolution: Fixed

Committed to the HDFS-8707 branch. Thanks James and Bob for the reviews!

> Move the implementation to the hdfs-native-client module
> 
>
> Key: HDFS-9207
> URL: https://issues.apache.org/jira/browse/HDFS-9207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9207.000.patch
>
>
> The implementation of libhdfspp should be moved to the new hdfs-native-client 
> module as HDFS-9170 has landed in trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9223) Code cleanup for DatanodeDescriptor and HeartbeatManager

2015-10-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9223:
-
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-8966

> Code cleanup for DatanodeDescriptor and HeartbeatManager
> 
>
> Key: HDFS-9223
> URL: https://issues.apache.org/jira/browse/HDFS-9223
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9223.000.patch, HDFS-9223.001.patch, 
> HDFS-9223.002.patch, HDFS-9223.003.patch
>
>
> Some code cleanup for {{DatanodeDescriptor}} and {{HeartbeatManager}}. The 
> changes include:
> # Change {{DataDescriptor#isAlive}} and {{DatanodeDescriptor#needKeyUpdate}} 
> from public to private
> # Use EnumMap for {{HeartbeatManager#storageTypeStatesMap}}
> # Move the {{isInStartupSafeMode}} out of the namesystem lock in 
> {{heartbeatCheck}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-15 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959367#comment-14959367
 ] 

Anu Engineer commented on HDFS-4015:


[~liuml07] Thanks for looking that the Hadoop QA results.  I did look at test 
results just to double check. 
2 of them are failures related to globbing that has already been reverted. 
Other failures are mostly timing related and not related to this patch.


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9220) Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum

2015-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959361#comment-14959361
 ] 

Hudson commented on HDFS-9220:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1274 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1274/])
HDFS-9220. Reading small file (< 512 bytes) that is open for append (kihwal: 
rev c7c36cbd6218f46c33d7fb2f60cd52cb29e6d720)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java


> Reading small file (< 512 bytes) that is open for append fails due to 
> incorrect checksum
> 
>
> Key: HDFS-9220
> URL: https://issues.apache.org/jira/browse/HDFS-9220
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-9220.000.patch, HDFS-9220.001.patch, 
> HDFS-9220.002.patch, test2.java
>
>
> Exception:
> 2015-10-09 14:59:40 WARN  DFSClient:1150 - fetchBlockByteRange(). Got a 
> checksum exception for /tmp/file0.05355529331575182 at 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882:0 from 
> DatanodeInfoWithStorage[10.10.10.10]:5001
> All 3 replicas cause this exception and the read fails entirely with:
> BlockMissingException: Could not obtain block: 
> BP-353681639-10.10.10.10-1437493596883:blk_1075692769_9244882 
> file=/tmp/file0.05355529331575182
> Code to reproduce is attached.
> Does not happen in 2.7.0.
> Data is read correctly if checksum verification is disabled.
> More generally, the failure happens when reading from the last block of a 
> file and the last block has <= 512 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9092) Nfs silently drops overlapping write requests and causes data copying to fail

2015-10-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959355#comment-14959355
 ] 

Mingliang Liu commented on HDFS-9092:
-

s/Do/Does/

> Nfs silently drops overlapping write requests and causes data copying to fail
> -
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write 
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously. 
> 2. The NFS gateway has handler to handle the incoming requests by creating an 
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the 
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in 
> 3. The detection of overlapping write request happens in 2, but it only 
> checks the write request against the curent offset, and trim the request if 
> necessary. Because the write requests are sent asynchronously, if two 
> requests are beyond the current offset, and they overlap, it's not detected 
> and both are put into the cache. This cause the symptom reported in this case 
> at step 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3059) ssl-server.xml causes NullPointer

2015-10-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959332#comment-14959332
 ] 

Yongjun Zhang commented on HDFS-3059:
-

HI [~xiaochen],

Thanks for working on this issue. I browsed it and have some comments/question:

{code}
 LOG.warn("IOException caught when getting password, setting password "
  + "to null. Exception:\"" + ioe.getMessage() + "\".");
{code}
to:
{code}
 LOG.warn("Setting password to null since IOException is caught when 
getting password", ioe);
{code}

2. Add comma to " is specified make sure it is a relative path" as "is 
specified, make sure it is a relative path"

3. Would you please explain why the following comments? maybe add the 
explanation as an addition to the comment.
{code}
   // This is only needed when starting SNN as a daemon,
   // and no need to run it if called from shell command.
{code}

Thanks.


> ssl-server.xml causes NullPointer
> -
>
> Key: HDFS-3059
> URL: https://issues.apache.org/jira/browse/HDFS-3059
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 2.7.1
> Environment: in core-site.xml:
> {code:xml}
>   
> hadoop.security.authentication
> kerberos
>   
>   
> hadoop.security.authorization
> true
>   
> {code}
> in hdfs-site.xml:
> {code:xml}
>   
> dfs.https.server.keystore.resource
> /etc/hadoop/conf/ssl-server.xml
>   
>   
> dfs.https.enable
> true
>   
>   
> ...other security props
>   
> {code}
>Reporter: Evert Lammerts
>Assignee: Xiao Chen
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3059.02.patch, HDFS-3059.03.patch, 
> HDFS-3059.04.patch, HDFS-3059.05.patch, HDFS-3059.patch, HDFS-3059.patch.2
>
>
> If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a 
> DN will crash during startup while setting up an SSL socket with a 
> NullPointerException:
> {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: 
> useKerb = false, useCerts = true
> jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO 
> mortbay.log: jetty-6.1.26.cloudera.1
> 12/03/07 17:08:36 INFO mortbay.log: Started 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new 
> KrbServerSocket for: 0.0.0.0
> 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: 
> !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: 
> java.io.IOException: !JsseListener: java.lang.NullPointerException
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> Krb5AndCertsSslSocketConnector@0.0.0.0:50475
> 12/03/07 17:08:36 INFO mortbay.log: Stopped 
> selectchannelconnec...@p-worker35.alley.sara.nl:1006
> 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, 
> active threads is 0{noformat}
> The same happens if I set an absolute path to an existing 
> dfs.https.server.keystore.resource - in this case the file cannot be found 
> but not even a WARN is given.
> Since in dfs.https.server.keystore.resource we know we need to have 4 
> properties specified (ssl.server.truststore.location, 
> ssl.server.keystore.location, ssl.server.keystore.password, and 
> ssl.server.keystore.keypassword) we should check if they are set and throw an 
> IOException if they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9092) Nfs silently drops overlapping write requests and causes data copying to fail

2015-10-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959329#comment-14959329
 ] 

Mingliang Liu commented on HDFS-9092:
-

It happens to me sometimes when the Jenkins did not report findbugs, in which 
case I have to run it locally to double check. It will be nice if we can find 
out why.

As to the warning itself, I think the unsynchronized access is read only, and 
for LOG/toString purpose. Do it make sense to make the {{offset}} and 
{{originalCount}} volatile?

> Nfs silently drops overlapping write requests and causes data copying to fail
> -
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write 
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously. 
> 2. The NFS gateway has handler to handle the incoming requests by creating an 
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the 
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in 
> 3. The detection of overlapping write request happens in 2, but it only 
> checks the write request against the curent offset, and trim the request if 
> necessary. Because the write requests are sent asynchronously, if two 
> requests are beyond the current offset, and they overlap, it's not detected 
> and both are put into the cache. This cause the symptom reported in this case 
> at step 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty

2015-10-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9250:

Status: Patch Available  (was: Open)

> LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
> ---
>
> Key: HDFS-9250
> URL: https://issues.apache.org/jira/browse/HDFS-9250
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9250.001.patch
>
>
> We may see the following exception:
> {noformat}
> java.lang.ArrayStoreException
> at java.util.ArrayList.toArray(ArrayList.java:389)
> at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> {noformat}
> The cause is that in LocatedBlock.java, when {{addCachedLoc}}:
> - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added 
> to {{cachedList}}
> - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type 
> {{DatanodeInfoWithStorage}}.
> Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of 
> {{DatanodeInfo}} but do not inherit from each other, resulting in the 
> ArrayStoreException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >