[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652579#comment-14652579 ] Haohui Mai commented on HDFS-7966: -- bq. What's the upside of this new implementation? Performance is definitely one important factor. One of the motivation is to improve the efficiency of DN when there are hundreds of thousands of reads by reducing the overhead of context switches. [~Apache9], do you have any performance numbers on this scenario? HTTP/2-based DTP also serves as a building block of the next-level of innovation, just to quote the description in the jira: {quote} This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. {quote} bq. If it were the same performance but had other redeeming qualities (e.g. less code) then it's still worth consideration. This is designed to be a new code path so that it is compatible with older releases. You can still rely on the old DTP protocol depending on the application scenario. New Data Transfer Protocol via HTTP/2 - Key: HDFS-7966 URL: https://issues.apache.org/jira/browse/HDFS-7966 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Qianqian Shi Labels: gsoc, gsoc2015, mentor Attachments: GSoC2015_Proposal.pdf, TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, TestHttp2ReadBlockInsideEventLoop.svg The current Data Transfer Protocol (DTP) implements a rich set of features that span across multiple layers, including: * Connection pooling and authentication (session layer) * Encryption (presentation layer) * Data writing pipeline (application layer) All these features are HDFS-specific and defined by implementation. As a result it requires non-trivial amount of work to implement HDFS clients and servers. This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
[ https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652695#comment-14652695 ] Zhe Zhang commented on HDFS-8849: - Thanks Allen for the advice. I think we can report the *number of missing blocks with min replication* instead. fsck should report number of missing blocks with replication factor 1 - Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652698#comment-14652698 ] Duo Zhang commented on HDFS-7966: - I do not have enough machines to test the scenario... What I see if I create lots of thread to read from datanode concurrently is that HTTP/2 will start the request almost at the same time, but TCP will start the request one by one(maybe tens by tens where the number is cpu count). So there won't be a situation that DN really handle lots of concurrent read from client, and the context switch maybe small than HTTP/2 implementation since we also have a ThreadPool besides EventLoopGroup in HTTP/2 connection. And what make things worse is that our client is not event driven so we can not reduce the thread count of client... Let me see if I can make a scenario that HTTP/2 fast than TCP... Thanks. New Data Transfer Protocol via HTTP/2 - Key: HDFS-7966 URL: https://issues.apache.org/jira/browse/HDFS-7966 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Qianqian Shi Labels: gsoc, gsoc2015, mentor Attachments: GSoC2015_Proposal.pdf, TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, TestHttp2ReadBlockInsideEventLoop.svg The current Data Transfer Protocol (DTP) implements a rich set of features that span across multiple layers, including: * Connection pooling and authentication (session layer) * Encryption (presentation layer) * Data writing pipeline (application layer) All these features are HDFS-specific and defined by implementation. As a result it requires non-trivial amount of work to implement HDFS clients and servers. This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8499) Refactor BlockInfo class hierarchy with static helper class
[ https://issues.apache.org/jira/browse/HDFS-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652684#comment-14652684 ] Tsz Wo Nicholas Sze commented on HDFS-8499: --- Not yet. Should be able to try it on Wednesday. Refactor BlockInfo class hierarchy with static helper class --- Key: HDFS-8499 URL: https://issues.apache.org/jira/browse/HDFS-8499 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.8.0 Attachments: HDFS-8499.00.patch, HDFS-8499.01.patch, HDFS-8499.02.patch, HDFS-8499.03.patch, HDFS-8499.04.patch, HDFS-8499.05.patch, HDFS-8499.06.patch, HDFS-8499.07.patch, HDFS-8499.UCFeature.patch, HDFS-bistriped.patch In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a common abstraction for striped and contiguous UC blocks. This JIRA aims to merge it to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation
[ https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652808#comment-14652808 ] Tsz Wo Nicholas Sze commented on HDFS-8804: --- +1 the new patch looks good. Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation --- Key: HDFS-8804 URL: https://issues.apache.org/jira/browse/HDFS-8804 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for the stripe buffer and the buffers holding parity data. It's better to get ByteBuffer from DirectBufferPool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation
[ https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-8804. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 I've committed this to the feature branch. Thank you guys for the review! bq. we can at least assert alignedStripe.range.spanInBlock is no larger than cellSize This is guaranteed by the logic in {{readOneStripe}}. Thus my feeling here is the assertion is unnecessary. Also we don't have this assertion for data block buffer. Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation --- Key: HDFS-8804 URL: https://issues.apache.org/jira/browse/HDFS-8804 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Fix For: HDFS-7285 Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for the stripe buffer and the buffers holding parity data. It's better to get ByteBuffer from DirectBufferPool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
Colin Patrick McCabe created HDFS-8850: -- Summary: VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks Key: HDFS-8850 URL: https://issues.apache.org/jira/browse/HDFS-8850 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe The VolumeScanner threads inside the BlockScanner exit with an exception if there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652932#comment-14652932 ] Yi Liu commented on HDFS-8850: -- Yes, you are right. +1 pending Jenkins. VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks - Key: HDFS-8850 URL: https://issues.apache.org/jira/browse/HDFS-8850 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8850.001.patch The VolumeScanner threads inside the BlockScanner exit with an exception if there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8850: --- Status: Patch Available (was: Open) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks - Key: HDFS-8850 URL: https://issues.apache.org/jira/browse/HDFS-8850 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8850.001.patch The VolumeScanner threads inside the BlockScanner exit with an exception if there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command
[ https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Capo updated HDFS-488: - Hadoop Flags: Reviewed Fix Version/s: 2.7.1 Affects Version/s: 2.7.1 Target Version/s: 2.7.1 Tags: MoveToLocal Status: Patch Available (was: Open) Implement moveToLocal HDFS command --- Key: HDFS-488 URL: https://issues.apache.org/jira/browse/HDFS-488 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Ravi Phulari Assignee: Steven Capo Labels: newbie Fix For: 2.7.1 Attachments: Screen Shot 2014-07-23 at 12.28.23 PM 1.png Surprisingly executing HDFS FsShell command -moveToLocal outputs - Option '-moveToLocal' is not implemented yet. {code} statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t Option '-moveToLocal' is not implemented yet. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command
[ https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Capo updated HDFS-488: - Attachment: HDFS-488.patch Implement moveToLocal HDFS command --- Key: HDFS-488 URL: https://issues.apache.org/jira/browse/HDFS-488 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Ravi Phulari Assignee: Steven Capo Labels: newbie Fix For: 2.7.1 Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 1.png Surprisingly executing HDFS FsShell command -moveToLocal outputs - Option '-moveToLocal' is not implemented yet. {code} statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t Option '-moveToLocal' is not implemented yet. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8761) Windows HDFS daemon - datanode.DirectoryScanner: Error compiling report (...) XXX is not a prefix of YYY
[ https://issues.apache.org/jira/browse/HDFS-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652581#comment-14652581 ] Chris Nauroth commented on HDFS-8761: - [~odelalleau], glad to hear this helped! bq. I wonder how this is not a bug though, even if there exists a workaround. But not a big deal. I agree that the configuration file can end up looking non-intuitive on Windows. Unfortunately, I don't see a way to do any better while maintaining the feature that everything defaults to using {{hadoop.tmp.dir}} for quick dev deployments. This is a side effect of the fact that a Windows file system path is not always valid as a URL. On Linux, a file system path will always be a valid URL (assuming the individual path names stick to the characters that don't require escaping). I typically advise using a full {{file:}} URL in production configurations to make everything clearer for operators. Windows HDFS daemon - datanode.DirectoryScanner: Error compiling report (...) XXX is not a prefix of YYY Key: HDFS-8761 URL: https://issues.apache.org/jira/browse/HDFS-8761 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Environment: Windows 7, Java SDK 1.8.0_45 Reporter: Olivier Delalleau Priority: Minor I'm periodically seeing errors like the one below output by the HDFS daemon (started with start-dfs.cmd). This is with the default settings for data location (=not specified in my hdfs-site.xml). I assume it may be fixable by specifying a path with the drive letter in the config file, however I haven't be able to do that (see http://stackoverflow.com/questions/31353226/setting-hadoop-tmp-dir-on-windows-gives-error-uri-has-an-authority-component). 15/07/11 17:29:57 ERROR datanode.DirectoryScanner: Error compiling report java.util.concurrent.ExecutionException: java.lang.RuntimeException: \tmp\hadoop-odelalleau\dfs\data is not a prefix of D:\tmp\hadoop-odelalleau\dfs\data\current\BP-1474392971-10.128.22.110-1436634926842\current\finalized\subdir0\subdir0\blk_1073741825 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:566) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:425) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:406) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:362) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652645#comment-14652645 ] Hadoop QA commented on HDFS-8220: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 36s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 28s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 21s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 177m 12s | Tests failed in hadoop-hdfs. | | | | 220m 24s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestWriteStripedFileWithFailure | | | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748509/HDFS-8220-HDFS-7285-09.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / ba90c02 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11889/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11889/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11889/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11889/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11889/console | This message was automatically generated. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652789#comment-14652789 ] Jing Zhao commented on HDFS-8828: - Thanks for the explanation, Yufei! Yes, you're right that our current code uses the file list to check if a file is in the source. In that sense excluding -delete may be our only option here. But we may need to provide more details in the documentation about the behavior, as also suggested by Yongjun. Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
[ https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652970#comment-14652970 ] Allen Wittenauer commented on HDFS-8849: This is one of those times where I feel that no matter what I say, it's pretty clear the dev is hell bent on putting in some useless feature that doesn't actually benefit anyone. That said, I'll also remind you that putting this into 2.x is a breaking change by the compatibility requirements since changing the output of fsck isn't allowed. fsck should report number of missing blocks with replication factor 1 - Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command
[ https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-488: -- Priority: Minor (was: Major) Hadoop Flags: (was: Reviewed) Fix Version/s: (was: 2.7.1) Implement moveToLocal HDFS command --- Key: HDFS-488 URL: https://issues.apache.org/jira/browse/HDFS-488 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Ravi Phulari Assignee: Steven Capo Priority: Minor Labels: newbie Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 1.png Surprisingly executing HDFS FsShell command -moveToLocal outputs - Option '-moveToLocal' is not implemented yet. {code} statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t Option '-moveToLocal' is not implemented yet. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks
[ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652639#comment-14652639 ] Hadoop QA commented on HDFS-8823: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 22s | The applied patch generated 5 new checkstyle issues (total was 577, now 573). | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 34s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 3s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 159m 12s | Tests failed in hadoop-hdfs. | | | | 202m 55s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748510/HDFS-8823.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 469cfcd | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11890/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11890/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11890/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11890/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11890/console | This message was automatically generated. Move replication factor into individual blocks -- Key: HDFS-8823 URL: https://issues.apache.org/jira/browse/HDFS-8823 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8823.000.patch, HDFS-8823.001.patch This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes have two advantages: * Decoupling the namespace and the block management layer. It is a prerequisite step to move block management off the heap or to a separate process. * Increased flexibility on replicating blocks. Currently the replication factors of all blocks have to be the same. The replication factors of these blocks are equal to the highest replication factor across all snapshots. The changes will allow blocks in a file to have different replication factor, potentially saving some space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652657#comment-14652657 ] Andrew Wang commented on HDFS-7966: --- Agree there potentially are performance advantages, but it looks like all the benchmarks thus far show worse performance. I'd be very happy to see positive results, since erasure coding will lead to a lot more remote reads and thus possibly hitting this code path. There has to be some upside though for this to be merged. The existing DTP already implements a number of the features mentioned, so not sure how much we gain there. And if perf isn't as good or better, then we're increasing our maintenance burden for something that won't get used. New Data Transfer Protocol via HTTP/2 - Key: HDFS-7966 URL: https://issues.apache.org/jira/browse/HDFS-7966 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Qianqian Shi Labels: gsoc, gsoc2015, mentor Attachments: GSoC2015_Proposal.pdf, TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, TestHttp2ReadBlockInsideEventLoop.svg The current Data Transfer Protocol (DTP) implements a rich set of features that span across multiple layers, including: * Connection pooling and authentication (session layer) * Encryption (presentation layer) * Data writing pipeline (application layer) All these features are HDFS-specific and defined by implementation. As a result it requires non-trivial amount of work to implement HDFS clients and servers. This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652736#comment-14652736 ] Yufei Gu commented on HDFS-8828: Hi Jing Zhao, Thank you for reviewing the code. We changed the option here for the following reason. This patch is to build the diff file list instead of complete file list. In the other words, only files/directories changed/created will be in the copy file list. With the -delete option on, the MR jobs will delete every files/directories in the target which are not in the copy file list. So it will delete files we intend to keep. Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652728#comment-14652728 ] Hadoop QA commented on HDFS-8828: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 22s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 37s | The applied patch generated 2 additional warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 6m 19s | Tests passed in hadoop-distcp. | | | | 42m 32s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748535/HDFS-8828.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 469cfcd | | javadoc | https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/diffJavadocWarnings.txt | | hadoop-distcp test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/testrun_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11891/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11891/console | This message was automatically generated. Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8850: --- Attachment: HDFS-8850.001.patch VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks - Key: HDFS-8850 URL: https://issues.apache.org/jira/browse/HDFS-8850 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8850.001.patch The VolumeScanner threads inside the BlockScanner exit with an exception if there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command
[ https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-488: -- Tags: (was: MoveToLocal) Implement moveToLocal HDFS command --- Key: HDFS-488 URL: https://issues.apache.org/jira/browse/HDFS-488 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Ravi Phulari Assignee: Steven Capo Priority: Minor Labels: newbie Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 1.png Surprisingly executing HDFS FsShell command -moveToLocal outputs - Option '-moveToLocal' is not implemented yet. {code} statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t Option '-moveToLocal' is not implemented yet. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command
[ https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-488: -- Target Version/s: (was: 2.7.1) Implement moveToLocal HDFS command --- Key: HDFS-488 URL: https://issues.apache.org/jira/browse/HDFS-488 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Ravi Phulari Assignee: Steven Capo Priority: Minor Labels: newbie Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 1.png Surprisingly executing HDFS FsShell command -moveToLocal outputs - Option '-moveToLocal' is not implemented yet. {code} statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t Option '-moveToLocal' is not implemented yet. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
[ https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653015#comment-14653015 ] Ajith S commented on HDFS-8808: --- Hi [~ggop] Why not bootstrap the standby without that property and when its complete, before starting the standby you add dfs.image.tranfer.bandwidthPerSec dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby Key: HDFS-8808 URL: https://issues.apache.org/jira/browse/HDFS-8808 Project: Hadoop HDFS Issue Type: Bug Reporter: Gautam Gopalakrishnan The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the speed with which the fsimage is copied between the namenodes during regular use. However, as a side effect, this also limits transfers when the {{-bootstrapStandby}} option is used. This option is often used during upgrades and could potentially slow down the entire workflow. The request here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653034#comment-14653034 ] Walter Su commented on HDFS-8220: - When I ran tests, I ran into some NPEs. Could you add {{si.isFailed()}} guard to {{updateBlockForPipeline}} and {{updatePipeline}} as well? Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8851) datanode fails to start due to a bad disk
Wang Hao created HDFS-8851: -- Summary: datanode fails to start due to a bad disk Key: HDFS-8851 URL: https://issues.apache.org/jira/browse/HDFS-8851 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.1 Reporter: Wang Hao Data node can not start due to a bad disk. I found a similar issue HDFS-6245 is reported, but our situation is different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8851) datanode fails to start due to a bad disk
[ https://issues.apache.org/jira/browse/HDFS-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653048#comment-14653048 ] Wang Hao commented on HDFS-8851: code 15/08/04 12:01:24 INFO common.Storage: Analyzing storage directories for bpid BP-454299492-10.84.100.171-1416301904728 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Locking is disabled 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash. 15/08/04 12:01:24 FATAL datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to hadoop001.dx.momo.com/10.84.100.171:8022. Exiting. java.io.IOException: Input/output error at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:243) at java.util.Properties$LineReader.readLine(Properties.java:434) at java.util.Properties.load0(Properties.java:353) at java.util.Properties.load(Properties.java:341) at org.apache.hadoop.hdfs.server.common.StorageInfo.readPropertiesFile(StorageInfo.java:247) at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:227) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:256) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:155) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:269) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:975) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:946) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:278) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:812) at java.lang.Thread.run(Thread.java:745) 15/08/04 12:01:24 WARN datanode.DataNode: Ending block pool service for: Block pool registering (Datanode Uuid unassigned) service to hadoop001.dx.momo.com/10.84.100.171:8022 15/08/04 12:01:24 INFO datanode.DataNode: Removed Block pool registering (Datanode Uuid unassigned) 15/08/04 12:01:26 WARN datanode.DataNode: Exiting Datanode 15/08/04 12:01:26 INFO util.ExitUtil: Exiting with status 0 15/08/04 12:01:26 INFO datanode.DataNode: SHUTDOWN_MSG: code datanode fails to start due to a bad disk - Key: HDFS-8851 URL: https://issues.apache.org/jira/browse/HDFS-8851 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.1 Reporter: Wang Hao Data node can not start due to a bad disk. I found a similar issue HDFS-6245 is reported, but our situation is different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8704: Attachment: HDFS-8704-HDFS-7285-004.patch Erasure Coding: client fails to write large file when one datanode fails Key: HDFS-8704 URL: https://issues.apache.org/jira/browse/HDFS-8704 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch I test current code on a 5-node cluster using RS(3,2). When a datanode is corrupt, client succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests files smaller than a block group, this jira will add more test situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8827) Erasure Coding: When namenode processes over replicated striped block, NPE will be occur in ReplicationMonitor
[ https://issues.apache.org/jira/browse/HDFS-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Fukudome updated HDFS-8827: -- Attachment: HDFS-8827.1.patch Thanks for the comment, [~zhz]! I attached an initial patch which added a unit test occurs this issue. It processes a small EC file which doesn't have full internal blocks and its internal blocks are over replicated. If I understood correctly, when some indices of internal blocks are missing and internal blocks are over replicated, {{BlockPlacementPolicyDefault#chooseReplicaToDelete}} will return null. I think the cause is the {{excessTypes}} in {{chooseExcessReplicasStriped}} is empty during the process of such blocks. Erasure Coding: When namenode processes over replicated striped block, NPE will be occur in ReplicationMonitor -- Key: HDFS-8827 URL: https://issues.apache.org/jira/browse/HDFS-8827 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Takuya Fukudome Assignee: Takuya Fukudome Attachments: HDFS-8827.1.patch, processing-over-replica-npe.log In our test cluster, when namenode processed over replicated striped blocks, null pointer exception(NPE) occurred. This happened under below situation: 1) some datanodes shutdown. 2) namenode recovers block group which lost internal blocks. 3) restart the stopped datanodes. 4) namenode processes over replicated striped blocks. 5) NPE occurs I think BlockPlacementPolicyDefault#chooseReplicaToDelete will return null in this situation which causes this NPE problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653076#comment-14653076 ] Hadoop QA commented on HDFS-8850: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 20s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 2s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 158m 28s | Tests failed in hadoop-hdfs. | | | | 202m 14s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748576/HDFS-8850.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c3364ca | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11892/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11892/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11892/console | This message was automatically generated. VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks - Key: HDFS-8850 URL: https://issues.apache.org/jira/browse/HDFS-8850 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8850.001.patch The VolumeScanner threads inside the BlockScanner exit with an exception if there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8851) datanode fails to start due to a bad disk
[ https://issues.apache.org/jira/browse/HDFS-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653056#comment-14653056 ] Wang Hao commented on HDFS-8851: There is a IOException when read VERSION because of the disk is bad, it will causes datanode failed to start. I think we should handle the exception during init storage. datanode fails to start due to a bad disk - Key: HDFS-8851 URL: https://issues.apache.org/jira/browse/HDFS-8851 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.1 Reporter: Wang Hao Data node can not start due to a bad disk. I found a similar issue HDFS-6245 is reported, but our situation is different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8852) Documentation of Hadoop 2.x is outdated about append write support
Hong Dai Thanh created HDFS-8852: Summary: Documentation of Hadoop 2.x is outdated about append write support Key: HDFS-8852 URL: https://issues.apache.org/jira/browse/HDFS-8852 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Hong Dai Thanh In the [latest version of the documentation|http://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Simple_Coherency_Model], and also documentation for all releases with version 2, it’s mentioned that “A file once created, written, and closed need not be changed. “ and “There is a plan to support appending-writes to files in the future.” However, as far as I know, HDFS has supported append write since 0.21, based on [HDFS-265|https://issues.apache.org/jira/browse/HDFS-265] and [the old version of the documentation in 2012|https://web.archive.org/web/20121221171824/http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html#Appending-Writes+and+File+Syncs] Various posts on the Internet also suggests that append write has been available in HDFS, and will always be available in Hadoop version 2 branch. Can we update the documentation to reflect the current status? (Please also review whether the documentation should also be updated for version 0.21 and above, and the version 1.x branch) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-488) Implement moveToLocal HDFS command
[ https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652995#comment-14652995 ] Hadoop QA commented on HDFS-488: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | javac | 0m 32s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748586/HDFS-488.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c3364ca | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11893/console | This message was automatically generated. Implement moveToLocal HDFS command --- Key: HDFS-488 URL: https://issues.apache.org/jira/browse/HDFS-488 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Ravi Phulari Assignee: Steven Capo Priority: Minor Labels: newbie Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 1.png Surprisingly executing HDFS FsShell command -moveToLocal outputs - Option '-moveToLocal' is not implemented yet. {code} statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t Option '-moveToLocal' is not implemented yet. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8704: Status: Patch Available (was: Open) Erasure Coding: client fails to write large file when one datanode fails Key: HDFS-8704 URL: https://issues.apache.org/jira/browse/HDFS-8704 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch I test current code on a 5-node cluster using RS(3,2). When a datanode is corrupt, client succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests files smaller than a block group, this jira will add more test situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8852) HDFS architecture documentation of version 2.x is outdated about append write support
[ https://issues.apache.org/jira/browse/HDFS-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Dai Thanh updated HDFS-8852: - Summary: HDFS architecture documentation of version 2.x is outdated about append write support (was: Documentation of Hadoop 2.x is outdated about append write support) HDFS architecture documentation of version 2.x is outdated about append write support - Key: HDFS-8852 URL: https://issues.apache.org/jira/browse/HDFS-8852 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Hong Dai Thanh In the [latest version of the documentation|http://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Simple_Coherency_Model], and also documentation for all releases with version 2, it’s mentioned that “A file once created, written, and closed need not be changed. “ and “There is a plan to support appending-writes to files in the future.” However, as far as I know, HDFS has supported append write since 0.21, based on [HDFS-265|https://issues.apache.org/jira/browse/HDFS-265] and [the old version of the documentation in 2012|https://web.archive.org/web/20121221171824/http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html#Appending-Writes+and+File+Syncs] Various posts on the Internet also suggests that append write has been available in HDFS, and will always be available in Hadoop version 2 branch. Can we update the documentation to reflect the current status? (Please also review whether the documentation should also be updated for version 0.21 and above, and the version 1.x branch) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652534#comment-14652534 ] Andrew Wang commented on HDFS-8747: --- From our side, we have some customers using encryption who want Trash as a safety mechanism. So simply using -skipTrash means they lose this safety. My advice has been to use snapshots, since snapshots provide similar (if not superior) properties to trash. That's also why I'm willing to accept some of the compromises regarding the proposed design; while not perfect, it's better than what we've got now. I do think though that nested encryption zones would make this better yet (for reasons even besides trash), and would not be too difficult to implement. Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones -- Key: HDFS-8747 URL: https://issues.apache.org/jira/browse/HDFS-8747 Project: Hadoop HDFS Issue Type: Bug Components: encryption Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, HDFS-8747-07292015.pdf HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to allow create encryption zone on top of a single HDFS directory. Files under the root directory of the encryption zone will be encrypted/decrypted transparently upon HDFS client write or read operations. Generally, it does not support rename(without data copying) across encryption zones or between encryption zone and non-encryption zone because different security settings of encryption zones. However, there are certain use cases where efficient rename support is desired. This JIRA is to propose better support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft Delete” (a.k.a. trash) with HDFS encryption zones. “Scratch Space” is widely used in Hadoop jobs, which requires efficient rename support. Temporary files from MR jobs are usually stored in staging area outside encryption zone such as “/tmp” directory and then rename to targeted directories as specified once the data is ready to be further processed. Below is a summary of supported/unsupported cases from latest Hadoop: * Rename within the encryption zone is supported * Rename the entire encryption zone by moving the root directory of the zone is allowed. * Rename sub-directory/file from encryption zone to non-encryption zone is not allowed. * Rename sub-directory/file from encryption zone A to encryption zone B is not allowed. * Rename from non-encryption zone to encryption zone is not allowed. “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory of the user's home directory instead of being deleted. Deleted files are initially moved (renamed) to the Current sub-directory of the .Trash directory with original path being preserved. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory. Due to the limited rename support, delete sub-directory/file within encryption zone with trash feature is not allowed. Client has to use -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved the error message but without a complete solution to the problem. We propose to solve the problem by generalizing the mapping between encryption zone and its underlying HDFS directories from 1:1 today to 1:N. The encryption zone should allow non-overlapped directories such as scratch space or soft delete trash locations to be added/removed dynamically after creation. This way, rename for scratch space and soft delete can be better supported without breaking the assumption that rename is only supported within the zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation
[ https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652536#comment-14652536 ] Zhe Zhang commented on HDFS-8804: - Thanks Jing for the work! The patch looks good to me. The only minor comment is that the below section could use some assertions to avoid overlapped allocation in the {{parityBuf}}: {code} ByteBuffer buf = getParityBuffer().duplicate(); buf.position(cellSize * decodeIndex); buf.limit(cellSize * decodeIndex + (int) alignedStripe.range.spanInBlock); decodeInputs[decodeIndex] = buf.slice(); {code} For example, since this is stateful read, we can at least assert {{alignedStripe.range.spanInBlock}} is no larger than {{cellSize}}. Ideally we should assert {{decodeIndex}} has not been allocated yet but it doesn't seem easy. As follow-on we can think about how to do it for pread. Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation --- Key: HDFS-8804 URL: https://issues.apache.org/jira/browse/HDFS-8804 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for the stripe buffer and the buffers holding parity data. It's better to get ByteBuffer from DirectBufferPool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652549#comment-14652549 ] Andrew Wang commented on HDFS-7966: --- I guess my question here is similar to what [~stack] and [~tlipcon] posed at the beginning. What's the upside of this new implementation? Seems like it's between 10 to 30% slower than the current implementation, which is not good. If it were the same performance but had other redeeming qualities (e.g. less code) then it's still worth consideration. New Data Transfer Protocol via HTTP/2 - Key: HDFS-7966 URL: https://issues.apache.org/jira/browse/HDFS-7966 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Qianqian Shi Labels: gsoc, gsoc2015, mentor Attachments: GSoC2015_Proposal.pdf, TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, TestHttp2ReadBlockInsideEventLoop.svg The current Data Transfer Protocol (DTP) implements a rich set of features that span across multiple layers, including: * Connection pooling and authentication (session layer) * Encryption (presentation layer) * Data writing pipeline (application layer) All these features are HDFS-specific and defined by implementation. As a result it requires non-trivial amount of work to implement HDFS clients and servers. This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652559#comment-14652559 ] Jing Zhao commented on HDFS-8828: - Thanks for working on this, Yufei! One quick comment is about the following change: {code} -if ((!syncFolder || !deleteMissing) useDiff) { +if ((!syncFolder || deleteMissing) useDiff) { throw new IllegalArgumentException( - Diff is valid only with update and delete options); + Diff is valid only with update options); } {code} Currently we delete files/directories according to DELETE diff already. Looks to me this is consistent with the deleteMissing option actually. Any specific reason we want to change the semantic here? Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651638#comment-14651638 ] Jagadesh Kiran N commented on HDFS-8784: Pre-Patch failure is not related to the changes done in the patch BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Jagadesh Kiran N Attachments: HDFS-8784-00.patch, HDFS-8784-01.patch The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8839) Erasure Coding: client occasionally gets less block locations when some datanodes fail
[ https://issues.apache.org/jira/browse/HDFS-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651524#comment-14651524 ] Walter Su commented on HDFS-8839: - bq. Otherwise, the client writing can't go on. Yes, it hangs. It's a problem. bq. the namenode should still allocate 9 locations even it knows one of them is invalid. It's not the best solution. Please check my last comment at HDFS-8220. We can continue discuss there. Erasure Coding: client occasionally gets less block locations when some datanodes fail --- Key: HDFS-8839 URL: https://issues.apache.org/jira/browse/HDFS-8839 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo 9 datanodes, write two block groups. A datanode dies when writing the first block group. When client retrieves the second block group from namenode, the returned block group only contains 8 locations occasionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651595#comment-14651595 ] Hadoop QA commented on HDFS-8784: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 15s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 4s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 162m 7s | Tests passed in hadoop-hdfs. | | | | 203m 16s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748391/HDFS-8784-01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 90b5104 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11887/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11887/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11887/console | This message was automatically generated. BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Jagadesh Kiran N Attachments: HDFS-8784-00.patch, HDFS-8784-01.patch The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8841) Catch throwable return null
[ https://issues.apache.org/jira/browse/HDFS-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651694#comment-14651694 ] Jagadesh Kiran N commented on HDFS-8841: I failed to see the chance of Error(like ClassNotFoundError) in the following code. Could you please point me to the same. Thanks! try { final Path tmp = new Path(job.get(TMP_DIR_LABEL), relativedst); if (destFileSys.delete(tmp, true)) break; } catch (Throwable ex) { // ignore, we are just cleaning up LOG.debug(Ignoring cleanup exception, ex); } Catch throwable return null --- Key: HDFS-8841 URL: https://issues.apache.org/jira/browse/HDFS-8841 Project: Hadoop HDFS Issue Type: Bug Reporter: songwanging Assignee: Jagadesh Kiran N Priority: Minor In method map of class: \hadoop-2.7.1-src\hadoop-tools\hadoop-extras\src\main\java\org\apache\hadoop\tools\DistCpV1.java. This method has this code: public void map(LongWritable key, FilePair value, OutputCollectorWritableComparable?, Text out, Reporter reporter) throws IOException { ... } catch (Throwable ex) { // ignore, we are just cleaning up LOG.debug(Ignoring cleanup exception, ex); } } } ... } Throwable is the parent type of Exception and Error, so catching Throwable means catching both Exceptions as well as Errors. An Exception is something you could recover (like IOException), an Error is something more serious and usually you could'nt recover easily (like ClassNotFoundError) so it doesn't make much sense to catch an Error. We should convert to catch Exception instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing
[ https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Gu updated HDFS-7601: --- Attachment: 0001-for-hdfs-7601.patch Operations(e.g. balance) failed due to deficient configuration parsing -- Key: HDFS-7601 URL: https://issues.apache.org/jira/browse/HDFS-7601 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.3.0, 2.6.0 Reporter: Doris Gu Assignee: Doris Gu Priority: Minor Labels: BB2015-05-TBR Attachments: 0001-for-hdfs-7601.patch Some operations, for example,balance,parses configuration(from core-site.xml,hdfs-site.xml) to get NameServiceUris to link to. Current method considers those end with or without / as two different uris, then following operation may meet errors. bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different uris which actually the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651555#comment-14651555 ] Li Bo commented on HDFS-8838: - The number of datanodes is set to 9 in the unit test. Due to the problem of HDFS-8220 or HDFS-8839, I think we should use at least 10 datanodes for testing one datanode failure. Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8838_20150729.patch, h8838_20150731.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing
[ https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651561#comment-14651561 ] Hadoop QA commented on HDFS-7601: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 1m 44s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748405/0001-for-hdfs-7601.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 90b5104 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11888/console | This message was automatically generated. Operations(e.g. balance) failed due to deficient configuration parsing -- Key: HDFS-7601 URL: https://issues.apache.org/jira/browse/HDFS-7601 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.3.0, 2.6.0 Reporter: Doris Gu Assignee: Doris Gu Priority: Minor Labels: BB2015-05-TBR Attachments: 0001-for-hdfs-7601.patch Some operations, for example,balance,parses configuration(from core-site.xml,hdfs-site.xml) to get NameServiceUris to link to. Current method considers those end with or without / as two different uris, then following operation may meet errors. bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different uris which actually the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651523#comment-14651523 ] Walter Su commented on HDFS-8220: - bq. If number of datanodes NUM_DATA_BLOCKS then throw IOException(Failed to get datablocks number of datanodes!) Yes. I saw if ( numOfDNs = NUM_DATA_BLOCKS numOfDNs GROUP_SIZE ), the OutputStream hangs and stop writing, even if the file is smaller than a cellSize. We should fix that. The writing should succeed because user could add more DN nodes later. ECWorker can recover the missing blocks. The reason is some streamers can't get {{followingBlock}}, so they keep polling from {{followingBlocks}} queue. We should stop these streamers and mark them {{failed}}. So other streamers don't have to wait them. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing
[ https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Gu updated HDFS-7601: --- Attachment: (was: 0001-for-hdfs-7601.patch) Operations(e.g. balance) failed due to deficient configuration parsing -- Key: HDFS-7601 URL: https://issues.apache.org/jira/browse/HDFS-7601 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.3.0, 2.6.0 Reporter: Doris Gu Assignee: Doris Gu Priority: Minor Labels: BB2015-05-TBR Some operations, for example,balance,parses configuration(from core-site.xml,hdfs-site.xml) to get NameServiceUris to link to. Current method considers those end with or without / as two different uris, then following operation may meet errors. bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different uris which actually the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8848) Support OAuth2 in libwebhdfs
Puneeth P created HDFS-8848: --- Summary: Support OAuth2 in libwebhdfs Key: HDFS-8848 URL: https://issues.apache.org/jira/browse/HDFS-8848 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Reporter: Puneeth P Assignee: Puneeth P As per Jira [https://issues.apache.org/jira/browse/HDFS-8155] there is a patch for WebHDFS java client. It would be good to bring libwebhdfs on par as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8823) Move replication factor into individual blocks
[ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8823: - Attachment: HDFS-8823.001.patch Move replication factor into individual blocks -- Key: HDFS-8823 URL: https://issues.apache.org/jira/browse/HDFS-8823 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8823.000.patch, HDFS-8823.001.patch This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes have two advantages: * Decoupling the namespace and the block management layer. It is a prerequisite step to move block management off the heap or to a separate process. * Increased flexibility on replicating blocks. Currently the replication factors of all blocks have to be the same. The replication factors of these blocks are equal to the highest replication factor across all snapshots. The changes will allow blocks in a file to have different replication factor, potentially saving some space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8499) Refactor BlockInfo class hierarchy with static helper class
[ https://issues.apache.org/jira/browse/HDFS-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652325#comment-14652325 ] Zhe Zhang commented on HDFS-8499: - [~szetszwo] I wonder if you've had a chance to work on reverting or reworking this change? Thanks. Refactor BlockInfo class hierarchy with static helper class --- Key: HDFS-8499 URL: https://issues.apache.org/jira/browse/HDFS-8499 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.8.0 Attachments: HDFS-8499.00.patch, HDFS-8499.01.patch, HDFS-8499.02.patch, HDFS-8499.03.patch, HDFS-8499.04.patch, HDFS-8499.05.patch, HDFS-8499.06.patch, HDFS-8499.07.patch, HDFS-8499.UCFeature.patch, HDFS-bistriped.patch In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a common abstraction for striped and contiguous UC blocks. This JIRA aims to merge it to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8220: --- Attachment: HDFS-8220-HDFS-7285-09.patch Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8220: --- Attachment: (was: HDFS-8220-HDFS-7285-09.patch) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652228#comment-14652228 ] Rakesh R commented on HDFS-8220: bq. I saw if ( numOfDNs = NUM_DATA_BLOCKS numOfDNs GROUP_SIZE ), the OutputStream hangs and stop writing, even if the file is smaller than a cellSize. We should fix that. Good catch!, I've added testcase to simulate the same. Attached patch where I'm closing the streamer which doesn't have blocklocations available. After the execution of {{StripedDataStreamer.super.locateFollowingBlock()}}, it will validate the data blocks length. Secondly, it does the check for {{(blocks == null)}}. I could see {{LocatedBlock}} will be null when there is no sufficient data node available for that index. Since we are checking for sufficient data blocks number of DNs, those {{LocatedBlock}} will never be empty. If there are no block locations available for parity blocks then those blocks will become null. I've tried an approach by closing the respective parity streamers, any thoughts? Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8848) Support OAuth2 in libwebhdfs
[ https://issues.apache.org/jira/browse/HDFS-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Puneeth P updated HDFS-8848: Issue Type: New Feature (was: Improvement) Support OAuth2 in libwebhdfs Key: HDFS-8848 URL: https://issues.apache.org/jira/browse/HDFS-8848 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Puneeth P Assignee: Puneeth P As per Jira [https://issues.apache.org/jira/browse/HDFS-8155] there is a patch for WebHDFS java client. It would be good to bring libwebhdfs on par as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8220: --- Attachment: HDFS-8220-HDFS-7285-09.patch Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7929) inotify unable fetch pre-upgrade edit log segments once upgrade starts
[ https://issues.apache.org/jira/browse/HDFS-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7929: -- Labels: 2.6.1-candidate (was: ) inotify unable fetch pre-upgrade edit log segments once upgrade starts -- Key: HDFS-7929 URL: https://issues.apache.org/jira/browse/HDFS-7929 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7929-000.patch, HDFS-7929-001.patch, HDFS-7929-002.patch, HDFS-7929-003.patch inotify is often used to periodically poll HDFS events. However, once an HDFS upgrade has started, edit logs are moved to /previous on the NN, which is not accessible. Moreover, once the upgrade is finalized /previous is currently lost forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8480: -- Labels: 2.6.1-candidate (was: ) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them - Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation
[ https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8804: Attachment: HDFS-8804.001.patch Thanks Nicholas for the review! Update the patch to address the comments. I did not add synchronized to {{getParityBuffer}} because it is only used in StatefulStripeReader which is already protected by the lock. Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation --- Key: HDFS-8804 URL: https://issues.apache.org/jira/browse/HDFS-8804 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for the stripe buffer and the buffers holding parity data. It's better to get ByteBuffer from DirectBufferPool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8846) Create edit log files with old layout version for upgrade testing
[ https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652515#comment-14652515 ] Zhe Zhang commented on HDFS-8846: - Thanks Ming for the feedback! I was planning to only add edit log files. But I think creating an entire NN dir structure with old layout version is a good idea. It could support a broader range of upgrade tests. Create edit log files with old layout version for upgrade testing - Key: HDFS-8846 URL: https://issues.apache.org/jira/browse/HDFS-8846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8480, we should create some edit log files with old layout version, to test whether they can be correctly handled in upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated HDFS-8828: --- Attachment: HDFS-8828.003.patch Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652525#comment-14652525 ] Yufei Gu commented on HDFS-8828: Hi Yongjun, Thank you very much for detailed code review and all nice suggestion. I've upload a new patch(HDFS-8828.003.path) for above comments. Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
Zhe Zhang created HDFS-8849: --- Summary: fsck should report number of missing blocks with replication factor 1 Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8046) Allow better control of getContentSummary
[ https://issues.apache.org/jira/browse/HDFS-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8046: -- Labels: 2.6.1-candidate 2.7.2-candidate (was: 2.6.1-candidate) Allow better control of getContentSummary - Key: HDFS-8046 URL: https://issues.apache.org/jira/browse/HDFS-8046 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Labels: 2.6.1-candidate, 2.7.2-candidate Fix For: 2.8.0 Attachments: HDFS-8046.v1.patch On busy clusters, users performing quota checks against a big directory structure can affect the namenode performance. It has become a lot better after HDFS-4995, but as clusters get bigger and busier, it is apparent that we need finer grain control to avoid long read lock causing throughput drop. Even with unfair namesystem lock setting, a long read lock (10s of milliseconds) can starve many readers and especially writers. So the locking duration should be reduced, which can be done by imposing a lower count-per-iteration limit in the existing implementation. But HDFS-4995 came with a fixed amount of sleep between locks. This needs to be made configurable, so that {{getContentSummary()}} doesn't get exceedingly slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7894) Rolling upgrade readiness is not updated in jmx until query command is issued.
[ https://issues.apache.org/jira/browse/HDFS-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7894: -- Labels: 2.6.1-candidate (was: ) Rolling upgrade readiness is not updated in jmx until query command is issued. -- Key: HDFS-7894 URL: https://issues.apache.org/jira/browse/HDFS-7894 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Brahma Reddy Battula Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: HDFS-7894-002.patch, HDFS-7894-003.patch, HDFS-7894.patch When a hdfs rolling upgrade is started and a rollback image is created/uploaded, the active NN does not update its {{rollingUpgradeInfo}} until it receives a query command via RPC. This results in inconsistent info being showing up in the web UI and its jmx page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7446: -- Labels: 2.6.1-candidate (was: ) HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7446.001.patch, HDFS-7446.002.patch, HDFS-7446.003.patch HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652403#comment-14652403 ] Tsz Wo Nicholas Sze commented on HDFS-8838: --- [~walter.k.su], thanks for showing a detailed failure case. It is a multiple failure case. I need to think about how to handle it. Will work on it in HDFS-8383. Or are you interested in working on HDFS-8383? [~libo-intel], thanks for the suggestion. A datanode is started in each test. So we already has 10 datanodes. Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8838_20150729.patch, h8838_20150731.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7182) JMX metrics aren't accessible when NN is busy
[ https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7182: -- Labels: 2.6.1-candidate (was: ) JMX metrics aren't accessible when NN is busy - Key: HDFS-7182 URL: https://issues.apache.org/jira/browse/HDFS-7182 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7182-2.patch, HDFS-7182-3.patch, HDFS-7182.patch HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple new metrics have been added. It turns out RollingUpgradeStatus requires FSNamesystem read lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7314) When the DFSClient lease cannot be renewed, abort open-for-write files rather than the entire DFSClient
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7314: -- Labels: 2.6.1-candidate 2.7.2-candidate BB2015-05-TBR (was: BB2015-05-TBR) When the DFSClient lease cannot be renewed, abort open-for-write files rather than the entire DFSClient --- Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Labels: 2.6.1-candidate, 2.7.2-candidate, BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-7314-2.patch, HDFS-7314-3.patch, HDFS-7314-4.patch, HDFS-7314-5.patch, HDFS-7314-6.patch, HDFS-7314-7.patch, HDFS-7314-8.patch, HDFS-7314-9.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state. Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
[ https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652421#comment-14652421 ] Allen Wittenauer commented on HDFS-8849: That's pretty much covered already. fsck will already report the number of blocks that don't have the minimum replication (whether that be 1 or some higher number). fsck should report number of missing blocks with replication factor 1 - Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7916: -- Labels: (was: 2.6.1-candidate) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Vinayakumar B Assignee: Rushabh S Shah Priority: Critical Fix For: 2.7.1 Attachments: HDFS-7916-01.patch, HDFS-7916-1.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
[ https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652427#comment-14652427 ] Zhe Zhang commented on HDFS-8849: - Thanks for the input Allen. I guess there's still a small gap. Even when we know 1) the number of missing blocks; 2) number of blocks below min replication, it's not always possible to calculate the number of blocks meeting both conditions. So agreed that it's partially covered. This JIRA will just fill in the small gap. fsck should report number of missing blocks with replication factor 1 - Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
[ https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652439#comment-14652439 ] Allen Wittenauer commented on HDFS-8849: I'm not sure what benefit that number provides. If I'm missing a block below min rep, I'm still going through the full fsck output to try and find it. fsck should report number of missing blocks with replication factor 1 - Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
[ https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652471#comment-14652471 ] Allen Wittenauer commented on HDFS-8849: bq.A replication factor of 1 indicates the data is disposable. So when checking fsck on a directory the user might want to separately consider this metric (e.g., less alarmed about the number of disposable data that's missing). Meanwhile, back in real life, users set a repl factor of 1 to avoid quotas problems. I've seen it over and over and over. It's why a lot of us are starting to use min repl of 2. Special casing 1 is a dangerous capitulation to a bad practice that should be outlawed on production systems. fsck should report number of missing blocks with replication factor 1 - Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8046) Allow better control of getContentSummary
[ https://issues.apache.org/jira/browse/HDFS-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8046: -- Labels: 2.6.1-candidate (was: ) Allow better control of getContentSummary - Key: HDFS-8046 URL: https://issues.apache.org/jira/browse/HDFS-8046 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Labels: 2.6.1-candidate Fix For: 2.8.0 Attachments: HDFS-8046.v1.patch On busy clusters, users performing quota checks against a big directory structure can affect the namenode performance. It has become a lot better after HDFS-4995, but as clusters get bigger and busier, it is apparent that we need finer grain control to avoid long read lock causing throughput drop. Even with unfair namesystem lock setting, a long read lock (10s of milliseconds) can starve many readers and especially writers. So the locking duration should be reduced, which can be done by imposing a lower count-per-iteration limit in the existing implementation. But HDFS-4995 came with a fixed amount of sleep between locks. This needs to be made configurable, so that {{getContentSummary()}} doesn't get exceedingly slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7916: -- Labels: 2.6.1-candidate (was: ) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Vinayakumar B Assignee: Rushabh S Shah Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: HDFS-7916-01.patch, HDFS-7916-1.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652430#comment-14652430 ] Tsz Wo Nicholas Sze commented on HDFS-8220: --- Some minor comment: {code} if (!coordinator.getStripedDataStreamer(i).isFailed()) { +StripedDataStreamer curStreamer = coordinator +.getStripedDataStreamer(i); {code} Let's call getStripedDataStreamer before the if. How about renaming curStreamer to si? CurrentStreamer has a different meaning in DFSStripedOutputStream. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
[ https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652451#comment-14652451 ] Zhe Zhang commented on HDFS-8849: - I guess the motivation is the same as HDFS-7165. A replication factor of 1 indicates the data is disposable. So when checking {{fsck}} on a directory the user might want to separately consider this metric (e.g., less alarmed about the number of disposable data that's missing). fsck should report number of missing blocks with replication factor 1 - Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)