[jira] [Updated] (HDFS-7966) New Data Transfer Protocol via HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HDFS-7966: Attachment: TestHttp2ReadBlockInsideEventLoop.svg The flame graph of a {{TestHttp2ReadBlockInsideEventLoop}} run. New Data Transfer Protocol via HTTP/2 - Key: HDFS-7966 URL: https://issues.apache.org/jira/browse/HDFS-7966 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Qianqian Shi Labels: gsoc, gsoc2015, mentor Attachments: GSoC2015_Proposal.pdf, TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, TestHttp2ReadBlockInsideEventLoop.svg The current Data Transfer Protocol (DTP) implements a rich set of features that span across multiple layers, including: * Connection pooling and authentication (session layer) * Encryption (presentation layer) * Data writing pipeline (application layer) All these features are HDFS-specific and defined by implementation. As a result it requires non-trivial amount of work to implement HDFS clients and servers. This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633089#comment-14633089 ] Duo Zhang commented on HDFS-7966: - Write a single threaded testcase that do all the test works inside event loop. https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/web/dtp/TestHttp2ReadBlockInsideEventLoop.java And at server side, I remove the thread pool in {{ReadBlockHandler}}. The result is {noformat} *** time based on tcp 17734ms *** time based on http2 20019ms *** time based on tcp 18878ms *** time based on http2 21422ms *** time based on tcp 17562ms *** time based on http2 20568ms *** time based on tcp 18726ms *** time based on http2 20251ms *** time based on tcp 18632ms *** time based on http2 21227ms {noformat} The average time of original tcp is 18306.4ms, and HTTP/2 is 20697.4ms. 20697.4 / 18306.4 = 1.13, so HTTP/2 is 13% slower than tcp. In the above test it is 30% slower, so I think context switch maybe one of the problem why HTTP/2 is much slower than tcp. Will do this test on a real cluster to get more data. And the one {{EventLoop}} per datanode problem, I think it is a problem on a small cluster. So I think we should allow creating multiple HTTP/2 connections to one datanode. I will modify {{Http2ConnectionPool}} and do the test again. Thanks. New Data Transfer Protocol via HTTP/2 - Key: HDFS-7966 URL: https://issues.apache.org/jira/browse/HDFS-7966 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Qianqian Shi Labels: gsoc, gsoc2015, mentor Attachments: GSoC2015_Proposal.pdf, TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg The current Data Transfer Protocol (DTP) implements a rich set of features that span across multiple layers, including: * Connection pooling and authentication (session layer) * Encryption (presentation layer) * Data writing pipeline (application layer) All these features are HDFS-specific and defined by implementation. As a result it requires non-trivial amount of work to implement HDFS clients and servers. This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8750) FIleSystem does not honor Configuration.getClassLoader() while loading FileSystem implementations
[ https://issues.apache.org/jira/browse/HDFS-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633385#comment-14633385 ] Steve Loughran commented on HDFS-8750: -- Don't worry about it: it's what the review process is for. If there is one one lesson all of us working on Hadoop eventually learn, there are no simple changes -a one-liner may not add a major new feature, but it is at much risk of breaking things as the bigger patches. Hence the obsession with tests FIleSystem does not honor Configuration.getClassLoader() while loading FileSystem implementations - Key: HDFS-8750 URL: https://issues.apache.org/jira/browse/HDFS-8750 Project: Hadoop HDFS Issue Type: Bug Components: fs, HDFS Reporter: Himanshu Assignee: Himanshu Attachments: HDFS-8750.001.patch, HDFS-8750.002.patch In FileSystem.loadFileSystems(), at https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2652 a scheme - FileSystem implementation map is created from the jars available on classpath. It uses Thread.currentThread().getClassLoader() via ServiceLoader.load(FileSystem.class) Instead, loadFileSystems() should take Configuration as an argument and should first check if a classloader is configured in configuration.getClassLoader(), if yes then ServiceLoader.load(FileSystem.class, configuration.getClassLoader()) should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8753) Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanaka kumar avvaru updated HDFS-8753: -- Attachment: HDFS-8753-HDFS-7240.00.patch Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml --- Key: HDFS-8753 URL: https://issues.apache.org/jira/browse/HDFS-8753 Project: Hadoop HDFS Issue Type: Sub-task Reporter: kanaka kumar avvaru Assignee: kanaka kumar avvaru Attachments: HDFS-8753-HDFS-7240.00.patch This JIRA proposes adding ozone-default.xml to main resources ozone-site.xml to test resources with default known parameters as of now. Also, need to unify {{StorageContainerConfiguration}} to initialize conf with both the files as at present there are two classes with this name. {code} hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\ozone\StorageContainerConfiguration.java loads only ozone-site.xml hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\storagecontainer\StorageContainerConfiguration.java loads only storage-container-site.xml {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8753) Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanaka kumar avvaru updated HDFS-8753: -- Status: Patch Available (was: Open) Attached patch to refer {{OzoneConfiguration}} and remov the old {{StorageContainerConfiguration}} duplicate files. Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml --- Key: HDFS-8753 URL: https://issues.apache.org/jira/browse/HDFS-8753 Project: Hadoop HDFS Issue Type: Sub-task Reporter: kanaka kumar avvaru Assignee: kanaka kumar avvaru Attachments: HDFS-8753-HDFS-7240.00.patch This JIRA proposes adding ozone-default.xml to main resources ozone-site.xml to test resources with default known parameters as of now. Also, need to unify {{StorageContainerConfiguration}} to initialize conf with both the files as at present there are two classes with this name. {code} hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\ozone\StorageContainerConfiguration.java loads only ozone-site.xml hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\storagecontainer\StorageContainerConfiguration.java loads only storage-container-site.xml {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8753) Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633471#comment-14633471 ] Hadoop QA commented on HDFS-8753: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 2s | Findbugs (version ) appears to be broken on HDFS-7240. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 11s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 17s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 7s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 0m 22s | Tests failed in hadoop-hdfs. | | | | 43m 5s | | \\ \\ || Reason || Tests || | Failed build | hadoop-hdfs | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746081/HDFS-8753-HDFS-7240.00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7240 / 8576861 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11749/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11749/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11749/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11749/console | This message was automatically generated. Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml --- Key: HDFS-8753 URL: https://issues.apache.org/jira/browse/HDFS-8753 Project: Hadoop HDFS Issue Type: Sub-task Reporter: kanaka kumar avvaru Assignee: kanaka kumar avvaru Attachments: HDFS-8753-HDFS-7240.00.patch This JIRA proposes adding ozone-default.xml to main resources ozone-site.xml to test resources with default known parameters as of now. Also, need to unify {{StorageContainerConfiguration}} to initialize conf with both the files as at present there are two classes with this name. {code} hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\ozone\StorageContainerConfiguration.java loads only ozone-site.xml hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\storagecontainer\StorageContainerConfiguration.java loads only storage-container-site.xml {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633492#comment-14633492 ] Walter Su commented on HDFS-8760: - LGTM. +1 after address one minor issue: (related) updateReadStatistics(..) is called twice, in readCells(..) and ByteBufferStrategy.doRead(..) I found other issues while reviewing the patch: (not related) 1. some util functions are static import from StripedBlockUtil, while others are called by StripedBlockUtil.method(...) 2. DFSStripedInputStream.read(Bytebuffer) is identical with the one in super class. 3. StripeReader / readStripe(..) the stripe means AlignedStripe, may across many real stripes. Need some javadoc. 4. Suppose {{buf}} is the buffer given by user. Pread() makes blockReader directly put data to {{buf}}. Stateful read() needs blockReader put data to curStripeBuf, then copy curStripeBuf to {{buf}}. curStripeBuf is useful when user calls read()/read(small buf) frequently, especially when there are bad DN. I think if buf.size curStripeBuf.size we can directly write data to buf without curStripeBuf. Maybe copy is fine. But why is it a DirectByteBuffer? I don't know how does it help decoding, but it's bad that copy data from heap to native memory, then copy from native memory to heap, if there's no need to decode. We need a further digging. Erasure Coding: reuse BlockReader when reading the same block in pread -- Key: HDFS-8760 URL: https://issues.apache.org/jira/browse/HDFS-8760 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8760.000.patch Currently in pread, we create a new block reader for each aligned stripe even though these stripes belong to the same block. It's better to reuse them to avoid unnecessary block reader creation overhead. This can also avoid reading from the same bad DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633567#comment-14633567 ] Kai Zheng commented on HDFS-7337: - Thanks [~andrew.wang] for the thoughts to move on this a bit. They sound good to me and I'm fine. Some points for further discussion: bq. ...also the need to persist this information in the fsimage/editlog, ... Did you mean schema? If so, it looks like a point we all agree with. Per discussion in HDFS-7859 and related, we planed to do it in follow-on, along with support of multiple schemas. For now we only support one system defined schema, RS(6, 3). bq. Codec enum (e.g. RS, LRC, etc), ...When we get to the point of fully-pluggable codecs, we can add a special wildcard enum value to support this Good to have the enum for built-in codecs for now and the wildcard for customized additional ones in future. bq. In client's hdfs-site.xml, we can configure a codec implementation for every codec. This would look something like... In existing codes we're using the following format for the similar purpose. Please confirm if it looks good. {noformat} /** Raw coder factory for the RS codec. */ public static final String IO_ERASURECODE_CODEC_RS_RAWCODER_KEY = io.erasurecode.codec.rs.rawcoder; /** Raw coder factory for the XOR codec. */ public static final String IO_ERASURECODE_CODEC_XOR_RAWCODER_KEY = io.erasurecode.codec.xor.rawcoder; {noformat} The related codes reside in {{CodecUtil}} reading above configurations. Would you check it if necessary. When we're clear what's needed to be done for the phase, I would have an issue to get them done separately. Thanks. Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Kai Zheng Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, PluggableErasureCodec-v2.pdf, PluggableErasureCodec-v3.pdf, PluggableErasureCodec.pdf According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633501#comment-14633501 ] Walter Su commented on HDFS-8760: - I just saw HADOOP-12060 fixes #4. Please ignore #4. Erasure Coding: reuse BlockReader when reading the same block in pread -- Key: HDFS-8760 URL: https://issues.apache.org/jira/browse/HDFS-8760 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8760.000.patch Currently in pread, we create a new block reader for each aligned stripe even though these stripes belong to the same block. It's better to reuse them to avoid unnecessary block reader creation overhead. This can also avoid reading from the same bad DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8604) Erasure Coding: update invalidateBlock(..) logic for striped block
[ https://issues.apache.org/jira/browse/HDFS-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su resolved HDFS-8604. - Resolution: Duplicate already fixed in HDFS-8619 Erasure Coding: update invalidateBlock(..) logic for striped block -- Key: HDFS-8604 URL: https://issues.apache.org/jira/browse/HDFS-8604 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su {code} private boolean invalidateBlock(BlockToMarkCorrupt b, DatanodeInfo dn ) throws IOException { .. } else if (nr.liveReplicas() = 1) { // If we have at least one copy on a live node, then we can delete it. addToInvalidates(b.corrupted, dn); removeStoredBlock(b.stored, node); {code} We don't delete corrupted block if all we left is corrupted block. We give user the decision. So user has chance to recover it manually. We should not compare liveReplicas() of Striped block with 1. The logic need update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.
[ https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633612#comment-14633612 ] Walter Su commented on HDFS-8483: - Sorry, the description says this jira test DatanodeProtocol#reportBadBlocks, but the patch test blockreport. DN will report bad blocks using DatanodeProtocol#reportBadBlocks when: 1. VolumeScanner found bad block (See DataNode.reportBadBlocks(..) ) 2. blockReceiver found upstream DN in the pipiline has corrupted block (See BlockReceiver.verifyChunks(..) ) 3. DN gets DNA_TRANSFER command from NN, and try to copy a replica from another DN. (See DataNode.reportBadBlocks(..) ) EC Striping doesn't have #2,#3 situation. And I think it's trival to test #1 because NamenodeRpcServer.reportBadBlocks(..) has the same implementation for ClientProtocol, DatanodeProtocol. It's more important to write a striping version of {{TestProcessCorruptBlocks}}. Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block. -- Key: HDFS-8483 URL: https://issues.apache.org/jira/browse/HDFS-8483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma Fix For: HDFS-7285 Attachments: HDFS-8483.0.patch We can mimic one/several DataNode(s) reporting bad block(s) (which belong to a striped block) to the NameNode (through the DatanodeProtocol#reportBadBlocks call), and check if the recovery/invalidation work can be correctly scheduled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8753) Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633611#comment-14633611 ] kanaka kumar avvaru commented on HDFS-8753: --- Looks like some .proto files are moved to {{hadoop-hdfs-client}} project. [~arpitagarwal], can you please check if we need to update build config for the branch. Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml --- Key: HDFS-8753 URL: https://issues.apache.org/jira/browse/HDFS-8753 Project: Hadoop HDFS Issue Type: Sub-task Reporter: kanaka kumar avvaru Assignee: kanaka kumar avvaru Attachments: HDFS-8753-HDFS-7240.00.patch This JIRA proposes adding ozone-default.xml to main resources ozone-site.xml to test resources with default known parameters as of now. Also, need to unify {{StorageContainerConfiguration}} to initialize conf with both the files as at present there are two classes with this name. {code} hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\ozone\StorageContainerConfiguration.java loads only ozone-site.xml hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\storagecontainer\StorageContainerConfiguration.java loads only storage-container-site.xml {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8799) Erasure Coding: add tests for process corrupt striped blocks
Walter Su created HDFS-8799: --- Summary: Erasure Coding: add tests for process corrupt striped blocks Key: HDFS-8799 URL: https://issues.apache.org/jira/browse/HDFS-8799 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633669#comment-14633669 ] Nathan Roberts commented on HDFS-8791: -- Hi [~cmccabe]. Thanks for the idea. Yes, I had actually tried something like that. I actually just kept a loop of DU's running on the node (outside of the datanode process for simplicity sake). I thought this would prevent it from happening but it turns out it still gets into this situation. I suspect the reason is that when there is memory pressure, it will start to seek a little, and then once it starts to seek a little the system quickly degrades because buffers are being thrown away faster than the disks can seek. block ID-based DN storage layout can be very slow for datanode on ext4 -- Key: HDFS-8791 URL: https://issues.apache.org/jira/browse/HDFS-8791 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Priority: Critical We are seeing cases where the new directory layout causes the datanode to basically cause the disks to seek for 10s of minutes. This can be when the datanode is running du, and it can also be when it is performing a checkDirs(). Both of these operations currently scan all directories in the block pool and that's very expensive in the new layout. The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K leaf directories where block files are placed. So, what we have on disk is: - 256 inodes for the first level directories - 256 directory blocks for the first level directories - 256*256 inodes for the second level directories - 256*256 directory blocks for the second level directories - Then the inodes and blocks to store the the HDFS blocks themselves. The main problem is the 256*256 directory blocks. inodes and dentries will be cached by linux and one can configure how likely the system is to prune those entries (vfs_cache_pressure). However, ext4 relies on the buffer cache to cache the directory blocks and I'm not aware of any way to tell linux to favor buffer cache pages (even if it did I'm not sure I would want it to in general). Also, ext4 tries hard to spread directories evenly across the entire volume, this basically means the 64K directory blocks are probably randomly spread across the entire disk. A du type scan will look at directories one at a time, so the ioscheduler can't optimize the corresponding seeks, meaning the seeks will be random and far. In a system I was using to diagnose this, I had 60K blocks. A DU when things are hot is less than 1 second. When things are cold, about 20 minutes. How do things get cold? - A large set of tasks run on the node. This pushes almost all of the buffer cache out, causing the next DU to hit this situation. We are seeing cases where a large job can cause a seek storm across the entire cluster. Why didn't the previous layout see this? - It might have but it wasn't nearly as pronounced. The previous layout would be a few hundred directory blocks. Even when completely cold, these would only take a few a hundred seeks which would mean single digit seconds. - With only a few hundred directories, the odds of the directory blocks getting modified is quite high, this keeps those blocks hot and much less likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8760: Attachment: HDFS-8760-HDFS-7285.001.patch Thanks for the review, Walter! Update the patch to address all your comments. In the meanwhile, {{testWriteReadUsingWebHdfs}} may fail after the change. The failure may be related to HDFS-8797. So currently I temporarily disabled the pread test in {{testWriteReadUsingWebHdfs}} and we can add it back later. Erasure Coding: reuse BlockReader when reading the same block in pread -- Key: HDFS-8760 URL: https://issues.apache.org/jira/browse/HDFS-8760 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8760-HDFS-7285.001.patch, HDFS-8760.000.patch Currently in pread, we create a new block reader for each aligned stripe even though these stripes belong to the same block. It's better to reuse them to avoid unnecessary block reader creation overhead. This can also avoid reading from the same bad DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633730#comment-14633730 ] Chang Li commented on HDFS-6407: [~benoyantony] is there any status on this issue? new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Benoy Antony Priority: Critical Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633741#comment-14633741 ] Hadoop QA commented on HDFS-6407: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 0m 0s | Pre-patch trunk compilation is healthy. | | {color:red}-1{color} | @author | 0m 0s | The patch appears to contain 2 @author tags which the Hadoop community has agreed to not allow in code contributions. | | {color:red}-1{color} | release audit | 0m 14s | The applied patch generated 3 release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 0m 17s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730214/HDFS-6407-003.patch | | Optional Tests | | | git revision | trunk / 98c2bc8 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11750/artifact/patchprocess/patchReleaseAuditProblems.txt | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11750/console | This message was automatically generated. new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Benoy Antony Priority: Critical Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2433) TestFileAppend4 fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-2433. -- Resolution: Cannot Reproduce I don't think I've seen this fail in a long, long time. Going to close this out. Please reopen if you disagree. TestFileAppend4 fails intermittently Key: HDFS-2433 URL: https://issues.apache.org/jira/browse/HDFS-2433 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, test Affects Versions: 0.20.205.0, 1.0.0 Reporter: Robert Joseph Evans Priority: Critical Attachments: failed.tar.bz2 A Jenkins build we have running failed twice in a row with issues form TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the error I ran TestFileAppend4 in a loop over night saving the results away. (No clean was done in between test runs) When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] tests fail about 10% of the time (14 times out of 130 tries) They all fail with something like the following. Often it is only one of the tests that fail, but I have seen as many as two fail in one run. {noformat} Testcase: testAppendSyncReplication2 took 32.198 sec FAILED Should have 2 replicas for that block, not 1 junit.framework.AssertionFailedError: Should have 2 replicas for that block, not 1 at org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477) at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425) {noformat} I also saw several other tests that are a part of TestFileApped4 fail during this experiment. They may all be related to one another so I am filing them in the same JIRA. If it turns out that they are not related then they can be split up later. testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the time {noformat} Testcase: testAppendSyncBlockPlusBbw took 1.633 sec FAILED unexpected file size! received=0 , expected=1024 junit.framework.AssertionFailedError: unexpected file size! received=0 , expected=1024 at org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136) at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401) {noformat} testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of the time {noformat} Testcase: testAppendSyncChecksum1 took 32.385 sec FAILED Should have 1 replica for that block, not 2 junit.framework.AssertionFailedError: Should have 1 replica for that block, not 2 at org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556) at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500) {noformat} I will attach logs for all of the failures. Be aware that I did change some of the logging messages in this test so I could better see when testAppendSyncReplication started and ended. Other then that the code is stock 0.20.205 RC2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3660) TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out
[ https://issues.apache.org/jira/browse/HDFS-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3660. -- Resolution: Cannot Reproduce Target Version/s: (was: ) This is an ancient/stale flaky test JIRA. Resolving. TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out Key: HDFS-3660 URL: https://issues.apache.org/jira/browse/HDFS-3660 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Priority: Minor Saw this on a recent jenkins run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8788) Implement unit tests for remote block reader in libhdfspp
[ https://issues.apache.org/jira/browse/HDFS-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633930#comment-14633930 ] James Clampffer commented on HDFS-8788: --- +1 Implement unit tests for remote block reader in libhdfspp - Key: HDFS-8788 URL: https://issues.apache.org/jira/browse/HDFS-8788 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8788.000.patch This jira proposes to implement unit tests for the remote block reader in gmock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8760: Status: Patch Available (was: Open) Erasure Coding: reuse BlockReader when reading the same block in pread -- Key: HDFS-8760 URL: https://issues.apache.org/jira/browse/HDFS-8760 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8760-HDFS-7285.001.patch, HDFS-8760.000.patch Currently in pread, we create a new block reader for each aligned stripe even though these stripes belong to the same block. It's better to reuse them to avoid unnecessary block reader creation overhead. This can also avoid reading from the same bad DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8788) Implement unit tests for remote block reader in libhdfspp
[ https://issues.apache.org/jira/browse/HDFS-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633929#comment-14633929 ] James Clampffer commented on HDFS-8788: --- +1 Implement unit tests for remote block reader in libhdfspp - Key: HDFS-8788 URL: https://issues.apache.org/jira/browse/HDFS-8788 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8788.000.patch This jira proposes to implement unit tests for the remote block reader in gmock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3811) TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky
[ https://issues.apache.org/jira/browse/HDFS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3811. -- Resolution: Cannot Reproduce I don't think I've seen this fail in a very long time. Going to resolve this. Please reopen if you disagree. TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky - Key: HDFS-3811 URL: https://issues.apache.org/jira/browse/HDFS-3811 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Andrew Wang Assignee: Todd Lipcon Attachments: stacktrace, testfail-editlog.log, testfail.log, testpersistblocks.txt This test failed on a recent Jenkins build, but passes for me locally. Seems flaky. See: https://builds.apache.org/job/PreCommit-HDFS-Build/3021//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633946#comment-14633946 ] Allen Wittenauer commented on HDFS-8344: {code} dfs.block.uc.max.recovery.attemps {code} Typo on the configuration entry. Also, should probably be in hdfs-default.xml. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-4001) TestSafeMode#testInitializeReplQueuesEarly may time out
[ https://issues.apache.org/jira/browse/HDFS-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4001. -- Resolution: Fixed Haven't seen this fail in a very long time. Closing this out. Feel free to reopen if you disagree. TestSafeMode#testInitializeReplQueuesEarly may time out --- Key: HDFS-4001 URL: https://issues.apache.org/jira/browse/HDFS-4001 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Attachments: timeout.txt.gz Saw this failure on a recent branch-2 jenkins run, has also been seen on trunk. {noformat} java.util.concurrent.TimeoutException: Timed out waiting for condition at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:107) at org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:191) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8673) HDFS reports file already exists if there is a file/dir name end with ._COPYING_
[ https://issues.apache.org/jira/browse/HDFS-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633859#comment-14633859 ] Hadoop QA commented on HDFS-8673: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 9s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 39s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 53s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 9s | Tests passed in hadoop-common. | | | | 60m 26s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746128/HDFS-8673.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 98c2bc8 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11751/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11751/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11751/console | This message was automatically generated. HDFS reports file already exists if there is a file/dir name end with ._COPYING_ Key: HDFS-8673 URL: https://issues.apache.org/jira/browse/HDFS-8673 Project: Hadoop HDFS Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Attachments: HDFS-8673.000-WIP.patch, HDFS-8673.000.patch, HDFS-8673.001.patch, HDFS-8673.002.patch, HDFS-8673.003.patch, HDFS-8673.003.patch Because CLI is using CommandWithDestination.java which add ._COPYING_ to the tail of file name when it does the copy. It will cause problem if there is a file/dir already called *._COPYING_ on HDFS. For file: -bash-4.1$ hadoop fs -put 5M /user/occ/ -bash-4.1$ hadoop fs -mv /user/occ/5M /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup5242880 2015-06-26 05:16 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup 131072 2015-06-26 05:19 /user/occ/5M For dir: -bash-4.1$ hadoop fs -mkdir /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items drwxr-xr-x - occ supergroup 0 2015-06-26 05:24 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M put: /user/occ/5M._COPYING_ already exists as a directory -bash-4.1$ hadoop fs -ls /user/occ/ (/user/occ/5M._COPYING_ is gone) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8673) HDFS reports file already exists if there is a file/dir name end with ._COPYING_
[ https://issues.apache.org/jira/browse/HDFS-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He reassigned HDFS-8673: - Assignee: Chen He HDFS reports file already exists if there is a file/dir name end with ._COPYING_ Key: HDFS-8673 URL: https://issues.apache.org/jira/browse/HDFS-8673 Project: Hadoop HDFS Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Attachments: HDFS-8673.000-WIP.patch, HDFS-8673.000.patch, HDFS-8673.001.patch, HDFS-8673.002.patch, HDFS-8673.003.patch Because CLI is using CommandWithDestination.java which add ._COPYING_ to the tail of file name when it does the copy. It will cause problem if there is a file/dir already called *._COPYING_ on HDFS. For file: -bash-4.1$ hadoop fs -put 5M /user/occ/ -bash-4.1$ hadoop fs -mv /user/occ/5M /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup5242880 2015-06-26 05:16 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup 131072 2015-06-26 05:19 /user/occ/5M For dir: -bash-4.1$ hadoop fs -mkdir /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items drwxr-xr-x - occ supergroup 0 2015-06-26 05:24 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M put: /user/occ/5M._COPYING_ already exists as a directory -bash-4.1$ hadoop fs -ls /user/occ/ (/user/occ/5M._COPYING_ is gone) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633804#comment-14633804 ] Nathan Roberts commented on HDFS-8791: -- I agree we should optimize all the potential scans (du, checkDirs, directoryScanner, etc) I also think we need to do something more general because I feel like people will trip on this in all sorts of ways. Even tools outside of the DN process that do periodic scans will be affected and will in-turn adversely affect the datenode's performance. Also, it's hard to see this problem until you're running at scale so it will be difficult to catch jiras that introduce yet another scan, because they run really fast when everything is in memory. I'm wondering if we shouldn't move to a hashing scheme that is more dynamic and grows/shrinks based on the number of blocks in the volume. A consistent hash to minimize renames, plus some logic that knows how to look in two places (old hash, new hash), seems like it might work. We could set a threshold of avg 100 blocks per directory, when we cross that threshold then we add enough subdirs to bring the avg down to 95. I think ext2 and ext3 will see a similar problem. Are you seeing something different? I'll admit that my understanding of the differences isn't exhaustive, but it sure seems like all of them rely on the buffer cache to maintain directory blocks and all of them try to spread directories across the disk, so they'd all be subject to the same sort of thing. block ID-based DN storage layout can be very slow for datanode on ext4 -- Key: HDFS-8791 URL: https://issues.apache.org/jira/browse/HDFS-8791 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Priority: Critical We are seeing cases where the new directory layout causes the datanode to basically cause the disks to seek for 10s of minutes. This can be when the datanode is running du, and it can also be when it is performing a checkDirs(). Both of these operations currently scan all directories in the block pool and that's very expensive in the new layout. The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K leaf directories where block files are placed. So, what we have on disk is: - 256 inodes for the first level directories - 256 directory blocks for the first level directories - 256*256 inodes for the second level directories - 256*256 directory blocks for the second level directories - Then the inodes and blocks to store the the HDFS blocks themselves. The main problem is the 256*256 directory blocks. inodes and dentries will be cached by linux and one can configure how likely the system is to prune those entries (vfs_cache_pressure). However, ext4 relies on the buffer cache to cache the directory blocks and I'm not aware of any way to tell linux to favor buffer cache pages (even if it did I'm not sure I would want it to in general). Also, ext4 tries hard to spread directories evenly across the entire volume, this basically means the 64K directory blocks are probably randomly spread across the entire disk. A du type scan will look at directories one at a time, so the ioscheduler can't optimize the corresponding seeks, meaning the seeks will be random and far. In a system I was using to diagnose this, I had 60K blocks. A DU when things are hot is less than 1 second. When things are cold, about 20 minutes. How do things get cold? - A large set of tasks run on the node. This pushes almost all of the buffer cache out, causing the next DU to hit this situation. We are seeing cases where a large job can cause a seek storm across the entire cluster. Why didn't the previous layout see this? - It might have but it wasn't nearly as pronounced. The previous layout would be a few hundred directory blocks. Even when completely cold, these would only take a few a hundred seeks which would mean single digit seconds. - With only a few hundred directories, the odds of the directory blocks getting modified is quite high, this keeps those blocks hot and much less likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8753) Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633875#comment-14633875 ] Anu Engineer commented on HDFS-8753: Hi [~kanaka], Thanks for the patch. I am able to build this patch on my local machine, and from the build logs it looks like it failed due to {code} {Exception in thread main java.io.FileNotFoundException: /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/HDFS-7240FindbugsWarningshadoop-hdfs.xml (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at edu.umd.cs.findbugs.SortedBugCollection.progessMonitoredInputStream(SortedBugCollection.java:1231) at edu.umd.cs.findbugs.SortedBugCollection.readXML(SortedBugCollection.java:308) at edu.umd.cs.findbugs.SortedBugCollection.readXML(SortedBugCollection.java:295) at edu.umd.cs.findbugs.workflow.Filter.main(Filter.java:712) Pre-patch HDFS-7240 findbugs is broken? {code} Not being able to run findbugs on the pre-patch build. I have re-submitted the build to if we can repro this issue https://builds.apache.org/job/PreCommit-HDFS-Build/11752/console Thanks Anu Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml --- Key: HDFS-8753 URL: https://issues.apache.org/jira/browse/HDFS-8753 Project: Hadoop HDFS Issue Type: Sub-task Reporter: kanaka kumar avvaru Assignee: kanaka kumar avvaru Attachments: HDFS-8753-HDFS-7240.00.patch This JIRA proposes adding ozone-default.xml to main resources ozone-site.xml to test resources with default known parameters as of now. Also, need to unify {{StorageContainerConfiguration}} to initialize conf with both the files as at present there are two classes with this name. {code} hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\ozone\StorageContainerConfiguration.java loads only ozone-site.xml hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\storagecontainer\StorageContainerConfiguration.java loads only storage-container-site.xml {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-6407: --- Priority: Critical (was: Minor) new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Benoy Antony Priority: Critical Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633675#comment-14633675 ] Nathan Roberts commented on HDFS-8791: -- I forgot to mention that I'm pretty confident it's not the inodes, but rather the directory blocks. inodes have their own cache that I can control with vfs_cache_pressure. directory blocks however are just cached via the buffer cache (afaik), and the buffer cache is much more difficult to have any control over. block ID-based DN storage layout can be very slow for datanode on ext4 -- Key: HDFS-8791 URL: https://issues.apache.org/jira/browse/HDFS-8791 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Priority: Critical We are seeing cases where the new directory layout causes the datanode to basically cause the disks to seek for 10s of minutes. This can be when the datanode is running du, and it can also be when it is performing a checkDirs(). Both of these operations currently scan all directories in the block pool and that's very expensive in the new layout. The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K leaf directories where block files are placed. So, what we have on disk is: - 256 inodes for the first level directories - 256 directory blocks for the first level directories - 256*256 inodes for the second level directories - 256*256 directory blocks for the second level directories - Then the inodes and blocks to store the the HDFS blocks themselves. The main problem is the 256*256 directory blocks. inodes and dentries will be cached by linux and one can configure how likely the system is to prune those entries (vfs_cache_pressure). However, ext4 relies on the buffer cache to cache the directory blocks and I'm not aware of any way to tell linux to favor buffer cache pages (even if it did I'm not sure I would want it to in general). Also, ext4 tries hard to spread directories evenly across the entire volume, this basically means the 64K directory blocks are probably randomly spread across the entire disk. A du type scan will look at directories one at a time, so the ioscheduler can't optimize the corresponding seeks, meaning the seeks will be random and far. In a system I was using to diagnose this, I had 60K blocks. A DU when things are hot is less than 1 second. When things are cold, about 20 minutes. How do things get cold? - A large set of tasks run on the node. This pushes almost all of the buffer cache out, causing the next DU to hit this situation. We are seeing cases where a large job can cause a seek storm across the entire cluster. Why didn't the previous layout see this? - It might have but it wasn't nearly as pronounced. The previous layout would be a few hundred directory blocks. Even when completely cold, these would only take a few a hundred seeks which would mean single digit seconds. - With only a few hundred directories, the odds of the directory blocks getting modified is quite high, this keeps those blocks hot and much less likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8673) HDFS reports file already exists if there is a file/dir name end with ._COPYING_
[ https://issues.apache.org/jira/browse/HDFS-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-8673: -- Attachment: HDFS-8673.003.patch reattach path to trigger Hadoop QA HDFS reports file already exists if there is a file/dir name end with ._COPYING_ Key: HDFS-8673 URL: https://issues.apache.org/jira/browse/HDFS-8673 Project: Hadoop HDFS Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Attachments: HDFS-8673.000-WIP.patch, HDFS-8673.000.patch, HDFS-8673.001.patch, HDFS-8673.002.patch, HDFS-8673.003.patch, HDFS-8673.003.patch Because CLI is using CommandWithDestination.java which add ._COPYING_ to the tail of file name when it does the copy. It will cause problem if there is a file/dir already called *._COPYING_ on HDFS. For file: -bash-4.1$ hadoop fs -put 5M /user/occ/ -bash-4.1$ hadoop fs -mv /user/occ/5M /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup5242880 2015-06-26 05:16 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup 131072 2015-06-26 05:19 /user/occ/5M For dir: -bash-4.1$ hadoop fs -mkdir /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items drwxr-xr-x - occ supergroup 0 2015-06-26 05:24 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M put: /user/occ/5M._COPYING_ already exists as a directory -bash-4.1$ hadoop fs -ls /user/occ/ (/user/occ/5M._COPYING_ is gone) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8673) HDFS reports file already exists if there is a file/dir name end with ._COPYING_
[ https://issues.apache.org/jira/browse/HDFS-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633767#comment-14633767 ] Chen He commented on HDFS-8673: --- PATCH, :) HDFS reports file already exists if there is a file/dir name end with ._COPYING_ Key: HDFS-8673 URL: https://issues.apache.org/jira/browse/HDFS-8673 Project: Hadoop HDFS Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Attachments: HDFS-8673.000-WIP.patch, HDFS-8673.000.patch, HDFS-8673.001.patch, HDFS-8673.002.patch, HDFS-8673.003.patch, HDFS-8673.003.patch Because CLI is using CommandWithDestination.java which add ._COPYING_ to the tail of file name when it does the copy. It will cause problem if there is a file/dir already called *._COPYING_ on HDFS. For file: -bash-4.1$ hadoop fs -put 5M /user/occ/ -bash-4.1$ hadoop fs -mv /user/occ/5M /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup5242880 2015-06-26 05:16 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items -rw-r--r-- 1 occ supergroup 131072 2015-06-26 05:19 /user/occ/5M For dir: -bash-4.1$ hadoop fs -mkdir /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -ls /user/occ/ Found 1 items drwxr-xr-x - occ supergroup 0 2015-06-26 05:24 /user/occ/5M._COPYING_ -bash-4.1$ hadoop fs -put 128K /user/occ/5M put: /user/occ/5M._COPYING_ already exists as a directory -bash-4.1$ hadoop fs -ls /user/occ/ (/user/occ/5M._COPYING_ is gone) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8486) DN startup may cause severe data loss
[ https://issues.apache.org/jira/browse/HDFS-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633828#comment-14633828 ] Daryn Sharp commented on HDFS-8486: --- Public service notice: * _Every restart of a 2.6.x or 2.7.0 DN incurs a risk of unwanted block deletion_. * Apply this patch if you are running a pre-2.7.1 release. I previously attributed this as an ancient bug but it's new to 2.6. HDFS-2560 did start the scanner too early but the race caused a benign log warning. In 2.6, HDFS-6931 made an unrelated change that introduced the faulty (mass) deletion logic. DN startup may cause severe data loss - Key: HDFS-8486 URL: https://issues.apache.org/jira/browse/HDFS-8486 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 0.23.1, 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8486.patch, HDFS-8486.patch A race condition between block pool initialization and the directory scanner may cause a mass deletion of blocks in multiple storages. If block pool initialization finds a block on disk that is already in the replica map, it deletes one of the blocks based on size, GS, etc. Unfortunately it _always_ deletes one of the blocks even if identical, thus the replica map _must_ be empty when the pool is initialized. The directory scanner starts at a random time within its periodic interval (default 6h). If the scanner starts very early it races to populate the replica map, causing the block pool init to erroneously delete blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634056#comment-14634056 ] Haohui Mai commented on HDFS-8344: -- Then the question becomes what would be a good default value of configuration? Why does it require retrying on UC blocks for a number of times instead of just marking the block as missing when the hard limit expire? NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8775) SASL support for data transfer protocol in libhdfspp
[ https://issues.apache.org/jira/browse/HDFS-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8775: - Attachment: HDFS-8775.000.patch SASL support for data transfer protocol in libhdfspp Key: HDFS-8775 URL: https://issues.apache.org/jira/browse/HDFS-8775 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8775.000.patch This jira proposes to implement basic SASL support for the data transfer protocol which allows libhdfspp to talk to secure clusters. Support for encryption is deferred to subsequent jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8499) Refactor BlockInfo class hierarchy with static helper class
[ https://issues.apache.org/jira/browse/HDFS-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634120#comment-14634120 ] Tsz Wo Nicholas Sze commented on HDFS-8499: --- ..., but if you could share some thoughts on the comparison it'd be nice. ... BlockInfoContiguous and BlockInfoStriped could be implemented using completely different data structures. They also can be constructuced in a completely different way. Indeed, BlockInfoContiguous is constructed by a write pipeline and BlockInfoStriped is constructed by parallel write. Therefore, BlockInfoContiguousUC and BlockInfoStripedUC may not share a lot of common code. However, Design #1 assumes both BlockInfoContiguous and BlockInfoStriped can be constructed in a similar way. Also, if BlockInfoContiguousUC/BlockInfoStripedUC does not extend BlockInfoContiguous/BlockInfoStriped, their data structures cannot be made private to the classes. HDFS-8499 adds ContiguousBlockStorageOp for BlockInfoContiguous and BlockInfoUnderConstructionContiguous so that the actual logic for contiguous BlockInfo is actually in the static methods in ContiguousBlockStorageOp. It is a procedural langage apporach but not OO apporach. BlockInfoContiguous/BlockInfoUnderConstructionContiguous become adapter-style classes -- they simply call the methods in ContiguousBlockStorageOp and there are a lot of code duplication between these two classes. They same thing is going to happen to BlockInfoStriped and BlockInfoStripedUC in Design #1. ... I chose option A to avoid breaking the existing is-a relationship in trunk. ... Do you mean breaking the trunk code before HDFS-8499. If yes, Could you explain how Design #2 breaks the existing is-a relationship? Refactor BlockInfo class hierarchy with static helper class --- Key: HDFS-8499 URL: https://issues.apache.org/jira/browse/HDFS-8499 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.8.0 Attachments: HDFS-8499.00.patch, HDFS-8499.01.patch, HDFS-8499.02.patch, HDFS-8499.03.patch, HDFS-8499.04.patch, HDFS-8499.05.patch, HDFS-8499.06.patch, HDFS-8499.07.patch, HDFS-8499.UCFeature.patch, HDFS-bistriped.patch In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a common abstraction for striped and contiguous UC blocks. This JIRA aims to merge it to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634094#comment-14634094 ] Hudson commented on HDFS-8344: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8186 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8186/]) HDFS-8344. NameNode doesn't recover lease for files with missing blocks (raviprak) (raviprak: rev e4f756260f16156179ba4adad974ec92279c2fac) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8764) Generate Hadoop RPC stubs from protobuf definitions
[ https://issues.apache.org/jira/browse/HDFS-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-8764. -- Resolution: Fixed Fix Version/s: HDFS-8707 Committed to the HDFS-8707 branch. Thanks Jing and James for the reviews. Generate Hadoop RPC stubs from protobuf definitions --- Key: HDFS-8764 URL: https://issues.apache.org/jira/browse/HDFS-8764 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS-8707 Attachments: HDFS-8764.000.patch It would be nice to have the the RPC stubs generated from the protobuf definitions which is similar to what the HADOOP-10388 has achieved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-8344: --- Resolution: Fixed Fix Version/s: 2.8.0 Release Note: Allow a configuration to specify the maximum number of recovery attempts for blocks under construction. Status: Resolved (was: Patch Available) NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634088#comment-14634088 ] Ravi Prakash commented on HDFS-8344: Thanks for the review Allen, Kihwal, Masatake and Haohui. I've committed this to trunk and branch-2. I just saw your comment Haohui. The datanode might be busy and recovery may fail the first time. I thought it best to try recovery a few times before giving up. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8306) Generate ACL and Xattr outputs in OIV XML outputs
[ https://issues.apache.org/jira/browse/HDFS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-8306: Attachment: HDFS-8306.008.patch Updated the patch to use utf-8 {{InputSource}}. Generate ACL and Xattr outputs in OIV XML outputs - Key: HDFS-8306 URL: https://issues.apache.org/jira/browse/HDFS-8306 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8306.000.patch, HDFS-8306.001.patch, HDFS-8306.002.patch, HDFS-8306.003.patch, HDFS-8306.004.patch, HDFS-8306.005.patch, HDFS-8306.006.patch, HDFS-8306.007.patch, HDFS-8306.008.patch, HDFS-8306.debug0.patch, HDFS-8306.debug1.patch Currently, in the {{hdfs oiv}} XML outputs, not all fields of fsimage are outputs. It makes inspecting {{fsimage}} from XML outputs less practical. Also it prevents recovering a fsimage from XML file. This JIRA is adding ACL and XAttrs in the XML outputs as the first step to achieve the goal described in HDFS-8061. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8788) Implement unit tests for remote block reader in libhdfspp
[ https://issues.apache.org/jira/browse/HDFS-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-8788. -- Resolution: Fixed Fix Version/s: HDFS-8707 Committed to the HDFS-8707 branch. Thanks James for the reviews. Implement unit tests for remote block reader in libhdfspp - Key: HDFS-8788 URL: https://issues.apache.org/jira/browse/HDFS-8788 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS-8707 Attachments: HDFS-8788.000.patch This jira proposes to implement unit tests for the remote block reader in gmock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3455) Add docs for NameNode initializeSharedEdits and bootstrapStandby commands
[ https://issues.apache.org/jira/browse/HDFS-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Rojas updated HDFS-3455: Assignee: (was: Anthony Rojas) Add docs for NameNode initializeSharedEdits and bootstrapStandby commands - Key: HDFS-3455 URL: https://issues.apache.org/jira/browse/HDFS-3455 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Labels: newbie We've made the HA setup easier by adding new flags to the namenode to automatically set up the standby. But, we didn't document them yet. We should amend the HDFSHighAvailability.apt.vm docs to include this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-8344: --- Attachment: HDFS-8344.07.patch Thanks a lot for the careful review Allen! Here's another with the fixes. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8753) Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634044#comment-14634044 ] Hadoop QA commented on HDFS-8753: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 23s | Findbugs (version ) appears to be broken on HDFS-7240. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 32s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 3s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 161m 6s | Tests failed in hadoop-hdfs. | | | | 202m 35s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746081/HDFS-8753-HDFS-7240.00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7240 / 8576861 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11752/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11752/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11752/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11752/console | This message was automatically generated. Ozone: Unify StorageContainerConfiguration with ozone-default.xml ozone-site.xml --- Key: HDFS-8753 URL: https://issues.apache.org/jira/browse/HDFS-8753 Project: Hadoop HDFS Issue Type: Sub-task Reporter: kanaka kumar avvaru Assignee: kanaka kumar avvaru Attachments: HDFS-8753-HDFS-7240.00.patch This JIRA proposes adding ozone-default.xml to main resources ozone-site.xml to test resources with default known parameters as of now. Also, need to unify {{StorageContainerConfiguration}} to initialize conf with both the files as at present there are two classes with this name. {code} hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\ozone\StorageContainerConfiguration.java loads only ozone-site.xml hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\storagecontainer\StorageContainerConfiguration.java loads only storage-container-site.xml {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634047#comment-14634047 ] Allen Wittenauer commented on HDFS-8344: +1 lgtm NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8486) DN startup may cause severe data loss
[ https://issues.apache.org/jira/browse/HDFS-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-8486: -- Release Note: Public service notice: - Every restart of a 2.6.x or 2.7.0 DN incurs a risk of unwanted block deletion. - Apply this patch if you are running a pre-2.7.1 release. (Promoting comment into release-notes area of JIRA just so its better visible) DN startup may cause severe data loss - Key: HDFS-8486 URL: https://issues.apache.org/jira/browse/HDFS-8486 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 0.23.1, 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8486.patch, HDFS-8486.patch A race condition between block pool initialization and the directory scanner may cause a mass deletion of blocks in multiple storages. If block pool initialization finds a block on disk that is already in the replica map, it deletes one of the blocks based on size, GS, etc. Unfortunately it _always_ deletes one of the blocks even if identical, thus the replica map _must_ be empty when the pool is initialized. The directory scanner starts at a random time within its periodic interval (default 6h). If the scanner starts very early it races to populate the replica map, causing the block pool init to erroneously delete blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634011#comment-14634011 ] Hadoop QA commented on HDFS-8760: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 41s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 9s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 19s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 14s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 42s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 45s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 24s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 18m 7s | Tests failed in hadoop-hdfs. | | | | 62m 44s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestReservedRawPaths | | | hadoop.hdfs.server.blockmanagement.TestDatanodeManager | | | hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.hdfs.TestModTime | | | hadoop.fs.TestUrlStreamHandler | | | hadoop.hdfs.security.TestDelegationToken | | | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolarent | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.server.namenode.TestFileLimit | | | hadoop.hdfs.TestParallelShortCircuitRead | | | hadoop.hdfs.server.namenode.snapshot.TestFileContextSnapshot | | | hadoop.hdfs.TestDisableConnCache | | | hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped | | | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter | | | hadoop.hdfs.server.namenode.TestEditLogAutoroll | | | hadoop.TestRefreshCallQueue | | | hadoop.hdfs.protocolPB.TestPBHelper | | | hadoop.hdfs.web.TestWebHdfsUrl | | | hadoop.hdfs.TestECSchemas | | | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints | | | hadoop.hdfs.TestConnCache | | | hadoop.cli.TestCryptoAdminCLI | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.TestSetrepDecreasing | | | hadoop.hdfs.server.datanode.TestDiskError | | | hadoop.fs.viewfs.TestViewFsWithAcls | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.namenode.TestAddStripedBlocks | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.TestHostsFiles | | | hadoop.hdfs.server.datanode.TestTransferRbw | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | | | hadoop.fs.contract.hdfs.TestHDFSContractDelete | | | hadoop.hdfs.server.namenode.TestFileContextAcl | | | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.fs.TestFcHdfsSetUMask | | | hadoop.fs.TestUnbuffer | | | hadoop.hdfs.server.namenode.TestClusterId | | | hadoop.hdfs.server.namenode.TestDeleteRace | | | hadoop.hdfs.TestPread | | | hadoop.hdfs.server.namenode.TestFSDirectory | | | hadoop.hdfs.server.namenode.TestLeaseManager | | | hadoop.fs.contract.hdfs.TestHDFSContractOpen | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing | | | hadoop.hdfs.server.datanode.TestStorageReport | | | hadoop.hdfs.server.datanode.TestBlockRecovery | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestReadWhileWriting | | | hadoop.fs.contract.hdfs.TestHDFSContractMkdir | | | hadoop.fs.contract.hdfs.TestHDFSContractAppend | | | hadoop.hdfs.server.datanode.TestFsDatasetCache | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation | | | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock | | | hadoop.hdfs.server.namenode.ha.TestQuotasWithHA | | | hadoop.hdfs.server.namenode.ha.TestGetGroupsWithHA | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.namenode.TestSecondaryWebUi | | | hadoop.hdfs.server.namenode.TestMalformedURLs | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestRecoverStripedBlocks | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
[jira] [Updated] (HDFS-8799) Erasure Coding: add tests for namenode processing corrupt striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8799: Summary: Erasure Coding: add tests for namenode processing corrupt striped blocks (was: Erasure Coding: add tests for process corrupt striped blocks) Erasure Coding: add tests for namenode processing corrupt striped blocks Key: HDFS-8799 URL: https://issues.apache.org/jira/browse/HDFS-8799 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8799) Erasure Coding: add tests for namenode processing corrupt striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633718#comment-14633718 ] Walter Su commented on HDFS-8799: - {{TestAddStripedBlocks#testCehckStripedReplicaCorrupt()}} tests the count of corruptReplicas. This jira tests whether the corruptReplicas should be deleted, and when. Just like {{TestProcessCorruptBlocks}} Erasure Coding: add tests for namenode processing corrupt striped blocks Key: HDFS-8799 URL: https://issues.apache.org/jira/browse/HDFS-8799 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8335) FSNamesystem/FSDirStatAndListingOp getFileInfo and getListingInt construct FSPermissionChecker regardless of isPermissionEnabled()
[ https://issues.apache.org/jira/browse/HDFS-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak reassigned HDFS-8335: -- Assignee: Gabor Liptak FSNamesystem/FSDirStatAndListingOp getFileInfo and getListingInt construct FSPermissionChecker regardless of isPermissionEnabled() -- Key: HDFS-8335 URL: https://issues.apache.org/jira/browse/HDFS-8335 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 Reporter: David Bryson Assignee: Gabor Liptak Attachments: HDFS-8335.2.patch, HDFS-8335.patch FSNamesystem (2.5.x)/FSDirStatAndListingOp(current trunk) getFileInfo and getListingInt methods call getPermissionChecker() to construct a FSPermissionChecker regardless of isPermissionEnabled(). When permission checking is disabled, this leads to an unnecessary performance hit constructing a UserGroupInformation object that is never used. For example, from a stack dump when driving concurrent requests, they all end up blocking. Here's the thread holding the lock: IPC Server handler 9 on 9000 daemon prio=10 tid=0x7f78d8b9e800 nid=0x142f3 runnable [0x7f78c2ddc000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x0007d9b105c0 (a java.lang.UNIXProcess$ProcessPipeInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) - locked 0x0007d9b1a888 (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.read1(BufferedReader.java:205) at java.io.BufferedReader.read(BufferedReader.java:279) - locked 0x0007d9b1a888 (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715) at org.apache.hadoop.util.Shell.runCommand(Shell.java:524) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) at org.apache.hadoop.util.Shell.execCommand(Shell.java:774) at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:84) at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52) at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50) at org.apache.hadoop.security.Groups.getGroups(Groups.java:139) at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1474) - locked 0x0007a6df75f8 (a org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.init(FSPermissionChecker.java:82) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3534) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4489) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4478) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:898) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:602) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Here is (one of the many) threads waiting on the lock: IPC Server handler 2 on 9000 daemon prio=10 tid=0x7f78d8c48800 nid=0x142ec waiting for monitor entry [0x7f78c34e3000] java.lang.Thread.State: BLOCKED (on object monitor) at
[jira] [Updated] (HDFS-8335) FSNamesystem/FSDirStatAndListingOp getFileInfo and getListingInt construct FSPermissionChecker regardless of isPermissionEnabled()
[ https://issues.apache.org/jira/browse/HDFS-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated HDFS-8335: --- Attachment: HDFS-8335.2.patch FSNamesystem/FSDirStatAndListingOp getFileInfo and getListingInt construct FSPermissionChecker regardless of isPermissionEnabled() -- Key: HDFS-8335 URL: https://issues.apache.org/jira/browse/HDFS-8335 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 Reporter: David Bryson Attachments: HDFS-8335.2.patch, HDFS-8335.patch FSNamesystem (2.5.x)/FSDirStatAndListingOp(current trunk) getFileInfo and getListingInt methods call getPermissionChecker() to construct a FSPermissionChecker regardless of isPermissionEnabled(). When permission checking is disabled, this leads to an unnecessary performance hit constructing a UserGroupInformation object that is never used. For example, from a stack dump when driving concurrent requests, they all end up blocking. Here's the thread holding the lock: IPC Server handler 9 on 9000 daemon prio=10 tid=0x7f78d8b9e800 nid=0x142f3 runnable [0x7f78c2ddc000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x0007d9b105c0 (a java.lang.UNIXProcess$ProcessPipeInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) - locked 0x0007d9b1a888 (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.read1(BufferedReader.java:205) at java.io.BufferedReader.read(BufferedReader.java:279) - locked 0x0007d9b1a888 (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715) at org.apache.hadoop.util.Shell.runCommand(Shell.java:524) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) at org.apache.hadoop.util.Shell.execCommand(Shell.java:774) at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:84) at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52) at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50) at org.apache.hadoop.security.Groups.getGroups(Groups.java:139) at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1474) - locked 0x0007a6df75f8 (a org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.init(FSPermissionChecker.java:82) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3534) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4489) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4478) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:898) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:602) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Here is (one of the many) threads waiting on the lock: IPC Server handler 2 on 9000 daemon prio=10 tid=0x7f78d8c48800 nid=0x142ec waiting for monitor entry [0x7f78c34e3000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1472) -
[jira] [Updated] (HDFS-8794) Improve CorruptReplicasMap#corruptReplicasMap
[ https://issues.apache.org/jira/browse/HDFS-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8794: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 (was: 2.7.1) Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks for [~arpitagarwal] review. Improve CorruptReplicasMap#corruptReplicasMap - Key: HDFS-8794 URL: https://issues.apache.org/jira/browse/HDFS-8794 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8794.001.patch, HDFS-8794.002.patch Currently we use {{TreeMap}} for {{corruptReplicasMap}}, actually the only need sorted place is {{getCorruptReplicaBlockIds}} which is used by test. So we can use {{HashMap}}. From memory and performance view, {{HashMap}} is better than {{TreeMap}}, a simliar optimization HDFS-7433. Of course we need to make few change to {{getCorruptReplicaBlockIds}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8495) Consolidate append() related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8495: --- Component/s: (was: namenode) Consolidate append() related implementation into a single class --- Key: HDFS-8495 URL: https://issues.apache.org/jira/browse/HDFS-8495 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8495-000.patch, HDFS-8495-001.patch, HDFS-8495-002.patch, HDFS-8495-003.patch, HDFS-8495-003.patch, HDFS-8495-004.patch, HDFS-8495-005.patch, HDFS-8495-006.patch This jira proposes to consolidate {{FSNamesystem#append()}} related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8495) Consolidate append() related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8495: --- Component/s: namenode Consolidate append() related implementation into a single class --- Key: HDFS-8495 URL: https://issues.apache.org/jira/browse/HDFS-8495 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8495-000.patch, HDFS-8495-001.patch, HDFS-8495-002.patch, HDFS-8495-003.patch, HDFS-8495-003.patch, HDFS-8495-004.patch, HDFS-8495-005.patch, HDFS-8495-006.patch This jira proposes to consolidate {{FSNamesystem#append()}} related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer
[ https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634412#comment-14634412 ] mingleizhang commented on HDFS-8562: Thank you,Yanping.I'm trying to solve these problems these days. HDFS Performance is impacted by FileInputStream Finalizer - Key: HDFS-8562 URL: https://issues.apache.org/jira/browse/HDFS-8562 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, HDFS, performance Affects Versions: 2.5.0 Environment: Impact any application that uses HDFS Reporter: Yanping Wang Assignee: mingleizhang While running HBase using HDFS as datanodes, we noticed excessive high GC pause spikes. For example with jdk8 update 40 and G1 collector, we saw datanode GC pauses spiked toward 160 milliseconds while they should be around 20 milliseconds. We tracked down to GC logs and found those long GC pauses were devoted to process high number of final references. For example, this Young GC: 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs] 2715.572: [SoftReference, 0 refs, 0.0001034 secs] 2715.572: [WeakReference, 0 refs, 0.123 secs] 2715.572: [FinalReference, 8292 refs, 0.0748194 secs] 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs] 2715.647: [JNI Weak Reference, 0.140 secs] [Ref Proc: 122.3 ms] [Eden: 910.0M(910.0M)-0.0B(911.0M) Survivors: 11.0M-10.0M Heap: 951.1M(1536.0M)-40.2M(1536.0M)] [Times: user=0.47 sys=0.01, real=0.15 secs] This young GC took 152.9 milliseconds STW pause, while spent 122.3 milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 milliseconds plus some overhead. We used JFR and JMAP with Memory Analyzer to track down and found those FinalReference were all from FileInputStream. We checked HDFS code and saw the use of the FileInputStream in datanode: https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java {code} 1.public static MappableBlock load(long length, 2.FileInputStream blockIn, FileInputStream metaIn, 3.String blockFileName) throws IOException { 4.MappableBlock mappableBlock = null; 5.MappedByteBuffer mmap = null; 6.FileChannel blockChannel = null; 7.try { 8.blockChannel = blockIn.getChannel(); 9.if (blockChannel == null) { 10. throw new IOException(Block InputStream has no FileChannel.); 11. } 12. mmap = blockChannel.map(MapMode.READ_ONLY, 0, length); 13. NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length); 14. verifyChecksum(length, metaIn, blockChannel, blockFileName); 15. mappableBlock = new MappableBlock(mmap, length); 16. } finally { 17. IOUtils.closeQuietly(blockChannel); 18. if (mappableBlock == null) { 19. if (mmap != null) { 20. NativeIO.POSIX.munmap(mmap); // unmapping also unlocks 21. } 22. } 23. } 24. return mappableBlock; 25. } {code} We looked up https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html and http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java and noticed FileInputStream relies on the Finalizer to release its resource. When a class that has a finalizer created, an entry for that class instance is put on a queue in the JVM so the JVM knows it has a finalizer that needs to be executed. The current issue is: even with programmers do call close() after using FileInputStream, its finalize() method will still be called. In other words, still get the side effect of the FinalReference being registered at FileInputStream allocation time, and also reference processing to reclaim the FinalReference during GC (any GC solution has to deal with this). We can imagine When running industry deployment HDFS, millions of files could be opened and closed which resulted in a very large number of finalizers being registered and subsequently being executed. That could cause very long GC pause times. We tried to use Files.newInputStream() to replace FileInputStream, but it was clear we could not replace FileInputStream in hdfs/server/datanode/fsdataset/impl/MappableBlock.java We notified Oracle JVM team of this performance issue that impacting all Big Data applications using HDFS. We recommended the proper fix in Java SE FileInputStream. Because (1) it is really nothing wrong to use FileInputStream in above datanode code, (2) as the object with a finalizer is registered with finalizer list within the JVM at object allocation time, if someone makes an explicit call to close or free the resources that are to be done in the finalizer, then
[jira] [Commented] (HDFS-7483) Display information per tier on the Namenode UI
[ https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634419#comment-14634419 ] Hadoop QA commented on HDFS-7483: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 6m 0s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 41s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 168m 44s | Tests failed in hadoop-hdfs. | | | | 189m 30s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.TestDistributedFileSystem | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746206/HDFS-7483.003.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / e4f7562 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11757/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11757/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11757/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11757/console | This message was automatically generated. Display information per tier on the Namenode UI --- Key: HDFS-7483 URL: https://issues.apache.org/jira/browse/HDFS-7483 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, HDFS-7483.003.patch, overview.png, storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, withTwoStorageType.png If cluster has different types of storage, it is useful to display the storage information per type. The information will be available via JMX (HDFS-7390) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8794) Improve CorruptReplicasMap#corruptReplicasMap
[ https://issues.apache.org/jira/browse/HDFS-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634433#comment-14634433 ] Hudson commented on HDFS-8794: -- FAILURE: Integrated in Hadoop-trunk-Commit #8188 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8188/]) HDFS-8794. Improve CorruptReplicasMap#corruptReplicasMap. (yliu) (yliu: rev d6d58606b8adf94b208aed5fc2d054b9dd081db1) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptReplicaInfo.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve CorruptReplicasMap#corruptReplicasMap - Key: HDFS-8794 URL: https://issues.apache.org/jira/browse/HDFS-8794 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8794.001.patch, HDFS-8794.002.patch Currently we use {{TreeMap}} for {{corruptReplicasMap}}, actually the only need sorted place is {{getCorruptReplicaBlockIds}} which is used by test. So we can use {{HashMap}}. From memory and performance view, {{HashMap}} is better than {{TreeMap}}, a simliar optimization HDFS-7433. Of course we need to make few change to {{getCorruptReplicaBlockIds}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6407: - Priority: Minor (was: Critical) new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Benoy Antony Priority: Minor Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7582) Enforce maximum number of ACL entries separately per access and default.
[ https://issues.apache.org/jira/browse/HDFS-7582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634219#comment-14634219 ] Chris Nauroth commented on HDFS-7582: - Hi [~vinayrpet]. Thank you again for your patience. The patch looks good. I found just one thing that needs to be corrected. {code} if (defaultEntries.size() MAX_ENTRIES) { throw new AclException(Invalid ACL: ACL has + accessEntries.size() + default entries, which exceeds maximum of + MAX_ENTRIES + .); } {code} The text of this exception needs to use {{defaultEntries.size()}} instead of {{accessEntries.size()}}. Enforce maximum number of ACL entries separately per access and default. Key: HDFS-7582 URL: https://issues.apache.org/jira/browse/HDFS-7582 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-7582-001.patch, HDFS-7582-01.patch Current ACL limits are only on the total number of entries. But there can be a situation where number of default entries for a directory will be more than half of the maximum entries, i.e. 16. In such case, under this parent directory only files can be created which will have ACLs inherited using parent's default entries. But when directories are created, total number of entries will be more than the maximum allowed, because sub-directories copies both inherited ACLs as well as default entries. Since currently there is no check while copying ACLs from default ACLs directory creation succeeds, but any modification (only permission on one entry also) on the same ACL will fail. It would be better to enforce the maximum of 32 entries separately per access and default. This would be consistent with our observations testing ACLs on other file systems, such as XFS and ext3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7483) Display information per tier on the Namenode UI
[ https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7483: - Attachment: HDFS-7483.003.patch Display information per tier on the Namenode UI --- Key: HDFS-7483 URL: https://issues.apache.org/jira/browse/HDFS-7483 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, HDFS-7483.003.patch, overview.png, storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, withTwoStorageType.png If cluster has different types of storage, it is useful to display the storage information per type. The information will be available via JMX (HDFS-7390) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7483) Display information per tier on the Namenode UI
[ https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634237#comment-14634237 ] Haohui Mai commented on HDFS-7483: -- Sorry for the delay. [~benoyantony], I uploaded the v3 patch to demonstrate the approach. The basic idea is to calculate the percentage using JavaScript and leave the templates to deal with the formatting only. Display information per tier on the Namenode UI --- Key: HDFS-7483 URL: https://issues.apache.org/jira/browse/HDFS-7483 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, HDFS-7483.003.patch, overview.png, storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, withTwoStorageType.png If cluster has different types of storage, it is useful to display the storage information per type. The information will be available via JMX (HDFS-7390) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634300#comment-14634300 ] Haohui Mai commented on HDFS-8779: -- Is it possible to bring in a dedicated library like https://github.com/sidorares/json-bigint instead of putting hacks into the JSON string? It looks much cleaner. bq. I'm guessing that the Java WebHdfsFileSystem implementation somehow already avoids the JS MAX_SAFE_INTEGER issue... I don't see why there are related. As pointed out in the description, in Java MAX_LONG equals to 2^63 - 1 while in JavaScript MAX_SAFE_INTEGER is only 2^53 - 1. JavaScript can represent numbers that are larger than 2^53 -1, but most likely with the loss of precisions. WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch, HDFS-8779.03.patch, after-02-patch.png, before.png Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8800) shutdown has bugs
[ https://issues.apache.org/jira/browse/HDFS-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Smith updated HDFS-8800: - Description: namenode stop creates stack traces and extra gc logs. shutdown has bugs - Key: HDFS-8800 URL: https://issues.apache.org/jira/browse/HDFS-8800 Project: Hadoop HDFS Issue Type: Bug Reporter: John Smith namenode stop creates stack traces and extra gc logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8059) Erasure coding: revisit how to store EC schema and cellSize in NameNode
[ https://issues.apache.org/jira/browse/HDFS-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634323#comment-14634323 ] Andrew Wang commented on HDFS-8059: --- Hey Jing, bq. Could you please provide more details about how putting EC schema in INodeFile can solve the rename problem...? The schema will travel with the file when it gets renamed. We can read the schema information off the file directly rather than going up to the zone, so the zone's schema is only used at write-time. bq. Can you direct me to this list? I just refer to the phase 1 umbrella JIRA at HDFS-7285, this subtask probably should move over there since it's related to persisting schema info to NN on-disk metadata, which I think we should figure out before merging. Erasure coding: revisit how to store EC schema and cellSize in NameNode --- Key: HDFS-8059 URL: https://issues.apache.org/jira/browse/HDFS-8059 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8059.001.patch Move {{dataBlockNum}} and {{parityBlockNum}} from BlockInfoStriped to INodeFile, and store them in {{FileWithStripedBlocksFeature}}. Ideally these two nums are the same for all striped blocks in a file, and store them in BlockInfoStriped will waste NN memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8657) Update docs for mSNN
[ https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634340#comment-14634340 ] Hudson commented on HDFS-8657: -- FAILURE: Integrated in Hadoop-trunk-Commit #8187 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8187/]) HDFS-8657. Update docs for mSNN. Contributed by Jesse Yates. (atm: rev ed01dc70b2f4ff4bdcaf71c19acf244da0868a82) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md Update docs for mSNN Key: HDFS-8657 URL: https://issues.apache.org/jira/browse/HDFS-8657 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Fix For: 3.0.0 Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch After the commit of HDFS-6440, some docs need to be updated to reflect the new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8306) Generate ACL and Xattr outputs in OIV XML outputs
[ https://issues.apache.org/jira/browse/HDFS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634347#comment-14634347 ] Hadoop QA commented on HDFS-8306: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 2s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 20s | The applied patch generated 1 new checkstyle issues (total was 61, now 62). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 6s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 161m 37s | Tests failed in hadoop-hdfs. | | | | 205m 3s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746185/HDFS-8306.008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e4f7562 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11756/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11756/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11756/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11756/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11756/console | This message was automatically generated. Generate ACL and Xattr outputs in OIV XML outputs - Key: HDFS-8306 URL: https://issues.apache.org/jira/browse/HDFS-8306 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8306.000.patch, HDFS-8306.001.patch, HDFS-8306.002.patch, HDFS-8306.003.patch, HDFS-8306.004.patch, HDFS-8306.005.patch, HDFS-8306.006.patch, HDFS-8306.007.patch, HDFS-8306.008.patch, HDFS-8306.debug0.patch, HDFS-8306.debug1.patch Currently, in the {{hdfs oiv}} XML outputs, not all fields of fsimage are outputs. It makes inspecting {{fsimage}} from XML outputs less practical. Also it prevents recovering a fsimage from XML file. This JIRA is adding ACL and XAttrs in the XML outputs as the first step to achieve the goal described in HDFS-8061. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8797) WebHdfsFileSystem creates too many connections for pread
[ https://issues.apache.org/jira/browse/HDFS-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634362#comment-14634362 ] Yi Liu commented on HDFS-8797: -- Thanks Jing for working on this, I think it's a good approach to override {{readFully}} and create a new InputStream for pread. A minor comment, we should also override {{int read(long position, ...}} WebHdfsFileSystem creates too many connections for pread Key: HDFS-8797 URL: https://issues.apache.org/jira/browse/HDFS-8797 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8797.000.patch While running a test we found that WebHdfsFileSystem can create several thousand connections when doing a position read of a 200MB file. For each connection the client will connect to the DataNode again and the DataNode will create a new DFSClient instance to handle the read request. This also leads to several thousand {{getBlockLocations}} call to the NameNode. The cause of the issue is that in {{FSInputStream#read(long, byte[], int, int)}}, each time the inputstream reads some time, it seeks back to the old position and resets its state to SEEK. Thus the next read will regenerate the connection. {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { synchronized (this) { long oldPos = getPos(); int nread = -1; try { seek(position); nread = read(buffer, offset, length); } finally { seek(oldPos); } return nread; } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8695) OzoneHandler : Add Bucket REST Interface
[ https://issues.apache.org/jira/browse/HDFS-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634373#comment-14634373 ] Hadoop QA commented on HDFS-8695: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 57s | Findbugs (version ) appears to be broken on HDFS-7240. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 10m 5s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 58s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 22s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 46s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 10s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 50s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 76m 37s | Tests failed in hadoop-hdfs. | | | | 129m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestFileAppend4 | | | hadoop.hdfs.TestRead | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.namenode.TestSecondaryWebUi | | | hadoop.hdfs.server.namenode.TestSaveNamespace | | | hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional | | | hadoop.hdfs.tools.TestGetGroups | | | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd | | | hadoop.hdfs.server.namenode.ha.TestHAStateTransitions | | | hadoop.hdfs.server.datanode.TestRefreshNamenodes | | | hadoop.hdfs.TestHdfsAdmin | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler | | | hadoop.hdfs.TestClientReportBadBlock | | | hadoop.hdfs.TestSafeMode | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.tools.TestDebugAdmin | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.TestDatanodeReport | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement | | | hadoop.hdfs.server.namenode.TestNameNodeRpcServer | | | hadoop.hdfs.TestFileAppendRestart | | | hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade | | | hadoop.hdfs.TestMultiThreadedHflush | | | hadoop.hdfs.TestParallelRead | | | hadoop.hdfs.server.namenode.snapshot.TestSetQuotaWithSnapshot | | | hadoop.hdfs.server.namenode.ha.TestHAFsck | | | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots | | | hadoop.hdfs.tools.TestStoragePolicyCommands | | | hadoop.hdfs.protocol.TestBlockListAsLongs | | | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant | | | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | | | hadoop.hdfs.server.namenode.TestHDFSConcat | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.TestRefreshCallQueue | | | hadoop.hdfs.TestSetTimes | | | hadoop.hdfs.TestListFilesInDFS | | | hadoop.hdfs.server.namenode.TestAddBlock | | | hadoop.hdfs.server.namenode.TestMalformedURLs | | | hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.server.namenode.TestNameEditsConfigs | | | hadoop.hdfs.TestMiniDFSCluster | | | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.security.TestPermissionSymlinks | | | hadoop.hdfs.TestDFSRollback | | |
[jira] [Commented] (HDFS-6300) Prevent multiple balancers from running simultaneously
[ https://issues.apache.org/jira/browse/HDFS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634192#comment-14634192 ] Aaron T. Myers commented on HDFS-6300: -- Given these recent fixes, do we think that HDFS-4505 is now obsolete and should therefore be closed? Prevent multiple balancers from running simultaneously -- Key: HDFS-6300 URL: https://issues.apache.org/jira/browse/HDFS-6300 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Rakesh R Assignee: Rakesh R Priority: Critical Fix For: 2.7.1 Attachments: HDFS-6300-001.patch, HDFS-6300-002.patch, HDFS-6300-003.patch, HDFS-6300-004.patch, HDFS-6300-005.patch, HDFS-6300-006.patch, HDFS-6300.patch Javadoc of Balancer.java says, it will not allow to run second balancer if the first one is in progress. But I've noticed multiple can run together and balancer.id implementation is not safe guarding. {code} * liAnother balancer is running. Exiting... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8059) Erasure coding: revisit how to store EC schema and cellSize in NameNode
[ https://issues.apache.org/jira/browse/HDFS-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634198#comment-14634198 ] Jing Zhao commented on HDFS-8059: - bq. Breaking rename as I stated above is a huge limitation. Could you please provide more details about how putting EC schema in INodeFile can solve the rename problem (assuming the rename is across two EC zones with different EC schema)? Please note with wrong schemas an EC file cannot be correctly read. bq. This JIRA is also on the shortlist of remaining issues for the EC branch Can you direct me to this list? Erasure coding: revisit how to store EC schema and cellSize in NameNode --- Key: HDFS-8059 URL: https://issues.apache.org/jira/browse/HDFS-8059 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8059.001.patch Move {{dataBlockNum}} and {{parityBlockNum}} from BlockInfoStriped to INodeFile, and store them in {{FileWithStripedBlocksFeature}}. Ideally these two nums are the same for all striped blocks in a file, and store them in BlockInfoStriped will waste NN memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8695) OzoneHandler : Add Bucket REST Interface
[ https://issues.apache.org/jira/browse/HDFS-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8695: --- Status: Patch Available (was: Open) OzoneHandler : Add Bucket REST Interface Key: HDFS-8695 URL: https://issues.apache.org/jira/browse/HDFS-8695 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8695-HDFS-7240.001.patch Add Bucket REST interface into Ozone server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634269#comment-14634269 ] Hadoop QA commented on HDFS-8760: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 3s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 11m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 45s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 53s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 59s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 26s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 57s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 82m 13s | Tests failed in hadoop-hdfs. | | | | 139m 16s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots | | | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolarent | | | hadoop.hdfs.server.namenode.TestFileLimit | | | hadoop.hdfs.server.namenode.snapshot.TestFileContextSnapshot | | | hadoop.hdfs.server.namenode.TestEditLogAutoroll | | | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints | | | hadoop.hdfs.server.namenode.TestAddStripedBlocks | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.TestHostsFiles | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | | | hadoop.hdfs.server.namenode.TestFileContextAcl | | | hadoop.hdfs.server.namenode.TestClusterId | | | hadoop.hdfs.server.namenode.TestDeleteRace | | | hadoop.hdfs.server.namenode.TestFSDirectory | | | hadoop.hdfs.server.namenode.TestLeaseManager | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.datanode.TestFsDatasetCache | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation | | | hadoop.hdfs.server.namenode.ha.TestQuotasWithHA | | | hadoop.hdfs.server.namenode.ha.TestGetGroupsWithHA | | | hadoop.hdfs.server.namenode.TestSecondaryWebUi | | | hadoop.hdfs.server.namenode.TestMalformedURLs | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestRecoverStripedBlocks | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles | | | hadoop.hdfs.server.namenode.TestHDFSConcat | | | hadoop.hdfs.server.namenode.TestAddBlockRetry | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol | | | hadoop.hdfs.server.namenode.ha.TestHAMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap | | | hadoop.hdfs.server.namenode.TestFsLimits | | | hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional | | | hadoop.hdfs.server.datanode.TestHSync | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter | | | hadoop.hdfs.server.namenode.TestNameNodeRpcServer | | | hadoop.hdfs.server.namenode.TestFileContextXAttr | | | hadoop.hdfs.server.namenode.TestAclConfigFlag | | | hadoop.hdfs.server.namenode.TestFSImageWithXAttr | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot | | | hadoop.hdfs.server.namenode.TestFSImageWithAcl | | | hadoop.hdfs.server.namenode.TestQuotaWithStripedBlocks | | | hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement | | | hadoop.hdfs.server.namenode.TestListCorruptFileBlocks | | | hadoop.hdfs.server.namenode.ha.TestFailureOfSharedDir | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestParallelImageWrite | | | hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot | | | hadoop.hdfs.server.namenode.TestNameNodeResourceChecker | | | hadoop.hdfs.server.namenode.TestGenericJournalConf | | | hadoop.hdfs.server.namenode.TestEditLogJournalFailures | | |
[jira] [Commented] (HDFS-8748) ACL permission check does not union groups to determine effective permissions
[ https://issues.apache.org/jira/browse/HDFS-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634276#comment-14634276 ] Scott Opell commented on HDFS-8748: --- Hi Chris. I actually was just able to get my hadoop environment setup and got a patch together. However, in my explorations, I found that while the behavior outlined in the design document is mentioned in the POSIX spec, its not actually what they decided to go with. Check out page 272 of the pdf here for more details. http://users.suse.com/~agruen/acl/posix/Posix_1003.1e-990310.pdf So basically, the code matches POSIX and the design document doesn't. I submitted my patch, which should work if the project decides to go with the non-posix behavior. I guess your OS doesn't fully comply with the 1003.1e draft. ACL permission check does not union groups to determine effective permissions - Key: HDFS-8748 URL: https://issues.apache.org/jira/browse/HDFS-8748 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Scott Opell Labels: acl, permission In the ACL permission checking routine, the implemented named group section does not match the design document. In the design document, its shown in the pseudo-code that if the requester is not the owner or a named user, then the applicable groups are unioned together to form effective permissions for the requester. Instead, the current implementation will search for the first group that grants access and will use that. It will not union the permissions together. Here is the design document's description of the desired behavior {quote} If the user is a member of the file's group or at least one group for which there is a named group entry in the ACL, then effective permissions are calculated from groups. This is the union of the file group permissions (if the user is a member of the file group) and all named group entries matching the user's groups. For example, consider a user that is a member of 2 groups: sales and execs. The user is not the file owner, and the ACL contains no named user entries. The ACL contains named group entries for both groups as follows: group:sales:r\-\-, group:execs:\-w\-. In this case, the user's effective permissions are rw-. {quote} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 10?? The design document's algorithm matches that description: *Design Document Algorithm* {code:title=DesignDocument} if (user == fileOwner) { effectivePermissions = aclEntries.getOwnerPermissions() } else if (user ∈ aclEntries.getNamedUsers()) { effectivePermissions = aclEntries.getNamedUserPermissions(user) } else if (userGroupsInAcl != ∅) { effectivePermissions = ∅ if (fileGroup ∈ userGroupsInAcl) { effectivePermissions = effectivePermissions ∪ aclEntries.getGroupPermissions() } for ({group | group ∈ userGroupsInAcl}) { effectivePermissions = effectivePermissions ∪ aclEntries.getNamedGroupPermissions(group) } } else { effectivePermissions = aclEntries.getOthersPermissions() } {code} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 9?? The current implementation does NOT match the description. *Current Trunk* {code:title=FSPermissionChecker.java} // Use owner entry from permission bits if user is owner. if (getUser().equals(inode.getUserName())) { if (mode.getUserAction().implies(access)) { return; } foundMatch = true; } // Check named user and group entries if user was not denied by owner entry. if (!foundMatch) { for (int pos = 0, entry; pos aclFeature.getEntriesSize(); pos++) { entry = aclFeature.getEntryAt(pos); if (AclEntryStatusFormat.getScope(entry) == AclEntryScope.DEFAULT) { break; } AclEntryType type = AclEntryStatusFormat.getType(entry); String name = AclEntryStatusFormat.getName(entry); if (type == AclEntryType.USER) { // Use named user entry with mask from permission bits applied if user // matches name. if (getUser().equals(name)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; break; } } else if (type == AclEntryType.GROUP) { // Use group entry (unnamed or named) with mask from permission bits // applied if user is a member and entry grants access. If user is a //
[jira] [Updated] (HDFS-8657) Update docs for mSNN
[ https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-8657: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) I've just committed this change to trunk. Thanks very much for the contribution, Jesse. Update docs for mSNN Key: HDFS-8657 URL: https://issues.apache.org/jira/browse/HDFS-8657 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Fix For: 3.0.0 Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch After the commit of HDFS-6440, some docs need to be updated to reflect the new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8748) ACL permission check does not union groups to determine effective permissions
[ https://issues.apache.org/jira/browse/HDFS-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634288#comment-14634288 ] Chris Nauroth commented on HDFS-8748: - Thank you for the pointer to 1003.1e. That's a very interesting find. The high-level goal always has been POSIX adherence, which makes me inclined to leave the current code as is. ACL permission check does not union groups to determine effective permissions - Key: HDFS-8748 URL: https://issues.apache.org/jira/browse/HDFS-8748 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Scott Opell Labels: acl, permission Attachments: HDFS_8748.patch In the ACL permission checking routine, the implemented named group section does not match the design document. In the design document, its shown in the pseudo-code that if the requester is not the owner or a named user, then the applicable groups are unioned together to form effective permissions for the requester. Instead, the current implementation will search for the first group that grants access and will use that. It will not union the permissions together. Here is the design document's description of the desired behavior {quote} If the user is a member of the file's group or at least one group for which there is a named group entry in the ACL, then effective permissions are calculated from groups. This is the union of the file group permissions (if the user is a member of the file group) and all named group entries matching the user's groups. For example, consider a user that is a member of 2 groups: sales and execs. The user is not the file owner, and the ACL contains no named user entries. The ACL contains named group entries for both groups as follows: group:sales:r\-\-, group:execs:\-w\-. In this case, the user's effective permissions are rw-. {quote} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 10?? The design document's algorithm matches that description: *Design Document Algorithm* {code:title=DesignDocument} if (user == fileOwner) { effectivePermissions = aclEntries.getOwnerPermissions() } else if (user ∈ aclEntries.getNamedUsers()) { effectivePermissions = aclEntries.getNamedUserPermissions(user) } else if (userGroupsInAcl != ∅) { effectivePermissions = ∅ if (fileGroup ∈ userGroupsInAcl) { effectivePermissions = effectivePermissions ∪ aclEntries.getGroupPermissions() } for ({group | group ∈ userGroupsInAcl}) { effectivePermissions = effectivePermissions ∪ aclEntries.getNamedGroupPermissions(group) } } else { effectivePermissions = aclEntries.getOthersPermissions() } {code} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 9?? The current implementation does NOT match the description. *Current Trunk* {code:title=FSPermissionChecker.java} // Use owner entry from permission bits if user is owner. if (getUser().equals(inode.getUserName())) { if (mode.getUserAction().implies(access)) { return; } foundMatch = true; } // Check named user and group entries if user was not denied by owner entry. if (!foundMatch) { for (int pos = 0, entry; pos aclFeature.getEntriesSize(); pos++) { entry = aclFeature.getEntryAt(pos); if (AclEntryStatusFormat.getScope(entry) == AclEntryScope.DEFAULT) { break; } AclEntryType type = AclEntryStatusFormat.getType(entry); String name = AclEntryStatusFormat.getName(entry); if (type == AclEntryType.USER) { // Use named user entry with mask from permission bits applied if user // matches name. if (getUser().equals(name)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; break; } } else if (type == AclEntryType.GROUP) { // Use group entry (unnamed or named) with mask from permission bits // applied if user is a member and entry grants access. If user is a // member of multiple groups that have entries that grant access, then // it doesn't matter which is chosen, so exit early after first match. String group = name == null ? inode.getGroupName() : name; if (getGroups().contains(group)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction());
[jira] [Updated] (HDFS-8748) ACL permission check does not union groups to determine effective permissions
[ https://issues.apache.org/jira/browse/HDFS-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Opell updated HDFS-8748: -- Flags: Patch ACL permission check does not union groups to determine effective permissions - Key: HDFS-8748 URL: https://issues.apache.org/jira/browse/HDFS-8748 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Scott Opell Labels: acl, permission Attachments: HDFS_8748.patch In the ACL permission checking routine, the implemented named group section does not match the design document. In the design document, its shown in the pseudo-code that if the requester is not the owner or a named user, then the applicable groups are unioned together to form effective permissions for the requester. Instead, the current implementation will search for the first group that grants access and will use that. It will not union the permissions together. Here is the design document's description of the desired behavior {quote} If the user is a member of the file's group or at least one group for which there is a named group entry in the ACL, then effective permissions are calculated from groups. This is the union of the file group permissions (if the user is a member of the file group) and all named group entries matching the user's groups. For example, consider a user that is a member of 2 groups: sales and execs. The user is not the file owner, and the ACL contains no named user entries. The ACL contains named group entries for both groups as follows: group:sales:r\-\-, group:execs:\-w\-. In this case, the user's effective permissions are rw-. {quote} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 10?? The design document's algorithm matches that description: *Design Document Algorithm* {code:title=DesignDocument} if (user == fileOwner) { effectivePermissions = aclEntries.getOwnerPermissions() } else if (user ∈ aclEntries.getNamedUsers()) { effectivePermissions = aclEntries.getNamedUserPermissions(user) } else if (userGroupsInAcl != ∅) { effectivePermissions = ∅ if (fileGroup ∈ userGroupsInAcl) { effectivePermissions = effectivePermissions ∪ aclEntries.getGroupPermissions() } for ({group | group ∈ userGroupsInAcl}) { effectivePermissions = effectivePermissions ∪ aclEntries.getNamedGroupPermissions(group) } } else { effectivePermissions = aclEntries.getOthersPermissions() } {code} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 9?? The current implementation does NOT match the description. *Current Trunk* {code:title=FSPermissionChecker.java} // Use owner entry from permission bits if user is owner. if (getUser().equals(inode.getUserName())) { if (mode.getUserAction().implies(access)) { return; } foundMatch = true; } // Check named user and group entries if user was not denied by owner entry. if (!foundMatch) { for (int pos = 0, entry; pos aclFeature.getEntriesSize(); pos++) { entry = aclFeature.getEntryAt(pos); if (AclEntryStatusFormat.getScope(entry) == AclEntryScope.DEFAULT) { break; } AclEntryType type = AclEntryStatusFormat.getType(entry); String name = AclEntryStatusFormat.getName(entry); if (type == AclEntryType.USER) { // Use named user entry with mask from permission bits applied if user // matches name. if (getUser().equals(name)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; break; } } else if (type == AclEntryType.GROUP) { // Use group entry (unnamed or named) with mask from permission bits // applied if user is a member and entry grants access. If user is a // member of multiple groups that have entries that grant access, then // it doesn't matter which is chosen, so exit early after first match. String group = name == null ? inode.getGroupName() : name; if (getGroups().contains(group)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; } } } } {code} As seen in the GROUP section, the permissions check will succeed if and
[jira] [Updated] (HDFS-8797) WebHdfsFileSystem creates too many connections for pread
[ https://issues.apache.org/jira/browse/HDFS-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8797: Attachment: HDFS-8797.000.patch One possible way to fix can be to override the {{readFully}} method in {{ByteRangeInputStream}} in which we use a newly created InputStream so that we do not need to touch the internal states. Upload a patch to demo the idea. WebHdfsFileSystem creates too many connections for pread Key: HDFS-8797 URL: https://issues.apache.org/jira/browse/HDFS-8797 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8797.000.patch While running a test we found that WebHdfsFileSystem can create several thousand connections when doing a position read of a 200MB file. For each connection the client will connect to the DataNode again and the DataNode will create a new DFSClient instance to handle the read request. This also leads to several thousand {{getBlockLocations}} call to the NameNode. The cause of the issue is that in {{FSInputStream#read(long, byte[], int, int)}}, each time the inputstream reads some time, it seeks back to the old position and resets its state to SEEK. Thus the next read will regenerate the connection. {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { synchronized (this) { long oldPos = getPos(); int nread = -1; try { seek(position); nread = read(buffer, offset, length); } finally { seek(oldPos); } return nread; } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8800) shutdown has bugs
John Smith created HDFS-8800: Summary: shutdown has bugs Key: HDFS-8800 URL: https://issues.apache.org/jira/browse/HDFS-8800 Project: Hadoop HDFS Issue Type: Bug Reporter: John Smith -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8797) WebHdfsFileSystem creates too many connections for pread
[ https://issues.apache.org/jira/browse/HDFS-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8797: Status: Patch Available (was: Open) WebHdfsFileSystem creates too many connections for pread Key: HDFS-8797 URL: https://issues.apache.org/jira/browse/HDFS-8797 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8797.000.patch While running a test we found that WebHdfsFileSystem can create several thousand connections when doing a position read of a 200MB file. For each connection the client will connect to the DataNode again and the DataNode will create a new DFSClient instance to handle the read request. This also leads to several thousand {{getBlockLocations}} call to the NameNode. The cause of the issue is that in {{FSInputStream#read(long, byte[], int, int)}}, each time the inputstream reads some time, it seeks back to the old position and resets its state to SEEK. Thus the next read will regenerate the connection. {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { synchronized (this) { long oldPos = getPos(); int nread = -1; try { seek(position); nread = read(buffer, offset, length); } finally { seek(oldPos); } return nread; } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634302#comment-14634302 ] Hadoop QA commented on HDFS-8344: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 20s | The applied patch generated 5 new checkstyle issues (total was 854, now 855). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 35s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 4s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 161m 17s | Tests failed in hadoop-hdfs. | | | | 204m 48s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746170/HDFS-8344.07.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 98c2bc8 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11754/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11754/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11754/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11754/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11754/console | This message was automatically generated. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634172#comment-14634172 ] Haohui Mai commented on HDFS-8344: -- -1. Can you please revert the commit? I'm concerned with the complexity associated with the commit as well as the difficulty for the users to choose the right configuration. It's an internal implementation detail and it should not be exposed to users whenever it's possible. We intentionally keep the soft and hard limit not configurable to avoid the users shooting their foot. bq. The datanode might be busy and recovery may fail the first time. That's exactly what the hard limit / retries of leases is designed for. Again this is only one type of internal implementation towards the solution. The detail should not be exposed to the users. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634178#comment-14634178 ] Haohui Mai commented on HDFS-6407: -- Though it's nice to fix, it is not a core HDFS functionality though. Changing the priority back to Minor. Please feel free to bump the priority if you feel differently. Contributions are appreciated. new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Benoy Antony Priority: Minor Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8059) Erasure coding: revisit how to store EC schema and cellSize in NameNode
[ https://issues.apache.org/jira/browse/HDFS-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634181#comment-14634181 ] Andrew Wang commented on HDFS-8059: --- [~wheat9] if you could also address the other points I made in my above comment, that would also add to the discussion. Breaking rename as I stated above is a huge limitation. This JIRA is also on the shortlist of remaining issues for the EC branch, so we'd like to make progress on it quickly. Erasure coding: revisit how to store EC schema and cellSize in NameNode --- Key: HDFS-8059 URL: https://issues.apache.org/jira/browse/HDFS-8059 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8059.001.patch Move {{dataBlockNum}} and {{parityBlockNum}} from BlockInfoStriped to INodeFile, and store them in {{FileWithStripedBlocksFeature}}. Ideally these two nums are the same for all striped blocks in a file, and store them in BlockInfoStriped will waste NN memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8695) OzoneHandler : Add Bucket REST Interface
[ https://issues.apache.org/jira/browse/HDFS-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8695: --- Attachment: hdfs-8695-HDFS-7240.001.patch * Adds REST interface for buckets and corresponding handler OzoneHandler : Add Bucket REST Interface Key: HDFS-8695 URL: https://issues.apache.org/jira/browse/HDFS-8695 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8695-HDFS-7240.001.patch Add Bucket REST interface into Ozone server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8748) ACL permission check does not union groups to determine effective permissions
[ https://issues.apache.org/jira/browse/HDFS-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-8748: Assignee: Chris Nauroth Hello [~scott_o]. Thank you for reporting this. I can confirm that this is a bug. (The design doc is correct, and the current code has a bug.) To confirm this, I ran a test case on an ext4 file system with ACLs enabled. See below for a transcript of my test case. Executing a file requires both read and execute permissions (r-x). In my test case, I defined the read permission on one named group entry and the execute permission on a second named group entry. My user was able to execute the file. This proves that on ext4, permissions can be defined on separate named group ACL entries, and the permission checks will treat the union of those entries as the effective permissions. Scott, are you interested in coding a patch? If not, then I'll assign this to myself for the fix. {code} whoami cnauroth groups cnauroth sudo testgroup1 getfacl test_HDFS-8748 # file: test_HDFS-8748 # owner: root # group: root user::rwx group::--- group:sudo:r-- group:testgroup1:--x mask::r-x other::--- ./test_HDFS-8748 echo $? 0 sudo setfacl -m group:sudo:r--,group:testgroup1:--- test_HDFS-8748 ./test_HDFS-8748 -bash: ./test_HDFS-8748: Permission denied echo $? 126 sudo setfacl -m group:sudo:---,group:testgroup1:--x test_HDFS-8748 ./test_HDFS-8748 bash: ./test_HDFS-8748: Permission denied echo $? 126 sudo setfacl -m group:sudo:r--,group:testgroup1:--x test_HDFS-8748 ./test_HDFS-8748 echo $? 0 {code} ACL permission check does not union groups to determine effective permissions - Key: HDFS-8748 URL: https://issues.apache.org/jira/browse/HDFS-8748 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Scott Opell Assignee: Chris Nauroth Labels: acl, permission In the ACL permission checking routine, the implemented named group section does not match the design document. In the design document, its shown in the pseudo-code that if the requester is not the owner or a named user, then the applicable groups are unioned together to form effective permissions for the requester. Instead, the current implementation will search for the first group that grants access and will use that. It will not union the permissions together. Here is the design document's description of the desired behavior {quote} If the user is a member of the file's group or at least one group for which there is a named group entry in the ACL, then effective permissions are calculated from groups. This is the union of the file group permissions (if the user is a member of the file group) and all named group entries matching the user's groups. For example, consider a user that is a member of 2 groups: sales and execs. The user is not the file owner, and the ACL contains no named user entries. The ACL contains named group entries for both groups as follows: group:sales:r\-\-, group:execs:\-w\-. In this case, the user's effective permissions are rw-. {quote} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 10?? The design document's algorithm matches that description: *Design Document Algorithm* {code:title=DesignDocument} if (user == fileOwner) { effectivePermissions = aclEntries.getOwnerPermissions() } else if (user ∈ aclEntries.getNamedUsers()) { effectivePermissions = aclEntries.getNamedUserPermissions(user) } else if (userGroupsInAcl != ∅) { effectivePermissions = ∅ if (fileGroup ∈ userGroupsInAcl) { effectivePermissions = effectivePermissions ∪ aclEntries.getGroupPermissions() } for ({group | group ∈ userGroupsInAcl}) { effectivePermissions = effectivePermissions ∪ aclEntries.getNamedGroupPermissions(group) } } else { effectivePermissions = aclEntries.getOthersPermissions() } {code} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 9?? The current implementation does NOT match the description. *Current Trunk* {code:title=FSPermissionChecker.java} // Use owner entry from permission bits if user is owner. if (getUser().equals(inode.getUserName())) { if (mode.getUserAction().implies(access)) { return; } foundMatch = true; } // Check named user and group entries if user was not denied by owner entry. if (!foundMatch) { for (int pos = 0, entry; pos aclFeature.getEntriesSize(); pos++) { entry = aclFeature.getEntryAt(pos); if (AclEntryStatusFormat.getScope(entry) == AclEntryScope.DEFAULT) {
[jira] [Updated] (HDFS-8748) ACL permission check does not union groups to determine effective permissions
[ https://issues.apache.org/jira/browse/HDFS-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-8748: Assignee: (was: Chris Nauroth) ACL permission check does not union groups to determine effective permissions - Key: HDFS-8748 URL: https://issues.apache.org/jira/browse/HDFS-8748 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Scott Opell Labels: acl, permission In the ACL permission checking routine, the implemented named group section does not match the design document. In the design document, its shown in the pseudo-code that if the requester is not the owner or a named user, then the applicable groups are unioned together to form effective permissions for the requester. Instead, the current implementation will search for the first group that grants access and will use that. It will not union the permissions together. Here is the design document's description of the desired behavior {quote} If the user is a member of the file's group or at least one group for which there is a named group entry in the ACL, then effective permissions are calculated from groups. This is the union of the file group permissions (if the user is a member of the file group) and all named group entries matching the user's groups. For example, consider a user that is a member of 2 groups: sales and execs. The user is not the file owner, and the ACL contains no named user entries. The ACL contains named group entries for both groups as follows: group:sales:r\-\-, group:execs:\-w\-. In this case, the user's effective permissions are rw-. {quote} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 10?? The design document's algorithm matches that description: *Design Document Algorithm* {code:title=DesignDocument} if (user == fileOwner) { effectivePermissions = aclEntries.getOwnerPermissions() } else if (user ∈ aclEntries.getNamedUsers()) { effectivePermissions = aclEntries.getNamedUserPermissions(user) } else if (userGroupsInAcl != ∅) { effectivePermissions = ∅ if (fileGroup ∈ userGroupsInAcl) { effectivePermissions = effectivePermissions ∪ aclEntries.getGroupPermissions() } for ({group | group ∈ userGroupsInAcl}) { effectivePermissions = effectivePermissions ∪ aclEntries.getNamedGroupPermissions(group) } } else { effectivePermissions = aclEntries.getOthersPermissions() } {code} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 9?? The current implementation does NOT match the description. *Current Trunk* {code:title=FSPermissionChecker.java} // Use owner entry from permission bits if user is owner. if (getUser().equals(inode.getUserName())) { if (mode.getUserAction().implies(access)) { return; } foundMatch = true; } // Check named user and group entries if user was not denied by owner entry. if (!foundMatch) { for (int pos = 0, entry; pos aclFeature.getEntriesSize(); pos++) { entry = aclFeature.getEntryAt(pos); if (AclEntryStatusFormat.getScope(entry) == AclEntryScope.DEFAULT) { break; } AclEntryType type = AclEntryStatusFormat.getType(entry); String name = AclEntryStatusFormat.getName(entry); if (type == AclEntryType.USER) { // Use named user entry with mask from permission bits applied if user // matches name. if (getUser().equals(name)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; break; } } else if (type == AclEntryType.GROUP) { // Use group entry (unnamed or named) with mask from permission bits // applied if user is a member and entry grants access. If user is a // member of multiple groups that have entries that grant access, then // it doesn't matter which is chosen, so exit early after first match. String group = name == null ? inode.getGroupName() : name; if (getGroups().contains(group)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; } } } } {code} As seen in the GROUP section, the permissions check will succeed if and only if a
[jira] [Commented] (HDFS-7483) Display information per tier on the Namenode UI
[ https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634243#comment-14634243 ] Benoy Antony commented on HDFS-7483: Nice. Thanks for the snippet. I like this approach. +1 pending jenkins. Display information per tier on the Namenode UI --- Key: HDFS-7483 URL: https://issues.apache.org/jira/browse/HDFS-7483 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, HDFS-7483.003.patch, overview.png, storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, withTwoStorageType.png If cluster has different types of storage, it is useful to display the storage information per type. The information will be available via JMX (HDFS-7390) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8657) Update docs for mSNN
[ https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634275#comment-14634275 ] Aaron T. Myers commented on HDFS-8657: -- +1, latest patch looks good to me. I'm going to commit this momentarily. Update docs for mSNN Key: HDFS-8657 URL: https://issues.apache.org/jira/browse/HDFS-8657 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch After the commit of HDFS-6440, some docs need to be updated to reflect the new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8748) ACL permission check does not union groups to determine effective permissions
[ https://issues.apache.org/jira/browse/HDFS-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Opell updated HDFS-8748: -- Attachment: HDFS_8748.patch ACL permission check does not union groups to determine effective permissions - Key: HDFS-8748 URL: https://issues.apache.org/jira/browse/HDFS-8748 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Scott Opell Labels: acl, permission Attachments: HDFS_8748.patch In the ACL permission checking routine, the implemented named group section does not match the design document. In the design document, its shown in the pseudo-code that if the requester is not the owner or a named user, then the applicable groups are unioned together to form effective permissions for the requester. Instead, the current implementation will search for the first group that grants access and will use that. It will not union the permissions together. Here is the design document's description of the desired behavior {quote} If the user is a member of the file's group or at least one group for which there is a named group entry in the ACL, then effective permissions are calculated from groups. This is the union of the file group permissions (if the user is a member of the file group) and all named group entries matching the user's groups. For example, consider a user that is a member of 2 groups: sales and execs. The user is not the file owner, and the ACL contains no named user entries. The ACL contains named group entries for both groups as follows: group:sales:r\-\-, group:execs:\-w\-. In this case, the user's effective permissions are rw-. {quote} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 10?? The design document's algorithm matches that description: *Design Document Algorithm* {code:title=DesignDocument} if (user == fileOwner) { effectivePermissions = aclEntries.getOwnerPermissions() } else if (user ∈ aclEntries.getNamedUsers()) { effectivePermissions = aclEntries.getNamedUserPermissions(user) } else if (userGroupsInAcl != ∅) { effectivePermissions = ∅ if (fileGroup ∈ userGroupsInAcl) { effectivePermissions = effectivePermissions ∪ aclEntries.getGroupPermissions() } for ({group | group ∈ userGroupsInAcl}) { effectivePermissions = effectivePermissions ∪ aclEntries.getNamedGroupPermissions(group) } } else { effectivePermissions = aclEntries.getOthersPermissions() } {code} ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf page 9?? The current implementation does NOT match the description. *Current Trunk* {code:title=FSPermissionChecker.java} // Use owner entry from permission bits if user is owner. if (getUser().equals(inode.getUserName())) { if (mode.getUserAction().implies(access)) { return; } foundMatch = true; } // Check named user and group entries if user was not denied by owner entry. if (!foundMatch) { for (int pos = 0, entry; pos aclFeature.getEntriesSize(); pos++) { entry = aclFeature.getEntryAt(pos); if (AclEntryStatusFormat.getScope(entry) == AclEntryScope.DEFAULT) { break; } AclEntryType type = AclEntryStatusFormat.getType(entry); String name = AclEntryStatusFormat.getName(entry); if (type == AclEntryType.USER) { // Use named user entry with mask from permission bits applied if user // matches name. if (getUser().equals(name)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; break; } } else if (type == AclEntryType.GROUP) { // Use group entry (unnamed or named) with mask from permission bits // applied if user is a member and entry grants access. If user is a // member of multiple groups that have entries that grant access, then // it doesn't matter which is chosen, so exit early after first match. String group = name == null ? inode.getGroupName() : name; if (getGroups().contains(group)) { FsAction masked = AclEntryStatusFormat.getPermission(entry).and( mode.getGroupAction()); if (masked.implies(access)) { return; } foundMatch = true; } } } } {code} As seen in the GROUP section, the permissions check will
[jira] [Commented] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634318#comment-14634318 ] Andrew Wang commented on HDFS-8779: --- Yea, so like I said WebHdfsFileSystem's JSON parser does not have the 2^53-1 limitation...that was an aside to my concern about compatibility, which I think is accurate. WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch, HDFS-8779.03.patch, after-02-patch.png, before.png Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6945) BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed
[ https://issues.apache.org/jira/browse/HDFS-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634458#comment-14634458 ] Hudson commented on HDFS-6945: -- FAILURE: Integrated in Hadoop-trunk-Commit #8189 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8189/]) Move HDFS-6945 to 2.7.2 section in CHANGES.txt. (aajisaka: rev a628f675900d2533ddf86fb3d3e601238ecd68c3) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed -- Key: HDFS-6945 URL: https://issues.apache.org/jira/browse/HDFS-6945 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Critical Labels: metrics Fix For: 2.8.0 Attachments: HDFS-6945-003.patch, HDFS-6945-004.patch, HDFS-6945-005.patch, HDFS-6945.2.patch, HDFS-6945.patch I'm seeing ExcessBlocks metric increases to more than 300K in some clusters, however, there are no over-replicated blocks (confirmed by fsck). After a further research, I noticed when deleting a block, BlockManager does not remove the block from excessReplicateMap or decrement excessBlocksCount. Usually the metric is decremented when processing block report, however, if the block has been deleted, BlockManager does not remove the block from excessReplicateMap or decrement the metric. That way the metric and excessReplicateMap can increase infinitely (i.e. memory leak can occur). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6945) BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed
[ https://issues.apache.org/jira/browse/HDFS-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6945: Fix Version/s: 2.7.2 BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed -- Key: HDFS-6945 URL: https://issues.apache.org/jira/browse/HDFS-6945 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Critical Labels: metrics Fix For: 2.8.0, 2.7.2 Attachments: HDFS-6945-003.patch, HDFS-6945-004.patch, HDFS-6945-005.patch, HDFS-6945.2.patch, HDFS-6945.patch I'm seeing ExcessBlocks metric increases to more than 300K in some clusters, however, there are no over-replicated blocks (confirmed by fsck). After a further research, I noticed when deleting a block, BlockManager does not remove the block from excessReplicateMap or decrement excessBlocksCount. Usually the metric is decremented when processing block report, however, if the block has been deleted, BlockManager does not remove the block from excessReplicateMap or decrement the metric. That way the metric and excessReplicateMap can increase infinitely (i.e. memory leak can occur). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7483) Display information per tier on the Namenode UI
[ https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7483: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~benoyantony] for the contribution. Display information per tier on the Namenode UI --- Key: HDFS-7483 URL: https://issues.apache.org/jira/browse/HDFS-7483 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Benoy Antony Assignee: Benoy Antony Fix For: 2.8.0 Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, HDFS-7483.003.patch, overview.png, storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, withTwoStorageType.png If cluster has different types of storage, it is useful to display the storage information per type. The information will be available via JMX (HDFS-7390) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8779: Attachment: HDFS-8779.04.patch patch-to-json-parse.txt Thanks [~wheat9] for the idea. json-bigint doesn't have a front-end version. The author gives a browserify version in [here|http://stackoverflow.com/questions/18755125/node-js-is-there-any-proper-way-to-parse-json-with-large-numbers-long-bigint] The file is up to 79kb. Since both [BigNumber|https://github.com/MikeMcl/bignumber.js] and [JSON-js|https://github.com/douglascrockford/JSON-js] have a front-end version. I re-create the file using the idea of json-bigint: I simply add 2 lines to JSON-js library ( that's how json-bigint does): {code} +if (string.length 15) + return new BigNumber(string); {code} {{patch-to-json-parse.txt}} shows that. I didn't change anything to {{BigNumber}}. Uploaded 04 patch, tested in chrome/ie11/firefox. I still prefer 03 patch because it's simple. Both works for me. WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch, HDFS-8779.03.patch, HDFS-8779.04.patch, after-02-patch.png, before.png, patch-to-json-parse.txt Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8616) Cherry pick HDFS-6495 for excess block leak
[ https://issues.apache.org/jira/browse/HDFS-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved HDFS-8616. - Resolution: Done I've backported HDFS-6945 to 2.7.2. Please reopen this issue if you disagree. Cherry pick HDFS-6495 for excess block leak --- Key: HDFS-8616 URL: https://issues.apache.org/jira/browse/HDFS-8616 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Akira AJISAKA Busy clusters quickly leak tens or hundreds of thousands of excess blocks which slow BR processing. HDFS-6495 should be cherry picked into 2.7.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8616) Cherry pick HDFS-6945 for excess block leak
[ https://issues.apache.org/jira/browse/HDFS-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-8616: Summary: Cherry pick HDFS-6945 for excess block leak (was: Cherry pick HDFS-6495 for excess block leak) Cherry pick HDFS-6945 for excess block leak --- Key: HDFS-8616 URL: https://issues.apache.org/jira/browse/HDFS-8616 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Akira AJISAKA Busy clusters quickly leak tens or hundreds of thousands of excess blocks which slow BR processing. HDFS-6495 should be cherry picked into 2.7.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8779: Description: Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of -number-(*integer*) in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. was: Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch, HDFS-8779.03.patch, HDFS-8779.04.patch, after-02-patch.png, before.png, patch-to-json-parse.txt Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of -number-(*integer*) in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634469#comment-14634469 ] Hadoop QA commented on HDFS-8779: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 0m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | release audit | 0m 13s | The applied patch generated 2 release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 0m 16s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746254/HDFS-8779.04.patch | | Optional Tests | | | git revision | trunk / df1e8ce | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11761/artifact/patchprocess/patchReleaseAuditProblems.txt | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11761/console | This message was automatically generated. WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch, HDFS-8779.03.patch, HDFS-8779.04.patch, after-02-patch.png, before.png, patch-to-json-parse.txt Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of -number-(*integer*) in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)