[jira] [Resolved] (HDFS-16891) Avoid the overhead of copy-on-write exception list while loading inodes sub sections in parallel
[ https://issues.apache.org/jira/browse/HDFS-16891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-16891. -- Fix Version/s: 3.4.0 3.3.9 Resolution: Fixed > Avoid the overhead of copy-on-write exception list while loading inodes sub > sections in parallel > > > Key: HDFS-16891 > URL: https://issues.apache.org/jira/browse/HDFS-16891 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.4 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > If we enable parallel loading and persisting of inodes from/to fs image, we > get the benefit of improved performance. However, while loading sub-sections > INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write > list to maintain the list of exceptions. Since our usecase is not to iterate > over this list while executor threads are adding new elements to the list, > using copy-on-write is bit of an overhead for this usecase. > It would be better to synchronize adding new elements to the list rather than > having the list copy all elements over every time new element is added to the > list. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16887) Log start and end of phase/step in startup progress
[ https://issues.apache.org/jira/browse/HDFS-16887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-16887. -- Fix Version/s: 3.4.0 3.2.5 3.3.9 Resolution: Fixed > Log start and end of phase/step in startup progress > --- > > Key: HDFS-16887 > URL: https://issues.apache.org/jira/browse/HDFS-16887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.5, 3.3.9 > > > As part of Namenode startup progress, we have multiple phases and steps > within phase that are instantiated. While the startup progress view can be > instantiated with the current view of phase/step, having at least DEBUG logs > for startup progress would be helpful to identify when a particular step for > LOADING_FSIMAGE/SAVING_CHECKPOINT/LOADING_EDITS was started and ended. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16887) Log start and end of phase/step in startup progress
[ https://issues.apache.org/jira/browse/HDFS-16887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-16887: - Component/s: namenode > Log start and end of phase/step in startup progress > --- > > Key: HDFS-16887 > URL: https://issues.apache.org/jira/browse/HDFS-16887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > > As part of Namenode startup progress, we have multiple phases and steps > within phase that are instantiated. While the startup progress view can be > instantiated with the current view of phase/step, having at least DEBUG logs > for startup progress would be helpful to identify when a particular step for > LOADING_FSIMAGE/SAVING_CHECKPOINT/LOADING_EDITS was started and ended. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission
[ https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-16881: - Fix Version/s: 3.4.0 (was: 1.3.0) > Warn if AccessControlEnforcer runs for a long time to check permission > -- > > Key: HDFS-16881 > URL: https://issues.apache.org/jira/browse/HDFS-16881 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > AccessControlEnforcer is configurable. If an external AccessControlEnforcer > runs for a long time to check permission with the FSnamesystem lock, it will > significantly slow down the entire Namenode. In the JIRA, we will print a > WARN message when it happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16868) Audit log duplicate problem when an ACE occurs in FSNamesystem.
[ https://issues.apache.org/jira/browse/HDFS-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth reassigned HDFS-16868: Assignee: Beibei Zhao > Audit log duplicate problem when an ACE occurs in FSNamesystem. > --- > > Key: HDFS-16868 > URL: https://issues.apache.org/jira/browse/HDFS-16868 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Beibei Zhao >Assignee: Beibei Zhao >Priority: Major > Labels: pull-request-available > > checkSuperuserPrivilege call logAuditEvent and throw ace when an > AccessControlException occurs. > {code:java} > // This method logs operationName without super user privilege. > // It should be called without holding FSN lock. > void checkSuperuserPrivilege(String operationName, String path) > throws IOException { > if (isPermissionEnabled) { > try { > FSPermissionChecker.setOperationType(operationName); > FSPermissionChecker pc = getPermissionChecker(); > pc.checkSuperuserPrivilege(path); > } catch(AccessControlException ace){ > logAuditEvent(false, operationName, path); > throw ace; > } > } > } > {code} > It' s callers like metaSave call it like this: > {code:java} > /** >* Dump all metadata into specified file >* @param filename >*/ > void metaSave(String filename) throws IOException { > String operationName = "metaSave"; > checkSuperuserPrivilege(operationName); > .. > try { > .. > metaSave(out); > .. > } > } finally { > readUnlock(operationName, getLockReportInfoSupplier(null)); > } > logAuditEvent(true, operationName, null); > } > {code} > but setQuota, addCachePool, modifyCachePool, removeCachePool, > createEncryptionZone and reencryptEncryptionZone catch the ace and log the > same msg again, it' s a waste of memory I think: > {code:java} > /** >* Set the namespace quota and storage space quota for a directory. >* See {@link ClientProtocol#setQuota(String, long, long, StorageType)} for > the >* contract. >* >* Note: This does not support ".inodes" relative path. >*/ > void setQuota(String src, long nsQuota, long ssQuota, StorageType type) > throws IOException { > .. > try { > if(!allowOwnerSetQuota) { > checkSuperuserPrivilege(operationName, src); > } > .. > } catch (AccessControlException ace) { > logAuditEvent(false, operationName, src); > throw ace; > } > getEditLog().logSync(); > logAuditEvent(true, operationName, src); > } > {code} > Maybe we should move the checkSuperuserPrivilege out of the try block as > metaSave and other callers do. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8510) Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.
[ https://issues.apache.org/jira/browse/HDFS-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-8510. - Resolution: Won't Fix This is an old improvement proposal that I'm no longer planning on implementing. I'm going to close the issue. If anyone else would find it useful, please feel free to reopen and reassign. I'd be happy to help with code review. > Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo. > -- > > Key: HDFS-8510 > URL: https://issues.apache.org/jira/browse/HDFS-8510 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Major > > During a rolling upgrade, an administrator runs {{hdfs dfsadmin > -getDatanodeInfo}} to check if a DataNode has stopped. Currently, this > operation is subject to the RPC connection retries defined in > {{ipc.client.connect.max.retries}} and {{ipc.client.connect.retry.interval}}. > This issue proposes adding separate configuration properties to control the > retries for this operation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-4289) FsDatasetImpl#updateReplicaUnderRecovery throws errors validating replica byte count on Windows
[ https://issues.apache.org/jira/browse/HDFS-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-4289. - Resolution: Won't Fix I'm no longer actively working on this. I no longer have easy access to a Windows environment to make Windows-specific changes, or even to confirm that this test failure still happens. It's a very old issue with no recent activity, so I'm going to assume it's no longer relevant and close it out. If it's still an ongoing issue that a Windows developer wants to pick up, please feel free to reopen and reassign. > FsDatasetImpl#updateReplicaUnderRecovery throws errors validating replica > byte count on Windows > --- > > Key: HDFS-4289 > URL: https://issues.apache.org/jira/browse/HDFS-4289 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: trunk-win >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Major > > {{FsDatasetImpl#updateReplicaUnderRecovery}} throws errors validating replica > byte count on Windows. This can be seen by running > {{TestBalancerWithNodeGroup#testBalancerWithRackLocality}}, which fails on > Windows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-3296) Running libhdfs tests in mac fails
[ https://issues.apache.org/jira/browse/HDFS-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth reassigned HDFS-3296: --- Assignee: (was: Chris Nauroth) I'm going to unassign this, because I'm no longer actively working on it. I see a new patch revision came in from [~jzhuge] a while ago. John (or others), please feel free to take it if you're working on it. > Running libhdfs tests in mac fails > -- > > Key: HDFS-3296 > URL: https://issues.apache.org/jira/browse/HDFS-3296 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Reporter: Amareshwari Sriramadasu >Priority: Major > Attachments: HDFS-3296.001.patch, HDFS-3296.002.patch, > HDFS-3296.003.patch, HDFS-3296.004.patch > > > Running "ant -Dcompile.c++=true -Dlibhdfs=true test-c++-libhdfs" on Mac fails > with following error: > {noformat} > [exec] dyld: lazy symbol binding failed: Symbol not found: > _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] dyld: Symbol not found: _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] > /Users/amareshwari.sr/workspace/hadoop/src/c++/libhdfs/tests/test-libhdfs.sh: > line 122: 39485 Trace/BPT trap: 5 CLASSPATH=$HADOOP_CONF_DIR:$CLASSPATH > LD_PRELOAD="$LIB_JVM_DIR/libjvm.so:$LIBHDFS_INSTALL_DIR/libhdfs.so:" > $LIBHDFS_BUILD_DIR/$HDFS_TEST > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16623) IllegalArgumentException in LifelineSender
[ https://issues.apache.org/jira/browse/HDFS-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-16623: - Fix Version/s: 3.4.0 3.2.4 3.3.4 > IllegalArgumentException in LifelineSender > -- > > Key: HDFS-16623 > URL: https://issues.apache.org/jira/browse/HDFS-16623 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 1h > Remaining Estimate: 0h > > In our production environment, an IllegalArgumentException occurred in the > LifelineSender at one DataNode which was undergoing GC at that time. > And the bug code is at line 1060 in BPServiceActor.java, because the sleep > time is negative. > {code:java} > while (shouldRun()) { > try { > if (lifelineNamenode == null) { > lifelineNamenode = dn.connectToLifelineNN(lifelineNnAddr); > } > sendLifelineIfDue(); > Thread.sleep(scheduler.getLifelineWaitTime()); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } catch (IOException e) { > LOG.warn("IOException in LifelineSender for " + BPServiceActor.this, > e); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16207) Remove NN logs stack trace for non-existent xattr query
[ https://issues.apache.org/jira/browse/HDFS-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-16207: - Fix Version/s: 3.2.4 3.3.2 2.10.2 3.4.0 Resolution: Fixed Status: Resolved (was: Patch Available) [~ahussein], thank you for the additional patch to backport to branch-2.10. I just committed it. > Remove NN logs stack trace for non-existent xattr query > --- > > Key: HDFS-16207 > URL: https://issues.apache.org/jira/browse/HDFS-16207 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Attachments: HDFS-16207-branch-2.10.001.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The NN logs a full stack trace every time a getXAttrs is called for a > non-existent xattr. The logging has zero value add. The increased logging > load may harm performance. Something is now probing for xattrs resulting in > many lines of: > {code:bash} > 2021-09-02 13:48:03,340 [IPC Server handler 5 on default port 59951] INFO > ipc.Server (Server.java:logException(3149)) - IPC Server handler 5 on default > port 59951, call Call#17 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.getXAttrs from 127.0.0.1:59961 > java.io.IOException: At least one of the attributes provided was not found. > at > org.apache.hadoop.hdfs.server.namenode.FSDirXAttrOp.getXAttrs(FSDirXAttrOp.java:134) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getXAttrs(FSNamesystem.java:8472) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getXAttrs(NameNodeRpcServer.java:2317) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getXAttrs(ClientNamenodeProtocolServerSideTranslatorPB.java:1745) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1155) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1083) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3088) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16207) Remove NN logs stack trace for non-existent xattr query
[ https://issues.apache.org/jira/browse/HDFS-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412370#comment-17412370 ] Chris Nauroth commented on HDFS-16207: -- [~ahussein], thank you. I merged this to trunk, branch-3.3 and branch-3.2, resolving some trivial merge conflicts along the way. From the Affects Version/s, it sounds like you also wanted to merge to branch-2.10. Trying to merge to that branch fails with this compilation error: {code} ERROR] /home/cnauroth/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java:[524,9] cannot find symbol symbol: class XAttrNotFoundException location: class org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer {code} I stopped there. If you still want to merge to branch-2.10, please provide a separate patch compatible with the branch. > Remove NN logs stack trace for non-existent xattr query > --- > > Key: HDFS-16207 > URL: https://issues.apache.org/jira/browse/HDFS-16207 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The NN logs a full stack trace every time a getXAttrs is called for a > non-existent xattr. The logging has zero value add. The increased logging > load may harm performance. Something is now probing for xattrs resulting in > many lines of: > {code:bash} > 2021-09-02 13:48:03,340 [IPC Server handler 5 on default port 59951] INFO > ipc.Server (Server.java:logException(3149)) - IPC Server handler 5 on default > port 59951, call Call#17 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.getXAttrs from 127.0.0.1:59961 > java.io.IOException: At least one of the attributes provided was not found. > at > org.apache.hadoop.hdfs.server.namenode.FSDirXAttrOp.getXAttrs(FSDirXAttrOp.java:134) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getXAttrs(FSNamesystem.java:8472) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getXAttrs(NameNodeRpcServer.java:2317) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getXAttrs(ClientNamenodeProtocolServerSideTranslatorPB.java:1745) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1155) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1083) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3088) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16207) Remove NN logs stack trace for non-existent xattr query
[ https://issues.apache.org/jira/browse/HDFS-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412086#comment-17412086 ] Chris Nauroth commented on HDFS-16207: -- +1 Thank you for the contribution, Ahmed. I see you asked Kihwal to review, so I'll wait a day before merging in case you really wanted Kihwal's opinion specifically. > Remove NN logs stack trace for non-existent xattr query > --- > > Key: HDFS-16207 > URL: https://issues.apache.org/jira/browse/HDFS-16207 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The NN logs a full stack trace every time a getXAttrs is called for a > non-existent xattr. The logging has zero value add. The increased logging > load may harm performance. Something is now probing for xattrs resulting in > many lines of: > {code:bash} > 2021-09-02 13:48:03,340 [IPC Server handler 5 on default port 59951] INFO > ipc.Server (Server.java:logException(3149)) - IPC Server handler 5 on default > port 59951, call Call#17 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.getXAttrs from 127.0.0.1:59961 > java.io.IOException: At least one of the attributes provided was not found. > at > org.apache.hadoop.hdfs.server.namenode.FSDirXAttrOp.getXAttrs(FSDirXAttrOp.java:134) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getXAttrs(FSNamesystem.java:8472) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getXAttrs(NameNodeRpcServer.java:2317) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getXAttrs(ClientNamenodeProtocolServerSideTranslatorPB.java:1745) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1155) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1083) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3088) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.
[ https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054424#comment-16054424 ] Chris Nauroth commented on HDFS-9505: - FYI, I have filed HDFS-11995 for another inaccuracy in the HDFS Architecture documentation that remains even after this patch was committed. > HDFS Architecture documentation needs to be refreshed. > -- > > Key: HDFS-9505 > URL: https://issues.apache.org/jira/browse/HDFS-9505 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Chris Nauroth >Assignee: Masatake Iwasaki > Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1 > > Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch > > > The HDFS Architecture document is out of date with respect to the current > design of the system. > http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html > There are multiple false statements and omissions of recent features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11995) HDFS Architecture documentation incorrectly describes writing to a local temporary file.
[ https://issues.apache.org/jira/browse/HDFS-11995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-11995: - Priority: Minor (was: Major) > HDFS Architecture documentation incorrectly describes writing to a local > temporary file. > > > Key: HDFS-11995 > URL: https://issues.apache.org/jira/browse/HDFS-11995 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha3 >Reporter: Chris Nauroth >Priority: Minor > > The HDFS Architecture documentation has a section titled "Staging" that > describes clients writing to a local temporary file first before interacting > with the NameNode to allocate file metadata. This information is incorrect. > (Perhaps it was correct a long time ago, but it is no longer accurate with > respect to the current implementation.) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11995) HDFS Architecture documentation incorrectly describes writing to a local temporary file.
Chris Nauroth created HDFS-11995: Summary: HDFS Architecture documentation incorrectly describes writing to a local temporary file. Key: HDFS-11995 URL: https://issues.apache.org/jira/browse/HDFS-11995 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 3.0.0-alpha3 Reporter: Chris Nauroth The HDFS Architecture documentation has a section titled "Staging" that describes clients writing to a local temporary file first before interacting with the NameNode to allocate file metadata. This information is incorrect. (Perhaps it was correct a long time ago, but it is no longer accurate with respect to the current implementation.) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6962) ACL inheritance conflicts with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971465#comment-15971465 ] Chris Nauroth commented on HDFS-6962: - Yes, agreed with John. That might then lead to the question of why this wasn't included in branch-2. I have an earlier comment where I stated that the compatibility story looks good, but I thought it was a risky change close the 2.8.0 cutoff: {quote} I think what you are proposing for configurability and extending the protocol messages makes sense as a way to provide deployments with a choice of which behavior to use. However, I'm reluctant to push it into 2.8.0 now due to the complexity of the changes required to support it. Considering something like a cross-cluster DistCp, with a mix of old and new versions in play, it could become very confusing to explain the end results to users. Unless you consider it urgent for 2.8.0, would you consider targeting it to the 3.x line, as I had done a while ago? {quote} If users are asking for this change in the 2.x line, I think we could probably make it happen. At this point, it would have to be tracked in a separate JIRA with a separate release note targeted to a 2.x release. However, if there isn't user demand for shipping the change in 2.x, then it's still probably safer to leave it in 3.x only. > ACL inheritance conflicts with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Fix For: 3.0.0-alpha2 > > Attachments: disabled_new_client.log, disabled_old_client.log, > enabled_new_client.log, enabled_old_client.log, HDFS-6962.001.patch, > HDFS-6962.002.patch, HDFS-6962.003.patch, HDFS-6962.004.patch, > HDFS-6962.005.patch, HDFS-6962.006.patch, HDFS-6962.007.patch, > HDFS-6962.008.patch, HDFS-6962.009.patch, HDFS-6962.010.patch, > HDFS-6962.1.patch, run_compat_tests, run_unit_tests, test_plan.md > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-un
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966659#comment-15966659 ] Chris Nauroth commented on HDFS-11163: -- [~djp], sorry I missed the email update on the 2.8.1 release plan. Thank you for cherry-picking it into the new branch-2.8.1. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Fix For: 3.0.0-alpha3, 2.8.1 > > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, > HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, > HDFS-11163-006.patch, HDFS-11163-007.patch, HDFS-11163-branch-2.001.patch, > HDFS-11163-branch-2.002.patch, HDFS-11163-branch-2.003.patch, > temp-YARN-6278.HDFS-11163.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-11163: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.1 3.0.0-alpha3 Status: Resolved (was: Patch Available) +1 for the latest patches. I have committed this to trunk, branch-2 and branch-2.8. [~surendrasingh], thank you for the contribution. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Fix For: 3.0.0-alpha3, 2.8.1 > > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, > HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, > HDFS-11163-006.patch, HDFS-11163-007.patch, HDFS-11163-branch-2.001.patch, > HDFS-11163-branch-2.002.patch, HDFS-11163-branch-2.003.patch, > temp-YARN-6278.HDFS-11163.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964553#comment-15964553 ] Chris Nauroth commented on HDFS-11163: -- [~surendrasingh], it looks like HDFS-11163-branch-2.002.patch still doesn't apply cleanly. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, > HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, > HDFS-11163-006.patch, HDFS-11163-007.patch, HDFS-11163-branch-2.001.patch, > HDFS-11163-branch-2.002.patch, temp-YARN-6278.HDFS-11163.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963498#comment-15963498 ] Chris Nauroth commented on HDFS-11163: -- [~surendrasingh], sorry, but the patches need to be rebased again. Sorry for the churn. I'll do my best to prioritize getting these committed right after you post new patches so that you don't have to rebase again. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, > HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, > HDFS-11163-006.patch, HDFS-11163-branch-2.001.patch, > temp-YARN-6278.HDFS-11163.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927069#comment-15927069 ] Chris Nauroth commented on HDFS-11163: -- [~surendrasingh], I was about to commit this when I noticed that the patch is not compatible with branch-2. Can you please provide a version of the patch for branch-2? > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, > HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, > HDFS-11163-006.patch, temp-YARN-6278.HDFS-11163.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905960#comment-15905960 ] Chris Nauroth commented on HDFS-11163: -- The Checkstyle warnings are not worth addressing. The test failure is in unrelated code, and I can't repro it. +1 for patch revision 006. [~vinayrpet], just to make sure I'm clear, are you also +1 now? If so, I'd be happy to commit. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, > HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, > HDFS-11163-006.patch, temp-YARN-6278.HDFS-11163.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898092#comment-15898092 ] Chris Nauroth commented on HDFS-11163: -- The {{FsServerDefaults}} class is annotated {{Public}}, so let's maintain the existing constructor signature and add a new constructor that supports passing default storage policy ID. The old constructor can delegate to the new constructor with default storage policy ID of 0. That also would remove the need to change {{FtpConfigKeys}} and {{LocalConfigKeys}} in this patch. The logic is looking good to me, but I'd still like a second opinion review before committing anything. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, > HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, > temp-YARN-6278.HDFS-11163.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11340) DataNode reconfigure for disks doesn't remove the failed volumes
[ https://issues.apache.org/jira/browse/HDFS-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898024#comment-15898024 ] Chris Nauroth commented on HDFS-11340: -- [~manojg], thank you for the patch. This looks good. I have 2 small requests: {code} // removed when the failure was detected by DataNode#checkDiskErorrAsync. {code} Please fix the type of "Error". {code} void addVolumeFailureInfo(VolumeFailureInfo volumeFailureInfo) { if (!volumeFailureInfos.containsKey(volumeFailureInfo .getFailedStorageLocation())) { volumeFailureInfos.put(volumeFailureInfo.getFailedStorageLocation(), volumeFailureInfo); } } {code} Please enter a comment explaining why the {{containsKey}} check is necessary, since this was a point of confusion in earlier code review feedback. That way, other maintainers reading the code won't accidentally remove the {{containsKey}} check thinking that it's unnecessary. > DataNode reconfigure for disks doesn't remove the failed volumes > > > Key: HDFS-11340 > URL: https://issues.apache.org/jira/browse/HDFS-11340 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11340.01.patch, HDFS-11340.02.patch, > HDFS-11340.03.patch, HDFS-11340.04.patch > > > Say a DataNode (uuid:xyz) has disks D1 and D2. When D1 turns bad, JMX query > on FSDatasetState-xyz for "NumFailedVolumes" attr rightly shows the failed > volume count as 1 and the "FailedStorageLocations" attr has the failed > storage location as "D1". > It is possible to add or remove disks to this DataNode by running > {{reconfigure}} command. Let the failed disk D1 be removed from the conf and > the new conf has only one good disk D2. Upon running the reconfigure command > for this DataNode with this new disk conf, the expectation is DataNode would > no more have "NumFailedVolumes" or "FailedStorageLocations". But, even after > removing the failed disk from the conf and a successful reconfigure, DataNode > continues to show the "NumFailedVolumes" as 1 and "FailedStorageLocations" as > "D1" and it never gets reset. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9409) DataNode shutdown does not guarantee full shutdown of all threads due to race condition.
[ https://issues.apache.org/jira/browse/HDFS-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854634#comment-15854634 ] Chris Nauroth commented on HDFS-9409: - Using a hidden configuration flag for this sounds appropriate to me. I agree that there is no need for a strict long wait on all threads in production operations if correctness doesn't depend on it. > DataNode shutdown does not guarantee full shutdown of all threads due to race > condition. > > > Key: HDFS-9409 > URL: https://issues.apache.org/jira/browse/HDFS-9409 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Chris Nauroth > > {{DataNode#shutdown}} is documented to return "only after shutdown is > complete". Even after completion of this method, it's possible that threads > started by the DataNode are still running. Race conditions in the shutdown > sequence may cause it to skip stopping and joining the {{BPServiceActor}} > threads. > This is likely not a big problem in normal operations, because these are > daemon threads that won't block overall process exit. It is more of a > problem for tests, because it makes it impossible to write reliable > assertions that these threads exited cleanly. For large test suites, it can > also cause an accumulation of unneeded threads, which might harm test > performance. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819266#comment-15819266 ] Chris Nauroth commented on HDFS-11163: -- bq. ...HdfsLocatedFileStatus already has the resolved storage policy. Thank you for clarifying. I missed this. In that case, I think it makes sense to put it in {{FsServerDefaults}}. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815949#comment-15815949 ] Chris Nauroth commented on HDFS-11163: -- bq. we can add one API to get default policy from namenode, so we can avoid getStoragePolicy RPC per file. If we went in this direction, then maybe it could fit into {{getServerDefaults}}. However, I think the challenge is that it really needs to be sensitive to path. If the storage policy is unspecified at an inode, then the real effective storage policy might be resolved via inheritance from the inode's ancestry. I can't think of a clever way to completely avoid additional RPCs, though perhaps a new API could help reduce it by memoizing results from the ancestry on the client side. This is tricky. Sorry I missed it in the review of HDFS-9534. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset
[ https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812601#comment-15812601 ] Chris Nauroth commented on HDFS-11163: -- [~surendrasingh], thank you for the patch. This looks correct to me. One thing I'm unsure about is the potential impact on performance of Mover. It will require an additional {{getStoragePolicy}} RPC per file with the default storage policy, whereas previously there was no RPC for those files. Unfortunately, I don't see a way to avoid that, at least not with the current APIs, because that's how we resolve inheritance of storage policies from parent paths. I would prefer to get an opinion from [~szetszwo]. > Mover should move the file blocks to default storage once policy is unset > - > > Key: HDFS-11163 > URL: https://issues.apache.org/jira/browse/HDFS-11163 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch > > > HDFS-9534 added new API in FileSystem to unset the storage policy. Once > policy is unset blocks should move back to the default storage policy. > Currently mover is not moving file blocks which have zero storage ID > {code} > // currently we ignore files with unspecified storage policy > if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-9483: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I have committed this to trunk, branch-2 and branch-2.8. [~surendrasingh], thank you for contributing the patch. [~brahmareddy], thank you for help with the code review. > Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured > WebHDFS. > - > > Key: HDFS-9483 > URL: https://issues.apache.org/jira/browse/HDFS-9483 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Chris Nauroth >Assignee: Surendra Singh Lilhore > Fix For: 2.8.0 > > Attachments: HDFS-9483.001.patch, HDFS-9483.002.patch, HDFS-9483.patch > > > If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in > a URL to access it. The current documentation does not state this anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-9483: Fix Version/s: 3.0.0-alpha2 > Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured > WebHDFS. > - > > Key: HDFS-9483 > URL: https://issues.apache.org/jira/browse/HDFS-9483 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Chris Nauroth >Assignee: Surendra Singh Lilhore > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-9483.001.patch, HDFS-9483.002.patch, HDFS-9483.patch > > > If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in > a URL to access it. The current documentation does not state this anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799645#comment-15799645 ] Chris Nauroth commented on HDFS-9483: - I'm +1 for patch 002. [~brahmareddy], you had suggested perhaps a link to SSL configuration information. Unfortunately, the only place I'm aware of with that information is the Encrypted Shuffle page, and it would be kind of odd to link to that from DistCp. I'll hold off committing in case you have further comments. > Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured > WebHDFS. > - > > Key: HDFS-9483 > URL: https://issues.apache.org/jira/browse/HDFS-9483 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Chris Nauroth >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9483.001.patch, HDFS-9483.002.patch, HDFS-9483.patch > > > If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in > a URL to access it. The current documentation does not state this anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class
[ https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-11273: - Status: Patch Available (was: Open) > Move TransferFsImage#doGetUrl function to a Util class > -- > > Key: HDFS-11273 > URL: https://issues.apache.org/jira/browse/HDFS-11273 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11273.000.patch > > > TransferFsImage#doGetUrl function is required for JournalNode syncing as > well. We can move the code to a Utility class to avoid duplication of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11208) Deadlock in WebHDFS on shutdown
[ https://issues.apache.org/jira/browse/HDFS-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788385#comment-15788385 ] Chris Nauroth commented on HDFS-11208: -- This is somewhat similar to a deadlock we encountered on {{S3AFileSystem#close}}, reported in HADOOP-13599. In that case, we were able to resolve the problem by removing {{synchronized}} from {{S3AFileSystem#close}}. This one is going to be trickier. > Deadlock in WebHDFS on shutdown > --- > > Key: HDFS-11208 > URL: https://issues.apache.org/jira/browse/HDFS-11208 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1 >Reporter: Erik Krogen >Assignee: Erik Krogen > Attachments: HDFS-11208-test-deadlock.patch > > > Currently on the client side if the {{DelegationTokenRenewer}} attempts to > renew a WebHdfs delegation token while the client system is shutting down > (i.e. {{FileSystem.Cache.ClientFinalizer}} is running) a deadlock may occur. > This happens because {{ClientFinalizer}} calls > {{FileSystem.Cache.closeAll()}} which first takes a lock on the > {{FileSystem.Cache}} object and then locks each file system in the cache as > it iterates over them. {{DelegationTokenRenewer}} takes a lock on a > filesystem object while it is renewing that filesystem's token, but within > {{TokenAspect.TokenManager.renew()}} (used for renewal of WebHdfs tokens) > {{FileSystem.get}} is called, which in turn takes a lock on the FileSystem > cache object, potentially causing deadlock if {{ClientFinalizer}} is > currently running. > See below for example deadlock output: > {code} > Found one Java-level deadlock: > = > "Thread-8572": > waiting to lock monitor 0x7eff401f9878 (object 0x00051ec3f930, a > dali.hdfs.web.WebHdfsFileSystem), > which is held by "FileSystem-DelegationTokenRenewer" > "FileSystem-DelegationTokenRenewer": > waiting to lock monitor 0x7f005c08f5c8 (object 0x00050389c8b8, a > dali.fs.FileSystem$Cache), > which is held by "Thread-8572" > Java stack information for the threads listed above: > === > "Thread-8572": > at dali.hdfs.web.WebHdfsFileSystem.close(WebHdfsFileSystem.java:864) >- waiting to lock <0x00051ec3f930> (a >dali.hdfs.web.WebHdfsFileSystem) >at dali.fs.FilterFileSystem.close(FilterFileSystem.java:449) >at dali.fs.FileSystem$Cache.closeAll(FileSystem.java:2407) >- locked <0x00050389c8b8> (a dali.fs.FileSystem$Cache) >at dali.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424) >- locked <0x00050389c8d0> (a >dali.fs.FileSystem$Cache$ClientFinalizer) >at dali.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) >"FileSystem-DelegationTokenRenewer": >at dali.fs.FileSystem$Cache.getInternal(FileSystem.java:2343) >- waiting to lock <0x00050389c8b8> (a dali.fs.FileSystem$Cache) >at dali.fs.FileSystem$Cache.get(FileSystem.java:2332) >at dali.fs.FileSystem.get(FileSystem.java:369) >at >dali.hdfs.web.TokenAspect$TokenManager.getInstance(TokenAspect.java:92) >at dali.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:72) >at dali.security.token.Token.renew(Token.java:373) >at > > dali.fs.DelegationTokenRenewer$RenewAction.renew(DelegationTokenRenewer.java:127) >- locked <0x00051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem) >at > > dali.fs.DelegationTokenRenewer$RenewAction.access$300(DelegationTokenRenewer.java:57) >at dali.fs.DelegationTokenRenewer.run(DelegationTokenRenewer.java:258) > Found 1 deadlock. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11076) Add unit test for extended Acls
[ https://issues.apache.org/jira/browse/HDFS-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788374#comment-15788374 ] Chris Nauroth commented on HDFS-11076: -- I haven't reviewed this patch closely, but the scenarios listed in the description sound like things that would have been covered already in hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java. Did anyone review that suite to make sure the newly introduced test cases are not redundant? > Add unit test for extended Acls > --- > > Key: HDFS-11076 > URL: https://issues.apache.org/jira/browse/HDFS-11076 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Chen Liang >Assignee: Chen Liang > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-11076.001.patch, HDFS-11076.002.patch, > HDFS-11076.003.patch > > > This JIRA tries to add unit tests for extended ACLs in HDFS, to cover the > following scenarios: > # the default ACL of parent directory should be inherited by newly created > child directory and file > # the access ACL of parent directory should not be inherited by newly created > child directory and file > # changing the default ACL of parent directory should not change the ACL of > existing child directory and file > # child directory can add more default ACL in addition to the ACL inherited > from parent directory > # child directory can also restrict ACL based on the ACL inherited from > parent directory -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9911) TestDataNodeLifeline Fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755252#comment-15755252 ] Chris Nauroth commented on HDFS-9911: - [~linyiqun], thank you for the patch! > TestDataNodeLifeline Fails intermittently > -- > > Key: HDFS-9911 > URL: https://issues.apache.org/jira/browse/HDFS-9911 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.0 >Reporter: Anu Engineer >Assignee: Yiqun Lin > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-9911.001.patch, HDFS-9911.002.patch > > > In HDFS-1312 branch, we have a failure for this test. > {{org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline.testNoLifelineSentIfHeartbeatsOnTime}} > {noformat} > Error Message > Expect metrics to count no lifeline calls. expected:<0> but was:<1> > Stacktrace > java.lang.AssertionError: Expect metrics to count no lifeline calls. > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline.testNoLifelineSentIfHeartbeatsOnTime(TestDataNodeLifeline.java:256) > {noformat} > Details can be found here. > https://builds.apache.org/job/PreCommit-HDFS-Build/14726/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeLifeline/testNoLifelineSentIfHeartbeatsOnTime/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9911) TestDataNodeLifeline Fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-9911: Assignee: Yiqun Lin (was: Chris Nauroth) [~linyiqun], thank you for the analysis and volunteering to take over the patch. I am reassigning this to you. I suggest changing this to: {code} scheduleNextLifeline(nextHeartbeatTime); {code} This would make it consistent with other points in the code where the lifeline time is scheduled relative to the heartbeat time. It could help avoid confusion if 2 separate calls to {{monotonicNow()}} return 2 different timestamps (one for initialization of {{nextHeartbeatTime}} and the other for {{scheduleNextLifeline}}). > TestDataNodeLifeline Fails intermittently > -- > > Key: HDFS-9911 > URL: https://issues.apache.org/jira/browse/HDFS-9911 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.0 >Reporter: Anu Engineer >Assignee: Yiqun Lin > Fix For: 2.8.0 > > Attachments: HDFS-9911.001.patch > > > In HDFS-1312 branch, we have a failure for this test. > {{org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline.testNoLifelineSentIfHeartbeatsOnTime}} > {noformat} > Error Message > Expect metrics to count no lifeline calls. expected:<0> but was:<1> > Stacktrace > java.lang.AssertionError: Expect metrics to count no lifeline calls. > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline.testNoLifelineSentIfHeartbeatsOnTime(TestDataNodeLifeline.java:256) > {noformat} > Details can be found here. > https://builds.apache.org/job/PreCommit-HDFS-Build/14726/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeLifeline/testNoLifelineSentIfHeartbeatsOnTime/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11063) Set NameNode RPC server handler thread name with more descriptive information about the RPC call.
[ https://issues.apache.org/jira/browse/HDFS-11063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612496#comment-15612496 ] Chris Nauroth commented on HDFS-11063: -- [~kihwal], thank you. That's a good observation. We'll have to be mindful of this while implementing the change. > Set NameNode RPC server handler thread name with more descriptive information > about the RPC call. > - > > Key: HDFS-11063 > URL: https://issues.apache.org/jira/browse/HDFS-11063 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chris Nauroth > > We often run {{jstack}} on a NameNode process as a troubleshooting step if it > is suffering high load or appears to be hanging. By reading the stack trace, > we can identify if a caller is blocked inside an expensive operation. This > would be even more helpful if we updated the RPC server handler thread name > with more descriptive information about the RPC call. This could include the > calling user, the called RPC method, and the most significant argument to > that method (most likely the path). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11034) Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting.
[ https://issues.apache.org/jira/browse/HDFS-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610068#comment-15610068 ] Chris Nauroth commented on HDFS-11034: -- Hello [~GergelyNovak]. If the decommissioned host is removed from the {{dfs.hosts.exclude}} file, followed by running {{hdfs dfsadmin -refreshNodes}}, then the host is no longer considered to be excluded. If the DataNode process is still running, or if it's restarted accidentally, then that DataNode will re-register with the NameNode, come back into service and become a candidate for writing new blocks. I was imagining a new workflow, where the host remains decommissioned, but the administrator has a way to clear out the in-memory tracked state about that node. It's interesting that you brought up the exclude file. Since that's already the existing mechanism for inclusion/exclusion of hosts, I wonder if there is a way to enhance it to cover this use case, so that administrators wouldn't need to learn a new command. I'll think about it more (and comments are welcome from others who have ideas too). > Provide a command line tool to clear decommissioned DataNode information from > the NameNode without restarting. > -- > > Key: HDFS-11034 > URL: https://issues.apache.org/jira/browse/HDFS-11034 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chris Nauroth >Assignee: Gergely Novák > > Information about decommissioned DataNodes remains tracked in the NameNode > for the entire NameNode process lifetime. Currently, the only way to clear > this information is to restart the NameNode. This issue proposes to add a > way to clear this information online, without requiring a process restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11063) Set NameNode RPC server handler thread name with more descriptive information about the RPC call.
[ https://issues.apache.org/jira/browse/HDFS-11063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609190#comment-15609190 ] Chris Nauroth commented on HDFS-11063: -- Here is an example thread from {{jstack}}: {code} "IPC Server handler 6 on 19000" #49 daemon prio=5 os_prio=31 tid=0x7ff6b3c84800 nid=0x9e03 waiting on condition [0x73762000] java.lang.Thread.State: TIMED_WAITING (sleeping) at org.apache.hadoop.hdfs.server.namenode.INode.computeAndConvertContentSummary(INode.java:431) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummaryInt(FSDirStatAndListingOp.java:515) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getContentSummary(FSDirStatAndListingOp.java:134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:2941) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1311) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:920) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:467) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:990) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1795) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2535) {code} I'm thinking we could set the thread name so that it looks more like this: {code} "IPC Server handler 6 on 19000 getContentSummary(user=chris, startTime=12345, path=/)" #49 daemon prio=5 os_prio=31 tid=0x7ff6b3c84800 nid=0x9e03 waiting on condition [0x73762000] ... {code} This would clearly show that user "chris" was very naughty and called an expensive {{getContentSummary}} on the root. We could also determine how long the operation has been running based on the start time. This additional contextual information would have to be cleared out of the thread name after completion of the RPC method, so that when the thread is returned to the pool for handling later calls, it doesn't hold on to the information about the old call. Bonus points if we can find a way to do this generically in Hadoop Common in a way that gives meaningful thread names for all RPC servers, without code changes in the individual RPC servers. I have a feeling that the desire to include protocol-specific information (like the path argument) makes that impossible though, so I have filed this as an HDFS JIRA. > Set NameNode RPC server handler thread name with more descriptive information > about the RPC call. > - > > Key: HDFS-11063 > URL: https://issues.apache.org/jira/browse/HDFS-11063 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chris Nauroth > > We often run {{jstack}} on a NameNode process as a troubleshooting step if it > is suffering high load or appears to be hanging. By reading the stack trace, > we can identify if a caller is blocked inside an expensive operation. This > would be even more helpful if we updated the RPC server handler thread name > with more descriptive information about the RPC call. This could include the > calling user, the called RPC method, and the most significant argument to > that method (most likely the path). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11063) Set NameNode RPC server handler thread name with more descriptive information about the RPC call.
Chris Nauroth created HDFS-11063: Summary: Set NameNode RPC server handler thread name with more descriptive information about the RPC call. Key: HDFS-11063 URL: https://issues.apache.org/jira/browse/HDFS-11063 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chris Nauroth We often run {{jstack}} on a NameNode process as a troubleshooting step if it is suffering high load or appears to be hanging. By reading the stack trace, we can identify if a caller is blocked inside an expensive operation. This would be even more helpful if we updated the RPC server handler thread name with more descriptive information about the RPC call. This could include the calling user, the called RPC method, and the most significant argument to that method (most likely the path). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11034) Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting.
[ https://issues.apache.org/jira/browse/HDFS-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-11034: - Assignee: Gergely Novák > Provide a command line tool to clear decommissioned DataNode information from > the NameNode without restarting. > -- > > Key: HDFS-11034 > URL: https://issues.apache.org/jira/browse/HDFS-11034 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chris Nauroth >Assignee: Gergely Novák > > Information about decommissioned DataNodes remains tracked in the NameNode > for the entire NameNode process lifetime. Currently, the only way to clear > this information is to restart the NameNode. This issue proposes to add a > way to clear this information online, without requiring a process restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11034) Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting.
[ https://issues.apache.org/jira/browse/HDFS-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589341#comment-15589341 ] Chris Nauroth commented on HDFS-11034: -- We can add a new dfsadmin command to clear this state. It's important to note that for some operations workflows, it's valuable to retain the decommissioned node information. If the operator is working on a series of decommission/recommission steps, then this information is valuable to see which nodes are still remaining in decommissioned state. That likely means that the command line needs to accept an argument for a specific host instead of just blindly clearing all decommissioned node information. Remember to clear from both NameNodes in an HA pair. > Provide a command line tool to clear decommissioned DataNode information from > the NameNode without restarting. > -- > > Key: HDFS-11034 > URL: https://issues.apache.org/jira/browse/HDFS-11034 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chris Nauroth > > Information about decommissioned DataNodes remains tracked in the NameNode > for the entire NameNode process lifetime. Currently, the only way to clear > this information is to restart the NameNode. This issue proposes to add a > way to clear this information online, without requiring a process restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11034) Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting.
Chris Nauroth created HDFS-11034: Summary: Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting. Key: HDFS-11034 URL: https://issues.apache.org/jira/browse/HDFS-11034 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chris Nauroth Information about decommissioned DataNodes remains tracked in the NameNode for the entire NameNode process lifetime. Currently, the only way to clear this information is to restart the NameNode. This issue proposes to add a way to clear this information online, without requiring a process restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10950) Add unit tests to verify ACLs in safemode
[ https://issues.apache.org/jira/browse/HDFS-10950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543723#comment-15543723 ] Chris Nauroth commented on HDFS-10950: -- Hello [~xiaobingo]. Do you think this is already covered sufficiently in {{TestSafeMode#testOperationsWhileInSafeMode}}? {code} runFsFun("modifyAclEntries while in SM", new FSRun() { @Override public void run(FileSystem fs) throws IOException { fs.modifyAclEntries(file1, Lists.newArrayList()); }}); runFsFun("removeAclEntries while in SM", new FSRun() { @Override public void run(FileSystem fs) throws IOException { fs.removeAclEntries(file1, Lists.newArrayList()); }}); runFsFun("removeDefaultAcl while in SM", new FSRun() { @Override public void run(FileSystem fs) throws IOException { fs.removeDefaultAcl(file1); }}); runFsFun("removeAcl while in SM", new FSRun() { @Override public void run(FileSystem fs) throws IOException { fs.removeAcl(file1); }}); runFsFun("setAcl while in SM", new FSRun() { @Override public void run(FileSystem fs) throws IOException { fs.setAcl(file1, Lists.newArrayList()); }}); ... try { fs.getAclStatus(file1); } catch (IOException ioe) { fail("getAclStatus failed while in SM"); } {code} > Add unit tests to verify ACLs in safemode > - > > Key: HDFS-10950 > URL: https://issues.apache.org/jira/browse/HDFS-10950 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs, test >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > > This proposes adding unit tests to validate that getting Acls works when > namende is in safemode, while setting Acls fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-6277) WebHdfsFileSystem#toUrl does not perform character escaping for rename
[ https://issues.apache.org/jira/browse/HDFS-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-6277. - Resolution: Won't Fix This bug is present in the 1.x line, but not 2.x or 3.x. I'm resolving this as Won't Fix, because 1.x is no longer under active maintenance. > WebHdfsFileSystem#toUrl does not perform character escaping for rename > --- > > Key: HDFS-6277 > URL: https://issues.apache.org/jira/browse/HDFS-6277 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Ramya Sunil >Assignee: Chris Nauroth > > Found this issue while testing HDFS-6141. WebHdfsFileSystem#toUrl does not > perform character escaping for rename and causes the operation to fail. > This bug does not exist on 2.x > For e.g: > $ hadoop dfs -rmr 'webhdfs://:/tmp/test dirname with spaces' > Problem with Trash.Unexpected HTTP response: code=400 != 200, op=RENAME, > message=Bad Request. Consider using -skipTrash option > rmr: Failed to move to trash: webhdfs://:/tmp/test dirname > with spaces -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6255) fuse_dfs will not adhere to ACL permissions in some cases
[ https://issues.apache.org/jira/browse/HDFS-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6255: Assignee: (was: Chris Nauroth) I'm not actively working on this, so I'm unassigning. > fuse_dfs will not adhere to ACL permissions in some cases > - > > Key: HDFS-6255 > URL: https://issues.apache.org/jira/browse/HDFS-6255 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fuse-dfs >Affects Versions: 2.4.0, 3.0.0-alpha1 >Reporter: Stephen Chu > > As hdfs user, I created a directory /tmp/acl_dir/ and set permissions to 700. > Then I set a new acl group:jenkins:rwx on /tmp/acl_dir. > {code} > jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -getfacl /tmp/acl_dir > # file: /tmp/acl_dir > # owner: hdfs > # group: supergroup > user::rwx > group::--- > group:jenkins:rwx > mask::rwx > other::--- > {code} > Through the FsShell, the jenkins user can list /tmp/acl_dir as well as create > a file and directory inside. > {code} > [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -touchz /tmp/acl_dir/testfile1 > [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -mkdir /tmp/acl_dir/testdir1 > hdfs dfs -ls /tmp/acl[jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -ls /tmp/acl_dir/ > Found 2 items > drwxr-xr-x - jenkins supergroup 0 2014-04-17 19:11 > /tmp/acl_dir/testdir1 > -rw-r--r-- 1 jenkins supergroup 0 2014-04-17 19:11 > /tmp/acl_dir/testfile1 > [jenkins@hdfs-vanilla-1 ~]$ > {code} > However, as the same jenkins user, when I try to cd into /tmp/acl_dir using a > fuse_dfs mount, I get permission denied. Same permission denied when I try to > create or list files. > {code} > [jenkins@hdfs-vanilla-1 tmp]$ ls -l > total 16 > drwxrwx--- 4 hdfsnobody 4096 Apr 17 19:11 acl_dir > drwx-- 2 hdfsnobody 4096 Apr 17 18:30 acl_dir_2 > drwxr-xr-x 3 mapred nobody 4096 Mar 11 03:53 mapred > drwxr-xr-x 4 jenkins nobody 4096 Apr 17 07:25 testcli > -rwx-- 1 hdfsnobody0 Apr 7 17:18 tf1 > [jenkins@hdfs-vanilla-1 tmp]$ cd acl_dir > bash: cd: acl_dir: Permission denied > [jenkins@hdfs-vanilla-1 tmp]$ touch acl_dir/testfile2 > touch: cannot touch `acl_dir/testfile2': Permission denied > [jenkins@hdfs-vanilla-1 tmp]$ mkdir acl_dir/testdir2 > mkdir: cannot create directory `acl_dir/testdir2': Permission denied > [jenkins@hdfs-vanilla-1 tmp]$ > {code} > The fuse_dfs debug output doesn't show any error for the above operations: > {code} > unique: 18, opcode: OPENDIR (27), nodeid: 2, insize: 48 >unique: 18, success, outsize: 32 > unique: 19, opcode: READDIR (28), nodeid: 2, insize: 80 > readdir[0] from 0 >unique: 19, success, outsize: 312 > unique: 20, opcode: GETATTR (3), nodeid: 2, insize: 56 > getattr /tmp >unique: 20, success, outsize: 120 > unique: 21, opcode: READDIR (28), nodeid: 2, insize: 80 >unique: 21, success, outsize: 16 > unique: 22, opcode: RELEASEDIR (29), nodeid: 2, insize: 64 >unique: 22, success, outsize: 16 > unique: 23, opcode: GETATTR (3), nodeid: 2, insize: 56 > getattr /tmp >unique: 23, success, outsize: 120 > unique: 24, opcode: GETATTR (3), nodeid: 3, insize: 56 > getattr /tmp/acl_dir >unique: 24, success, outsize: 120 > unique: 25, opcode: GETATTR (3), nodeid: 3, insize: 56 > getattr /tmp/acl_dir >unique: 25, success, outsize: 120 > unique: 26, opcode: GETATTR (3), nodeid: 3, insize: 56 > getattr /tmp/acl_dir >unique: 26, success, outsize: 120 > unique: 27, opcode: GETATTR (3), nodeid: 3, insize: 56 > getattr /tmp/acl_dir >unique: 27, success, outsize: 120 > unique: 28, opcode: GETATTR (3), nodeid: 3, insize: 56 > getattr /tmp/acl_dir >unique: 28, success, outsize: 120 > {code} > In other scenarios, ACL permissions are enforced successfully. For example, > as hdfs user I create /tmp/acl_dir_2 and set permissions to 777. I then set > the acl user:jenkins:--- on the directory. On the fuse mount, I am not able > to ls, mkdir, or touch to that directory as jenkins user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10824) MiniDFSCluster#storageCapacities has no effects on real capacity
[ https://issues.apache.org/jira/browse/HDFS-10824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10824: - Hadoop Flags: Reviewed +1 for patch 006. Thank you, Xiaobing. [~anu] or [~arpitagarwal], do you have any further comments? > MiniDFSCluster#storageCapacities has no effects on real capacity > > > Key: HDFS-10824 > URL: https://issues.apache.org/jira/browse/HDFS-10824 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-10824.000.patch, HDFS-10824.001.patch, > HDFS-10824.002.patch, HDFS-10824.003.patch, HDFS-10824.004.patch, > HDFS-10824.005.patch, HDFS-10824.006.patch > > > It has been noticed MiniDFSCluster#storageCapacities has no effects on real > capacity. It can be reproduced by explicitly setting storageCapacities and > then call ClientProtocol#getDatanodeStorageReport(DatanodeReportType.LIVE) to > compare results. The following are storage report for one node with two > volumes after I set capacity as 300 * 1024. Apparently, the capacity is not > changed. > adminState|DatanodeInfo$AdminStates (id=6861) > |blockPoolUsed|215192| > |cacheCapacity|0| > |cacheUsed|0| > |capacity|998164971520| > |datanodeUuid|"839912e9-5bcb-45d1-81cf-9a9c9c02a00b" (id=6862)| > |dependentHostNames|LinkedList (id=6863)| > |dfsUsed|215192| > |hostName|"127.0.0.1" (id=6864)| > |infoPort|64222| > |infoSecurePort|0| > |ipAddr|"127.0.0.1" (id=6865)| > |ipcPort|64223| > |lastUpdate|1472682790948| > |lastUpdateMonotonic|209605640| > |level|0| > |location|"/default-rack" (id=6866)| > |maintenanceExpireTimeInMS|0| > |parent|null| > |peerHostName|null| > |remaining|20486512640| > |softwareVersion|null| > |upgradeDomain|null| > |xceiverCount|1| > |xferAddr|"127.0.0.1:64220" (id=6855)| > |xferPort|64220| > [0]StorageReport (id=6856) > |blockPoolUsed|4096| > |capacity|499082485760| > |dfsUsed|4096| > |failed|false| > |remaining|10243256320| > |storage|DatanodeStorage (id=6869)| > [1]StorageReport (id=6859) > |blockPoolUsed|211096| > |capacity|499082485760| > |dfsUsed|211096| > |failed|false| > |remaining|10243256320| > |storage|DatanodeStorage (id=6872)| -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10824) MiniDFSCluster#storageCapacities has no effects on real capacity
[ https://issues.apache.org/jira/browse/HDFS-10824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526833#comment-15526833 ] Chris Nauroth commented on HDFS-10824: -- [~xiaobingo], thank you for the update. I thought of one more thing. I think it's still necessary to send a heartbeat, so that the NameNode updates its view of the capacity numbers before any test assertions about capacity seen at the NameNode. I think it would be safe to add back a call to {{DataNodeTestUtils#triggerHeartbeat}} right after the call to {{FsVolumeImpl#setCapacityForTesting}}, because you are now waiting for the DataNode to be fully initialized before that logic runs. > MiniDFSCluster#storageCapacities has no effects on real capacity > > > Key: HDFS-10824 > URL: https://issues.apache.org/jira/browse/HDFS-10824 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-10824.000.patch, HDFS-10824.001.patch, > HDFS-10824.002.patch, HDFS-10824.003.patch, HDFS-10824.004.patch, > HDFS-10824.005.patch > > > It has been noticed MiniDFSCluster#storageCapacities has no effects on real > capacity. It can be reproduced by explicitly setting storageCapacities and > then call ClientProtocol#getDatanodeStorageReport(DatanodeReportType.LIVE) to > compare results. The following are storage report for one node with two > volumes after I set capacity as 300 * 1024. Apparently, the capacity is not > changed. > adminState|DatanodeInfo$AdminStates (id=6861) > |blockPoolUsed|215192| > |cacheCapacity|0| > |cacheUsed|0| > |capacity|998164971520| > |datanodeUuid|"839912e9-5bcb-45d1-81cf-9a9c9c02a00b" (id=6862)| > |dependentHostNames|LinkedList (id=6863)| > |dfsUsed|215192| > |hostName|"127.0.0.1" (id=6864)| > |infoPort|64222| > |infoSecurePort|0| > |ipAddr|"127.0.0.1" (id=6865)| > |ipcPort|64223| > |lastUpdate|1472682790948| > |lastUpdateMonotonic|209605640| > |level|0| > |location|"/default-rack" (id=6866)| > |maintenanceExpireTimeInMS|0| > |parent|null| > |peerHostName|null| > |remaining|20486512640| > |softwareVersion|null| > |upgradeDomain|null| > |xceiverCount|1| > |xferAddr|"127.0.0.1:64220" (id=6855)| > |xferPort|64220| > [0]StorageReport (id=6856) > |blockPoolUsed|4096| > |capacity|499082485760| > |dfsUsed|4096| > |failed|false| > |remaining|10243256320| > |storage|DatanodeStorage (id=6869)| > [1]StorageReport (id=6859) > |blockPoolUsed|211096| > |capacity|499082485760| > |dfsUsed|211096| > |failed|false| > |remaining|10243256320| > |storage|DatanodeStorage (id=6872)| -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10824) MiniDFSCluster#storageCapacities has no effects on real capacity
[ https://issues.apache.org/jira/browse/HDFS-10824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524258#comment-15524258 ] Chris Nauroth commented on HDFS-10824: -- [~xiaobingo], thank you for sharing patch 004. Adding the {{waitDataNodeFullyStarted}} call for all DataNode restarts might slow down some tests that don't really need the wait. Do you think that call can be moved inside {{setDataNodeStorageCapacities}}? That way, you could do the wait only if {{storageCapacities}} is non-null and non-empty. > MiniDFSCluster#storageCapacities has no effects on real capacity > > > Key: HDFS-10824 > URL: https://issues.apache.org/jira/browse/HDFS-10824 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-10824.000.patch, HDFS-10824.001.patch, > HDFS-10824.002.patch, HDFS-10824.003.patch, HDFS-10824.004.patch > > > It has been noticed MiniDFSCluster#storageCapacities has no effects on real > capacity. It can be reproduced by explicitly setting storageCapacities and > then call ClientProtocol#getDatanodeStorageReport(DatanodeReportType.LIVE) to > compare results. The following are storage report for one node with two > volumes after I set capacity as 300 * 1024. Apparently, the capacity is not > changed. > adminState|DatanodeInfo$AdminStates (id=6861) > |blockPoolUsed|215192| > |cacheCapacity|0| > |cacheUsed|0| > |capacity|998164971520| > |datanodeUuid|"839912e9-5bcb-45d1-81cf-9a9c9c02a00b" (id=6862)| > |dependentHostNames|LinkedList (id=6863)| > |dfsUsed|215192| > |hostName|"127.0.0.1" (id=6864)| > |infoPort|64222| > |infoSecurePort|0| > |ipAddr|"127.0.0.1" (id=6865)| > |ipcPort|64223| > |lastUpdate|1472682790948| > |lastUpdateMonotonic|209605640| > |level|0| > |location|"/default-rack" (id=6866)| > |maintenanceExpireTimeInMS|0| > |parent|null| > |peerHostName|null| > |remaining|20486512640| > |softwareVersion|null| > |upgradeDomain|null| > |xceiverCount|1| > |xferAddr|"127.0.0.1:64220" (id=6855)| > |xferPort|64220| > [0]StorageReport (id=6856) > |blockPoolUsed|4096| > |capacity|499082485760| > |dfsUsed|4096| > |failed|false| > |remaining|10243256320| > |storage|DatanodeStorage (id=6869)| > [1]StorageReport (id=6859) > |blockPoolUsed|211096| > |capacity|499082485760| > |dfsUsed|211096| > |failed|false| > |remaining|10243256320| > |storage|DatanodeStorage (id=6872)| -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10824) MiniDFSCluster#storageCapacities has no effects on real capacity
[ https://issues.apache.org/jira/browse/HDFS-10824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497051#comment-15497051 ] Chris Nauroth commented on HDFS-10824: -- [~xiaobingo], thank you for the patch. It appears that at least one of the test failures, {{TestFsDatasetImpl}}, was caused by patch revision 003. That test passes for me on current trunk, and then it times out after I apply patch 003. I didn't fully investigate root cause. However, I did run jstack on the JUnit process to see what was happening. I've pasted the relevant stack trace for the main thread below. After restarting the mini-cluster, the thread is blocked while trying to trigger a heartbeat. Perhaps something in the patch has impacted reinitialization after DataNode restart, such as delivery of the initial block report. I hope this helps with investigation. {code} "main" #1 prio=5 os_prio=31 tid=0x7fee83801800 nid=0x1703 in Object.wait() [0x70218000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.triggerHeartbeatForTests(BPServiceActor.java:310) - locked <0x00079ae302c0> (a org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.triggerHeartbeatForTests(BPOfferService.java:592) at org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.triggerHeartbeat(DataNodeTestUtils.java:72) at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2289) - locked <0x0007400415e0> (a org.apache.hadoop.hdfs.MiniDFSCluster) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testAddVolumeWithSameStorageUuid(TestFsDatasetImpl.java:242) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} > MiniDFSCluster#storageCapacities has no effects on real capacity > > > Key: HDFS-10824 > URL: https://issues.apache.org/jira/browse/HDFS-10824 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-10824.000.patch, HDFS-10824.001.patch, > HDFS-10824.002.patch, HDFS-10824.003.patch > > > It has been noticed MiniDFSCluster#storageCapacities has no effects on real > capacity. It can be reproduced by explicitly setting storageCapacities and > then call ClientProtocol#getDatanodeStorageReport(DatanodeReportType.LIVE) to > compare results. The following are storage report for one node with two > volumes after I set capacity as 300 * 1024. Apparently, the capacity is not > changed. > adminState|DatanodeInfo$AdminStates (id=6861) > |blockPoolUsed|215192| >
[jira] [Commented] (HDFS-6962) ACL inheritance conflicts with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468292#comment-15468292 ] Chris Nauroth commented on HDFS-6962: - Oh no! I botched the commit message on this, so it says "Contributed by Chris Nauroth" instead of "Contributed by John Zhuge". This can't really be fixed, because it would require a force push and cause grief for anyone who has sync'd the repo since the commit. [~jzhuge], I'm really sorry about that. The JIRA issue remains assigned to you for proper attribution though. > ACL inheritance conflicts with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.007.patch, HDFS-6962.008.patch, > HDFS-6962.009.patch, HDFS-6962.010.patch, HDFS-6962.1.patch, > disabled_new_client.log, disabled_old_client.log, enabled_new_client.log, > enabled_old_client.log, run_compat_tests, run_unit_tests, test_plan.md > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6962) ACL inheritance conflicts with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6962: Fix Version/s: 3.0.0-alpha2 > ACL inheritance conflicts with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.007.patch, HDFS-6962.008.patch, > HDFS-6962.009.patch, HDFS-6962.010.patch, HDFS-6962.1.patch, > disabled_new_client.log, disabled_old_client.log, enabled_new_client.log, > enabled_old_client.log, run_compat_tests, run_unit_tests, test_plan.md > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6962) ACL inheritance conflicts with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6962: +1 for patch revision 010. I have committed this to trunk. [~jzhuge], thank you for your hard work on this patch. [~eddyxu], thank you for reviewing. > ACL inheritance conflicts with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.007.patch, HDFS-6962.008.patch, > HDFS-6962.009.patch, HDFS-6962.010.patch, HDFS-6962.1.patch, > disabled_new_client.log, disabled_old_client.log, enabled_new_client.log, > enabled_old_client.log, run_compat_tests, run_unit_tests, test_plan.md > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6962) ACL inheritance conflicts with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6962: Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) > ACL inheritance conflicts with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.007.patch, HDFS-6962.008.patch, > HDFS-6962.009.patch, HDFS-6962.010.patch, HDFS-6962.1.patch, > disabled_new_client.log, disabled_old_client.log, enabled_new_client.log, > enabled_old_client.log, run_compat_tests, run_unit_tests, test_plan.md > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6962) ACL inheritance conflicts with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6962: Release Note: The original implementation of HDFS ACLs applied the client's umask to the permissions when inheriting a default ACL defined on a parent directory. This behavior is a deviation from the POSIX ACL specification, which states that the umask has no influence when a default ACL propagates from parent to child. HDFS now offers the capability to ignore the umask in this case for improved compliance with POSIX. This change is considered backward-incompatible, so the new behavior is off by default and must be explicitly configured by setting dfs.namenode.posix.acl.inheritance.enabled to true in hdfs-site.xml. Please see the HDFS Permissions Guide for further details. > ACL inheritance conflicts with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.007.patch, HDFS-6962.008.patch, > HDFS-6962.009.patch, HDFS-6962.010.patch, HDFS-6962.1.patch, > disabled_new_client.log, disabled_old_client.log, enabled_new_client.log, > enabled_old_client.log, run_compat_tests, run_unit_tests, test_plan.md > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9038) DFS reserved space is erroneously counted towards non-DFS used.
[ https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15461883#comment-15461883 ] Chris Nauroth commented on HDFS-9038: - [~brahmareddy], [~arpitagarwal] and [~vinayrpet], thank you for your dedication working through this issue. I am +1 to proceed with the patch after addressing Arpit's last round of feedback. > DFS reserved space is erroneously counted towards non-DFS used. > --- > > Key: HDFS-9038 > URL: https://issues.apache.org/jira/browse/HDFS-9038 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Brahma Reddy Battula > Attachments: GetFree.java, HDFS-9038-002.patch, HDFS-9038-003.patch, > HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038-006.patch, > HDFS-9038-007.patch, HDFS-9038-008.patch, HDFS-9038-009.patch, > HDFS-9038-010.patch, HDFS-9038.patch > > > HDFS-5215 changed the DataNode volume available space calculation to consider > the reserved space held by the {{dfs.datanode.du.reserved}} configuration > property. As a side effect, reserved space is now counted towards non-DFS > used. I don't believe it was intentional to change the definition of non-DFS > used. This issue proposes restoring the prior behavior: do not count > reserved space towards non-DFS used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15461875#comment-15461875 ] Chris Nauroth commented on HDFS-6962: - [~jzhuge], thank you so much for your diligence with this patch. Revision 009 looks good to me, but the patch needs to be rebased for the current trunk. Would you please provide an updated patch? I'll get on the final review quickly so that we don't run the risk of the patch going stale again. Also, there were Checkstyle warnings in the last pre-commit run. The report is gone now, so I can't check to see if they were worthwhile to fix. If so, please include those fixes in the next revision too. > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.007.patch, HDFS-6962.008.patch, > HDFS-6962.009.patch, HDFS-6962.1.patch, disabled_new_client.log, > disabled_old_client.log, enabled_new_client.log, enabled_old_client.log, > run_compat_tests, run_unit_tests, test_plan.md > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10689) "hdfs dfs -chmod 777" does not remove sticky bit
[ https://issues.apache.org/jira/browse/HDFS-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396500#comment-15396500 ] Chris Nauroth commented on HDFS-10689: -- bq. Now the question is if we declare this a bug fix that can be backported to branch-2, or if this behavior change is too incompatible. Given that sticky bits are pretty rare in general, I think it's safe for branch-2, but would welcome other's thoughts. Anything to add Chris Nauroth? [~andrew.wang], thanks for the notification. I agree with the proposed change, but the compatibility aspects of changes like this are always tricky to consider. In this case, the change is something that potentially weakens authorization. If a user has some automation that runs chmod on a directory, and that user expects the current behavior that sticky bit is preserved, then the effect would be to start allowing users to delete files owned by someone else. Admittedly, sticky bit usage is rare, typically only on /tmp, but I'd still be more comfortable with this as a 3.x change flagged backward-incompatible. > "hdfs dfs -chmod 777" does not remove sticky bit > > > Key: HDFS-10689 > URL: https://issues.apache.org/jira/browse/HDFS-10689 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Minor > Attachments: HDFS-10689.001.patch > > > When a directory permission is modified using hdfs dfs chmod command and when > octal/numeric format is used, the leading sticky bit is not fully honored. > 1. Create a dir dir_test_with_sticky_bit > 2. Apply sticky bit permission on the dir : hdfs dfs -chmod 1755 > /dir_test_with_sticky_bit > 3. Remove sticky bit permission on the dir: hdfs dfs -chmod 755 > /dir_test_with_sticky_bit > Expected: Remove the sticky bit on the dir, as it happens on Mac/Linux native > filesystem with native chmod. > 4. However, removing sticky bit permission by explicitly turning off the bit > works. hdfs dfs -chmod 0755 /dir_test_with_sticky_bit > {noformat} > manoj@~/work/hadev-pp: hdfs dfs -chmod 1755 /dir_test_with_sticky_bit > manoj@~/work/hadev-pp: hdfs dfs -ls / > Found 2 items > drwxr-xr-t - manoj supergroup 0 2016-07-25 11:42 > /dir_test_with_sticky_bit > drwxr-xr-x - manoj supergroup 0 2016-07-25 11:42 /user > manoj@~/work/hadev-pp: hdfs dfs -chmod 755 /dir_test_with_sticky_bit > manoj@~/work/hadev-pp: hdfs dfs -ls / > Found 2 items > drwxr-xr-t - manoj supergroup 0 2016-07-25 11:42 > /dir_test_with_sticky_bit <=== sticky bit still intact > drwxr-xr-x - manoj supergroup 0 2016-07-25 11:42 /user > manoj@~/work/hadev-pp: hdfs dfs -chmod 0755 /dir_test_with_sticky_bit > manoj@~/work/hadev-pp: hdfs dfs -ls / > Found 2 items > drwxr-xr-x - manoj supergroup 0 2016-07-25 11:42 > /dir_test_with_sticky_bit > drwxr-xr-x - manoj supergroup 0 2016-07-25 11:42 /user > manoj@~/work/hadev-pp: > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10650) DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory permission
[ https://issues.apache.org/jira/browse/HDFS-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388627#comment-15388627 ] Chris Nauroth commented on HDFS-10650: -- I'm not aware of any history behind an intentional choice to use 666 as the default here. It looks incorrect to me. The only thing somewhat related that I remember is HADOOP-9155, which introduced the split of file vs. directory default permissions, but that didn't touch the {{applyUMask}} logic. It would be good to do a thorough review of all the code paths that end up routing through {{DFSClient#applyUMask}} to make sure this is safe. > DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory > permission > - > > Key: HDFS-10650 > URL: https://issues.apache.org/jira/browse/HDFS-10650 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > Attachments: HDFS-10650.001.patch, HDFS-10650.002.patch > > > These 2 DFSClient methods should use default directory permission to create a > directory. > {code:java} > public boolean mkdirs(String src, FsPermission permission, > boolean createParent) throws IOException { > if (permission == null) { > permission = FsPermission.getDefault(); > } > {code} > {code:java} > public boolean primitiveMkdir(String src, FsPermission absPermission, > boolean createParent) > throws IOException { > checkOpen(); > if (absPermission == null) { > absPermission = > FsPermission.getDefault().applyUMask(dfsClientConf.uMask); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10666) Über-jira: Unit tests should not use fixed sleep interval to wait for conditions
[ https://issues.apache.org/jira/browse/HDFS-10666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387987#comment-15387987 ] Chris Nauroth commented on HDFS-10666: -- Excellent! Thank you for filing this, Mingliang. Another technique that has been helpful in the ZooKeeper tests is coordinating the JUnit thread with activities of the internal threads using a {{CountDownLatch}}. Sometimes it requires intrusive changes in the product code to make that work though. > Über-jira: Unit tests should not use fixed sleep interval to wait for > conditions > > > Key: HDFS-10666 > URL: https://issues.apache.org/jira/browse/HDFS-10666 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > There have been dozens of intermittent failing unit tests because they depend > on fixed-interval sleep to wait for conditions to reach before assertion. > This umbrella jira is to replace these sleep statements with: > * {{GenericTestUtils.waitFor()}} to retry the conditions/assertions > * Trigger internal state change of {{MiniDFSCluster}}, e.g. > {{trigger\{BlockReports,HeartBeats,DeletionReports\}}} > * fails fast if specific exceptions are caught > * _ad-hoc fixes_ (TBD) > p.s. I don't know how closures in Java 8 comes into play but I'd like to see > any effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10664) layoutVersion mismatch between Namenode VERSION file and Journalnode VERSION file after cluster upgrade
[ https://issues.apache.org/jira/browse/HDFS-10664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386685#comment-15386685 ] Chris Nauroth commented on HDFS-10664: -- [~aanand001c], thank you for filing this. I had an old note to myself to file a JIRA for this, which I had overlooked. I can confirm that this does happen. In practice, I haven't observed any negative side effects from this, but I agree that we should update that file for consistency. > layoutVersion mismatch between Namenode VERSION file and Journalnode VERSION > file after cluster upgrade > --- > > Key: HDFS-10664 > URL: https://issues.apache.org/jira/browse/HDFS-10664 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs >Affects Versions: 2.7.1 >Reporter: Amit Anand > > After a cluster is upgraded I see a mismatch in {{layoutVersion}} between NN > VERSION file and JN VERSION file. > Here is what I see: > Before cluster upgrade: > == > {code} > ## Version file from NN current directory > namespaceID=109645726 > clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de > cTime=0 > storageType=NAME_NODE > blockpoolID=BP-786201894-10.0.100.11-1466026941507 > layoutVersion=-60 > {code} > {code} > ## Version file from JN current directory > namespaceID=109645726 > clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de > cTime=0 > storageType=JOURNAL_NODE > layoutVersion=-60 > {code} > After cluster upgrade: > = > {code} > ## Version file from NN current directory > namespaceID=109645726 > clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de > cTime=0 > storageType=NAME_NODE > blockpoolID=BP-786201894-10.0.100.11-1466026941507 > layoutVersion=-63 > {code} > {code} > ## Version file from JN current directory > namespaceID=109645726 > clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de > cTime=0 > storageType=JOURNAL_NODE > layoutVersion=-60 > {code} > Since {{Namenode}} is what creates {{Journalnode}} {{VERSION}} file during > {{initializeSharedEdits}}, it should also update the file with correct > information after the cluster is upgraded and {{hdfs dfsadmin > -finalizeUpgrade}} has been executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9852) hdfs dfs -setfacl error message is misleading
[ https://issues.apache.org/jira/browse/HDFS-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-9852: Affects Version/s: (was: 3.0.0-alpha1) Fix Version/s: (was: 3.0.0-alpha1) 2.8.0 I cherry-picked this to branch-2 and branch-2.8. > hdfs dfs -setfacl error message is misleading > - > > Key: HDFS-9852 > URL: https://issues.apache.org/jira/browse/HDFS-9852 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-9852.001.patch, HDFS-9852.002.patch > > > When I type > {noformat}hdfs dfs -setfacl -m default:user::rwx{noformat} > It prints error message: > {noformat} > -setfacl: is missing > Usage: hadoop fs [generic options] -setfacl [-R] [{-b|-k} {-m|-x } > ]|[--set ] > {noformat} > But actually, it's the path that I missed. A correct command should be > {noformat} > hdfs dfs -setfacl -m default:user::rwx /data > {noformat} > In fact, > {noformat}-setfacl -x | -m | --set{noformat} expects two parameters. > We should print error message like this if it misses one: > {noformat} > -setfacl: Missing either or > {noformat} > and print the following if it misses two: > {noformat} > -setfacl: Missing arguments: > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383330#comment-15383330 ] Chris Nauroth commented on HDFS-6962: - bq. One additional question before responding to your comments. I added getMasked and getUnmasked with default implementations to FsPermission which is public and stable. Is that ok? The alternative to this approach is to use instanceof to detect FsCreateModes object with an FsPermission reference. Adding new methods to a public/stable class is acceptable according to [Apache Hadoop Compatibility|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Compatibility.html] guidelines. We took a similar approach when adding the ACL bit. We added {{FsPermission#getAclBit}} with a default implementation. The HDFS-specific {{FsPermissionExtension}} subclass overrides that method. bq. I think it is ok. Will it affect our plan to backport the fix to CDH branches based on 2.6.0? I can't comment definitively on CDH concerns, but I expect that any distro could make the choice to apply the patch to prior maintenance lines if they come to a different risk assessment decision. The ACL code changes infrequently at this point, so I expect it would be trivial to backport, with low likelihood of complex merge conflicts. > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.1.patch, disabled_new_client.log, > disabled_old_client.log, enabled_new_client.log, enabled_old_client.log, run > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383249#comment-15383249 ] Chris Nauroth commented on HDFS-6962: - Hello [~jzhuge]. I apologize for my delayed response. Thank you for working on this tricky issue. I think what you are proposing for configurability and extending the protocol messages makes sense as a way to provide deployments with a choice of which behavior to use. However, I'm reluctant to push it into 2.8.0 now due to the complexity of the changes required to support it. Considering something like a cross-cluster DistCp, with a mix of old and new versions in play, it could become very confusing to explain the end results to users. Unless you consider it urgent for 2.8.0, would you consider targeting it to the 3.x line, as I had done a while ago? I don't think we can realistically ship without the WebHDFS support in place. At this point, there is a user expectation of feature parity for ACL commands whether the target is an hdfs: path or a webhdfs: path. If you want to track WebHDFS work in a separate JIRA, then I think that's fine, but I wouldn't want to ship a non-alpha release lacking the WebHDFS support. I am concerned about adding the {{createModes}} member to {{INodeWithAdditionalFields}} because of the increased per-inode memory footprint in the NameNode. Even for a {{null}}, there is still the pointer cost. I assume this was done because it was the easiest way to get the masked vs. unmasked information passed all the way down to {{FSDirectory#copyINodeDefaultAcl}} during new file/directory creation. That information is not valuable beyond the lifetime of the creation operation, so paying memory to preserve it longer is unnecessary. I think we'll need to explore passing the unmasked information along separately from the inode object. Unfortunately, this likely will make the change more awkward, requiring changes to method signatures to accept more arguments. {code} if (modes == null) { LOG.warn("Received create request without unmasked create mode"); } {code} I expect this log statement would be noisy in practice. I recommend removing it or changing it to debug level if you find it helpful. The documentation of {{dfs.namenode.posix.acl.inheritance.enabled}} in hdfs-default.xml and HdfsPermissionsGuide.md looks good overall. I saw one typo in both places: "comppatible" instead of "compatible". Could you also add a clarifying statement that umask would be ignored if the parent has a default ACL? It could be as simple as "...will apply default ACLs from the parent directory to the create mode and ignore umask." In addition to the new tests you added to {{FSAclBaseTest}}, I recommend testing through the shell. The XML-driven shell tests don't have a way to reconfigure the mini-cluster under test. I expect you'll need to make a new test suite, similar to {{TestAclCLI}}, but with {{dfs.namenode.posix.acl.inheritance.enabled}} set to {{true}}. bq. PermissionStatus#applyUMask never used, remove it? bq. DFSClient#mkdirs and {{DFSClient#primitiveMkdir use file default if permission is null. Should use dir default permission? You might consider filing separate JIRAs for these 2 observations, so that we keep the scope here focused on the ACL inheritance issue. > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.1.patch, disabled_new_client.log, > disabled_old_client.log, enabled_new_client.log, enabled_old_client.log, run > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:rea
[jira] [Commented] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362829#comment-15362829 ] Chris Nauroth commented on HDFS-10594: -- During initial implementation, we made an intentional choice that a cache directive on a directory applies to its direct children only, not all descendants recursively. This behavior is documented here: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html#Cache_directive I'm not in favor of changing this behavior, because it would be an unexpected change for users after an upgrade. It's possible that it would cause the DataNode to {{mlock}} a lot more files than pre-upgrade. This would cause either unpredictable caching if the new files exceed {{dfs.datanode.max.locked.memory}}, possibly caching files that are not useful to cache, or even worse, blowing out memory budget and causing insufficient memory for services and YARN containers running on the host. If there is a desire for this behavior, then a more graceful way to support it would be to introduce a notion of a recursive cache directive. This would preserve the existing default behavior of applying only to direct children. Users who want the recursive behavior could opt in by passing a new flag while creating the cache directive. > CacheReplicationMonitor should recursively rescan the path when the inode of > the path is directory > -- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10594.001.patch > > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345249#comment-15345249 ] Chris Nauroth commented on HDFS-6962: - [~jzhuge], I know {{FSAclBaseTest}} has test cases that cover inheritance of default ACLs to newly created files and sub-directories. Possibly the other suites you mentioned would have test cases for default ACL inheritance too. If those are what you're looking for, then you might not need a new {{TestAclInheritance}} suite. Sorry for my delay in reviewing this more deeply. I will aim for no later than next week to take a closer look. > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.1.patch > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10437) ReconfigurationProtocol not covered by HDFSPolicyProvider.
[ https://issues.apache.org/jira/browse/HDFS-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10437: - Assignee: Arpit Agarwal (was: Xiaobing Zhou) Hadoop Flags: Reviewed +1 for patch 02, pending pre-commit. {code} -policyProviderProtocols = new ArrayList<>(services.length); +policyProviderProtocols = new HashSet<>(services.length); {code} Just a very minor nit: Passing a length to a collection constructor is usually meant to avoid a memory realloc when the expected length is known. With the switch from {{ArrayList}} to {{HashSet}}, that doesn't really work anymore though, because the argument is now interpreted as the internal hash table's capacity, with a realloc occurring after exceeding (capacity * load factor), which defaults to 0.75. Things like Guava's {{Maps#newHashMapWithExpectedSize}} internally do some math on the argument to scale it up and try to stay ahead of the load factor to prevent a realloc. It doesn't really matter much here, where it's just test code and the data set is tiny, so I'm still +1 for the patch. It's just a common pitfall of the {{HashMap}}/{{HashSet}} API. Thanks for the patch! > ReconfigurationProtocol not covered by HDFSPolicyProvider. > -- > > Key: HDFS-10437 > URL: https://issues.apache.org/jira/browse/HDFS-10437 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Chris Nauroth >Assignee: Arpit Agarwal > Attachments: HDFS-10437.01.patch, HDFS-10437.02.patch > > > The {{HDFSPolicyProvider}} class contains an entry for defining the security > policy of each HDFS RPC protocol interface. {{ReconfigurationProtocol}} is > not listed currently. This may indicate that reconfiguration functionality > is not working correctly in secured clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10353) Fix hadoop-hdfs-native-client compilation on Windows
[ https://issues.apache.org/jira/browse/HDFS-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10353: - Affects Version/s: (was: 3.0.0-alpha1) 2.8.0 Fix Version/s: (was: 3.0.0-alpha1) 2.8.0 [~ajisakaa], you are correct. I cherry-picked it to branch-2 and branch-2.8. I confirmed that both branches build the distro successfully on Windows. > Fix hadoop-hdfs-native-client compilation on Windows > > > Key: HDFS-10353 > URL: https://issues.apache.org/jira/browse/HDFS-10353 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.8.0 > Environment: windows >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Fix For: 2.8.0 > > Attachments: HDFS-10353.patch > > > After HADOOP-12892,,hadoop-hdfs-native-client compilation failing by throwing > the following... > {noformat} > F:\Trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\antrun\build-main.xml > [ERROR] -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project > hadoop-hdfs-native-client: An Ant BuildException has occured: > F:\GitCode\Trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\target\bin\RelWithDebInfo > does not exist. > around Ant part ... todir="F:\Trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target/bin">... > @ 14:98 in > F:\Trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\antrun\build-main.xml > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:317) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:152) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:555) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:214) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:158) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) > at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) > at > org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) > Caused by: org.apache.maven.plugin.MojoExecutionException: An Ant > BuildException has occured: > F:\Trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\target\bin\RelWithDebInfo > does not exist. > around Ant part ... todir="F:\Trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target/bin">... > @ 14:98 in > F:\Trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\antrun\build-main.xml > at > org.apache.maven.plugin.antrun.AntRunMojo.execute(AntRunMojo.java:355) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 19 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-10546) hadoop-hdfs-native-client fails distro build when trying to copy libhdfs binaries.
[ https://issues.apache.org/jira/browse/HDFS-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-10546. -- Resolution: Duplicate I just realized this duplicates HDFS-10353, which fixed the problem in trunk. We just need to cherry-pick that patch down to branch-2 and branch-2.8. I'll cover it over there. > hadoop-hdfs-native-client fails distro build when trying to copy libhdfs > binaries. > -- > > Key: HDFS-10546 > URL: https://issues.apache.org/jira/browse/HDFS-10546 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Blocker > > During the distro build, hadoop-hdfs-native-client copies the built libhdfs > binary artifacts for inclusion in the distro. It references an incorrect > path though. The copy fails and the build aborts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10546) hadoop-hdfs-native-client fails distro build when trying to copy libhdfs binaries.
Chris Nauroth created HDFS-10546: Summary: hadoop-hdfs-native-client fails distro build when trying to copy libhdfs binaries. Key: HDFS-10546 URL: https://issues.apache.org/jira/browse/HDFS-10546 Project: Hadoop HDFS Issue Type: Bug Components: build Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Blocker During the distro build, hadoop-hdfs-native-client copies the built libhdfs binary artifacts for inclusion in the distro. It references an incorrect path though. The copy fails and the build aborts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10533) Make DistCpOptions class immutable
[ https://issues.apache.org/jira/browse/HDFS-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332569#comment-15332569 ] Chris Nauroth commented on HDFS-10533: -- Is this targeted to 3.0.0-alpha1 so that there is freedom to make a backward-incompatible change? If so, then I know at least Hive and Falcon would be impacted, and possibly Oozie too. https://github.com/apache/hive/blob/release-2.0.1/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L1262-L1279 https://github.com/apache/falcon/blob/release-0.9-rc0/replication/src/main/java/org/apache/falcon/replication/FeedReplicator.java#L187-L239 https://github.com/apache/oozie/blob/release-4.2.0/sharelib/distcp/src/main/java/org/apache/oozie/action/hadoop/DistcpMain.java#L74-L96 If downstream projects expect a stable interface, then we should update the interface audience and stability annotations too. > Make DistCpOptions class immutable > -- > > Key: HDFS-10533 > URL: https://issues.apache.org/jira/browse/HDFS-10533 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > Currently the {{DistCpOptions}} class encapsulates all DistCp options, which > may be set from command-line (via the {{OptionsParser}}) or may be set > manually (eg construct an instance and call setters). As there are multiple > option fields and more (e.g. [HDFS-9868], [HDFS-10314]) to add, validating > them can be cumbersome. Ideally, the {{DistCpOptions}} object should be > immutable. The benefits are: > # {{DistCpOptions}} is simple and easier to use and share, plus it scales well > # validation is automatic, e.g. manually constructed {{DistCpOptions}} gets > validated before usage > # validation error message is well-defined which does not depend on the order > of setters > This jira is to track the effort of making the {{DistCpOptions}} immutable by > using a Builder pattern for creation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-10502) Enabled memory locking and now HDFS won't start up
[ https://issues.apache.org/jira/browse/HDFS-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-10502. -- Resolution: Invalid Hello [~machey]. I recommend taking these questions to the u...@hadoop.apache.org mailing list. We use JIRA for tracking confirmed bugs and feature requests. We use u...@hadoop.apache.org for usage advice and troubleshooting. Regarding whether or not this is a recommended approach, I think it depends on a few other factors. Is the intent to use these cached files from Hadoop workloads, such as MapReduce jobs or Hive queries? If not, then I wonder if your use case might be better served by something more directly focused on general caching use cases, such as Redis or memcached. If your use case does involve Hadoop integration, then certainly Centralized Cache Management is worth exploring. Regarding the timeouts, I can tell from the exception that this is the heartbeat RPC sent from the DataNode to the NameNode. I recommend investigating connectivity between the DataNode and the NameNode and examining the logs from both sides to try to determine if something is going wrong in the handling of the heartbeat message. On one hand, a heartbeat timeout is not an error condition that is specific to Centralized Cache Management. It could happen whether or not you're using that feature. On the other hand, the heartbeat message does contain some optional information about the state of cache capacity and current usage at the DataNode. That information would trigger special handling logic at the NameNode side, so I suppose there is a chance that something in that logic is hanging up the heartbeat handling. Investigating the logs might reveal more. u...@hadoop.apache.org would be a good forum for further discussion of both of these topics. > Enabled memory locking and now HDFS won't start up > -- > > Key: HDFS-10502 > URL: https://issues.apache.org/jira/browse/HDFS-10502 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 2.7.2 > Environment: RHEL 6.8 >Reporter: Chris Machemer > > My goal is to speed up reads. I have about 500k small files (2k to 15k) and > I'm trying to use HDFS as a cache for serialized instances of java objects. > I've written the code to construct and serialize all the objects out to HDFS, > and am now hoping to improve read performance, because accessing the objects > from disk-based storage is proving to be too slow for my application's SLA's. > So my first question is, is using memory locking and hdfs cacheadmin pools > and directives the right way to go, to cache my objects into memory, or > should I create RAM disks, and do memory-based storage instead? > If hdfs cacheadmin is the way to go (it's the path I'm going down so far), > then I need to figure out if what's happening is a bug or if I've configured > something wrong, because when I start up HDFS with a gig of memory locked > (both in limits.d for ulimit -l and also in hdfs-site.xml) and the server > starts up, and presumably tries to cache things into memory, I get hours and > hours of timeouts in the logs like this: > 2016-06-08 07:42:50,856 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > IOException in offerService > java.net.SocketTimeoutException: Call From stgb-fe1.litle.com/10.1.9.66 to > localhost:8020 failed on socket timeout exception: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/127.0.0.1:51647 remote=localhost/127.0.0.1:8020]; For more details > see: http://wiki.apache.org/hadoop/SocketTimeout > at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751) > at org.apache.hadoop.ipc.Client.call(Client.java:1479) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.sendHeartbeat(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:554) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653) > at > org.apache.hado
[jira] [Updated] (HDFS-10488) WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating files/directories without specifying permissions
[ https://issues.apache.org/jira/browse/HDFS-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10488: - Assignee: Wellington Chevreuil > WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating > files/directories without specifying permissions > -- > > Key: HDFS-10488 > URL: https://issues.apache.org/jira/browse/HDFS-10488 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: HDFS-10488.002.patch, HDFS-10488.003.patch, > HDFS-10488.patch > > > WebHDFS methods for creating file/directories are always creating it with 755 > permissions as default, even ignoring any configured > *fs.permissions.umask-mode* in the case of directories. > Dfs CLI, however, applies the configured umask to 777 permission for > directories, or 666 permission for files. > Example below shows the different behaviour when creating directory via CLI > and WebHDFS: > {noformat} > 1) Creating a directory under '/test/' as 'test-user'. Configured > fs.permissions.umask-mode is 000: > $ sudo -u test-user hdfs dfs -mkdir /test/test-user1 > $ sudo -u test-user hdfs dfs -getfacl /test/test-user1 > # file: /test/test-user1 > # owner: test-user > # group: supergroup > user::rwx > group::rwx > other::rwx > 4) Doing the same via WebHDFS does not get the proper ACLs: > $ curl -i -X PUT > "http://namenode-host:50070/webhdfs/v1/test/test-user2?user.name=test-user&op=MKDIRS"; > > $ sudo -u test-user hdfs dfs -getfacl /test/test-user2 > # file: /test/test-user2 > # owner: test-user > # group: supergroup > user::rwx > group::r-x > other::r-x > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10488) WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating files/directories without specifying permissions
[ https://issues.apache.org/jira/browse/HDFS-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319384#comment-15319384 ] Chris Nauroth commented on HDFS-10488: -- bq. So, Chris Nauroth, summarizing, fs.permissions.umask-mode should not be applied for WebHDFS created directories/files. I think a slight refinement of this is to say that it should not be applied by the WebHDFS server side (the NameNode). It may be applied by the WebHDFS client side. For example, the {{WebHdfsFileSystem}} class that ships in Hadoop does apply {{fs.permissions.umask-mode}} from the client side before calling the WebHDFS server side. bq. While working on this, I had found out the default permission (if no permission is specified while calling the method) for both directories and files created by WebHDFS currently is 755. However, defining "execution" permissions for HDFS files don't have any value. Should this be changed to give different default permissions for files and directories? This part is admittedly odd, and there is a long-standing open JIRA requesting a change to 644 as the default for files. That is HDFS-6434. This change is potentially backwards-incompatible, such as if someone has an existing workflow that round-trips a file through HDFS and expects it to be executable after getting it back out, though that's likely a remote edge case. If you'd like to proceed with HDFS-6434, then I'd suggest targeting trunk/Hadoop 3.x, where we currently can make backwards-incompatible changes. bq. Still on the default values, setting 755 as default can lead to confusion about umask being used. Since default umask is 022, users can conclude that the umask is being applied when they see newly created directories got 755. Should this be changed to more permissive permissions such as 777? I do think 777 makes sense from one perspective, but there is also a trade-off with providing behavior that is secure by default. In HDFS-2427, the project made the choice to go with 755, favoring secure default behavior (755) over the possibly more intuitive behavior (777). bq. When working on tests for WebHDFS CREATESYMLINK as suggested by Wei-Chiu Chuang, I realized this method is no longer supported. Should we simply remove from WebHDFS, or only document this is not supported anymore and leave it giving the current error? HDFS symlinks are currently in a state where the code is partially completed but dormant due to unresolved problems with backwards-compatibility and security. We might get past those hurdles someday, so I suggest leaving that code as is. We still run tests against the symlink code paths. This works by having the tests call the private {{FileSystem#enableSymlinks}} method to toggle on the dormant symlink code. > WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating > files/directories without specifying permissions > -- > > Key: HDFS-10488 > URL: https://issues.apache.org/jira/browse/HDFS-10488 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Wellington Chevreuil >Priority: Minor > Attachments: HDFS-10488.002.patch, HDFS-10488.003.patch, > HDFS-10488.patch > > > WebHDFS methods for creating file/directories are always creating it with 755 > permissions as default, even ignoring any configured > *fs.permissions.umask-mode* in the case of directories. > Dfs CLI, however, applies the configured umask to 777 permission for > directories, or 666 permission for files. > Example below shows the different behaviour when creating directory via CLI > and WebHDFS: > {noformat} > 1) Creating a directory under '/test/' as 'test-user'. Configured > fs.permissions.umask-mode is 000: > $ sudo -u test-user hdfs dfs -mkdir /test/test-user1 > $ sudo -u test-user hdfs dfs -getfacl /test/test-user1 > # file: /test/test-user1 > # owner: test-user > # group: supergroup > user::rwx > group::rwx > other::rwx > 4) Doing the same via WebHDFS does not get the proper ACLs: > $ curl -i -X PUT > "http://namenode-host:50070/webhdfs/v1/test/test-user2?user.name=test-user&op=MKDIRS"; > > $ sudo -u test-user hdfs dfs -getfacl /test/test-user2 > # file: /test/test-user2 > # owner: test-user > # group: supergroup > user::rwx > group::r-x > other::r-x > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10488) WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating files/directories without specifying permissions
[ https://issues.apache.org/jira/browse/HDFS-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10488: - Status: Open (was: Patch Available) > WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating > files/directories without specifying permissions > -- > > Key: HDFS-10488 > URL: https://issues.apache.org/jira/browse/HDFS-10488 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Wellington Chevreuil >Priority: Minor > Attachments: HDFS-10488.002.patch, HDFS-10488.003.patch, > HDFS-10488.patch > > > WebHDFS methods for creating file/directories are always creating it with 755 > permissions as default, even ignoring any configured > *fs.permissions.umask-mode* in the case of directories. > Dfs CLI, however, applies the configured umask to 777 permission for > directories, or 666 permission for files. > Example below shows the different behaviour when creating directory via CLI > and WebHDFS: > {noformat} > 1) Creating a directory under '/test/' as 'test-user'. Configured > fs.permissions.umask-mode is 000: > $ sudo -u test-user hdfs dfs -mkdir /test/test-user1 > $ sudo -u test-user hdfs dfs -getfacl /test/test-user1 > # file: /test/test-user1 > # owner: test-user > # group: supergroup > user::rwx > group::rwx > other::rwx > 4) Doing the same via WebHDFS does not get the proper ACLs: > $ curl -i -X PUT > "http://namenode-host:50070/webhdfs/v1/test/test-user2?user.name=test-user&op=MKDIRS"; > > $ sudo -u test-user hdfs dfs -getfacl /test/test-user2 > # file: /test/test-user2 > # owner: test-user > # group: supergroup > user::rwx > group::r-x > other::r-x > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10488) WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating files/directories without specifying permissions
[ https://issues.apache.org/jira/browse/HDFS-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317387#comment-15317387 ] Chris Nauroth commented on HDFS-10488: -- -1 for the proposed change. The umask ({{fs.permissions.umask-mode}}) is a concept that is applied at the client side by individual applications, usually via their usage of the FileSystem subclasses that implement a particular file system client. The umask is not applied by the API/protocol layer such as WebHDFS or NameNode RPC. As such, the behavior of the shell, which applies umask, is not always going to look consistent with the behavior of a direct curl WebHDFS call, which does not apply the umask. Using the shell to access WebHDFS gives consistent results, because the logic of the WebHdfsFileSystem class used by the shell will apply the umask. If this patch were committed, then it would become basically impossible to create files and directories with absolute permissions through WebHDFS. For example, suppose {{fs.permissions.umask-mode}} is set to 022, but an individual application has a desire to create a file with 775 permissions. This wouldn't work as expected, because server-side enforcement of the umask would restrict permissions on the resulting file to 755. The only way to work around this would be to reconfigure {{fs.permissions.umask-mode}} and restart the NameNode, which isn't operationally desirable. Worse than that, this would likely have the long-term effect of reducing {{fs.permissions.umask-mode}} to lowest common denominator, perhaps even 000, to accommodate all possible permissions at file creation time, thus weakening the benefit of umask as applied by client applications like the shell. As a final point against this change, please note that it could be considered backwards-incompatible. In my example above trying to create a file with 775 permissions, but the server-side umask forcing it to 755, it means that subsequent write actions by users in the same group will be unauthorized. This may break certain workflows. The area where there is a possibility for change is documentation to help raise user awareness of this. That could potentially go into the HDFS Permissions Guide page or the WebHDFS REST API page, or perhaps some combination of both. I would be happy to help review and +1 documentation changes. [~wchevreuil], despite my -1, thank you for writing up your experience with this and posting a patch. If you'd like to proceed with a documentation patch, please let me know, and I'll assign the issue to you. > WebHDFS CREATE and MKDIRS does not follow same rules as DFS CLI when creating > files/directories without specifying permissions > -- > > Key: HDFS-10488 > URL: https://issues.apache.org/jira/browse/HDFS-10488 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Wellington Chevreuil >Priority: Minor > Attachments: HDFS-10488.002.patch, HDFS-10488.003.patch, > HDFS-10488.patch > > > WebHDFS methods for creating file/directories are always creating it with 755 > permissions as default, even ignoring any configured > *fs.permissions.umask-mode* in the case of directories. > Dfs CLI, however, applies the configured umask to 777 permission for > directories, or 666 permission for files. > Example below shows the different behaviour when creating directory via CLI > and WebHDFS: > {noformat} > 1) Creating a directory under '/test/' as 'test-user'. Configured > fs.permissions.umask-mode is 000: > $ sudo -u test-user hdfs dfs -mkdir /test/test-user1 > $ sudo -u test-user hdfs dfs -getfacl /test/test-user1 > # file: /test/test-user1 > # owner: test-user > # group: supergroup > user::rwx > group::rwx > other::rwx > 4) Doing the same via WebHDFS does not get the proper ACLs: > $ curl -i -X PUT > "http://namenode-host:50070/webhdfs/v1/test/test-user2?user.name=test-user&op=MKDIRS"; > > $ sudo -u test-user hdfs dfs -getfacl /test/test-user2 > # file: /test/test-user2 > # owner: test-user > # group: supergroup > user::rwx > group::r-x > other::r-x > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10430) Reuse FileSystem#access in TestAsyncDFS
[ https://issues.apache.org/jira/browse/HDFS-10430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10430: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) +1 for patch v002. I have committed this to trunk, branch-2 and branch-2.8. [~xiaobingo], thank you for the contribution. > Reuse FileSystem#access in TestAsyncDFS > --- > > Key: HDFS-10430 > URL: https://issues.apache.org/jira/browse/HDFS-10430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Fix For: 2.8.0 > > Attachments: HDFS-10430-HDFS-9924.000.patch, > HDFS-10430-HDFS-9924.001.patch, HDFS-10430-HDFS-9924.002.patch > > > In TestAsyncDFS, there are duplicate code to do access check. Here it tries > to reuse FileSystem#access for the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10467) Router-based HDFS federation
[ https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304646#comment-15304646 ] Chris Nauroth commented on HDFS-10467: -- [~elgoiri], thank you for sharing this. A similar discussion came up recently on the hdfs-...@hadoop.apache.org mailing list, so it appears you are not alone in this requirement. http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201605.mbox/%3C1462210332.1520687.595811233.2B297F6A%40webmail.messagingengine.com%3E > Router-based HDFS federation > > > Key: HDFS-10467 > URL: https://issues.apache.org/jira/browse/HDFS-10467 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Affects Versions: 2.7.2 >Reporter: Inigo Goiri > Attachments: HDFS Router Federation.pdf > > > Add a Router to provide a federated view of multiple HDFS clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10430) Reuse FileSystem#access in TestAsyncDFS
[ https://issues.apache.org/jira/browse/HDFS-10430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303547#comment-15303547 ] Chris Nauroth commented on HDFS-10430: -- [~xiaobingo], this looks like a good change. Thank you for the patch. I have just 2 minor comments: # The {{checkAccessPermissions}} helper method seems unnecessary now that it's just a pass-through to a single line of code. Do you think it makes sense to move {{fs.access(path, mode);}} inline with {{testConcurrentAsyncAPI}} and remove the extra method? # Please remove the unused imports reported by Checkstyle. > Reuse FileSystem#access in TestAsyncDFS > --- > > Key: HDFS-10430 > URL: https://issues.apache.org/jira/browse/HDFS-10430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-10430-HDFS-9924.000.patch, > HDFS-10430-HDFS-9924.001.patch > > > In TestAsyncDFS, there are duplicate code to do access check. Here it tries > to reuse FileSystem#access for the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10430) Refactor FileSystem#checkAccessPermissions for better reuse from tests
[ https://issues.apache.org/jira/browse/HDFS-10430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298596#comment-15298596 ] Chris Nauroth commented on HDFS-10430: -- It's not immediately clear to me why another project's tests would need direct access to this method instead of using the public {{FileSystem#access}} method. Maybe seeing the proposed patch or pointing out examples would help clarify. The reason for the existence of the package-private {{FileSystem#checkAccessPermissions}} method is to provide code sharing between {{FileSystem}} and {{AbstractFileSystem}} for a default implementation of {{access}} in the base classes. However, that default implementation is not necessarily complete or correct for all file systems. For HDFS, {{DistributedFileSystem}} overrides {{access}} to use an RPC to the NameNode. The implementation of that RPC at the NameNode is different from the base class implementation, because it considers not only permissions but also HDFS ACLs. If {{checkAccessPermissions}} is made public, then there is a risk that applications would call it directly from main code, unaware that they could be bypassing ACL logic when connected to HDFS. > Refactor FileSystem#checkAccessPermissions for better reuse from tests > -- > > Key: HDFS-10430 > URL: https://issues.apache.org/jira/browse/HDFS-10430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > > FileSystem#checkAccessPermissions could be used in a bunch of tests from > different projects, but it's in hadoop-common, which is not visible in some > cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.
[ https://issues.apache.org/jira/browse/HDFS-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10438: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I have committed this to trunk, branch-2 and branch-2.8. [~arpitagarwal] and [~kihwal], thank you for the code reviews. > When NameNode HA is configured to use the lifeline RPC server, it should log > the address of that server. > > > Key: HDFS-10438 > URL: https://issues.apache.org/jira/browse/HDFS-10438 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: KWON BYUNGCHANG >Assignee: Chris Nauroth >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-10438.001.patch, HDFS-10438.002.patch > > > As reported by [~magnum]: > I have configured below > {code} > dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040 > dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041 > {code} > servicerpc port is 8040, lifeline port is 8041. > however zkfc daemon is logging using servicerpc port. > It may cause confusion. > thank you. > {code} > 2016-05-19 19:18:40,566 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(207)) - Service health check failed for > NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources > available > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10424) DatanodeLifelineProtocol not able to use under security cluster
[ https://issues.apache.org/jira/browse/HDFS-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10424: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) [~arpitagarwal] and [~vinayrpet], thank you for the code reviews. I have committed this to trunk, branch-2 and branch-2.8. > DatanodeLifelineProtocol not able to use under security cluster > --- > > Key: HDFS-10424 > URL: https://issues.apache.org/jira/browse/HDFS-10424 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: gu-chi >Assignee: Chris Nauroth >Priority: Blocker > Fix For: 2.8.0 > > Attachments: HDFS-10424-branch-2.8.001.patch, HDFS-10424.001.patch > > > {quote} > protocol org.apache.hadoop.hdfs.server.protocol.DatanodeLifelineProtocol is > unauthorized for user * (auth:KERBEROS) | Server.java:1979 > {quote} > I am using security cluster authenticate with kerberos, as I checked the the > code, if security auth enabled, because the DatanodeLifelineProtocol is not > inside HDFSPolicyProvider, when authorize in ServiceAuthorizationManager, > AuthorizationException will be thrown at line 96. > Please point me out if I am wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10385) LocalFileSystem rename() function should return false when destination file exists
[ https://issues.apache.org/jira/browse/HDFS-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293881#comment-15293881 ] Chris Nauroth commented on HDFS-10385: -- [~boky01], I agree. I'm generally reluctant to change behavior of the local file system classes in the 2.x line. If it's helpful, we could consider possible changes for trunk/3.x and mark them as backwards-incompatible. > LocalFileSystem rename() function should return false when destination file > exists > -- > > Key: HDFS-10385 > URL: https://issues.apache.org/jira/browse/HDFS-10385 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 2.6.0 >Reporter: Aihua Xu >Assignee: Xiaobing Zhou > > Currently rename() of LocalFileSystem returns true and renames successfully > when the destination file exists. That seems to have different behavior from > DFSFileSystem. > If they can have the same behavior, then we can use one call to do rename > rather than checking if destination exists and then making rename() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.
[ https://issues.apache.org/jira/browse/HDFS-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10438: - Attachment: HDFS-10438.002.patch [~arpitagarwal], thank you for the review. I'm attaching patch v002 with just a cosmetic change to satisfy the Checkstyle line length warning. Dos this still look good? > When NameNode HA is configured to use the lifeline RPC server, it should log > the address of that server. > > > Key: HDFS-10438 > URL: https://issues.apache.org/jira/browse/HDFS-10438 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: KWON BYUNGCHANG >Assignee: Chris Nauroth >Priority: Minor > Attachments: HDFS-10438.001.patch, HDFS-10438.002.patch > > > As reported by [~magnum]: > I have configured below > {code} > dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040 > dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041 > {code} > servicerpc port is 8040, lifeline port is 8041. > however zkfc daemon is logging using servicerpc port. > It may cause confusion. > thank you. > {code} > 2016-05-19 19:18:40,566 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(207)) - Service health check failed for > NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources > available > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10424) DatanodeLifelineProtocol not able to use under security cluster
[ https://issues.apache.org/jira/browse/HDFS-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292754#comment-15292754 ] Chris Nauroth commented on HDFS-10424: -- The Checkstyle warnings are triggered by existing patterns in the code. I'm not planning on cleaning those up within the scope of this patch. > DatanodeLifelineProtocol not able to use under security cluster > --- > > Key: HDFS-10424 > URL: https://issues.apache.org/jira/browse/HDFS-10424 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: gu-chi >Assignee: Chris Nauroth >Priority: Blocker > Attachments: HDFS-10424-branch-2.8.001.patch, HDFS-10424.001.patch > > > {quote} > protocol org.apache.hadoop.hdfs.server.protocol.DatanodeLifelineProtocol is > unauthorized for user * (auth:KERBEROS) | Server.java:1979 > {quote} > I am using security cluster authenticate with kerberos, as I checked the the > code, if security auth enabled, because the DatanodeLifelineProtocol is not > inside HDFSPolicyProvider, when authorize in ServiceAuthorizationManager, > AuthorizationException will be thrown at line 96. > Please point me out if I am wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10424) DatanodeLifelineProtocol not able to use under security cluster
[ https://issues.apache.org/jira/browse/HDFS-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10424: - Attachment: HDFS-10424-branch-2.8.001.patch [~arpitagarwal], thank you for the code review. I just noticed that I need a separate patch for branch-2.8, because {{ReconfigurationProtocol}} doesn't exist there. I'm attaching that now. Are you +1 for the branch-2.8 patch too? > DatanodeLifelineProtocol not able to use under security cluster > --- > > Key: HDFS-10424 > URL: https://issues.apache.org/jira/browse/HDFS-10424 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: gu-chi >Assignee: Chris Nauroth >Priority: Blocker > Attachments: HDFS-10424-branch-2.8.001.patch, HDFS-10424.001.patch > > > {quote} > protocol org.apache.hadoop.hdfs.server.protocol.DatanodeLifelineProtocol is > unauthorized for user * (auth:KERBEROS) | Server.java:1979 > {quote} > I am using security cluster authenticate with kerberos, as I checked the the > code, if security auth enabled, because the DatanodeLifelineProtocol is not > inside HDFSPolicyProvider, when authorize in ServiceAuthorizationManager, > AuthorizationException will be thrown at line 96. > Please point me out if I am wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10437) ReconfigurationProtocol not covered by HDFSPolicyProvider.
[ https://issues.apache.org/jira/browse/HDFS-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292395#comment-15292395 ] Chris Nauroth commented on HDFS-10437: -- [~xiaobingo], thanks for confirmation. When we do that, let's also update the new {{TestHDFSPolicyProvider}} suite that I'm adding in HDFS-10424 to remove the special-case check for {{ReconfigurationProtocol}}. > ReconfigurationProtocol not covered by HDFSPolicyProvider. > -- > > Key: HDFS-10437 > URL: https://issues.apache.org/jira/browse/HDFS-10437 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Chris Nauroth > > The {{HDFSPolicyProvider}} class contains an entry for defining the security > policy of each HDFS RPC protocol interface. {{ReconfigurationProtocol}} is > not listed currently. This may indicate that reconfiguration functionality > is not working correctly in secured clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.
[ https://issues.apache.org/jira/browse/HDFS-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292386#comment-15292386 ] Chris Nauroth commented on HDFS-10438: -- Just in case it's not obvious, the {{toString}} method I'm changing gets called from {{org.apache.hadoop.ha.HealthMonitor}} up in hadoop-common. That's why changing the {{toString}} code fixes the log message. > When NameNode HA is configured to use the lifeline RPC server, it should log > the address of that server. > > > Key: HDFS-10438 > URL: https://issues.apache.org/jira/browse/HDFS-10438 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: KWON BYUNGCHANG >Assignee: Chris Nauroth >Priority: Minor > Attachments: HDFS-10438.001.patch > > > As reported by [~magnum]: > I have configured below > {code} > dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040 > dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041 > {code} > servicerpc port is 8040, lifeline port is 8041. > however zkfc daemon is logging using servicerpc port. > It may cause confusion. > thank you. > {code} > 2016-05-19 19:18:40,566 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(207)) - Service health check failed for > NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources > available > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.
[ https://issues.apache.org/jira/browse/HDFS-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10438: - Attachment: HDFS-10438.001.patch I'm attaching a patch that uses the lifeline address in the string that gets logged if a lifeline address is configured. I also updated a test to check for the correct string representation. While working on this, I noticed that {{MiniDFSCluster}} was not quite doing the right thing for copying the lifeline address configuration key in a multi-nameservice/multi-namenode setup. > When NameNode HA is configured to use the lifeline RPC server, it should log > the address of that server. > > > Key: HDFS-10438 > URL: https://issues.apache.org/jira/browse/HDFS-10438 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: KWON BYUNGCHANG >Assignee: Chris Nauroth >Priority: Minor > Attachments: HDFS-10438.001.patch > > > As reported by [~magnum]: > I have configured below > {code} > dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040 > dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041 > {code} > servicerpc port is 8040, lifeline port is 8041. > however zkfc daemon is logging using servicerpc port. > It may cause confusion. > thank you. > {code} > 2016-05-19 19:18:40,566 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(207)) - Service health check failed for > NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources > available > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9311) Support optional offload of NameNode HA service health checks to a separate RPC server.
[ https://issues.apache.org/jira/browse/HDFS-9311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292373#comment-15292373 ] Chris Nauroth commented on HDFS-9311: - [~magnum], thank you for your comment. I think we can improve that log message to use the lifeline address when it's configured. I filed a follow-up patch on HDFS-10438. > Support optional offload of NameNode HA service health checks to a separate > RPC server. > --- > > Key: HDFS-9311 > URL: https://issues.apache.org/jira/browse/HDFS-9311 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: HDFS-9311.001.patch, HDFS-9311.002.patch, > HDFS-9311.003.patch > > > When a NameNode is overwhelmed with load, it can lead to resource exhaustion > of the RPC handler pools (both client-facing and service-facing). > Eventually, this blocks the health check RPC issued from ZKFC, which triggers > a failover. Depending on fencing configuration, the former active NameNode > may be killed. In an overloaded situation, the new active NameNode is likely > to suffer the same fate, because client load patterns don't change after the > failover. This can degenerate into flapping between the 2 NameNodes without > real recovery. If a NameNode had been killed by fencing, then it would have > to transition through safe mode, further delaying time to recovery. > This issue proposes a separate, optional RPC server at the NameNode for > isolating the HA health checks. These health checks are lightweight > operations that do not suffer from contention issues on the namesystem lock > or other shared resources. Isolating the RPC handlers is sufficient to avoid > this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.
[ https://issues.apache.org/jira/browse/HDFS-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10438: - Status: Patch Available (was: Open) > When NameNode HA is configured to use the lifeline RPC server, it should log > the address of that server. > > > Key: HDFS-10438 > URL: https://issues.apache.org/jira/browse/HDFS-10438 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: KWON BYUNGCHANG >Assignee: Chris Nauroth >Priority: Minor > Attachments: HDFS-10438.001.patch > > > As reported by [~magnum]: > I have configured below > {code} > dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040 > dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041 > {code} > servicerpc port is 8040, lifeline port is 8041. > however zkfc daemon is logging using servicerpc port. > It may cause confusion. > thank you. > {code} > 2016-05-19 19:18:40,566 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(207)) - Service health check failed for > NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources > available > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.
Chris Nauroth created HDFS-10438: Summary: When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server. Key: HDFS-10438 URL: https://issues.apache.org/jira/browse/HDFS-10438 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Reporter: KWON BYUNGCHANG Assignee: Chris Nauroth Priority: Minor As reported by [~magnum]: I have configured below {code} dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040 dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041 {code} servicerpc port is 8040, lifeline port is 8041. however zkfc daemon is logging using servicerpc port. It may cause confusion. thank you. {code} 2016-05-19 19:18:40,566 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(207)) - Service health check failed for NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources available {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10437) ReconfigurationProtocol not covered by HDFSPolicyProvider.
[ https://issues.apache.org/jira/browse/HDFS-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292285#comment-15292285 ] Chris Nauroth commented on HDFS-10437: -- This came out of investigation on HDFS-10424, which reported a similar problem for {{DatanodeLifelineProtocol}}. The test I wrote for the patch attached there also indicated a possible problem for {{ReconfigurationProtocol}}. [~xiaobingo], would you please check? Cc [~arpitagarwal]. > ReconfigurationProtocol not covered by HDFSPolicyProvider. > -- > > Key: HDFS-10437 > URL: https://issues.apache.org/jira/browse/HDFS-10437 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Chris Nauroth > > The {{HDFSPolicyProvider}} class contains an entry for defining the security > policy of each HDFS RPC protocol interface. {{ReconfigurationProtocol}} is > not listed currently. This may indicate that reconfiguration functionality > is not working correctly in secured clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10437) ReconfigurationProtocol not covered by HDFSPolicyProvider.
Chris Nauroth created HDFS-10437: Summary: ReconfigurationProtocol not covered by HDFSPolicyProvider. Key: HDFS-10437 URL: https://issues.apache.org/jira/browse/HDFS-10437 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.8.0 Reporter: Chris Nauroth The {{HDFSPolicyProvider}} class contains an entry for defining the security policy of each HDFS RPC protocol interface. {{ReconfigurationProtocol}} is not listed currently. This may indicate that reconfiguration functionality is not working correctly in secured clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10424) DatanodeLifelineProtocol not able to use under security cluster
[ https://issues.apache.org/jira/browse/HDFS-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10424: - Status: Patch Available (was: Open) > DatanodeLifelineProtocol not able to use under security cluster > --- > > Key: HDFS-10424 > URL: https://issues.apache.org/jira/browse/HDFS-10424 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: gu-chi >Priority: Blocker > Attachments: HDFS-10424.001.patch > > > {quote} > protocol org.apache.hadoop.hdfs.server.protocol.DatanodeLifelineProtocol is > unauthorized for user * (auth:KERBEROS) | Server.java:1979 > {quote} > I am using security cluster authenticate with kerberos, as I checked the the > code, if security auth enabled, because the DatanodeLifelineProtocol is not > inside HDFSPolicyProvider, when authorize in ServiceAuthorizationManager, > AuthorizationException will be thrown at line 96. > Please point me out if I am wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10424) DatanodeLifelineProtocol not able to use under security cluster
[ https://issues.apache.org/jira/browse/HDFS-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth reassigned HDFS-10424: Assignee: Chris Nauroth > DatanodeLifelineProtocol not able to use under security cluster > --- > > Key: HDFS-10424 > URL: https://issues.apache.org/jira/browse/HDFS-10424 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: gu-chi >Assignee: Chris Nauroth >Priority: Blocker > Attachments: HDFS-10424.001.patch > > > {quote} > protocol org.apache.hadoop.hdfs.server.protocol.DatanodeLifelineProtocol is > unauthorized for user * (auth:KERBEROS) | Server.java:1979 > {quote} > I am using security cluster authenticate with kerberos, as I checked the the > code, if security auth enabled, because the DatanodeLifelineProtocol is not > inside HDFSPolicyProvider, when authorize in ServiceAuthorizationManager, > AuthorizationException will be thrown at line 96. > Please point me out if I am wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10424) DatanodeLifelineProtocol not able to use under security cluster
[ https://issues.apache.org/jira/browse/HDFS-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-10424: - Attachment: HDFS-10424.001.patch [~gu chi], thank you for the bug report. I was able to reproduce it in a secured cluster, and then I verified that the attached patch fixes it. This patch also includes a new test suite designed to catch similar kinds of bugs in the future. It works by scanning the list of protocol classes covered by {{HDFSPolicyProvider}} and then comparing that to *Protocol interfaces implemented by known RPC server classes. If it finds a protocol interface implemented by a server, but not covered in the policy, then it fails. This way, if we add new protocols, but forget to update {{HDFSPolicyProvider}}, then the test will fail during pre-commit. Interestingly, this test immediately exposed another potential offender: {{ReconfigurationProtocol}}. I've coded the test to skip checking that one for now in the interest of expediting the patch here. I'll file a separate JIRA for follow-up on that one and contact contributors who have worked on reconfiguration. > DatanodeLifelineProtocol not able to use under security cluster > --- > > Key: HDFS-10424 > URL: https://issues.apache.org/jira/browse/HDFS-10424 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: gu-chi >Priority: Blocker > Attachments: HDFS-10424.001.patch > > > {quote} > protocol org.apache.hadoop.hdfs.server.protocol.DatanodeLifelineProtocol is > unauthorized for user * (auth:KERBEROS) | Server.java:1979 > {quote} > I am using security cluster authenticate with kerberos, as I checked the the > code, if security auth enabled, because the DatanodeLifelineProtocol is not > inside HDFSPolicyProvider, when authorize in ServiceAuthorizationManager, > AuthorizationException will be thrown at line 96. > Please point me out if I am wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3296) Running libhdfs tests in mac fails
[ https://issues.apache.org/jira/browse/HDFS-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-3296: Attachment: HDFS-3296.003.patch [~aw], I have filed HADOOP-13177 with a one-line patch for the Surefire configuration change to set {{DYLD_LIBRARY_PATH}}. We can commit that one to move ahead with Jenkins runs on OS X. I'm also attaching a rebased v003 patch here for just the change in hadoop-hdfs-native-client. We won't want to commit this one, because this will just make {{test_libhdfs_zerocopy_hdfs_static}} hang indefinitely. We need to get to the bottom of the domain socket issues on OS X before we can commit this one. > Running libhdfs tests in mac fails > -- > > Key: HDFS-3296 > URL: https://issues.apache.org/jira/browse/HDFS-3296 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Reporter: Amareshwari Sriramadasu >Assignee: Chris Nauroth > Attachments: HDFS-3296.001.patch, HDFS-3296.002.patch, > HDFS-3296.003.patch > > > Running "ant -Dcompile.c++=true -Dlibhdfs=true test-c++-libhdfs" on Mac fails > with following error: > {noformat} > [exec] dyld: lazy symbol binding failed: Symbol not found: > _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] dyld: Symbol not found: _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] > /Users/amareshwari.sr/workspace/hadoop/src/c++/libhdfs/tests/test-libhdfs.sh: > line 122: 39485 Trace/BPT trap: 5 CLASSPATH=$HADOOP_CONF_DIR:$CLASSPATH > LD_PRELOAD="$LIB_JVM_DIR/libjvm.so:$LIBHDFS_INSTALL_DIR/libhdfs.so:" > $LIBHDFS_BUILD_DIR/$HDFS_TEST > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3296) Running libhdfs tests in mac fails
[ https://issues.apache.org/jira/browse/HDFS-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-3296: Status: Open (was: Patch Available) > Running libhdfs tests in mac fails > -- > > Key: HDFS-3296 > URL: https://issues.apache.org/jira/browse/HDFS-3296 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Reporter: Amareshwari Sriramadasu >Assignee: Chris Nauroth > Attachments: HDFS-3296.001.patch, HDFS-3296.002.patch, > HDFS-3296.003.patch > > > Running "ant -Dcompile.c++=true -Dlibhdfs=true test-c++-libhdfs" on Mac fails > with following error: > {noformat} > [exec] dyld: lazy symbol binding failed: Symbol not found: > _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] dyld: Symbol not found: _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] > /Users/amareshwari.sr/workspace/hadoop/src/c++/libhdfs/tests/test-libhdfs.sh: > line 122: 39485 Trace/BPT trap: 5 CLASSPATH=$HADOOP_CONF_DIR:$CLASSPATH > LD_PRELOAD="$LIB_JVM_DIR/libjvm.so:$LIBHDFS_INSTALL_DIR/libhdfs.so:" > $LIBHDFS_BUILD_DIR/$HDFS_TEST > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3296) Running libhdfs tests in mac fails
[ https://issues.apache.org/jira/browse/HDFS-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289618#comment-15289618 ] Chris Nauroth commented on HDFS-3296: - Hi [~aw]. Thanks for setting up a nightly on OS X! Even just a basic build to catch compilation errors is a huge help. The patch here won't help with any compilation problems. This patch was just a small step towards fixing the libhdfs tests on Mac by setting up {{DYLD_LIBRARY_PATH}} with the right shared library dependencies. It isn't sufficient though, because we still have a compatibility problem around domain socket usage. This manifests as test failures in {{TestDomainSocketWatcher}} and unfortunately some of the libhdfs tests just hang. If the immediate goal is a basic build on OS X, then what I'm currently seeing on trunk is a compilation error in the container executor. This was introduced by patch YARN-4594, and I commented on the situation here: https://issues.apache.org/jira/browse/YARN-4594?focusedCommentId=15139679&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15139679 To summarize that discussion, if I can get the build environment to target the Mac OS X 10.10 SDK, then I suspect it would work. I wasn't able to follow up on it though. I'm curious if you have any thoughts on this. > Running libhdfs tests in mac fails > -- > > Key: HDFS-3296 > URL: https://issues.apache.org/jira/browse/HDFS-3296 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Reporter: Amareshwari Sriramadasu >Assignee: Chris Nauroth > Attachments: HDFS-3296.001.patch, HDFS-3296.002.patch > > > Running "ant -Dcompile.c++=true -Dlibhdfs=true test-c++-libhdfs" on Mac fails > with following error: > {noformat} > [exec] dyld: lazy symbol binding failed: Symbol not found: > _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] dyld: Symbol not found: _JNI_GetCreatedJavaVMs > [exec] Referenced from: > /Users/amareshwari.sr/workspace/hadoop/build/c++/Mac_OS_X-x86_64-64/lib/libhdfs.0.dylib > [exec] Expected in: flat namespace > [exec] > [exec] > /Users/amareshwari.sr/workspace/hadoop/src/c++/libhdfs/tests/test-libhdfs.sh: > line 122: 39485 Trace/BPT trap: 5 CLASSPATH=$HADOOP_CONF_DIR:$CLASSPATH > LD_PRELOAD="$LIB_JVM_DIR/libjvm.so:$LIBHDFS_INSTALL_DIR/libhdfs.so:" > $LIBHDFS_BUILD_DIR/$HDFS_TEST > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org