[jira] [Commented] (HDFS-13093) Quota set don't compute usage of unspecified storage policy content
[ https://issues.apache.org/jira/browse/HDFS-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348128#comment-16348128 ] liaoyuxiangqin commented on HDFS-13093: --- Thanks [~xyao] review this and give detail reason explanations and many solutions. As above you proposed three options which can resolve the problem of "hdfs dfs count" CLI return an inconsistent result, but because the incorrect remaining quota can't limit client continue write data to HDFS, so that the "hdfs dfs count" CLI may be return negative number for remaining quota after NN restarted. {noformat} SSD_QUOTA REM_SSD_QUOTA DISK_QUOTA REM_DISK_QUOTA ARCHIVE_QUOTA REM_ARCHIVE_QUOTA PROVIDED_QUOTA REM_PROVIDED_QUOTA PATHNAME none inf 6 G -3 G none inf none inf /hot {noformat} > Quota set don't compute usage of unspecified storage policy content > --- > > Key: HDFS-13093 > URL: https://issues.apache.org/jira/browse/HDFS-13093 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0 > Environment: hdfs: hadoop-3.1.0-SNAPSHOT > node:1 namenode, 9 datanodes >Reporter: liaoyuxiangqin >Priority: Major > Original Estimate: 48h > Remaining Estimate: 48h > > test as following steps: > 1. hdfs dfs -mkdir /hot > 2. hdfs dfs -put 1G.img /hot/file1 > 3. hdfs dfsadmin -setSpaceQuota 6442450944 -storageType DISK /hot > 4. hdfs storagepolicies -setStoragePolicy -path /hot -policy HOT > 5. hdfs dfs -count -q -h -v -t DISK /hot > {code:java} > SSD_QUOTA REM_SSD_QUOTA DISK_QUOTA REM_DISK_QUOTA ARCHIVE_QUOTA > REM_ARCHIVE_QUOTA PROVIDED_QUOTA REM_PROVIDED_QUOTA PATHNAME > none inf 6 G 6 G none inf none inf /hot{code} > In step5 i speculation the remaining quota is 3G(quota - 1G*3 replicas ),but > 6G actually. > if i change the turn of step3 and step4, then the remaining quota equal to > what I think 3G. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12897) getErasureCodingPolicy should handle .snapshot dir better
[ https://issues.apache.org/jira/browse/HDFS-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348129#comment-16348129 ] Hudson commented on HDFS-12897: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13597 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13597/]) HDFS-12897. getErasureCodingPolicy should handle .snapshot dir better. (xiao: rev ae2177d296a322d13708b85aaa8a971b8dcce128) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirErasureCodingOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestErasureCodingPolicies.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestErasureCodingPolicyWithSnapshot.java > getErasureCodingPolicy should handle .snapshot dir better > - > > Key: HDFS-12897 > URL: https://issues.apache.org/jira/browse/HDFS-12897 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding, hdfs, snapshots >Affects Versions: 3.0.0-alpha1, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: LiXin Ge >Priority: Major > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-12897.001.patch, HDFS-12897.002.patch, > HDFS-12897.003.patch, HDFS-12897.004.patch, HDFS-12897.005.patch > > > Scenario:- > --- > Operation on snapshot dir. > *EC policy* > bin> ./hdfs ec -getPolicy -path /dir/ > RS-3-2-1024k > bin> ./hdfs ec -getPolicy -path /dir/.snapshot/ > {{FileNotFoundException: Path not found: /dir/.snapshot}} > bin> ./hdfs dfs -ls /dir/.snapshot/ > Found 2 items > drwxr-xr-x - user group 0 2017-12-05 12:27 /dir/.snapshot/s1 > drwxr-xr-x - user group 0 2017-12-05 12:28 /dir/.snapshot/s2 > *Storagepolicies* > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/.snapshot/ > {{The storage policy of /dir/.snapshot/ is unspecified}} > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/ > The storage policy of /dir/: > BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], > replicationFallbacks=[]} > *Which is the correct behavior ?* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348115#comment-16348115 ] Hudson commented on HDFS-13060: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13596 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13596/]) HDFS-13060. Adding a BlacklistBasedTrustedChannelResolver for (xyao: rev af015c0b2359be317132e2cf35735429f4f34ea7) * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/CombinedIPList.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/package-info.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestBlackListBasedTrustedChannelResolver.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/BlackListBasedTrustedChannelResolver.java > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch, HDFS-13060.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12897) getErasureCodingPolicy should handle .snapshot dir better
[ https://issues.apache.org/jira/browse/HDFS-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12897: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.1 3.1.0 Status: Resolved (was: Patch Available) > getErasureCodingPolicy should handle .snapshot dir better > - > > Key: HDFS-12897 > URL: https://issues.apache.org/jira/browse/HDFS-12897 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding, hdfs, snapshots >Affects Versions: 3.0.0-alpha1, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: LiXin Ge >Priority: Major > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-12897.001.patch, HDFS-12897.002.patch, > HDFS-12897.003.patch, HDFS-12897.004.patch, HDFS-12897.005.patch > > > Scenario:- > --- > Operation on snapshot dir. > *EC policy* > bin> ./hdfs ec -getPolicy -path /dir/ > RS-3-2-1024k > bin> ./hdfs ec -getPolicy -path /dir/.snapshot/ > {{FileNotFoundException: Path not found: /dir/.snapshot}} > bin> ./hdfs dfs -ls /dir/.snapshot/ > Found 2 items > drwxr-xr-x - user group 0 2017-12-05 12:27 /dir/.snapshot/s1 > drwxr-xr-x - user group 0 2017-12-05 12:28 /dir/.snapshot/s2 > *Storagepolicies* > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/.snapshot/ > {{The storage policy of /dir/.snapshot/ is unspecified}} > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/ > The storage policy of /dir/: > BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], > replicationFallbacks=[]} > *Which is the correct behavior ?* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12897) getErasureCodingPolicy should handle .snapshot dir better
[ https://issues.apache.org/jira/browse/HDFS-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348107#comment-16348107 ] Xiao Chen commented on HDFS-12897: -- +1 on patch 5, failed tests are not related to the change here. Committed to trunk and branch-3.0. Thanks [~Harsha1206] for reporting the issue, [~GeLiXin] for the fix and [~rakeshr] for the review! > getErasureCodingPolicy should handle .snapshot dir better > - > > Key: HDFS-12897 > URL: https://issues.apache.org/jira/browse/HDFS-12897 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding, hdfs, snapshots >Affects Versions: 3.0.0-alpha1, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: LiXin Ge >Priority: Major > Attachments: HDFS-12897.001.patch, HDFS-12897.002.patch, > HDFS-12897.003.patch, HDFS-12897.004.patch, HDFS-12897.005.patch > > > Scenario:- > --- > Operation on snapshot dir. > *EC policy* > bin> ./hdfs ec -getPolicy -path /dir/ > RS-3-2-1024k > bin> ./hdfs ec -getPolicy -path /dir/.snapshot/ > {{FileNotFoundException: Path not found: /dir/.snapshot}} > bin> ./hdfs dfs -ls /dir/.snapshot/ > Found 2 items > drwxr-xr-x - user group 0 2017-12-05 12:27 /dir/.snapshot/s1 > drwxr-xr-x - user group 0 2017-12-05 12:28 /dir/.snapshot/s2 > *Storagepolicies* > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/.snapshot/ > {{The storage policy of /dir/.snapshot/ is unspecified}} > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/ > The storage policy of /dir/: > BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], > replicationFallbacks=[]} > *Which is the correct behavior ?* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12897) Path not found when getErasureCodingPolicy on a .snapshot dir
[ https://issues.apache.org/jira/browse/HDFS-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12897: - Summary: Path not found when getErasureCodingPolicy on a .snapshot dir (was: Path not found when we get the ec policy for a .snapshot dir) > Path not found when getErasureCodingPolicy on a .snapshot dir > - > > Key: HDFS-12897 > URL: https://issues.apache.org/jira/browse/HDFS-12897 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding, hdfs, snapshots >Affects Versions: 3.0.0-alpha1, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: LiXin Ge >Priority: Major > Attachments: HDFS-12897.001.patch, HDFS-12897.002.patch, > HDFS-12897.003.patch, HDFS-12897.004.patch, HDFS-12897.005.patch > > > Scenario:- > --- > Operation on snapshot dir. > *EC policy* > bin> ./hdfs ec -getPolicy -path /dir/ > RS-3-2-1024k > bin> ./hdfs ec -getPolicy -path /dir/.snapshot/ > {{FileNotFoundException: Path not found: /dir/.snapshot}} > bin> ./hdfs dfs -ls /dir/.snapshot/ > Found 2 items > drwxr-xr-x - user group 0 2017-12-05 12:27 /dir/.snapshot/s1 > drwxr-xr-x - user group 0 2017-12-05 12:28 /dir/.snapshot/s2 > *Storagepolicies* > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/.snapshot/ > {{The storage policy of /dir/.snapshot/ is unspecified}} > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/ > The storage policy of /dir/: > BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], > replicationFallbacks=[]} > *Which is the correct behavior ?* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12897) getErasureCodingPolicy should handle .snapshot dir better
[ https://issues.apache.org/jira/browse/HDFS-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12897: - Summary: getErasureCodingPolicy should handle .snapshot dir better (was: Path not found when getErasureCodingPolicy on a .snapshot dir) > getErasureCodingPolicy should handle .snapshot dir better > - > > Key: HDFS-12897 > URL: https://issues.apache.org/jira/browse/HDFS-12897 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding, hdfs, snapshots >Affects Versions: 3.0.0-alpha1, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: LiXin Ge >Priority: Major > Attachments: HDFS-12897.001.patch, HDFS-12897.002.patch, > HDFS-12897.003.patch, HDFS-12897.004.patch, HDFS-12897.005.patch > > > Scenario:- > --- > Operation on snapshot dir. > *EC policy* > bin> ./hdfs ec -getPolicy -path /dir/ > RS-3-2-1024k > bin> ./hdfs ec -getPolicy -path /dir/.snapshot/ > {{FileNotFoundException: Path not found: /dir/.snapshot}} > bin> ./hdfs dfs -ls /dir/.snapshot/ > Found 2 items > drwxr-xr-x - user group 0 2017-12-05 12:27 /dir/.snapshot/s1 > drwxr-xr-x - user group 0 2017-12-05 12:28 /dir/.snapshot/s2 > *Storagepolicies* > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/.snapshot/ > {{The storage policy of /dir/.snapshot/ is unspecified}} > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/ > The storage policy of /dir/: > BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], > replicationFallbacks=[]} > *Which is the correct behavior ?* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13060: -- Component/s: security datanode > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch, HDFS-13060.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13060: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) Thanks [~ajayydv] for the contribution. I've committed the patch to the trunk and branch-3.0. > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch, HDFS-13060.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13095) Improve slice tree traversal implementation
[ https://issues.apache.org/jira/browse/HDFS-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348072#comment-16348072 ] Xiao Chen commented on HDFS-13095: -- I think it's also arguable whether snapshots should be supported for sps. What's the use case where we need to support snapshot on sps? The major consideration for re-encryption is snapshots are supposed to be immutable (at least not to further complicate the semantic), and simplicity of code / test / support. > Improve slice tree traversal implementation > --- > > Key: HDFS-13095 > URL: https://issues.apache.org/jira/browse/HDFS-13095 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > > This task is to refine the existing slice tree traversal logic in > [ReencryptionHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java#L74] > class. > Please refer Daryn's review comments > {quote}*FSTreeTraverser* > I need to study this more but I have grave concerns this will work correctly > in a mutating namesystem. Ex. renames and deletes esp. in combination with > snapshots. Looks like there's a chance it will go off in the weeds when > backtracking out of a renamed directory. > traverseDir may NPE if it's traversing a tree in a snapshot and one of the > ancestors is deleted. > Not sure why it's bothering to re-check permissions during the crawl. The > storage policy is inherited by the entire tree, regardless of whether the > sub-contents are accessible. The effect of this patch is the storage policy > is enforced for all readable files, non-readable violate the new storage > policy, new non-readable will conform to the new storage policy. Very > convoluted. Since new files will conform, should just process the entire > tree. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-10453: --- Fix Version/s: 2.7.6 > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12897) Path not found when we get the ec policy for a .snapshot dir
[ https://issues.apache.org/jira/browse/HDFS-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348058#comment-16348058 ] Rakesh R commented on HDFS-12897: - Thanks [~GeLiXin], good unit test cases. +1 latest patch looks good to me. {quote}sure, I will create a jira soon and try to fix it. {quote} Makes sense, good to handle separately. > Path not found when we get the ec policy for a .snapshot dir > > > Key: HDFS-12897 > URL: https://issues.apache.org/jira/browse/HDFS-12897 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding, hdfs, snapshots >Affects Versions: 3.0.0-alpha1, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: LiXin Ge >Priority: Major > Attachments: HDFS-12897.001.patch, HDFS-12897.002.patch, > HDFS-12897.003.patch, HDFS-12897.004.patch, HDFS-12897.005.patch > > > Scenario:- > --- > Operation on snapshot dir. > *EC policy* > bin> ./hdfs ec -getPolicy -path /dir/ > RS-3-2-1024k > bin> ./hdfs ec -getPolicy -path /dir/.snapshot/ > {{FileNotFoundException: Path not found: /dir/.snapshot}} > bin> ./hdfs dfs -ls /dir/.snapshot/ > Found 2 items > drwxr-xr-x - user group 0 2017-12-05 12:27 /dir/.snapshot/s1 > drwxr-xr-x - user group 0 2017-12-05 12:28 /dir/.snapshot/s2 > *Storagepolicies* > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/.snapshot/ > {{The storage policy of /dir/.snapshot/ is unspecified}} > bin> ./hdfs storagepolicies -getStoragePolicy -path /dir/ > The storage policy of /dir/: > BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], > replicationFallbacks=[]} > *Which is the correct behavior ?* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348057#comment-16348057 ] Xiaoyu Yao commented on HDFS-13060: --- +1 for v3 patch. I'll fix the minor checkstyle comment issue upon commit. > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch, HDFS-13060.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348056#comment-16348056 ] He Xiaoqiao commented on HDFS-10453: [~ajayydv] Thank you for your suggestion, I just attach new patch [#HDFS-10453-branch-2.7.006.patch] for branch-2.7 and first check if {{block}} is abandoned or reopen for append, thus it can avoid choose target fail for deleted blocks endless loop. FYI. please correct me if i am wrong, Thanks again. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-13068) RBF: Add router admin option to manage safe mode
[ https://issues.apache.org/jira/browse/HDFS-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348053#comment-16348053 ] genericqa commented on HDFS-13068: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 5s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13068 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908712/HDFS-13068.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux b41cacb647f2 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0bee384 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/22914/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/22914/testReport/ | | Max. process+thread count | 2853 (vs. ulimit of 5000) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Updated] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-10453: --- Attachment: HDFS-10453-branch-2.7.006.patch > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13093) Quota set don't compute usage of unspecified storage policy content
[ https://issues.apache.org/jira/browse/HDFS-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348042#comment-16348042 ] Xiaoyu Yao commented on HDFS-13093: --- Thanks [~liaoyuxiangqin] for reporting this. This could relate to HDFS-8898 which returns a cached quota usage with the new getQuotaUsage() API from INode directly without recursive traversal and recalculate like getContentSummary() API. Before HDFS-8898, the "hdfs dfs -count" CLI uses getContentSummary(), which is expensive because it always walks the whole sub-tree to recalculate quota and usage. This guarantees the correctness no matter the order of step 3 and 4. After HDFS-8898, we switch to use the getQuotaUsage() API for "hdfs dfs count" CLI, the cached INode quota usage for storage type will be 0 if you don't set storage policy before storage type quota. This is because the storage type usage is strongly tied to the storage policy. If there is no storage policy set first, we will not be able to determine the quota usage for different storage type. As a result, 0 will be the default in this case. I believe this is a transient incorrectness that can be fixed by one of the following three options. 1. This is a corner case. Document the procedure to set storage policy first before set storage type quota. Otherwise, getQuotaUsage() API and "hdfs dfs count" CLI will return an inconsistent result. No fix needed. 2. The cached quota usage will be recalculated correctly any way upon next NN restart, no fix needed. 3. If we really want to allow setting storage type quota before setting storage policy, we can provide an option for "hdfs dfs count" CLI to use getContentSummary() to report the accurate usage. cc: [~mingma], [~kihwal] for additional comments. > Quota set don't compute usage of unspecified storage policy content > --- > > Key: HDFS-13093 > URL: https://issues.apache.org/jira/browse/HDFS-13093 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0 > Environment: hdfs: hadoop-3.1.0-SNAPSHOT > node:1 namenode, 9 datanodes >Reporter: liaoyuxiangqin >Priority: Major > Original Estimate: 48h > Remaining Estimate: 48h > > test as following steps: > 1. hdfs dfs -mkdir /hot > 2. hdfs dfs -put 1G.img /hot/file1 > 3. hdfs dfsadmin -setSpaceQuota 6442450944 -storageType DISK /hot > 4. hdfs storagepolicies -setStoragePolicy -path /hot -policy HOT > 5. hdfs dfs -count -q -h -v -t DISK /hot > {code:java} > SSD_QUOTA REM_SSD_QUOTA DISK_QUOTA REM_DISK_QUOTA ARCHIVE_QUOTA > REM_ARCHIVE_QUOTA PROVIDED_QUOTA REM_PROVIDED_QUOTA PATHNAME > none inf 6 G 6 G none inf none inf /hot{code} > In step5 i speculation the remaining quota is 3G(quota - 1G*3 replicas ),but > 6G actually. > if i change the turn of step3 and step4, then the remaining quota equal to > what I think 3G. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13095) Improve slice tree traversal implementation
[ https://issues.apache.org/jira/browse/HDFS-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348021#comment-16348021 ] Rakesh R commented on HDFS-13095: - Thank you [~xiaochen] for the quick reply. I understand that {{snapshot}} handling is not required in edek logic, so we will take care this condition specifically to sps. > Improve slice tree traversal implementation > --- > > Key: HDFS-13095 > URL: https://issues.apache.org/jira/browse/HDFS-13095 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > > This task is to refine the existing slice tree traversal logic in > [ReencryptionHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java#L74] > class. > Please refer Daryn's review comments > {quote}*FSTreeTraverser* > I need to study this more but I have grave concerns this will work correctly > in a mutating namesystem. Ex. renames and deletes esp. in combination with > snapshots. Looks like there's a chance it will go off in the weeds when > backtracking out of a renamed directory. > traverseDir may NPE if it's traversing a tree in a snapshot and one of the > ancestors is deleted. > Not sure why it's bothering to re-check permissions during the crawl. The > storage policy is inherited by the entire tree, regardless of whether the > sub-contents are accessible. The effect of this patch is the storage policy > is enforced for all readable files, non-readable violate the new storage > policy, new non-readable will conform to the new storage policy. Very > convoluted. Since new files will conform, should just process the entire > tree. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12512) RBF: Add WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated HDFS-12512: --- Status: Patch Available (was: Open) > RBF: Add WebHDFS > > > Key: HDFS-12512 > URL: https://issues.apache.org/jira/browse/HDFS-12512 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Íñigo Goiri >Assignee: Wei Yan >Priority: Major > Labels: RBF > Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, > HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch > > > The Router currently does not support WebHDFS. It needs to implement > something similar to {{NamenodeWebHdfsMethods}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12512) RBF: Add WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated HDFS-12512: --- Status: Open (was: Patch Available) > RBF: Add WebHDFS > > > Key: HDFS-12512 > URL: https://issues.apache.org/jira/browse/HDFS-12512 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Íñigo Goiri >Assignee: Wei Yan >Priority: Major > Labels: RBF > Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, > HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch > > > The Router currently does not support WebHDFS. It needs to implement > something similar to {{NamenodeWebHdfsMethods}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10285) Storage Policy Satisfier in Namenode
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347472#comment-16347472 ] Rakesh R edited comment on HDFS-10285 at 2/1/18 5:07 AM: - Thank you very much [~daryn] for your time and useful comments/thoughts. My reply follows, please take a look at it. +Comment-1)+ {quote}BlockManager Shouldn’t spsMode be volatile? Although I question why it’s here. {quote} [Rakesh's reply] Agreed, will do the changes. +Comment-2)+ {quote}Adding SPS methods to this class implies an unexpected coupling of the SPS service to the block manager. Please move them out to prove it’s not tightly coupled. {quote} [Rakesh's reply] Agreed. We are planning to create {{StoragePolicySatisfyManager}} and keep all the related apis over there. +Comment-3)+ {quote}BPServiceActor Is it actually sending back the moved blocks? Aren’t IBRs sufficient? BlockStorageMovementCommand/BlocksStorageMoveAttemptFinished Again, not sure that a new DN command is necessary, and why does it specifically report back successful moves instead of relying on IBRs? I would actually expect the DN to be completely ignorant of a SPS move vs any other move. {quote} [Rakesh's reply] We have explored IBR approach and the required code changes. If sps rely on this, then it would requires an *extra* check to know whether this new block has occurred due to sps move or others, which will be quite often considering more other ops compares to SPSBlockMove op. Currently, it is sending back {{blksMovementsFinished}} list separately, each movement finished block can be easily/quickly recognized by the Satisfier in NN side and updates the tracking details. If you agree this *extra* check is not an issue then we would be happy to implement the IBR approach. Secondly, BlockStorageMovementCommand is added to carry the block vs src/target pairs which is much needed for the move operation and we tried to decouple sps code using this command. +Comment-4)+ {quote}DataNode Why isn’t this just a block transfer? How is transferring between DNs any different than across storages? {quote} [Rakesh's reply] I could see Mover is also using {{REPLACE_BLOCK}} call and we just followed same approach in sps also. Am I missing anything here? +Comment-5)+ {quote}DatanodeDescriptor Why use a synchronized linked list to offer/poll instead of BlockingQueue? {quote} [Rakesh's reply] Agreed, will do the changes. +Comment-6)+ {quote}DatanodeManager I know it’s configurable, but realistically, when would you ever want to give storage movement tasks equal footing with under-replication? Is there really a use case for not valuing durability? {quote} [Rakesh's reply] We don't have any particular use case, though. One scenario we thought is, user configured SSDs and filled up quickly. In that case, there could be situations that cleaning-up is considered as a high priority. If you feel, this is not a real case then I'm OK to remove this config and SPS will use only the remaining slots always. +Comment-7)+ {quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport is already a very bad method that should be avoided for anything but jmx – even then it’s a concern. I eliminated calls to it years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn lock for an excessive length of time. Beyond that, the response is going to be pretty large and tagging all the storage reports is not going to be cheap. verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its storageMap? Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned earlier, this is NOT cheap. The SPS should be able to operate on a fuzzy/cached state of the world. Then it gets another datanode report to determine the number of live nodes to decide if it should sleep before processing the next path. The number of nodes from the prior cached view of the world should suffice. {quote} [Rakesh's reply] Good point. Sometime back Uma and me thought about cache part. Actually, we depend on this api for the data node storage types and remaining space details. I think, it requires two different mechanisms for internal and external sps. For internal, how about sps can directly refer {{DatanodeManager#datanodeMap}} for every file. For the external, IIUC you are suggesting a cache mechanism. How about, get storageReport once and cache at ExternalContext. This local cache can be refreshed periodically. Say, After every 5mins (just an arbitrary number I put here, if you have some period in mind please suggest), when getDatanodeStorageReport called, cache can be treated as expired and fetch freshly. Within 5mins it can use from cache. Does this make sense to you? Another point we thought of is, right now for checking whether
[jira] [Updated] (HDFS-13092) Reduce verbosity for ThrottledAsyncChecker.java:schedule
[ https://issues.apache.org/jira/browse/HDFS-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13092: - Affects Version/s: 3.0.0 Priority: Minor (was: Major) > Reduce verbosity for ThrottledAsyncChecker.java:schedule > > > Key: HDFS-13092 > URL: https://issues.apache.org/jira/browse/HDFS-13092 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Minor > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-13092.001.patch > > > ThrottledAsyncChecker.java:schedule prints a log message every time a disk > check is scheduled. However if the previous check was triggered lesser than > the frequency at "minMsBetweenChecks" then the task is not scheduled. This > jira will reduce the log verbosity by printing the message only when the task > will be scheduled. > {code} > 2018-01-29 00:51:44,467 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,470 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,477 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,480 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,486 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,501 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/13/hadoop/hdfs/data/current > 2018-01-29 00:51:44,507 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,533 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,536 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,543 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,544 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,548 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/3/hadoop/hdfs/data/current > 2018-01-29 00:51:44,549 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/5/hadoop/hdfs/data/current > 2018-01-29 00:51:44,550 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/6/hadoop/hdfs/data/current > 2018-01-29 00:51:44,551 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,554 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/9/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/14/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) -
[jira] [Updated] (HDFS-13092) Reduce verbosity for ThrottledAsyncChecker.java:schedule
[ https://issues.apache.org/jira/browse/HDFS-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13092: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.1 3.1.0 Status: Resolved (was: Patch Available) Seems this JIRA can be resolved. > Reduce verbosity for ThrottledAsyncChecker.java:schedule > > > Key: HDFS-13092 > URL: https://issues.apache.org/jira/browse/HDFS-13092 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-13092.001.patch > > > ThrottledAsyncChecker.java:schedule prints a log message every time a disk > check is scheduled. However if the previous check was triggered lesser than > the frequency at "minMsBetweenChecks" then the task is not scheduled. This > jira will reduce the log verbosity by printing the message only when the task > will be scheduled. > {code} > 2018-01-29 00:51:44,467 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,470 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,477 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,480 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,486 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,501 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/13/hadoop/hdfs/data/current > 2018-01-29 00:51:44,507 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,533 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,536 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,543 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,544 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,548 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/3/hadoop/hdfs/data/current > 2018-01-29 00:51:44,549 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/5/hadoop/hdfs/data/current > 2018-01-29 00:51:44,550 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/6/hadoop/hdfs/data/current > 2018-01-29 00:51:44,551 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,554 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/9/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/14/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for >
[jira] [Commented] (HDFS-7134) Replication count for a block should not update till the blocks have settled on Datanodes
[ https://issues.apache.org/jira/browse/HDFS-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347959#comment-16347959 ] liaoyuxiangqin commented on HDFS-7134: -- [~gurmukhd] I have test in hadoop 3.1.0, this issue no longer appear. > Replication count for a block should not update till the blocks have settled > on Datanodes > - > > Key: HDFS-7134 > URL: https://issues.apache.org/jira/browse/HDFS-7134 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 1.2.1, 2.6.0, 2.7.3 > Environment: Linux nn1.cluster1.com 2.6.32-431.20.3.el6.x86_64 #1 SMP > Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > [hadoop@nn1 conf]$ cat /etc/redhat-release > CentOS release 6.5 (Final) >Reporter: gurmukh singh >Priority: Critical > Labels: HDFS > > The count for the number of replica's for a block should not change till the > blocks have settled on the datanodes. > Test Case: > Hadoop Cluster with 1 namenode and 3 datanodes. > nn1.cluster1.com(192.168.1.70) > dn1.cluster1.com(192.168.1.72) > dn2.cluster1.com(192.168.1.73) > dn3.cluster1.com(192.168.1.74) > Cluster up and running fine with replication set to "1" for parameter > "dfs.replication on all nodes" > > dfs.replication > 1 > > To reduce the wait time, have reduced the dfs.heartbeat and recheck > parameters. > on datanode2 (192.168.1.72) > [hadoop@dn2 ~]$ hadoop fs -Ddfs.replication=2 -put from_dn2 / > [hadoop@dn2 ~]$ hadoop fs -ls /from_dn2 > Found 1 items > -rw-r--r-- 2 hadoop supergroup 17 2014-09-23 13:33 /from_dn2 > On Namenode > === > As expected, copy was done from datanode2, one copy will go locally. > [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations > FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 > 13:53:16 IST 2014 > /from_dn2 17 bytes, 1 block(s): OK > 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, > 192.168.1.73:50010] > Can see the blocks on the data nodes disks as well under the "current" > directory. > Now, shutdown datanode2(192.168.1.73) and as expected block moves to another > datanode to maintain a replication of 2 > [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations > FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 > 13:54:21 IST 2014 > /from_dn2 17 bytes, 1 block(s): OK > 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, > 192.168.1.72:50010] > But, now if i bring back the datanode2, and although the namenode see that > this block is at 3 places now and fires a invalidate command for > datanode1(192.168.1.72) but the replication on the namenode is bumped to 3 > immediately. > [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations > FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 > 13:56:12 IST 2014 > /from_dn2 17 bytes, 1 block(s): OK > 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, > 192.168.1.72:50010, 192.168.1.73:50010] > on Datanode1 - The invalidate command has been fired immediately and the > block deleted. > = > 2014-09-23 13:54:17,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: > /192.168.1.72:50010 > 2014-09-23 13:54:17,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Received blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: > /192.168.1.72:50010 size 17 > 2014-09-23 13:55:28,720 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Scheduling blk_8132629811771280764_1175 file > /space/disk1/current/blk_8132629811771280764 for deletion > 2014-09-23 13:55:28,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Deleted blk_8132629811771280764_1175 at file > /space/disk1/current/blk_8132629811771280764 > The namenode still shows 3 replica's. even if one has been deleted, even > after more then 30 mins. > [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations > FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 > 14:21:27 IST 2014 > /from_dn2 17 bytes, 1 block(s): OK > 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, > 192.168.1.72:50010, 192.168.1.73:50010] > This could be a dangerous, if someone remove or other 2 datanodes fail. > On Datanode 1 > = > Before, the datanode1 is brought back > [hadoop@dn1 conf]$ ls -l /space/disk*/current > /space/disk1/current: > total 28 > -rw-rw-r-- 1 hadoop hadoop 13 Sep 21 09:09 blk_2278001646987517832 > -rw-rw-r-- 1 hadoop hadoop 11 Sep 21 09:09 blk_2278001646987517832_1171.meta > -rw-rw-r-- 1 hadoop hadoop 17 Sep 23 13:54 blk_8132629811771280764 > -rw-rw-r-- 1
[jira] [Commented] (HDFS-13043) RBF: Expose the state of the Routers in the federation
[ https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347949#comment-16347949 ] Yiqun Lin commented on HDFS-13043: -- Failed unit tests are not related. LGTM, +1. [~elgoiri], as the work for the tracking Router state all be done in trunk, what's the next planning in RBF phase2? Subcluster Rebalaner or others? > RBF: Expose the state of the Routers in the federation > -- > > Key: HDFS-13043 > URL: https://issues.apache.org/jira/browse/HDFS-13043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13043.000.patch, HDFS-13043.001.patch, > HDFS-13043.002.patch, HDFS-13043.003.patch, HDFS-13043.004.patch, > HDFS-13043.005.patch, HDFS-13043.006.patch, HDFS-13043.007.patch, > HDFS-13043.008.patch, HDFS-13043.009.patch, router-info.png > > > The Router should expose the state of the other Routers in the federation > through a user UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13068) RBF: Add router admin option to manage safe mode
[ https://issues.apache.org/jira/browse/HDFS-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13068: - Attachment: HDFS-13068.003.patch > RBF: Add router admin option to manage safe mode > > > Key: HDFS-13068 > URL: https://issues.apache.org/jira/browse/HDFS-13068 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13068.001.patch, HDFS-13068.002.patch, > HDFS-13068.003.patch > > > HDFS-13044 adds a safe mode to reject requests. We should have an option to > manually set the Router into safe mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13068) RBF: Add router admin option to manage safe mode
[ https://issues.apache.org/jira/browse/HDFS-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347945#comment-16347945 ] Yiqun Lin commented on HDFS-13068: -- Thanks for the review, [~elgoiri]. Attach the update patch to address the comments. > RBF: Add router admin option to manage safe mode > > > Key: HDFS-13068 > URL: https://issues.apache.org/jira/browse/HDFS-13068 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13068.001.patch, HDFS-13068.002.patch, > HDFS-13068.003.patch > > > HDFS-13044 adds a safe mode to reject requests. We should have an option to > manually set the Router into safe mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347931#comment-16347931 ] genericqa commented on HDFS-13060: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 43s{color} | {color:orange} root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 9s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 33s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 2m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}221m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestRaceWhenRelogin | | | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits | | | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13060 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908673/HDFS-13060.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b5b3365fff81 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-12997) Move logging to slf4j in BlockPoolSliceStorage and Storage
[ https://issues.apache.org/jira/browse/HDFS-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347903#comment-16347903 ] Akira Ajisaka commented on HDFS-12997: -- +1, thanks Ajay. > Move logging to slf4j in BlockPoolSliceStorage and Storage > --- > > Key: HDFS-12997 > URL: https://issues.apache.org/jira/browse/HDFS-12997 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-12997.001.patch, HDFS-12997.002.patch, > HDFS-12997.003.patch, HDFS-12997.004.patch, HDFS-12997.005.patch, > HDFS-12997.006.patch, HDFS-12997.007.patch > > > Move logging to slf4j in BlockPoolSliceStorage and Storage classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Huo updated HDFS-13056: -- Attachment: hdfs-file-composite-crc32-v3.pdf > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, > Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, > hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347851#comment-16347851 ] Dennis Huo commented on HDFS-13056: --- Uploaded initial end-to-end working draft against trunk which supports CRC32/CRC32C and partial file prefixes of arbitrary bytes-per-crc or blocksize and across replicated vs striped encodings as well. Still a TODO to support the striped-reconstruction path, and adding stripe support made everything a lot messier so some refactoring is in order. Also, unittests still pending, but manual testing in a real setup works: {code:java} $ hdfs dfs -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmp/random-crctest-default1.dat $ hdfs dfs -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmp/random-crctest-default2.dat $ hdfs dfs -Ddfs.bytes-per-checksum=1024 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmp/random-crctest-bpc1024.dat $ hdfs dfs -Ddfs.blocksize=67108864 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmp/random-crctest-blocksize64mb.dat $ hdfs dfs -cp gs://hadoop-cloud-dev-dhuo/random-crctest-unaligned.dat hdfs:///tmp/random-crctest-unaligned1.dat $ hdfs dfs -Ddfs.bytes-per-checksum=1024 -cp gs://hadoop-cloud-dev-dhuo/random-crctest-unaligned.dat hdfs:///tmp/random-crctest-unaligned2.dat $ hdfs dfs -Ddfs.checksum.type=CRC32 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmp/random-crctest-gzipcrc32-1.dat $ hdfs dfs -Ddfs.checksum.type=CRC32 -Ddfs.bytes-per-checksum=1024 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmp/random-crctest-gzipcrc32-2.dat $ hdfs dfs -mkdir hdfs:///tmpec $ hdfs ec -enablePolicy -policy XOR-2-1-1024k $ hdfs ec -setPolicy -path hdfs:///tmpec -policy XOR-2-1-1024k $ hdfs dfs -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmpec/random-crctest-default1.dat $ hdfs dfs -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmpec/random-crctest-default2.dat $ hdfs dfs -Ddfs.bytes-per-checksum=1024 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmpec/random-crctest-bpc1024.dat $ hdfs dfs -Ddfs.blocksize=67108864 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmpec/random-crctest-blocksize64mb.dat $ hdfs dfs -cp gs://hadoop-cloud-dev-dhuo/random-crctest-unaligned.dat hdfs:///tmpec/random-crctest-unaligned1.dat $ hdfs dfs -Ddfs.bytes-per-checksum=1024 -cp gs://hadoop-cloud-dev-dhuo/random-crctest-unaligned.dat hdfs:///tmpec/random-crctest-unaligned2.dat $ hdfs dfs -Ddfs.checksum.type=CRC32 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmpec/random-crctest-gzipcrc32-1.dat $ hdfs dfs -Ddfs.checksum.type=CRC32 -Ddfs.bytes-per-checksum=1024 -cp gs://hadoop-cloud-dev-dhuo/random-crctest.dat hdfs:///tmpec/random-crctest-gzipcrc32-2.dat $ hdfs dfs -checksum hdfs:///tmp/random-crctest*.dat hdfs:///tmp/random-crctest-blocksize64mb.datMD5-of-131072MD5-of-512CRC32C 02028baa940ef6ed21fb4bd6224ce917d127 hdfs:///tmp/random-crctest-bpc1024.dat MD5-of-131072MD5-of-1024CRC32C 0402930b0d7ad333786a839b044ed8d18d2d hdfs:///tmp/random-crctest-default1.dat MD5-of-262144MD5-of-512CRC32C 0204c0baeeacbc4b5a3c8af5152944fe2d79 hdfs:///tmp/random-crctest-default2.dat MD5-of-262144MD5-of-512CRC32C 0204c0baeeacbc4b5a3c8af5152944fe2d79 hdfs:///tmp/random-crctest-gzipcrc32-1.dat MD5-of-262144MD5-of-512CRC32 020449d52fdd25aa08559e20536acc34d51d hdfs:///tmp/random-crctest-gzipcrc32-2.dat MD5-of-131072MD5-of-1024CRC32 04021d5468ea4093ddb3741790b8dc3b9a57 hdfs:///tmp/random-crctest-unaligned1.dat MD5-of-262144MD5-of-512CRC32C 02040da665dadca0df00456206f234d5f8b0 hdfs:///tmp/random-crctest-unaligned2.dat MD5-of-131072MD5-of-1024CRC32C 040227c2198f48224a0ddb92c4dc4addd28b $ hdfs dfs -checksum hdfs:///tmpec/random-crctest*.dat 18/02/01 01:15:54 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.2-hadoop2 hdfs:///tmpec/random-crctest-blocksize64mb.dat MD5-of-131072MD5-of-512CRC32C 02025b54faaa368ed81b25984a746c767d39 hdfs:///tmpec/random-crctest-bpc1024.datMD5-of-131072MD5-of-1024CRC32C 040289a128b1e1995256bdb34fb95720dafc hdfs:///tmpec/random-crctest-default1.dat MD5-of-262144MD5-of-512CRC32C 020407ee18e8f4909647adf085ec0f464d1a hdfs:///tmpec/random-crctest-default2.dat MD5-of-262144MD5-of-512CRC32C 020407ee18e8f4909647adf085ec0f464d1a hdfs:///tmpec/random-crctest-gzipcrc32-1.datMD5-of-262144MD5-of-512CRC32 0204d79ad1fa00fad2f0adb18f49f2e90bb3 hdfs:///tmpec/random-crctest-gzipcrc32-2.datMD5-of-131072MD5-of-1024CRC32 0402126ac7bc467c59942734bd8ebf690440
[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Huo updated HDFS-13056: -- Attachment: HDFS-13056.001.patch > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, > Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, > hdfs-file-composite-crc32-v2.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13062) Provide support for JN to use separate journal disk per namespace
[ https://issues.apache.org/jira/browse/HDFS-13062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347786#comment-16347786 ] genericqa commented on HDFS-13062: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | | | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13062 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908648/HDFS-13062.06.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3d0f1b19df7f 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3ce2190 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/22911/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/22911/testReport/ | | Max. process+thread count | 3986 (vs. ulimit of 5000) | | modules | C:
[jira] [Commented] (HDFS-13098) RBF: Datanodes interacting with Routers
[ https://issues.apache.org/jira/browse/HDFS-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347783#comment-16347783 ] Íñigo Goiri commented on HDFS-13098: Currently, we do the assignment of DNs to subclusters using external tools that generate {{hdfs-site.xml}}. These tools could be moved into the RBF infrastructure. I had some initial conversation about this topic with [~curino]. One of his concerns was to avoid passing every single heartbeat through the Routers. To solve this, we could make the DNs to just register the first time through the Router and afterwards switch to heartbeating into the actual Namenodes. I think this could also apply to YARN federation and we could share some infrastructure; [~subru], [~giovanni.fumarola], any thoughts here? Currently, this is an initial brainstorming and not much design done yet so feedback is welcomed. > RBF: Datanodes interacting with Routers > --- > > Key: HDFS-13098 > URL: https://issues.apache.org/jira/browse/HDFS-13098 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Priority: Major > > Datanodes talk to particular Namenodes. We could use the Router > infrastructure for the Datanodes to register/heartbeating into them and the > Routers would forward this to particular Namenodes. This would make the > assignment of Datanodes to subclusters potentially more dynamic. > The implementation would potentially include: > * Router to implement part of DatanodeProtocol > * Forwarding DN messages into Routers > * Policies to assign datanodes to subclusters > * Datanodes to make blockpool configuration dynamic -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13098) RBF: Datanodes interacting with Routers
Íñigo Goiri created HDFS-13098: -- Summary: RBF: Datanodes interacting with Routers Key: HDFS-13098 URL: https://issues.apache.org/jira/browse/HDFS-13098 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Íñigo Goiri Datanodes talk to particular Namenodes. We could use the Router infrastructure for the Datanodes to register/heartbeating into them and the Routers would forward this to particular Namenodes. This would make the assignment of Datanodes to subclusters potentially more dynamic. The implementation would potentially include: * Router to implement part of DatanodeProtocol * Forwarding DN messages into Routers * Policies to assign datanodes to subclusters * Datanodes to make blockpool configuration dynamic -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)
[ https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347764#comment-16347764 ] genericqa commented on HDFS-13097: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} HDFS-10285 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} HDFS-10285 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} HDFS-10285 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 14 new + 1057 unchanged - 0 fixed = 1071 total (was 1057) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Synchronization performed on java.util.concurrent.BlockingQueue in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlocksToMoveStorage(BlockStorageMovementCommand$BlockMovingInfo) At DatanodeDescriptor.java:org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlocksToMoveStorage(BlockStorageMovementCommand$BlockMovingInfo) At DatanodeDescriptor.java:[line 1087] | | | Synchronization performed on java.util.concurrent.BlockingQueue in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.getBlocksToMoveStorages(int) At DatanodeDescriptor.java:org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.getBlocksToMoveStorages(int) At DatanodeDescriptor.java:[line 1109] | | Failed junit tests | hadoop.hdfs.TestDistributedFileSystemWithECFile | | | hadoop.hdfs.server.namenode.TestQuotaByStorageType | | | hadoop.hdfs.TestReadWhileWriting | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | | |
[jira] [Comment Edited] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347753#comment-16347753 ] Ajay Kumar edited comment on HDFS-10453 at 1/31/18 11:12 PM: - Hi [~hexiaoqiao], Thanks for working on this. Patch looks good to me. One minor suggestion, I think we can simplify the patch a bit my merging the new check {{if (rw.block.getNumBytes() == BlockCommand.NO_ACK)}} with {{if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock(}} inside {{BlockManager#computeReplicationWorkForBlocks}} L1501. was (Author: ajayydv): Hi [~hexiaoqiao], Thanks for working on this. Patch looks good to me. One minor suggestion, I think we can simplify the patch a bit my merging the new check {{if (rw.block.getNumBytes() == BlockCommand.NO_ACK)}}{{ with \{{if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock( inside {{computeReplicationWorkForBlocks}} L1501. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck
[jira] [Comment Edited] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347753#comment-16347753 ] Ajay Kumar edited comment on HDFS-10453 at 1/31/18 11:11 PM: - Hi [~hexiaoqiao], Thanks for working on this. Patch looks good to me. One minor suggestion, I think we can simplify the patch a bit my merging the new check {{if (rw.block.getNumBytes() == BlockCommand.NO_ACK)}}{{ with \{{if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock( inside {{computeReplicationWorkForBlocks}} L1501. was (Author: ajayydv): Hi [~hexiaoqiao], Thanks for working on this. Patch looks good to me. One minor suggestion, I think we can simplify the patch a bit my merging the new check \{{if (rw.block.getNumBytes() == BlockCommand.NO_ACK)}}{{ with {{}}if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock({{inside \{{computeReplicationWorkForBlocks}} L1501. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347753#comment-16347753 ] Ajay Kumar commented on HDFS-10453: --- Hi [~hexiaoqiao], Thanks for working on this. Patch looks good to me. One minor suggestion, I think we can simplify the patch a bit my merging the new check \{{if (rw.block.getNumBytes() == BlockCommand.NO_ACK)}}{{ with {{}}if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock({{inside \{{computeReplicationWorkForBlocks}} L1501. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-13043) RBF: Expose the state of the Routers in the federation
[ https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347752#comment-16347752 ] genericqa commented on HDFS-13043: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 38s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.TestDFSClientRetries | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13043 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908652/HDFS-13043.009.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux e60a4d8092ec 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3ce2190 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/22912/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/22912/testReport/ | | Max. process+thread count | 4713 (vs. ulimit of 5000) | | modules | C:
[jira] [Commented] (HDFS-13062) Provide support for JN to use separate journal disk per namespace
[ https://issues.apache.org/jira/browse/HDFS-13062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347747#comment-16347747 ] Hanisha Koneru commented on HDFS-13062: --- Thanks [~bharatviswa]. +1 for patch v06. Test failures are unrelated. FindBugs error is inaccurate - we do use {{validateAndCreateJournalDir}}. > Provide support for JN to use separate journal disk per namespace > - > > Key: HDFS-13062 > URL: https://issues.apache.org/jira/browse/HDFS-13062 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13062.00.patch, HDFS-13062.01.patch, > HDFS-13062.02.patch, HDFS-13062.03.patch, HDFS-13062.04.patch, > HDFS-13062.05.patch, HDFS-13062.06.patch > > > In Federated HA setup, provide support for separate journal disk for each > namespace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347728#comment-16347728 ] Ajay Kumar commented on HDFS-13060: --- added package-info file in patch v3 to address checkstyle issue. > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch, HDFS-13060.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13060: -- Attachment: HDFS-13060.003.patch > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch, HDFS-13060.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13073) Cleanup code in InterQJournalProtocol.proto
[ https://issues.apache.org/jira/browse/HDFS-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347712#comment-16347712 ] genericqa commented on HDFS-13073: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}140m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}206m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13073 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908631/HDFS-13073.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 5287a3a05d0e 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3ce2190 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/22909/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-13062) Provide support for JN to use separate journal disk per namespace
[ https://issues.apache.org/jira/browse/HDFS-13062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347677#comment-16347677 ] genericqa commented on HDFS-13062: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 2s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}139m 6s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}185m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Private method org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(File) is never called At JournalNode.java:called At JournalNode.java:[lines 202-207] | | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestErasureCodingMultipleRacks | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestDFSStorageStateRecovery | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | | | hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | | hadoop.hdfs.TestDFSStripedOutputStream | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 | | | hadoop.hdfs.TestFileAppend4 | | | hadoop.hdfs.TestReadStripedFileWithDNFailure | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 | | | hadoop.hdfs.TestPread | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue |
[jira] [Assigned] (HDFS-12545) Autotune NameNode RPC handler threads according to number of datanodes in cluster
[ https://issues.apache.org/jira/browse/HDFS-12545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar reassigned HDFS-12545: - Assignee: (was: Ajay Kumar) > Autotune NameNode RPC handler threads according to number of datanodes in > cluster > - > > Key: HDFS-12545 > URL: https://issues.apache.org/jira/browse/HDFS-12545 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ajay Kumar >Priority: Major > > Autotune NameNode RPC handler threads according to number of datanodes in > cluster. > Currently rpc handler are controlled by {{dfs.namenode.handler.count}} on > cluster start. Jira is to discuss best way to auto tune it according to no of > datanodes and any other relevant input. Updating this to > {{max(dfs.namenode.handler.count, min(200,20 * log2(number of DataNodes)))}} > on NameNode start is one possible way. (This heuristic is from [Hadoop > Operations|http://shop.oreilly.com/product/0636920025085.do] book.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347625#comment-16347625 ] genericqa commented on HDFS-13060: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 46s{color} | {color:orange} root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 11s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}143m 49s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}232m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDistributedFileSystemWithECFileWithRandomECPolicy | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.federation.router.TestRouterSafemode | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.TestReplaceDatanodeOnFailure | | |
[jira] [Commented] (HDFS-13095) Improve slice tree traversal implementation
[ https://issues.apache.org/jira/browse/HDFS-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347616#comment-16347616 ] Xiao Chen commented on HDFS-13095: -- Thanks [~rakeshr] for creating the Jira and [~daryn] for reviewing (re-encryption essentially :)). On re-encryption we chose not to change snapshots due to the immutable nature of snapshots (old edek can still work if the ez key version is still there). Good point about permissions, perhaps since this is enforced to be hdfs superuser role we can skip perm checks... > Improve slice tree traversal implementation > --- > > Key: HDFS-13095 > URL: https://issues.apache.org/jira/browse/HDFS-13095 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > > This task is to refine the existing slice tree traversal logic in > [ReencryptionHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java#L74] > class. > Please refer Daryn's review comments > {quote}*FSTreeTraverser* > I need to study this more but I have grave concerns this will work correctly > in a mutating namesystem. Ex. renames and deletes esp. in combination with > snapshots. Looks like there's a chance it will go off in the weeds when > backtracking out of a renamed directory. > traverseDir may NPE if it's traversing a tree in a snapshot and one of the > ancestors is deleted. > Not sure why it's bothering to re-check permissions during the crawl. The > storage policy is inherited by the entire tree, regardless of whether the > sub-contents are accessible. The effect of this patch is the storage policy > is enforced for all readable files, non-readable violate the new storage > policy, new non-readable will conform to the new storage policy. Very > convoluted. Since new files will conform, should just process the entire > tree. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347614#comment-16347614 ] Ajay Kumar commented on HDFS-13060: --- [~xyao] thanks for review, created [HADOOP-15202] for "deprecation of CombinedIPWhiteList". > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13061) SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted channel
[ https://issues.apache.org/jira/browse/HDFS-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347601#comment-16347601 ] Ajay Kumar commented on HDFS-13061: --- [~xyao], thanks for review and commit. > SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted > channel > - > > Key: HDFS-13061 > URL: https://issues.apache.org/jira/browse/HDFS-13061 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-13061.000.patch, HDFS-13061.001.patch, > HDFS-13061.002.patch, HDFS-13061.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > SaslDataTransferClient#checkTrustAndSend ask the channel resolve whether the > client and server address are trusted, respectively. It decides the channel > is untrusted only if both client and server are not trusted to enforce > encryption. *This ticket is opened to change it to not trust (and encrypt) if > either client or server address are not trusted.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13043) RBF: Expose the state of the Routers in the federation
[ https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13043: --- Attachment: HDFS-13043.009.patch > RBF: Expose the state of the Routers in the federation > -- > > Key: HDFS-13043 > URL: https://issues.apache.org/jira/browse/HDFS-13043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13043.000.patch, HDFS-13043.001.patch, > HDFS-13043.002.patch, HDFS-13043.003.patch, HDFS-13043.004.patch, > HDFS-13043.005.patch, HDFS-13043.006.patch, HDFS-13043.007.patch, > HDFS-13043.008.patch, HDFS-13043.009.patch, router-info.png > > > The Router should expose the state of the other Routers in the federation > through a user UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10285) Storage Policy Satisfier in Namenode
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347575#comment-16347575 ] Surendra Singh Lilhore edited comment on HDFS-10285 at 1/31/18 8:38 PM: Thanks [~daryn] for reviews. Created Part1 Jira HDFS-13097 to fix few comments was (Author: surendrasingh): Thanks [~daryn] for reviews. Create Part1 Jira HDFS-13097 to fix few comments > Storage Policy Satisfier in Namenode > > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Attachments: HDFS-10285-consolidated-merge-patch-00.patch, > HDFS-10285-consolidated-merge-patch-01.patch, > HDFS-10285-consolidated-merge-patch-02.patch, > HDFS-10285-consolidated-merge-patch-03.patch, > HDFS-10285-consolidated-merge-patch-04.patch, > HDFS-10285-consolidated-merge-patch-05.patch, > HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, > Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, > Storage-Policy-Satisfier-in-HDFS-May10.pdf, > Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf > > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)
[ https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-13097: -- Status: Patch Available (was: Open) > [SPS]: Fix the branch review comments(Part1) > > > Key: HDFS-13097 > URL: https://issues.apache.org/jira/browse/HDFS-13097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13097-HDFS-10285.01.patch > > > Fix the branch review comment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347575#comment-16347575 ] Surendra Singh Lilhore commented on HDFS-10285: --- Thanks [~daryn] for reviews. Create Part1 Jira HDFS-13097 to fix few comments > Storage Policy Satisfier in Namenode > > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Attachments: HDFS-10285-consolidated-merge-patch-00.patch, > HDFS-10285-consolidated-merge-patch-01.patch, > HDFS-10285-consolidated-merge-patch-02.patch, > HDFS-10285-consolidated-merge-patch-03.patch, > HDFS-10285-consolidated-merge-patch-04.patch, > HDFS-10285-consolidated-merge-patch-05.patch, > HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, > Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, > Storage-Policy-Satisfier-in-HDFS-May10.pdf, > Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf > > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)
[ https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347573#comment-16347573 ] Surendra Singh Lilhore commented on HDFS-13097: --- Attached v1 patch > [SPS]: Fix the branch review comments(Part1) > > > Key: HDFS-13097 > URL: https://issues.apache.org/jira/browse/HDFS-13097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13097-HDFS-10285.01.patch > > > Fix the branch review comment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)
[ https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-13097: -- Attachment: HDFS-13097-HDFS-10285.01.patch > [SPS]: Fix the branch review comments(Part1) > > > Key: HDFS-13097 > URL: https://issues.apache.org/jira/browse/HDFS-13097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13097-HDFS-10285.01.patch > > > Fix the branch review comment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13062) Provide support for JN to use separate journal disk per namespace
[ https://issues.apache.org/jira/browse/HDFS-13062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347571#comment-16347571 ] Bharat Viswanadham commented on HDFS-13062: --- Hi [~hanishakoneru] Thanks for offline discussion for updating getLogDir, to use validateAndCreateJournalDir(dir), instead of validateAndCreateJournalDir(). I have mistakenly did this in v05 patch, reverted to be same as in v04 patch. And also removed the validateAndCreateJournalDir(), instead called validateAndCreateJournalDir(dir) in a for loop of localDir in start() method. > Provide support for JN to use separate journal disk per namespace > - > > Key: HDFS-13062 > URL: https://issues.apache.org/jira/browse/HDFS-13062 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13062.00.patch, HDFS-13062.01.patch, > HDFS-13062.02.patch, HDFS-13062.03.patch, HDFS-13062.04.patch, > HDFS-13062.05.patch, HDFS-13062.06.patch > > > In Federated HA setup, provide support for separate journal disk for each > namespace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13043) RBF: Expose the state of the Routers in the federation
[ https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347569#comment-16347569 ] genericqa commented on HDFS-13043: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}127m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13043 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908615/HDFS-13043.008.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 53766c00bf50 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d481344 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/22907/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Updated] (HDFS-13062) Provide support for JN to use separate journal disk per namespace
[ https://issues.apache.org/jira/browse/HDFS-13062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-13062: -- Attachment: HDFS-13062.06.patch > Provide support for JN to use separate journal disk per namespace > - > > Key: HDFS-13062 > URL: https://issues.apache.org/jira/browse/HDFS-13062 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13062.00.patch, HDFS-13062.01.patch, > HDFS-13062.02.patch, HDFS-13062.03.patch, HDFS-13062.04.patch, > HDFS-13062.05.patch, HDFS-13062.06.patch > > > In Federated HA setup, provide support for separate journal disk for each > namespace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)
[ https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347565#comment-16347565 ] Surendra Singh Lilhore commented on HDFS-13097: --- Fixing below comments *Comment-1)* {quote}BlockManager Shouldn’t spsMode be volatile? Although I question why it’s here. {quote} [Rakesh's reply] Agreed, will do the changes. *Comment-2)* {quote}Adding SPS methods to this class implies an unexpected coupling of the SPS service to the block manager. Please move them out to prove it’s not tightly coupled. {quote} [Rakesh's reply] Agreed. I'm planning to create {{StoragePolicySatisfyManager}} and keep all the related apis over there. *Comment-5)* {quote}DatanodeDescriptor Why use a synchronized linked list to offer/poll instead of BlockingQueue? {quote} [Rakesh's reply] Agreed, will do the changes. *Comment-8)* {quote}DFSUtil DFSUtil.removeOverlapBetweenStorageTypes and {{DFSUtil.getSPSWorkMultiplier }}. These aren’t generally useful methods so why are they in DFSUtil? Why aren’t they in the only calling class StoragePolicySatisfier? {quote} [Rakesh's reply] Agreed, Will do the changes. *Comment-11)* {quote}HdfsServerConstants The xattr is called user.hdfs.sps.xattr. Why does the xattr name actually contain the word “xattr”? {quote} [Rakesh's reply] Sure, will remove “xattr” word. *Comment-12)* {quote}NameNode Super trivial but using the plural pronoun “we” in this exception message is odd. Changing the value isn’t a joint activity. For enabling or disabling storage policy satisfier, we must pass either none/internal/external string value only {quote} [Rakesh's reply] oops, sorry for the mistake. Will change it. *Comment-16)* {quote}FSDirStatAndListOp Not sure why javadoc was changed to add needLocation. It's already present and now doubled up.{quote} [Rakesh'r reply] Agreed, will correct it. *Comment-18)* {quote}DFS_MOVER_MOVERTHREADS_DEFAULT is 1000 per DN? If the DN is concurrently doing 1000 moves, it's not in a good state, disk io is probably saturated, and this will only make it much worse. 10 is probably more than sufficient.{quote} [Rakesh'r reply] Agreed, will make it to smaller value 10. > [SPS]: Fix the branch review comments(Part1) > > > Key: HDFS-13097 > URL: https://issues.apache.org/jira/browse/HDFS-13097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > > Fix the branch review comment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)
Surendra Singh Lilhore created HDFS-13097: - Summary: [SPS]: Fix the branch review comments(Part1) Key: HDFS-13097 URL: https://issues.apache.org/jira/browse/HDFS-13097 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-10285 Reporter: Surendra Singh Lilhore Assignee: Surendra Singh Lilhore Fix the branch review comment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11419) BlockPlacementPolicyDefault is choosing datanode in an inefficient way
[ https://issues.apache.org/jira/browse/HDFS-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-11419. -- Resolution: Done > BlockPlacementPolicyDefault is choosing datanode in an inefficient way > -- > > Key: HDFS-11419 > URL: https://issues.apache.org/jira/browse/HDFS-11419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > > Currently in {{BlockPlacementPolicyDefault}}, {{chooseTarget}} will end up > calling into {{chooseRandom}}, which will first find a random datanode by > calling > {code}DatanodeDescriptor chosenNode = chooseDataNode(scope, > excludedNodes);{code}, then it checks whether that returned datanode > satisfies storage type requirement > {code}storage = chooseStorage4Block( > chosenNode, blocksize, results, entry.getKey());{code} > If yes, {{numOfReplicas--;}}, otherwise, the node is added to excluded nodes, > and runs the loop again until {{numOfReplicas}} is down to 0. > A problem here is that, storage type is not being considered only until after > a random node is already returned. We've seen a case where a cluster has a > large number of datanodes, while only a few satisfy the storage type > condition. So, for the most part, this code blindly picks random datanodes > that do not satisfy the storage type requirement. > To make matters worse, the way {{NetworkTopology#chooseRandom}} works is > that, given a set of excluded nodes, it first finds a random datanodes, then > if it is in excluded nodes set, try find another random nodes. So the more > excluded nodes there are, the more likely a random node will be in the > excluded set, in which case we basically wasted one iteration. > Therefore, this JIRA proposes to augment/modify the relevant classes in a way > that datanodes can be found more efficiently. There are currently two > different high level solutions we are considering: > 1. add some field to Node base types to describe the storage type info, and > when searching for a node, we take into account such field(s), and do not > return node that does not meet the storage type requirement. > 2. change {{NetworkTopology}} class to be aware of storage types, e.g. for > one storage type, there is one tree subset that connects all the nodes with > that type. And one search happens on only one such subset. So unexpected > storage types are simply not in the search space. > Thanks [~szetszwo] for the offline discussion, and thanks [~linyiqun] for > pointing out a wrong statement (corrected now) in the description. Any > further comments are more than welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12512) RBF: Add WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347525#comment-16347525 ] Wei Yan commented on HDFS-12512: Saw exceptions "java.lang.OutOfMemoryError: unable to create new native thread" from the test log, and the VM was crashed, which generated the two error log files (and Yetus complained ASF license of these files). Will wait for a while to retrigger the job,. > RBF: Add WebHDFS > > > Key: HDFS-12512 > URL: https://issues.apache.org/jira/browse/HDFS-12512 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Íñigo Goiri >Assignee: Wei Yan >Priority: Major > Labels: RBF > Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, > HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch > > > The Router currently does not support WebHDFS. It needs to implement > something similar to {{NamenodeWebHdfsMethods}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12512) RBF: Add WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347510#comment-16347510 ] genericqa commented on HDFS-12512: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 13 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 34s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 14 new + 121 unchanged - 8 fixed = 135 total (was 129) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 42s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs | | | hadoop.fs.contract.hdfs.TestHDFSContractDelete | | | hadoop.fs.permission.TestStickyBit | | | hadoop.hdfs.TestFileAppend | | | hadoop.fs.contract.router.web.TestRouterWebHDFSContractRename | | | hadoop.fs.TestSymlinkHdfsFileSystem | | | hadoop.fs.contract.router.web.TestRouterWebHDFSContractMkdir | | | hadoop.fs.contract.router.web.TestRouterWebHDFSContractRootDirectory | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | | | hadoop.hdfs.TestMaintenanceState | | | hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate | | | hadoop.hdfs.TestDFSStartupVersions | | | hadoop.fs.contract.hdfs.TestHDFSContractCreate | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.security.TestPermission | | | hadoop.hdfs.TestExternalBlockReader | | | hadoop.hdfs.TestAbandonBlock | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 | | | hadoop.fs.contract.router.web.TestRouterWebHDFSContractDelete | |
[jira] [Created] (HDFS-13096) HDFS group quota
Ruslan Dautkhanov created HDFS-13096: Summary: HDFS group quota Key: HDFS-13096 URL: https://issues.apache.org/jira/browse/HDFS-13096 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, fs, hdfs, nn Affects Versions: 3.0.0, 2.7.5, 2.8.3 Reporter: Ruslan Dautkhanov We have groups of people that have their own set of HDFS directories. For example, they have HDFS staging place for new files: /datascience /analysts ... but at the same time they have Hive warehouse directory /hivewarehouse/datascience /hivewarehouse/analysts ... on top of that they also have some files stored under /user/${username}/ It's always been a challenge to maintain a combined quota on all HDFS locations a particular group of people owns. As we're currently forced to put a particular quota for each directory independently. It would be great if HDFS would have a quota tied either - to a set of HDFS locations ; - or to a group of people (where `group`is defined as which HDFS group a particular file/directory belongs to). Linux allows to define quotas at group level, i.e. `edquota -g devel` etc.. would be great to have the same at HDFS level. Other thoughts and ideas? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347472#comment-16347472 ] Rakesh R commented on HDFS-10285: - Thank you very much [~daryn] for your time and useful comments/thoughts. My reply follows, please take a look at it. +Comment-1)+ {quote}BlockManager Shouldn’t spsMode be volatile? Although I question why it’s here. {quote} [Rakesh's reply] Agreed, will do the changes. +Comment-2)+ {quote}Adding SPS methods to this class implies an unexpected coupling of the SPS service to the block manager. Please move them out to prove it’s not tightly coupled. {quote} [Rakesh's reply] Agreed. We are planning to create {{StoragePolicySatisfyManager}} and keep all the related apis over there. +Comment-3)+ {quote}BPServiceActor Is it actually sending back the moved blocks? Aren’t IBRs sufficient? BlockStorageMovementCommand/BlocksStorageMoveAttemptFinished Again, not sure that a new DN command is necessary, and why does it specifically report back successful moves instead of relying on IBRs? I would actually expect the DN to be completely ignorant of a SPS move vs any other move. {quote} [Rakesh's reply] We have explored IBR approach and the required code changes. If sps rely on this, then it would requires an *extra* check to know whether this new block has occurred due to sps move or others, which will be quite often considering more other ops compares to SPSBlockMove op. Currently, it is sending back {{blksMovementsFinished}} list separately, each movement finished block can be easily/quickly recognized by the Satisfier in NN side and updates the tracking details. If you agree this *extra* check is not an issue then we would be happy to implement the IBR approach. Secondly, BlockStorageMovementCommand is added to carry the block vs src/target pairs which is much needed for the move operation and we tried to decouple sps code using this command. +Comment-4)+ {quote}DataNode Why isn’t this just a block transfer? How is transferring between DNs any different than across storages? {quote} [Rakesh's reply] I could see Mover is also using {{REPLACE_BLOCK}} call and we just followed same approach in sps also. Am I missing anything here? +Comment-5)+ {quote}DatanodeDescriptor Why use a synchronized linked list to offer/poll instead of BlockingQueue? {quote} [Rakesh's reply] Agreed, will do the changes. +Comment-6)+ {quote}DatanodeManager I know it’s configurable, but realistically, when would you ever want to give storage movement tasks equal footing with under-replication? Is there really a use case for not valuing durability? {quote} [Rakesh's reply] We don't have any particular use case, though. One scenario we thought is, user configured SSDs and filled up quickly. In that case, there could be situations that cleaning-up is considered as a high priority. If you feel, this is not a real case then I'm OK to remove this config and SPS will use only the remaining slots always. +Comment-7)+ {quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport is already a very bad method that should be avoided for anything but jmx – even then it’s a concern. I eliminated calls to it years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn lock for an excessive length of time. Beyond that, the response is going to be pretty large and tagging all the storage reports is not going to be cheap. verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its storageMap? Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned earlier, this is NOT cheap. The SPS should be able to operate on a fuzzy/cached state of the world. Then it gets another datanode report to determine the number of live nodes to decide if it should sleep before processing the next path. The number of nodes from the prior cached view of the world should suffice. {quote} [Rakesh's reply] Good point. Sometime back Uma and me thought about cache part. Actually, we depend on this api for the data node storage types and remaining space details. I think, it requires two different mechanisms for internal and external sps. For internal, how about sps can directly refer {{DatanodeManager#datanodeMap}} for every file. For the external, IIUC you are suggesting a cache mechanism. How about, get storageReport once and cache at ExternalContext. This local cache can be refreshed periodically. Say, After every 5mins (just an arbitrary number I put here, if you have some period in mind please suggest), when getDatanodeStorageReport called, cache can be treated as expired and fetch freshly. Within 5mins it can use from cache. Does this make sense to you? Another point we thought of is, right now for checking whether node has good space, its going to NN. With
[jira] [Created] (HDFS-13095) Improve slice tree traversal implementation
Rakesh R created HDFS-13095: --- Summary: Improve slice tree traversal implementation Key: HDFS-13095 URL: https://issues.apache.org/jira/browse/HDFS-13095 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R This task is to refine the existing slice tree traversal logic in [ReencryptionHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java#L74] class. Please refer Daryn's review comments {quote}*FSTreeTraverser* I need to study this more but I have grave concerns this will work correctly in a mutating namesystem. Ex. renames and deletes esp. in combination with snapshots. Looks like there's a chance it will go off in the weeds when backtracking out of a renamed directory. traverseDir may NPE if it's traversing a tree in a snapshot and one of the ancestors is deleted. Not sure why it's bothering to re-check permissions during the crawl. The storage policy is inherited by the entire tree, regardless of whether the sub-contents are accessible. The effect of this patch is the storage policy is enforced for all readable files, non-readable violate the new storage policy, new non-readable will conform to the new storage policy. Very convoluted. Since new files will conform, should just process the entire tree. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13092) Reduce verbosity for ThrottledAsyncChecker.java:schedule
[ https://issues.apache.org/jira/browse/HDFS-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347445#comment-16347445 ] Hudson commented on HDFS-13092: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13593 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13593/]) HDFS-13092. Reduce verbosity for ThrottledAsyncChecker#schedule. (hanishakoneru: rev 3ce2190b581526ad2d49e8c3a47be1547037310c) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/ThrottledAsyncChecker.java > Reduce verbosity for ThrottledAsyncChecker.java:schedule > > > Key: HDFS-13092 > URL: https://issues.apache.org/jira/browse/HDFS-13092 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDFS-13092.001.patch > > > ThrottledAsyncChecker.java:schedule prints a log message every time a disk > check is scheduled. However if the previous check was triggered lesser than > the frequency at "minMsBetweenChecks" then the task is not scheduled. This > jira will reduce the log verbosity by printing the message only when the task > will be scheduled. > {code} > 2018-01-29 00:51:44,467 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,470 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,477 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,480 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,486 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,501 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/13/hadoop/hdfs/data/current > 2018-01-29 00:51:44,507 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,533 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,536 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,543 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,544 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,548 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/3/hadoop/hdfs/data/current > 2018-01-29 00:51:44,549 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/5/hadoop/hdfs/data/current > 2018-01-29 00:51:44,550 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/6/hadoop/hdfs/data/current > 2018-01-29 00:51:44,551 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,554 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/9/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for >
[jira] [Updated] (HDFS-13092) Reduce verbosity for ThrottledAsyncChecker.java:schedule
[ https://issues.apache.org/jira/browse/HDFS-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13092: -- Issue Type: Improvement (was: Bug) > Reduce verbosity for ThrottledAsyncChecker.java:schedule > > > Key: HDFS-13092 > URL: https://issues.apache.org/jira/browse/HDFS-13092 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDFS-13092.001.patch > > > ThrottledAsyncChecker.java:schedule prints a log message every time a disk > check is scheduled. However if the previous check was triggered lesser than > the frequency at "minMsBetweenChecks" then the task is not scheduled. This > jira will reduce the log verbosity by printing the message only when the task > will be scheduled. > {code} > 2018-01-29 00:51:44,467 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,470 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,477 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,480 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,486 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,501 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/13/hadoop/hdfs/data/current > 2018-01-29 00:51:44,507 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,533 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,536 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,543 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,544 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,548 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/3/hadoop/hdfs/data/current > 2018-01-29 00:51:44,549 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/5/hadoop/hdfs/data/current > 2018-01-29 00:51:44,550 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/6/hadoop/hdfs/data/current > 2018-01-29 00:51:44,551 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,554 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/9/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/14/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,564
[jira] [Comment Edited] (HDFS-13073) Cleanup code in InterQJournalProtocol.proto
[ https://issues.apache.org/jira/browse/HDFS-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347408#comment-16347408 ] Bharat Viswanadham edited comment on HDFS-13073 at 1/31/18 7:08 PM: Fixed checkstyle issues in patch v01. Test cases are not added because, added method calls existing getEditLogManifest. And this is tested in TestJournalNodeSync testcases. was (Author: bharatviswa): Fixed checkstyle issues in patch v01. > Cleanup code in InterQJournalProtocol.proto > --- > > Key: HDFS-13073 > URL: https://issues.apache.org/jira/browse/HDFS-13073 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13073.00.patch, HDFS-13073.01.patch > > > We can reuse the messages in QJournalProtocol.proto, instead of redefining > again in InterQJournalProtocol.proto. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13092) Reduce verbosity for ThrottledAsyncChecker.java:schedule
[ https://issues.apache.org/jira/browse/HDFS-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347416#comment-16347416 ] Hanisha Koneru commented on HDFS-13092: --- Committed to trunk and branch-3.0. > Reduce verbosity for ThrottledAsyncChecker.java:schedule > > > Key: HDFS-13092 > URL: https://issues.apache.org/jira/browse/HDFS-13092 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDFS-13092.001.patch > > > ThrottledAsyncChecker.java:schedule prints a log message every time a disk > check is scheduled. However if the previous check was triggered lesser than > the frequency at "minMsBetweenChecks" then the task is not scheduled. This > jira will reduce the log verbosity by printing the message only when the task > will be scheduled. > {code} > 2018-01-29 00:51:44,467 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,470 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,477 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,480 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,486 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,501 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/13/hadoop/hdfs/data/current > 2018-01-29 00:51:44,507 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,533 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,536 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,543 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,544 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,548 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/3/hadoop/hdfs/data/current > 2018-01-29 00:51:44,549 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/5/hadoop/hdfs/data/current > 2018-01-29 00:51:44,550 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/6/hadoop/hdfs/data/current > 2018-01-29 00:51:44,551 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,554 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/9/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/14/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current >
[jira] [Commented] (HDFS-13061) SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted channel
[ https://issues.apache.org/jira/browse/HDFS-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347412#comment-16347412 ] Hudson commented on HDFS-13061: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13592 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13592/]) HDFS-13061. SaslDataTransferClient#checkTrustAndSend should not trust a (xyao: rev 37b753656849d0864ed3c8858edf3b85515cbf39) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java > SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted > channel > - > > Key: HDFS-13061 > URL: https://issues.apache.org/jira/browse/HDFS-13061 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-13061.000.patch, HDFS-13061.001.patch, > HDFS-13061.002.patch, HDFS-13061.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > SaslDataTransferClient#checkTrustAndSend ask the channel resolve whether the > client and server address are trusted, respectively. It decides the channel > is untrusted only if both client and server are not trusted to enforce > encryption. *This ticket is opened to change it to not trust (and encrypt) if > either client or server address are not trusted.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347407#comment-16347407 ] Dennis Huo commented on HDFS-13056: --- I ended up going with modifying the existing protocol, since otherwise the same splitting of the BlockGroupChecksum method for striped encodings ends up getting unwieldy. I've uploaded an amended v2 design doc outlining the pros and cons we've discussed for the DataTransferProtocol. It turns out this approach is also useful for dealing with merging stripe cells, I'm just finalizing that piece of the design still. > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, Reference_only_zhen_PPOC_hadoop2.6.X.diff, > hdfs-file-composite-crc32-v1.pdf, hdfs-file-composite-crc32-v2.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13073) Cleanup code in InterQJournalProtocol.proto
[ https://issues.apache.org/jira/browse/HDFS-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347408#comment-16347408 ] Bharat Viswanadham commented on HDFS-13073: --- Fixed checkstyle issues in patch v01. > Cleanup code in InterQJournalProtocol.proto > --- > > Key: HDFS-13073 > URL: https://issues.apache.org/jira/browse/HDFS-13073 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13073.00.patch, HDFS-13073.01.patch > > > We can reuse the messages in QJournalProtocol.proto, instead of redefining > again in InterQJournalProtocol.proto. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13073) Cleanup code in InterQJournalProtocol.proto
[ https://issues.apache.org/jira/browse/HDFS-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-13073: -- Attachment: HDFS-13073.01.patch > Cleanup code in InterQJournalProtocol.proto > --- > > Key: HDFS-13073 > URL: https://issues.apache.org/jira/browse/HDFS-13073 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13073.00.patch, HDFS-13073.01.patch > > > We can reuse the messages in QJournalProtocol.proto, instead of redefining > again in InterQJournalProtocol.proto. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Huo updated HDFS-13056: -- Attachment: hdfs-file-composite-crc32-v2.pdf > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, Reference_only_zhen_PPOC_hadoop2.6.X.diff, > hdfs-file-composite-crc32-v1.pdf, hdfs-file-composite-crc32-v2.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13058) Fix dfs.namenode.shared.edits.dir in TestJournalNode
[ https://issues.apache.org/jira/browse/HDFS-13058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347402#comment-16347402 ] Bharat Viswanadham commented on HDFS-13058: --- Test failures are not related. Ran them locally, tests have passed. > Fix dfs.namenode.shared.edits.dir in TestJournalNode > > > Key: HDFS-13058 > URL: https://issues.apache.org/jira/browse/HDFS-13058 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13058.00.patch, HDFS-13058.01.patch > > > In TestJournalNode.java > dfs.namenode.shared.edits.dir is set as below. > conf.set(DFSConfigKeys.DFS_NAMENODE_SHARED_EDITS_DIR_KEY +".ns1" +".nn1", > "qjournal://journalnode0:9900;journalnode1:9901"); > > From HDFS documentaion: > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html] > The URI should be of the form: > {{qjournal://*host1:port1*;*host2:port2*;*host3:port3*/*journalId*}}. > > Found this, when I was working for another jira, as when parsing this > dfs.namenode.shared.edits.dir property got an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347397#comment-16347397 ] Xiaoyu Yao commented on HDFS-13060: --- Thanks [~ajayydv] for the update. Patch v2 LGTM, +1 pending Jenkins. Please also file a ticket to deprecate CombinedIPWhiteList to use CombinedIPList for the white list based resolver as well to reduce the duplicated code. > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13061) SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted channel
[ https://issues.apache.org/jira/browse/HDFS-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13061: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) Thanks [~ajayydv] for the contribution. I've committed the patch to trunk and branch-3.0. > SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted > channel > - > > Key: HDFS-13061 > URL: https://issues.apache.org/jira/browse/HDFS-13061 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-13061.000.patch, HDFS-13061.001.patch, > HDFS-13061.002.patch, HDFS-13061.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > SaslDataTransferClient#checkTrustAndSend ask the channel resolve whether the > client and server address are trusted, respectively. It decides the channel > is untrusted only if both client and server are not trusted to enforce > encryption. *This ticket is opened to change it to not trust (and encrypt) if > either client or server address are not trusted.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13092) Reduce verbosity for ThrottledAsyncChecker.java:schedule
[ https://issues.apache.org/jira/browse/HDFS-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347383#comment-16347383 ] Hanisha Koneru commented on HDFS-13092: --- Thanks for the patch, [~msingh]. LGTM. Test failures are unrelated and pass locally. +1. > Reduce verbosity for ThrottledAsyncChecker.java:schedule > > > Key: HDFS-13092 > URL: https://issues.apache.org/jira/browse/HDFS-13092 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDFS-13092.001.patch > > > ThrottledAsyncChecker.java:schedule prints a log message every time a disk > check is scheduled. However if the previous check was triggered lesser than > the frequency at "minMsBetweenChecks" then the task is not scheduled. This > jira will reduce the log verbosity by printing the message only when the task > will be scheduled. > {code} > 2018-01-29 00:51:44,467 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,470 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,477 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,480 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/4/hadoop/hdfs/data/current > 2018-01-29 00:51:44,486 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,501 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/13/hadoop/hdfs/data/current > 2018-01-29 00:51:44,507 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/11/hadoop/hdfs/data/current > 2018-01-29 00:51:44,533 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,536 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,543 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,544 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,548 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/3/hadoop/hdfs/data/current > 2018-01-29 00:51:44,549 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/5/hadoop/hdfs/data/current > 2018-01-29 00:51:44,550 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/6/hadoop/hdfs/data/current > 2018-01-29 00:51:44,551 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/2/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/10/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,552 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,554 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/9/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/8/hadoop/hdfs/data/current > 2018-01-29 00:51:44,555 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/14/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for > /grid/12/hadoop/hdfs/data/current > 2018-01-29 00:51:44,560 INFO checker.ThrottledAsyncChecker > (ThrottledAsyncChecker.java:schedule(107)) -
[jira] [Created] (HDFS-13094) Refactor TestJournalNode
Bharat Viswanadham created HDFS-13094: - Summary: Refactor TestJournalNode Key: HDFS-13094 URL: https://issues.apache.org/jira/browse/HDFS-13094 Project: Hadoop HDFS Issue Type: Improvement Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham This Jira is created from the review comment from [~arpitagarwal] in HDFS-13062. We have used this testName to add testcase-specific behavior in the past but it is fragile. Perhaps we should open a separate Jira to move this behavior to testcase-specific init routines by using a test base class and derived classes for individual unit tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13062) Provide support for JN to use separate journal disk per namespace
[ https://issues.apache.org/jira/browse/HDFS-13062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347375#comment-16347375 ] Bharat Viswanadham commented on HDFS-13062: --- Hi [~hanishakoneru] Thanks for review. Addressed review comments in patch v05. * {quote}In {{setConf()}}, why are we getting the nameserviceIds from the config key {{DFS_INTERNAL_NAMESERVICES_KEY}} before {{DFS_NAMESERVICES}}? IIUC from HDFS-6376, which introduced the internal nameservices key, it is meant for datanodes to distinguish between which nameservices to connect to. JournalNodes should not be using this configuration to deduce the nameservice Ids. Please correct me if I am wrong.{quote}If DFS_INTERNAL_NAMESERVICES_KEY is set, those nameservices belongs to the cluster running, DFS_NAMESERVICES can be set with nameservices belonging to cluster and external cluster nameservices which it will be connecting to. So, that is the reason for checking DFS_INTERNAL_NAMESERVICES_KEY. > Provide support for JN to use separate journal disk per namespace > - > > Key: HDFS-13062 > URL: https://issues.apache.org/jira/browse/HDFS-13062 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13062.00.patch, HDFS-13062.01.patch, > HDFS-13062.02.patch, HDFS-13062.03.patch, HDFS-13062.04.patch, > HDFS-13062.05.patch > > > In Federated HA setup, provide support for separate journal disk for each > namespace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13062) Provide support for JN to use separate journal disk per namespace
[ https://issues.apache.org/jira/browse/HDFS-13062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-13062: -- Attachment: HDFS-13062.05.patch > Provide support for JN to use separate journal disk per namespace > - > > Key: HDFS-13062 > URL: https://issues.apache.org/jira/browse/HDFS-13062 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13062.00.patch, HDFS-13062.01.patch, > HDFS-13062.02.patch, HDFS-13062.03.patch, HDFS-13062.04.patch, > HDFS-13062.05.patch > > > In Federated HA setup, provide support for separate journal disk for each > namespace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347281#comment-16347281 ] zhenzhao wang edited comment on HDFS-13056 at 1/31/18 6:44 PM: --- Thanks for the detailed info. Both adding new method and modifying existing protocol sounds good to me. Part of the reason I added a data checksum option in top level is because CRC32/CRC32C is more generic checksum method (name) which is easy to understand. E.g. if I want copy a file from HDFS to GCS in distcp, the file checksum type or algorithm from GCS is CRC32C. And I hope I could use a same checksum type/name to get checksum from HDFS for verification. But I understand your concern too, as you said, it's difficult to come up with an entirely satisfactory approach. Both approach make sense to me. Now I got a patch to verify the data integrity in distcp by specifying the source and target fs checksum type explicitly, will modify it accordingly once this feature is accomplished. As for the CRC, your approach is much faster. CRC(concatenate(A, B)) = CRC(concatenate(A, \{length of B}))^CRC(B). Shift-right is faster than the matrix approach while calculating concatenate(A, \{length of B}) though the complexity are all O Log(\{length of B}). was (Author: wzzdreamer): Thanks for the detailed info. Both adding new method and modifying existing protocol sounds good to me. Part of the reason I added a data checksum option in top level is because CRC32/CRC32C is more generic checksum method (name) which is easy to understand. E.g. if I want copy a file from HDFS to GCS in distcp, the file checksum type or algorithm from GCS is CRC32C. And I hope I could use a same checksum type/name to get checksum from HDFS for verification. But I understand your concern too, as you said, it's difficult to come up with an entirely satisfactory approach. Both approach make sense to me. Now I got a patch to verify the data integrity in distcp by specifying the source and target fs checksum type explicitly, will modify it according once this feature is accomplished. As for the CRC, your approach is much faster. CRC(concatenate(A, B)) = CRC(concatenate(A, \{length of B}))^CRC(B). Shift-right is faster than the matrix approach while calculating concatenate(A, \{length of B}) though the complexity are all O Log(\{length of B}). > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, Reference_only_zhen_PPOC_hadoop2.6.X.diff, > hdfs-file-composite-crc32-v1.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be
[jira] [Assigned] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HDFS-10453: Assignee: He Xiaoqiao > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347201#comment-16347201 ] Ajay Kumar edited comment on HDFS-13060 at 1/31/18 6:08 PM: [~xyao], thanks for review. Updated patch v2 to address suggestions. Created [HDFS-13090] to support composite trusted channel resolver. was (Author: ajayydv): [~xyao], thanks for review. Updated patch v2 to address suggestions. > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347281#comment-16347281 ] zhenzhao wang commented on HDFS-13056: -- Thanks for the detailed info. Both adding new method and modifying existing protocol sounds good to me. Part of the reason I added a data checksum option in top level is because CRC32/CRC32C is more generic checksum method (name) which is easy to understand. E.g. if I want copy a file from HDFS to GCS in distcp, the file checksum type or algorithm from GCS is CRC32C. And I hope I could use a same checksum type/name to get checksum from HDFS for verification. But I understand your concern too, as you said, it's difficult to come up with an entirely satisfactory approach. Both approach make sense to me. Now I got a patch to verify the data integrity in distcp by specifying the source and target fs checksum type explicitly, will modify it according once this feature is accomplished. As for the CRC, your approach is much faster. CRC(concatenate(A, B)) = CRC(concatenate(A, \{length of B}))^CRC(B). Shift-right is faster than the matrix approach while calculating concatenate(A, \{length of B}) though the complexity are all O Log(\{length of B}). > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.poc1.patch, Reference_only_zhen_PPOC_hadoop2.6.X.diff, > hdfs-file-composite-crc32-v1.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13061) SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted channel
[ https://issues.apache.org/jira/browse/HDFS-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347274#comment-16347274 ] Xiaoyu Yao edited comment on HDFS-13061 at 1/31/18 5:55 PM: Thanks [~ajayydv] for the update. +1 for the v3 patch. The test failures are unrelated. I will commit it shortly. was (Author: xyao): Thanks [~ajayydv] for the update. +1 for the v4 patch. The test failures are unrelated. I will commit it shortly. > SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted > channel > - > > Key: HDFS-13061 > URL: https://issues.apache.org/jira/browse/HDFS-13061 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13061.000.patch, HDFS-13061.001.patch, > HDFS-13061.002.patch, HDFS-13061.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > SaslDataTransferClient#checkTrustAndSend ask the channel resolve whether the > client and server address are trusted, respectively. It decides the channel > is untrusted only if both client and server are not trusted to enforce > encryption. *This ticket is opened to change it to not trust (and encrypt) if > either client or server address are not trusted.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13061) SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted channel
[ https://issues.apache.org/jira/browse/HDFS-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347274#comment-16347274 ] Xiaoyu Yao commented on HDFS-13061: --- Thanks [~ajayydv] for the update. +1 for the v4 patch. The test failures are unrelated. I will commit it shortly. > SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted > channel > - > > Key: HDFS-13061 > URL: https://issues.apache.org/jira/browse/HDFS-13061 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13061.000.patch, HDFS-13061.001.patch, > HDFS-13061.002.patch, HDFS-13061.003.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > SaslDataTransferClient#checkTrustAndSend ask the channel resolve whether the > client and server address are trusted, respectively. It decides the channel > is untrusted only if both client and server are not trusted to enforce > encryption. *This ticket is opened to change it to not trust (and encrypt) if > either client or server address are not trusted.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12512) RBF: Add WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347266#comment-16347266 ] Íñigo Goiri commented on HDFS-12512: [^HDFS-12512.004.patch] looks good. Waiting for Yetus to come back. > RBF: Add WebHDFS > > > Key: HDFS-12512 > URL: https://issues.apache.org/jira/browse/HDFS-12512 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Íñigo Goiri >Assignee: Wei Yan >Priority: Major > Labels: RBF > Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, > HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch > > > The Router currently does not support WebHDFS. It needs to implement > something similar to {{NamenodeWebHdfsMethods}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13068) RBF: Add router admin option to manage safe mode
[ https://issues.apache.org/jira/browse/HDFS-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347260#comment-16347260 ] Íñigo Goiri commented on HDFS-13068: [^HDFS-13068.002.patch] looks good other than the check style issues. Minor comments: * In {{HDFSCommands.md}}, you have {{enter}} twice instead of {{leave}} in the table. * In {{HDFSRouterFederation.md}} ** Change {{There is a manuall way provided to manage Safe Mode for the Router.}} to {{There is a manual way to manage the Safe Mode for the Router.}} ** Change {{by following command}} to {{using the following command}} > RBF: Add router admin option to manage safe mode > > > Key: HDFS-13068 > URL: https://issues.apache.org/jira/browse/HDFS-13068 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13068.001.patch, HDFS-13068.002.patch > > > HDFS-13044 adds a safe mode to reject requests. We should have an option to > manually set the Router into safe mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13043) RBF: Expose the state of the Routers in the federation
[ https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13043: --- Attachment: HDFS-13043.008.patch > RBF: Expose the state of the Routers in the federation > -- > > Key: HDFS-13043 > URL: https://issues.apache.org/jira/browse/HDFS-13043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13043.000.patch, HDFS-13043.001.patch, > HDFS-13043.002.patch, HDFS-13043.003.patch, HDFS-13043.004.patch, > HDFS-13043.005.patch, HDFS-13043.006.patch, HDFS-13043.007.patch, > HDFS-13043.008.patch, router-info.png > > > The Router should expose the state of the other Routers in the federation > through a user UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12512) RBF: Add WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated HDFS-12512: --- Attachment: HDFS-12512.004.patch > RBF: Add WebHDFS > > > Key: HDFS-12512 > URL: https://issues.apache.org/jira/browse/HDFS-12512 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs >Reporter: Íñigo Goiri >Assignee: Wei Yan >Priority: Major > Labels: RBF > Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, > HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch > > > The Router currently does not support WebHDFS. It needs to implement > something similar to {{NamenodeWebHdfsMethods}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13044) RBF: Add a safe mode for the Router
[ https://issues.apache.org/jira/browse/HDFS-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13044: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.1 2.9.1 2.10.0 3.1.0 Status: Resolved (was: Patch Available) Thanks [~linyiqun] for the review. I did the commit to {{branch-3.0}}, {{branch-2}}, and {{branch-2.9}}. [~linyiqun] had already done {{trunk}}. > RBF: Add a safe mode for the Router > --- > > Key: HDFS-13044 > URL: https://issues.apache.org/jira/browse/HDFS-13044 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1 > > Attachments: HDFS-13004.000.patch, HDFS-13044-branch-3.0.000.patch, > HDFS-13044.001.patch, HDFS-13044.002.patch, HDFS-13044.003.patch, > HDFS-13044.004.patch, HDFS-13044.005.patch > > > When a Router cannot communicate with the State Store, it should enter into a > safe mode that disallows certain operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347201#comment-16347201 ] Ajay Kumar commented on HDFS-13060: --- [~xyao], thanks for review. Updated patch v2 to address suggestions. > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13060) Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13060: -- Attachment: HDFS-13060.002.patch > Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver > > > Key: HDFS-13060 > URL: https://issues.apache.org/jira/browse/HDFS-13060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13060.000.patch, HDFS-13060.001.patch, > HDFS-13060.002.patch > > > HDFS-5910 introduces encryption negotiation between client and server based > on a customizable TrustedChannelResolver class. The TrustedChannelResolver is > invoked on both client and server side. If the resolver indicates that the > channel is trusted, then the data transfer will not be encrypted even if > dfs.encrypt.data.transfer is set to true. > The default trust channel resolver implementation returns false indicating > that the channel is not trusted, which always enables encryption. HDFS-5910 > also added a build-int whitelist based trust channel resolver. It allows you > to put IP address/Network Mask of trusted client/server in whitelist files to > skip encryption for certain traffics. > This ticket is opened to add a blacklist based trust channel resolver for > cases only certain machines (IPs) are untrusted without adding each trusted > IP individually. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346967#comment-16346967 ] Kihwal Lee commented on HDFS-11187: --- bq. Not sure why yetus said it failed tests. Some tests failed to terminate normally, so surefire did not report any failures, but maven did. Look at the output. [https://builds.apache.org/job/PreCommit-HDFS-Build/22888/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346942#comment-16346942 ] Wei-Chiu Chuang commented on HDFS-11187: the latest patch actually passed all tests. Not sure why yetus said it failed tests. [~yzhangal] could you please review the patch? Thank you > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12504) Ozone: Improve SQLCLI performance
[ https://issues.apache.org/jira/browse/HDFS-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346858#comment-16346858 ] genericqa commented on HDFS-12504: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 10m 22s{color} | {color:red} Docker failed to build yetus/hadoop:d11161b. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12504 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12891263/HDFS-12504-HDFS-7240.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/22904/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: Improve SQLCLI performance > - > > Key: HDFS-12504 > URL: https://issues.apache.org/jira/browse/HDFS-12504 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Yuanbo Liu >Priority: Major > Labels: performance > Attachments: HDFS-12504-HDFS-7240.001.patch > > > In my test, my {{ksm.db}} has *3017660* entries with total size of *128mb*, > SQLCLI tool runs over *2 hours* but still not finish exporting the DB. This > is because it iterates each entry and inserts that to another sqllite DB > file, which is not efficient. We need to improve this to be running more > efficiently on large DB files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12504) Ozone: Improve SQLCLI performance
[ https://issues.apache.org/jira/browse/HDFS-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain reassigned HDFS-12504: -- Assignee: Yuanbo Liu (was: Weiwei Yang) > Ozone: Improve SQLCLI performance > - > > Key: HDFS-12504 > URL: https://issues.apache.org/jira/browse/HDFS-12504 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Yuanbo Liu >Priority: Major > Labels: performance > Attachments: HDFS-12504-HDFS-7240.001.patch > > > In my test, my {{ksm.db}} has *3017660* entries with total size of *128mb*, > SQLCLI tool runs over *2 hours* but still not finish exporting the DB. This > is because it iterates each entry and inserts that to another sqllite DB > file, which is not efficient. We need to improve this to be running more > efficiently on large DB files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org