[jira] [Updated] (HDFS-7610) Fix removal of dynamically added DN volumes
[ https://issues.apache.org/jira/browse/HDFS-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-7610: -- Labels: 2.6.1-candidate (was: ) > Fix removal of dynamically added DN volumes > --- > > Key: HDFS-7610 > URL: https://issues.apache.org/jira/browse/HDFS-7610 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: HDFS-7610.000.patch, HDFS-7610.001.patch > > > In the hot swap feature, {{FsDatasetImpl#addVolume}} uses the base volume dir > (e.g. "{{/foo/data0}}", instead of volume's current dir > "{{/foo/data/current}}" to construct {{FsVolumeImpl}}. As a result, DataNode > can not remove this newly added volume, because its > {{FsVolumeImpl#getBasePath}} returns "{{/foo}}". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694855#comment-14694855 ] Yongjun Zhang commented on HDFS-8828: - Hi [~yufeigu], Thanks for the new rev 006 which tries to address the issue we discussed (to avoid re-copying an already copied dir/file which is moved to a newly created dir since last snapshot/distcp). I have some more comments: # Change {{Number of path in the copy list}} to {{Number of paths in the copy list}} # change {code} if (LOG.isDebugEnabled()) { LOG.debug("Path in the copy list: " + lastFileStatus.getPath().toUri().getPath()); } {code} To (add an idx and only print in usediff && debug mode): if (options.shouldUseDiff() && LOG.isDebugEnabled()) { LOG.debug("Copy list entry " + idx + ": " + lastFileStatus.getPath().toUri().getPath()); } ++idx; {code} # Add some more explanation to the javadoc of {{static HashSet getExcludeList(Path dir, DiffInfo[] renameDiffs, Path prefix}}}, such as: {code} Given a newly created directory newDir in the snapshot diff, if a previously copied file/dirctory itemX is moved (renamed) to below newDir, itemX should be excluded so it will not to be copied again. {code} # the goal of this jira is to only copy modified/created files, all of which would have entries in the snapshot diff report, why we have to call {{traverseDirectory}} to recursively traverse everything in {{doBuildListingWithSnapshotDiff(..}}} in this mode? Sounds to me that we only need to look at each snapshot diff item, and its direct children. ("mv ./x/y ./p/q" would make two entries in the snapshot diff: ./x and ./y, so do need to care about the first level children of snapshot diff entry). Right? If so, to reuse the code in {{traverseDirectory}}, we can modify {{traverseDirectory}} to support a mode that only cares about the current source and and it's first level children, but not recursively. Thanks. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694879#comment-14694879 ] Yi Liu commented on HDFS-8859: -- Thanks [~szetszwo] for the review! Update the patch to address your comments. {quote} How about calling it LightWeightResizableGSet? {quote} Agree, rename it in the new patch. {quote} >From your calculation, the patch improve each block replica object size about >45%. The JIRA summary is misleading. It seems claiming that it improves the >overall DataNode memory footprint by about 45%. For 10m replicas, the original >overall map entry object size is ~900 MB and the new size is ~500MB. Is it >correct? {quote} It's correct. Actually I added {{ReplicaMap}} in the JIRA summary, yes, I use {{()}}, :), considering the {{ReplicaMap}} is the major in memory long-lived object of Datanode, of course, there are other aspects (most are transient: data read/write buffer, rpc buffer, etc..), I just highlighted the improvement. {quote} Subclass can call super.put(..) {quote} Update in the new patch. I just used to a new internal method . {quote} There is a rewrite for LightWeightGSet.remove(..) {quote} I revert it in the new patch and keep original one. Original implement has duplicate logic, we can share same logic for all the {{if...else..}} branches. {quote} I think we need some long running tests to make sure the correctness. See TestGSet.runMultipleTestGSet() {quote} Agree, updated it in the new patch. For the test failures of {{003}}, it's because there is one place (BlockPoolSlice) add replicaInfo to replicaMap from a tmp replicapMap, but the replicaInfo is still in the tmp one, we can remove it from the tmp one before adding (for LightWeightGSet, an element is not allowed to exist in two gset). In {{002}} patch, the failure doesn't exist, we have a new implement of {{SetIterator}} which is very similar to the logic in java Hashmap, and a bit different with original one, but both are correct, the major difference is the time of finding next element. In the new patch, I keep the original one, and make few change in BlockPoolSlice. All tests run successfully in my local for the new patch. > Improve DataNode (ReplicaMap) memory footprint to save about 45% > > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Critical > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap>(); > {code} > Currently we use a HashMap {{Map}} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8859: - Attachment: HDFS-8859.004.patch > Improve DataNode (ReplicaMap) memory footprint to save about 45% > > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Critical > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap>(); > {code} > Currently we use a HashMap {{Map}} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zhang updated HDFS-8891: - Attachment: HDFS-8891.001.patch First patch, please review > HDFS concat should keep srcs order > -- > > Key: HDFS-8891 > URL: https://issues.apache.org/jira/browse/HDFS-8891 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yong Zhang >Assignee: Yong Zhang > Attachments: HDFS-8891.001.patch > > > FSDirConcatOp.verifySrcFiles may change src files order, but it should their > order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zhang updated HDFS-8891: - Status: Patch Available (was: Open) > HDFS concat should keep srcs order > -- > > Key: HDFS-8891 > URL: https://issues.apache.org/jira/browse/HDFS-8891 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yong Zhang >Assignee: Yong Zhang > Attachments: HDFS-8891.001.patch > > > FSDirConcatOp.verifySrcFiles may change src files order, but it should their > order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
[ https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695077#comment-14695077 ] Hadoop QA commented on HDFS-8808: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 28s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 27s | The applied patch generated 1 new checkstyle issues (total was 574, now 574). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 3s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 177m 4s | Tests failed in hadoop-hdfs. | | | | 221m 22s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestDFSClientRetries | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750240/HDFS-8808-03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 40f8151 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11986/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11986/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11986/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11986/console | This message was automatically generated. > dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby > > > Key: HDFS-8808 > URL: https://issues.apache.org/jira/browse/HDFS-8808 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Gautam Gopalakrishnan >Assignee: Zhe Zhang > Attachments: HDFS-8808-00.patch, HDFS-8808-01.patch, > HDFS-8808-02.patch, HDFS-8808-03.patch > > > The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the > speed with which the fsimage is copied between the namenodes during regular > use. However, as a side effect, this also limits transfers when the > {{-bootstrapStandby}} option is used. This option is often used during > upgrades and could potentially slow down the entire workflow. The request > here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth > setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8892) ShortCircuitCache.CacheCleaner can add Slot.isInvalid() check too
Ravikumar created HDFS-8892: --- Summary: ShortCircuitCache.CacheCleaner can add Slot.isInvalid() check too Key: HDFS-8892 URL: https://issues.apache.org/jira/browse/HDFS-8892 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.7.1 Reporter: Ravikumar Priority: Minor Currently CacheCleaner thread checks only for cache-expiry times. It would be nice if it handles an invalid-slot too in an extra-pass of evictable map… for(ShortCircuitReplica replica:evictable.values()) { if(!scr.getSlot().isValid()) { purge(replica); } } //Existing code... int numDemoted = demoteOldEvictableMmaped(curMs); int numPurged = 0; Long evictionTimeNs = Long.valueOf(0); …. ….. Apps like HBase can tweak the expiry/staleness/cache-size params in DFS-Client, so that ShortCircuitReplica will never be closed except when Slot is declared invalid. I assume slot-invalidation will happen during block-invalidation/deletes {Primarily triggered by compaction/shard-takeover etc..} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8892) ShortCircuitCache.CacheCleaner can add Slot.isInvalid() check too
[ https://issues.apache.org/jira/browse/HDFS-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanaka kumar avvaru reassigned HDFS-8892: - Assignee: kanaka kumar avvaru > ShortCircuitCache.CacheCleaner can add Slot.isInvalid() check too > - > > Key: HDFS-8892 > URL: https://issues.apache.org/jira/browse/HDFS-8892 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Ravikumar >Assignee: kanaka kumar avvaru >Priority: Minor > > Currently CacheCleaner thread checks only for cache-expiry times. It would be > nice if it handles an invalid-slot too in an extra-pass of evictable map… > for(ShortCircuitReplica replica:evictable.values()) { > if(!scr.getSlot().isValid()) { > purge(replica); > } > } > //Existing code... > int numDemoted = demoteOldEvictableMmaped(curMs); > int numPurged = 0; > Long evictionTimeNs = Long.valueOf(0); > …. > ….. > Apps like HBase can tweak the expiry/staleness/cache-size params in > DFS-Client, so that ShortCircuitReplica will never be closed except when Slot > is declared invalid. > I assume slot-invalidation will happen during block-invalidation/deletes > {Primarily triggered by compaction/shard-takeover etc..} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8859: - Summary: Improve DataNode ReplicaMap memory footprint to save about 45% (was: Improve DataNode (ReplicaMap) memory footprint to save about 45%) > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Critical > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap>(); > {code} > Currently we use a HashMap {{Map}} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694879#comment-14694879 ] Yi Liu edited comment on HDFS-8859 at 8/13/15 12:02 PM: Thanks [~szetszwo] for the review! Update the patch to address your comments. {quote} How about calling it LightWeightResizableGSet? {quote} Agree, rename it in the new patch. {quote} >From your calculation, the patch improve each block replica object size about >45%. The JIRA summary is misleading. It seems claiming that it improves the >overall DataNode memory footprint by about 45%. For 10m replicas, the original >overall map entry object size is ~900 MB and the new size is ~500MB. Is it >correct? {quote} It's correct. I did add {{ReplicaMap}} in the JIRA summary, yes, I use {{()}}, :), considering the {{ReplicaMap}} is the major long-lived object in memory of Datanode which could be large, of course, there are other aspects (many are transient: data read/write buffer, rpc buffer, etc..), I just highlighted the improvement. Let me remove the {{()}}. {quote} Subclass can call super.put(..) {quote} Update in the new patch. I just used to a new internal method . {quote} There is a rewrite for LightWeightGSet.remove(..) {quote} I revert it in the new patch and keep original one. Original implement has duplicate logic, we can share same logic for all the {{if...else..}}. {quote} I think we need some long running tests to make sure the correctness. See TestGSet.runMultipleTestGSet() {quote} Agree, updated it in the new patch. For the test failures of {{003}}, it's because there is one place (BlockPoolSlice) add replicaInfo to replicaMap from a tmp replicapMap, but the replicaInfo is still in the tmp one, we can remove it from the tmp one before adding (for LightWeightGSet, an element is not allowed to exist in two gset). In {{002}} patch, the failure didn't exist, we had a new implement of {{SetIterator}} which was very similar to the logic in java Hashmap, and a bit different with original one. But both are correct, the major difference is the time of finding next element. In the new patch, I keep the original one, and make few change in BlockPoolSlice. All tests run successfully in my local for the new patch. was (Author: hitliuyi): Thanks [~szetszwo] for the review! Update the patch to address your comments. {quote} How about calling it LightWeightResizableGSet? {quote} Agree, rename it in the new patch. {quote} >From your calculation, the patch improve each block replica object size about >45%. The JIRA summary is misleading. It seems claiming that it improves the >overall DataNode memory footprint by about 45%. For 10m replicas, the original >overall map entry object size is ~900 MB and the new size is ~500MB. Is it >correct? {quote} It's correct. Actually I added {{ReplicaMap}} in the JIRA summary, yes, I use {{()}}, :), considering the {{ReplicaMap}} is the major in memory long-lived object of Datanode, of course, there are other aspects (most are transient: data read/write buffer, rpc buffer, etc..), I just highlighted the improvement. {quote} Subclass can call super.put(..) {quote} Update in the new patch. I just used to a new internal method . {quote} There is a rewrite for LightWeightGSet.remove(..) {quote} I revert it in the new patch and keep original one. Original implement has duplicate logic, we can share same logic for all the {{if...else..}} branches. {quote} I think we need some long running tests to make sure the correctness. See TestGSet.runMultipleTestGSet() {quote} Agree, updated it in the new patch. For the test failures of {{003}}, it's because there is one place (BlockPoolSlice) add replicaInfo to replicaMap from a tmp replicapMap, but the replicaInfo is still in the tmp one, we can remove it from the tmp one before adding (for LightWeightGSet, an element is not allowed to exist in two gset). In {{002}} patch, the failure doesn't exist, we have a new implement of {{SetIterator}} which is very similar to the logic in java Hashmap, and a bit different with original one, but both are correct, the major difference is the time of finding next element. In the new patch, I keep the original one, and make few change in BlockPoolSlice. All tests run successfully in my local for the new patch. > Improve DataNode (ReplicaMap) memory footprint to save about 45% > > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Critical > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using followi
[jira] [Commented] (HDFS-8879) Quota by storage type usage incorrectly initialized upon namenode restart
[ https://issues.apache.org/jira/browse/HDFS-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695116#comment-14695116 ] Hudson commented on HDFS-8879: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #286 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/286/]) HDFS-8879. Quota by storage type usage incorrectly initialized upon namenode restart. Contributed by Xiaoyu Yao. (xyao: rev 3e715a4f4c46bcd8b3054cb0566e526c46bd5d66) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java > Quota by storage type usage incorrectly initialized upon namenode restart > - > > Key: HDFS-8879 > URL: https://issues.apache.org/jira/browse/HDFS-8879 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8879.01.patch > > > This was found by [~kihwal] as part of HDFS-8865 work in this > [comment|https://issues.apache.org/jira/browse/HDFS-8865?focusedCommentId=14660904&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660904]. > The unit test > testQuotaByStorageTypePersistenceInFsImage/testQuotaByStorageTypePersistenceInFsEdit > failed to detect this because they were using an obsolete > FsDirectory instance. Once added the highlighted line below, the issue can be > reproed. > {code} > >fsdir = cluster.getNamesystem().getFSDirectory(); > INode testDirNodeAfterNNRestart = fsdir.getINode4Write(testDir.toString()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8622) Implement GETCONTENTSUMMARY operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695119#comment-14695119 ] Hudson commented on HDFS-8622: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #286 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/286/]) HDFS-8622. Implement GETCONTENTSUMMARY operation for WebImageViewer. Contributed by Jagadesh Kiran N. (aajisaka: rev 40f815131e822f5b7a8e6a6827f4b85b31220c43) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageHandler.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForContentSummary.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageLoader.java > Implement GETCONTENTSUMMARY operation for WebImageViewer > > > Key: HDFS-8622 > URL: https://issues.apache.org/jira/browse/HDFS-8622 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jagadesh Kiran N >Assignee: Jagadesh Kiran N > Attachments: HDFS-8622-00.patch, HDFS-8622-01.patch, > HDFS-8622-02.patch, HDFS-8622-03.patch, HDFS-8622-04.patch, > HDFS-8622-05.patch, HDFS-8622-06.patch, HDFS-8622-07.patch, > HDFS-8622-08.patch, HDFS-8622-09.patch, HDFS-8622-10.patch > > > it would be better for administrators if {code} GETCONTENTSUMMARY {code} are > supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8879) Quota by storage type usage incorrectly initialized upon namenode restart
[ https://issues.apache.org/jira/browse/HDFS-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695123#comment-14695123 ] Hudson commented on HDFS-8879: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1016 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1016/]) HDFS-8879. Quota by storage type usage incorrectly initialized upon namenode restart. Contributed by Xiaoyu Yao. (xyao: rev 3e715a4f4c46bcd8b3054cb0566e526c46bd5d66) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java > Quota by storage type usage incorrectly initialized upon namenode restart > - > > Key: HDFS-8879 > URL: https://issues.apache.org/jira/browse/HDFS-8879 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8879.01.patch > > > This was found by [~kihwal] as part of HDFS-8865 work in this > [comment|https://issues.apache.org/jira/browse/HDFS-8865?focusedCommentId=14660904&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660904]. > The unit test > testQuotaByStorageTypePersistenceInFsImage/testQuotaByStorageTypePersistenceInFsEdit > failed to detect this because they were using an obsolete > FsDirectory instance. Once added the highlighted line below, the issue can be > reproed. > {code} > >fsdir = cluster.getNamesystem().getFSDirectory(); > INode testDirNodeAfterNNRestart = fsdir.getINode4Write(testDir.toString()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8622) Implement GETCONTENTSUMMARY operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695126#comment-14695126 ] Hudson commented on HDFS-8622: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1016 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1016/]) HDFS-8622. Implement GETCONTENTSUMMARY operation for WebImageViewer. Contributed by Jagadesh Kiran N. (aajisaka: rev 40f815131e822f5b7a8e6a6827f4b85b31220c43) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageLoader.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageHandler.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForContentSummary.java > Implement GETCONTENTSUMMARY operation for WebImageViewer > > > Key: HDFS-8622 > URL: https://issues.apache.org/jira/browse/HDFS-8622 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jagadesh Kiran N >Assignee: Jagadesh Kiran N > Attachments: HDFS-8622-00.patch, HDFS-8622-01.patch, > HDFS-8622-02.patch, HDFS-8622-03.patch, HDFS-8622-04.patch, > HDFS-8622-05.patch, HDFS-8622-06.patch, HDFS-8622-07.patch, > HDFS-8622-08.patch, HDFS-8622-09.patch, HDFS-8622-10.patch > > > it would be better for administrators if {code} GETCONTENTSUMMARY {code} are > supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695178#comment-14695178 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 21s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 50s | The applied patch generated 6 new checkstyle issues (total was 12, now 16). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 33s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 76m 49s | Tests failed in hadoop-hdfs. | | | | 145m 35s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | | hadoop.hdfs.TestReplication | | | hadoop.hdfs.TestSafeMode | | | hadoop.hdfs.TestDatanodeRegistration | | | hadoop.hdfs.tools.TestDebugAdmin | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.hdfs.TestDatanodeReport | | | hadoop.hdfs.TestDFSShellGenericOptions | | | hadoop.hdfs.TestParallelRead | | | hadoop.hdfs.tools.TestStoragePolicyCommands | | | hadoop.hdfs.TestDFSRemove | | | hadoop.hdfs.qjournal.TestSecureNNWithQJM | | | hadoop.hdfs.web.TestWebHdfsTokens | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.TestPersistBlocks | | | hadoop.hdfs.TestParallelShortCircuitReadNoChecksum | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.TestQuota | | | hadoop.hdfs.TestDFSClientFailover | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForAcl | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.TestFileLengthOnClusterRestart | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary | | | hadoop.hdfs.TestFSOutputSummer | | | hadoop.hdfs.TestEncryptionZonesWithHA | | | hadoop.hdfs.TestBlockReaderFactory | | | hadoop.hdfs.TestDFSFinalize | | | hadoop.hdfs.TestDisableConnCache | | | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForXAttr | | | hadoop.hdfs.web.TestHttpsFileSystem | | | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter | | | hadoop.hdfs.web.TestWebHDFSAcl | | | hadoop.hdfs.TestHDFSTrash | | | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer | | | hadoop.hdfs.web.TestWebHDFSForHA | | | hadoop.hdfs.TestBlockMissingException | | | hadoop.hdfs.TestPipelines | | | hadoop.hdfs.TestRenameWhileOpen | | | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.TestFileAppend3 | | | hadoop.hdfs.TestBalancerBandwidth | | | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | hadoop.hdfs.TestSeekBug | | | hadoop.hdfs.TestParallelShortCircuitReadUnCached | | | hadoop.hdfs.TestBlockReaderLocal | | | hadoop.hdfs.TestListFilesInFileContext | | | hadoop.hdfs.web.TestWebHDFSXAttr | | | hadoop.hdfs.TestFileStatus | | | hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | Timed out tests | org.apache.hadoop.hdfs.TestFileCreation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750254/HDFS-8859.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 53bef9c | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11987/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11987/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://build
[jira] [Created] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
Rushabh S Shah created HDFS-8893: Summary: DNs with failed volumes stop serving during rolling upgrade Key: HDFS-8893 URL: https://issues.apache.org/jira/browse/HDFS-8893 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rushabh S Shah Priority: Critical When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker to each of their volumes. If one of the volumes is bad, this will fail. When this failure happens, the DN does not update the key it received from the NN. Unfortunately we had one failed volume on all the 3 datanodes which were having replica. Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the DNs with failed volumes will stop serving clients. Here is the stack trace on the datanode size: {noformat} 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN datanode.DataNode: IOException in offerService java.io.IOException: Read-only file system at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:947) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) at org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-8893: - Assignee: Daryn Sharp > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695320#comment-14695320 ] Hadoop QA commented on HDFS-8891: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 21s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 3s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 175m 19s | Tests failed in hadoop-hdfs. | | | | 219m 12s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750259/HDFS-8891.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 53bef9c | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11988/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11988/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11988/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11988/console | This message was automatically generated. > HDFS concat should keep srcs order > -- > > Key: HDFS-8891 > URL: https://issues.apache.org/jira/browse/HDFS-8891 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yong Zhang >Assignee: Yong Zhang > Attachments: HDFS-8891.001.patch > > > FSDirConcatOp.verifySrcFiles may change src files order, but it should their > order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8879) Quota by storage type usage incorrectly initialized upon namenode restart
[ https://issues.apache.org/jira/browse/HDFS-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695387#comment-14695387 ] Hudson commented on HDFS-8879: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2213/]) HDFS-8879. Quota by storage type usage incorrectly initialized upon namenode restart. Contributed by Xiaoyu Yao. (xyao: rev 3e715a4f4c46bcd8b3054cb0566e526c46bd5d66) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Quota by storage type usage incorrectly initialized upon namenode restart > - > > Key: HDFS-8879 > URL: https://issues.apache.org/jira/browse/HDFS-8879 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8879.01.patch > > > This was found by [~kihwal] as part of HDFS-8865 work in this > [comment|https://issues.apache.org/jira/browse/HDFS-8865?focusedCommentId=14660904&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660904]. > The unit test > testQuotaByStorageTypePersistenceInFsImage/testQuotaByStorageTypePersistenceInFsEdit > failed to detect this because they were using an obsolete > FsDirectory instance. Once added the highlighted line below, the issue can be > reproed. > {code} > >fsdir = cluster.getNamesystem().getFSDirectory(); > INode testDirNodeAfterNNRestart = fsdir.getINode4Write(testDir.toString()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7926) NameNode implementation of ClientProtocol.truncate(..) is not idempotent
[ https://issues.apache.org/jira/browse/HDFS-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-7926: -- Labels: (was: 2.6.1-candidate) Removing the 2.6.1-candidate label as truncate is not a feature in 2.6. > NameNode implementation of ClientProtocol.truncate(..) is not idempotent > > > Key: HDFS-7926 > URL: https://issues.apache.org/jira/browse/HDFS-7926 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.7.0 > > Attachments: h7926_20150313.patch, h7926_20150313b.patch > > > If dfsclient drops the first response of a truncate RPC call, the retry by > retry cache will fail with "DFSClient ... is already the current lease > holder". The truncate RPC is annotated as @Idempotent in ClientProtocol but > the NameNode implementation is not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8622) Implement GETCONTENTSUMMARY operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695390#comment-14695390 ] Hudson commented on HDFS-8622: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2213 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2213/]) HDFS-8622. Implement GETCONTENTSUMMARY operation for WebImageViewer. Contributed by Jagadesh Kiran N. (aajisaka: rev 40f815131e822f5b7a8e6a6827f4b85b31220c43) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForContentSummary.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageLoader.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageHandler.java > Implement GETCONTENTSUMMARY operation for WebImageViewer > > > Key: HDFS-8622 > URL: https://issues.apache.org/jira/browse/HDFS-8622 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jagadesh Kiran N >Assignee: Jagadesh Kiran N > Attachments: HDFS-8622-00.patch, HDFS-8622-01.patch, > HDFS-8622-02.patch, HDFS-8622-03.patch, HDFS-8622-04.patch, > HDFS-8622-05.patch, HDFS-8622-06.patch, HDFS-8622-07.patch, > HDFS-8622-08.patch, HDFS-8622-09.patch, HDFS-8622-10.patch > > > it would be better for administrators if {code} GETCONTENTSUMMARY {code} are > supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8879) Quota by storage type usage incorrectly initialized upon namenode restart
[ https://issues.apache.org/jira/browse/HDFS-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695454#comment-14695454 ] Hudson commented on HDFS-8879: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #275 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/275/]) HDFS-8879. Quota by storage type usage incorrectly initialized upon namenode restart. Contributed by Xiaoyu Yao. (xyao: rev 3e715a4f4c46bcd8b3054cb0566e526c46bd5d66) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java > Quota by storage type usage incorrectly initialized upon namenode restart > - > > Key: HDFS-8879 > URL: https://issues.apache.org/jira/browse/HDFS-8879 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8879.01.patch > > > This was found by [~kihwal] as part of HDFS-8865 work in this > [comment|https://issues.apache.org/jira/browse/HDFS-8865?focusedCommentId=14660904&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660904]. > The unit test > testQuotaByStorageTypePersistenceInFsImage/testQuotaByStorageTypePersistenceInFsEdit > failed to detect this because they were using an obsolete > FsDirectory instance. Once added the highlighted line below, the issue can be > reproed. > {code} > >fsdir = cluster.getNamesystem().getFSDirectory(); > INode testDirNodeAfterNNRestart = fsdir.getINode4Write(testDir.toString()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8622) Implement GETCONTENTSUMMARY operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695457#comment-14695457 ] Hudson commented on HDFS-8622: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #275 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/275/]) HDFS-8622. Implement GETCONTENTSUMMARY operation for WebImageViewer. Contributed by Jagadesh Kiran N. (aajisaka: rev 40f815131e822f5b7a8e6a6827f4b85b31220c43) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForContentSummary.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageLoader.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageHandler.java > Implement GETCONTENTSUMMARY operation for WebImageViewer > > > Key: HDFS-8622 > URL: https://issues.apache.org/jira/browse/HDFS-8622 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jagadesh Kiran N >Assignee: Jagadesh Kiran N > Attachments: HDFS-8622-00.patch, HDFS-8622-01.patch, > HDFS-8622-02.patch, HDFS-8622-03.patch, HDFS-8622-04.patch, > HDFS-8622-05.patch, HDFS-8622-06.patch, HDFS-8622-07.patch, > HDFS-8622-08.patch, HDFS-8622-09.patch, HDFS-8622-10.patch > > > it would be better for administrators if {code} GETCONTENTSUMMARY {code} are > supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695467#comment-14695467 ] Yufei Gu commented on HDFS-8828: Hi [~yzhangal], Thank you for detailed review. For 3, we do need recursively traverse because a created directory item in a snapshot diff report could have multiple levels of subdirectories. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8879) Quota by storage type usage incorrectly initialized upon namenode restart
[ https://issues.apache.org/jira/browse/HDFS-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695479#comment-14695479 ] Hudson commented on HDFS-8879: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #283 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/283/]) HDFS-8879. Quota by storage type usage incorrectly initialized upon namenode restart. Contributed by Xiaoyu Yao. (xyao: rev 3e715a4f4c46bcd8b3054cb0566e526c46bd5d66) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Quota by storage type usage incorrectly initialized upon namenode restart > - > > Key: HDFS-8879 > URL: https://issues.apache.org/jira/browse/HDFS-8879 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8879.01.patch > > > This was found by [~kihwal] as part of HDFS-8865 work in this > [comment|https://issues.apache.org/jira/browse/HDFS-8865?focusedCommentId=14660904&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660904]. > The unit test > testQuotaByStorageTypePersistenceInFsImage/testQuotaByStorageTypePersistenceInFsEdit > failed to detect this because they were using an obsolete > FsDirectory instance. Once added the highlighted line below, the issue can be > reproed. > {code} > >fsdir = cluster.getNamesystem().getFSDirectory(); > INode testDirNodeAfterNNRestart = fsdir.getINode4Write(testDir.toString()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8622) Implement GETCONTENTSUMMARY operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695482#comment-14695482 ] Hudson commented on HDFS-8622: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #283 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/283/]) HDFS-8622. Implement GETCONTENTSUMMARY operation for WebImageViewer. Contributed by Jagadesh Kiran N. (aajisaka: rev 40f815131e822f5b7a8e6a6827f4b85b31220c43) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageHandler.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForContentSummary.java > Implement GETCONTENTSUMMARY operation for WebImageViewer > > > Key: HDFS-8622 > URL: https://issues.apache.org/jira/browse/HDFS-8622 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jagadesh Kiran N >Assignee: Jagadesh Kiran N > Attachments: HDFS-8622-00.patch, HDFS-8622-01.patch, > HDFS-8622-02.patch, HDFS-8622-03.patch, HDFS-8622-04.patch, > HDFS-8622-05.patch, HDFS-8622-06.patch, HDFS-8622-07.patch, > HDFS-8622-08.patch, HDFS-8622-09.patch, HDFS-8622-10.patch > > > it would be better for administrators if {code} GETCONTENTSUMMARY {code} are > supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8622) Implement GETCONTENTSUMMARY operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695521#comment-14695521 ] Hudson commented on HDFS-8622: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2232 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2232/]) HDFS-8622. Implement GETCONTENTSUMMARY operation for WebImageViewer. Contributed by Jagadesh Kiran N. (aajisaka: rev 40f815131e822f5b7a8e6a6827f4b85b31220c43) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForContentSummary.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageHandler.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FSImageLoader.java > Implement GETCONTENTSUMMARY operation for WebImageViewer > > > Key: HDFS-8622 > URL: https://issues.apache.org/jira/browse/HDFS-8622 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jagadesh Kiran N >Assignee: Jagadesh Kiran N > Attachments: HDFS-8622-00.patch, HDFS-8622-01.patch, > HDFS-8622-02.patch, HDFS-8622-03.patch, HDFS-8622-04.patch, > HDFS-8622-05.patch, HDFS-8622-06.patch, HDFS-8622-07.patch, > HDFS-8622-08.patch, HDFS-8622-09.patch, HDFS-8622-10.patch > > > it would be better for administrators if {code} GETCONTENTSUMMARY {code} are > supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8879) Quota by storage type usage incorrectly initialized upon namenode restart
[ https://issues.apache.org/jira/browse/HDFS-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695518#comment-14695518 ] Hudson commented on HDFS-8879: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2232 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2232/]) HDFS-8879. Quota by storage type usage incorrectly initialized upon namenode restart. Contributed by Xiaoyu Yao. (xyao: rev 3e715a4f4c46bcd8b3054cb0566e526c46bd5d66) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java > Quota by storage type usage incorrectly initialized upon namenode restart > - > > Key: HDFS-8879 > URL: https://issues.apache.org/jira/browse/HDFS-8879 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8879.01.patch > > > This was found by [~kihwal] as part of HDFS-8865 work in this > [comment|https://issues.apache.org/jira/browse/HDFS-8865?focusedCommentId=14660904&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660904]. > The unit test > testQuotaByStorageTypePersistenceInFsImage/testQuotaByStorageTypePersistenceInFsEdit > failed to detect this because they were using an obsolete > FsDirectory instance. Once added the highlighted line below, the issue can be > reproed. > {code} > >fsdir = cluster.getNamesystem().getFSDirectory(); > INode testDirNodeAfterNNRestart = fsdir.getINode4Write(testDir.toString()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8865) Improve quota initialization performance
[ https://issues.apache.org/jira/browse/HDFS-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695574#comment-14695574 ] Xiaoyu Yao commented on HDFS-8865: -- Thanks for the patch, [~kihwal]! It looks pretty good to me. Just a few comments: 1. The number for large namespace looks impressive. Do you have the number for small/medium namespace? 2. Is it possible to add some profiling info between these logs below so that we can easily find how long it takes to finish quota initialization from the log? {code} LOG.info("Initializing quota with " + threads + " thread(s)"); ... LOG.info("Quota initialization complete.\n" + counts); {code} 3. Can you change to parameterized logging to avoid parameter construction in case the log statement is disabled. For example, {code} LOG.debug("Setting quota for {} +\n{}", dir, myCounts); {code} 4. NIT: typo chached -> cached? {code} // Directly access the name system to obtain the current chached usage. {code} 5. Now that HDFS-8879 is in, can you rebase and update the patch? Thanks! > Improve quota initialization performance > > > Key: HDFS-8865 > URL: https://issues.apache.org/jira/browse/HDFS-8865 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-8865.patch, HDFS-8865.v2.checkstyle.patch, > HDFS-8865.v2.patch > > > After replaying edits, the whole file system tree is recursively scanned in > order to initialize the quota. For big name space, this can take a very long > time. Since this is done during namenode failover, it also affects failover > latency. > By using the Fork-Join framework, I was able to greatly reduce the > initialization time. The following is the test result using the fsimage from > one of the big name nodes we have. > || threads || seconds|| > | 1 (existing) | 55| > | 1 (fork-join) | 68 | > | 4 | 16 | > | 8 | 8 | > | 12 | 6 | > | 16 | 5 | > | 20 | 4 | -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8890) Allow admin to specify which blockpools the balancer should run on
[ https://issues.apache.org/jira/browse/HDFS-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695580#comment-14695580 ] Tsz Wo Nicholas Sze commented on HDFS-8890: --- We probably already have this feature since we can specify paths when running Balancer. > Allow admin to specify which blockpools the balancer should run on > -- > > Key: HDFS-8890 > URL: https://issues.apache.org/jira/browse/HDFS-8890 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Reporter: Chris Trezzo >Assignee: Chris Trezzo > > Currently the balancer runs on all blockpools. Allow an admin to run the > balancer on a set of blockpools. This will enable the balancer to skip > blockpools that should not be balanced. For example, a tmp blockpool that has > a large amount of churn. > An example of the command line interface would be an additional flag that > specifies the blockpools by id: > -blockpools > BP-6299761-10.55.116.188-1415904647555,BP-47348528-10.51.120.139-1415904199257 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8854) Erasure coding: add ECPolicy to replace schema+cellSize in hadoop-hdfs
[ https://issues.apache.org/jira/browse/HDFS-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8854: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 Status: Resolved (was: Patch Available) Jenkins still generating unrelated failures sometimes, but we have 1 successful [run | https://builds.apache.org/job/Hadoop-HDFS-7285-Merge/84/]. Committed to both HDFS-7285-merge and HDFS-7285. Thanks Walter for the contribution, and Rakesh for reviewing! > Erasure coding: add ECPolicy to replace schema+cellSize in hadoop-hdfs > -- > > Key: HDFS-8854 > URL: https://issues.apache.org/jira/browse/HDFS-8854 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Walter Su >Assignee: Walter Su > Fix For: HDFS-7285 > > Attachments: HDFS-8854-Consolidated-20150806.02.txt, > HDFS-8854-HDFS-7285-merge.03.patch, HDFS-8854-HDFS-7285-merge.03.txt, > HDFS-8854-HDFS-7285.00.patch, HDFS-8854-HDFS-7285.01.patch, > HDFS-8854-HDFS-7285.02.patch, HDFS-8854-HDFS-7285.03.patch, HDFS-8854.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8890) Allow admin to specify which blockpools the balancer should run on
[ https://issues.apache.org/jira/browse/HDFS-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695585#comment-14695585 ] Tsz Wo Nicholas Sze commented on HDFS-8890: --- Oops, my previous comment is incorrect. Mixing something up. > Allow admin to specify which blockpools the balancer should run on > -- > > Key: HDFS-8890 > URL: https://issues.apache.org/jira/browse/HDFS-8890 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Reporter: Chris Trezzo >Assignee: Chris Trezzo > > Currently the balancer runs on all blockpools. Allow an admin to run the > balancer on a set of blockpools. This will enable the balancer to skip > blockpools that should not be balanced. For example, a tmp blockpool that has > a large amount of churn. > An example of the command line interface would be an additional flag that > specifies the blockpools by id: > -blockpools > BP-6299761-10.55.116.188-1415904647555,BP-47348528-10.51.120.139-1415904199257 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8894) Set SO_KEEPALIVE on DN server sockets
Nathan Roberts created HDFS-8894: Summary: Set SO_KEEPALIVE on DN server sockets Key: HDFS-8894 URL: https://issues.apache.org/jira/browse/HDFS-8894 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Nathan Roberts SO_KEEPALIVE is not set on things like datastreamer sockets which can cause lingering ESTABLISHED sockets when there is a network glitch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
[ https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695599#comment-14695599 ] Zhe Zhang commented on HDFS-8808: - Both reported test issues are unrelated and pass locally. The error message from Jenkins test result of {{testIdempotentAllocateBlockAndClose}} is interesting though. We should examine it in a separate JIRA. The checkstyle issue was pre-existing. > dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby > > > Key: HDFS-8808 > URL: https://issues.apache.org/jira/browse/HDFS-8808 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Gautam Gopalakrishnan >Assignee: Zhe Zhang > Attachments: HDFS-8808-00.patch, HDFS-8808-01.patch, > HDFS-8808-02.patch, HDFS-8808-03.patch > > > The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the > speed with which the fsimage is copied between the namenodes during regular > use. However, as a side effect, this also limits transfers when the > {{-bootstrapStandby}} option is used. This option is often used during > upgrades and could potentially slow down the entire workflow. The request > here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth > setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemanja Matkovic updated HDFS-8078: --- Description: 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error "-get: 2401 is not an IP string literal." This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. was: /patch1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataX
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemanja Matkovic updated HDFS-8078: --- Description: /patch1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error "-get: 2401 is not an IP string literal." This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. was: 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataX
[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695634#comment-14695634 ] Jing Zhao commented on HDFS-8891: - Thanks for working on this, Yong! Agree we should keep the srcs order. For the fix, maybe we only need to replace "HashSet" to "LinkedHashSet"? > HDFS concat should keep srcs order > -- > > Key: HDFS-8891 > URL: https://issues.apache.org/jira/browse/HDFS-8891 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yong Zhang >Assignee: Yong Zhang > Attachments: HDFS-8891.001.patch > > > FSDirConcatOp.verifySrcFiles may change src files order, but it should their > order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695685#comment-14695685 ] Yongjun Zhang commented on HDFS-8828: - Hello [~yufeigu], I expect every new CREATE/MODIFICATION below the newly created dir would also have an entry in the snapshot diff report (maybe except the first level children case described in my last comment), is this not the case? Thanks. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8859: -- Priority: Major (was: Critical) This is a good change although it does not reduce the overall datanode memory footprint much. (For 10m blocks, it only reduces 400MB memory. However, a datanode does not even have 1m blocks in practice.) > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap>(); > {code} > Currently we use a HashMap {{Map}} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8861) Remove unnecessary log from method FSNamesystem.getCorruptFiles
[ https://issues.apache.org/jira/browse/HDFS-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-8861: Resolution: Won't Fix Status: Resolved (was: Patch Available) Close it and leave getCorruptFiles unchanged, that warn log is fine. However HDFS-8522 patch is necessary. > Remove unnecessary log from method FSNamesystem.getCorruptFiles > --- > > Key: HDFS-8861 > URL: https://issues.apache.org/jira/browse/HDFS-8861 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou >Priority: Minor > Attachments: HDFS-8861.1.patch > > > The log in FSNamesystem.getCorruptFiles will print out too many messages > mixed with other log entries, which makes whole log quite verbose, hard to > understood and analyzed, especially in those cases where SuperuserPrivilege > check and Operation check are not satisfied in frequent calls of > listCorruptFileBlocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8883) NameNode Metrics : Add FSNameSystem lock Queue Length
[ https://issues.apache.org/jira/browse/HDFS-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-8883: Priority: Major (was: Minor) > NameNode Metrics : Add FSNameSystem lock Queue Length > - > > Key: HDFS-8883 > URL: https://issues.apache.org/jira/browse/HDFS-8883 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 2.8.0 > > Attachments: HDFS-8883.001.patch > > > FSNameSystemLock can have contention when NameNode is under load. This patch > adds LockQueueLength -- the number of threads waiting on FSNameSystemLock -- > as a metric in NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695800#comment-14695800 ] Chang Li commented on HDFS-6407: [~wheat9] how soon could you check in this code? Are you still waiting for some more reviews? > new namenode UI, lost ability to sort columns in datanode tab > - > > Key: HDFS-6407 > URL: https://issues.apache.org/jira/browse/HDFS-6407 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Nathan Roberts >Assignee: Haohui Mai >Priority: Critical > Labels: BB2015-05-TBR > Attachments: 002-datanodes-sorted-capacityUsed.png, > 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, > HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, > HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.011.patch, > HDFS-6407.4.patch, HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, > HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png, sorting > 2.png, sorting table.png > > > old ui supported clicking on column header to sort on that column. The new ui > seems to have dropped this very useful feature. > There are a few tables in the Namenode UI to display datanodes information, > directory listings and snapshots. > When there are many items in the tables, it is useful to have ability to sort > on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
Andrew Wang created HDFS-8895: - Summary: Remove deprecated BlockStorageLocation APIs Key: HDFS-8895 URL: https://issues.apache.org/jira/browse/HDFS-8895 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-7649: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) +1 Committed for 2.8.0. Thanks [~brahmareddy]. > Multihoming docs should emphasize using hostnames in configurations > --- > > Key: HDFS-7649 > URL: https://issues.apache.org/jira/browse/HDFS-7649 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Arpit Agarwal >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-7649.patch > > > The docs should emphasize that master and slave configurations should > hostnames wherever possible. > Link to current docs: > https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696000#comment-14696000 ] Yongjun Zhang commented on HDFS-8828: - Hi [~yufeigu], Thanks for answering my question in person. So for newly created dir, there is indeed one entry "CREATE" in the snapshot diff report, and no entries for new elements created below this dir. So please take care of my comment 1, 2 in my previous review, plus: 3. Suggest to change the {{getExcludeList}} method to {{getTraverseExcludeList}} (hopefully a better name) and with the following javadoc as we agreed. {code} This method returns a list of items to be excluded when recursively traversing newDir to build the copy list. Specifically, given a newly created directory newDir (a CREATE entry in the snapshot diff), if a previously copied file/directory itemX is moved (a RENAME entry in the snapshot diff) into newDir, itemX should be excluded when recursively traversing newDir in #traverseDirectory, so that it will not to be copied again. If the same itemX also has a MODIFY entry in the snapshot diff report, meaning it was modified after it was previously copied, it will still be added to the copy list (handled in the main loop of doBuildListingWithSnapshotDiff). {code} 4. Do refactoring to consolidate duplicated code in test code that we discussed. Hi [~jingzhao], I had quite some side discussion with Yufei, I am +1 on the change after the above comments are addressed. Would you please take a look at it if you wish? I'm targeting at committing it next Monday. Thanks much. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696023#comment-14696023 ] Jing Zhao commented on HDFS-8828: - Sure. I will review the patch. Thanks for the work, Yufei and Yongjun! > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8435: -- Status: Open (was: Patch Available) > createNonRecursive support needed in WebHdfsFileSystem to support HBase > --- > > Key: HDFS-8435 > URL: https://issues.apache.org/jira/browse/HDFS-8435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Vinoth Sathappan >Assignee: Jakob Homan > Attachments: HDFS-8435-branch-2.7.001.patch, HDFS-8435.001.patch, > HDFS-8435.002.patch > > > The WebHdfsFileSystem implementation doesn't support createNonRecursive. > HBase extensively depends on that for proper functioning. Currently, when the > region servers are started over web hdfs, they crash due with - > createNonRecursive unsupported for this filesystem class > org.apache.hadoop.hdfs.web.SWebHdfsFileSystem > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) > at > org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8888) Support volumes in HDFS
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696039#comment-14696039 ] Konstantin Shvachko commented on HDFS-: --- Could you please explain your concept of volumes. HDFS already has one from federation. I guess you are thinking of something different? > Support volumes in HDFS > --- > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai > > There are multiple types of zones (e.g., snapshottable directories, > encryption zones, directories with quotas) which are conceptually close to > namespace volumes in traditional file systems. > This jira proposes to introduce the concept of volume to simplify the > implementation of snapshots and encryption zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8435: -- Status: Patch Available (was: Open) > createNonRecursive support needed in WebHdfsFileSystem to support HBase > --- > > Key: HDFS-8435 > URL: https://issues.apache.org/jira/browse/HDFS-8435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Vinoth Sathappan >Assignee: Jakob Homan > Attachments: HDFS-8435-branch-2.7.001.patch, HDFS-8435.001.patch, > HDFS-8435.002.patch, HDFS-8435.003.patch > > > The WebHdfsFileSystem implementation doesn't support createNonRecursive. > HBase extensively depends on that for proper functioning. Currently, when the > region servers are started over web hdfs, they crash due with - > createNonRecursive unsupported for this filesystem class > org.apache.hadoop.hdfs.web.SWebHdfsFileSystem > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) > at > org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8435: -- Attachment: HDFS-8435.003.patch New patch that applies to both trunk and branch 2. The failed tests were because the default of createParent param in WebHDFS was being set to false, but then not being used by the actual call and overridden to true in the create call on the dfsclient. I've fixed this to pay attention to the parameter and updated the spec to be correct. Good catch on the throw. Removed. I had played around with that uber test a bit. Using the annotation loses the explicit method about what went wrong on each test. I put as much into the helper method as looked reasonable (judgment call here); when I put more of the per-test logic into the helper (expected exception, subsequent message), it got really crowded and ugly. > createNonRecursive support needed in WebHdfsFileSystem to support HBase > --- > > Key: HDFS-8435 > URL: https://issues.apache.org/jira/browse/HDFS-8435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Vinoth Sathappan >Assignee: Jakob Homan > Attachments: HDFS-8435-branch-2.7.001.patch, HDFS-8435.001.patch, > HDFS-8435.002.patch, HDFS-8435.003.patch > > > The WebHdfsFileSystem implementation doesn't support createNonRecursive. > HBase extensively depends on that for proper functioning. Currently, when the > region servers are started over web hdfs, they crash due with - > createNonRecursive unsupported for this filesystem class > org.apache.hadoop.hdfs.web.SWebHdfsFileSystem > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) > at > org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696079#comment-14696079 ] Hudson commented on HDFS-7649: -- FAILURE: Integrated in Hadoop-trunk-Commit #8295 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8295/]) HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. (Contributed by Brahma Reddy Battula) (arp: rev ae57d60d8239916312bca7149e2285b2ed3b123a) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Multihoming docs should emphasize using hostnames in configurations > --- > > Key: HDFS-7649 > URL: https://issues.apache.org/jira/browse/HDFS-7649 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Arpit Agarwal >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-7649.patch > > > The docs should emphasize that master and slave configurations should > hostnames wherever possible. > Link to current docs: > https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696103#comment-14696103 ] Ravi Prakash commented on HDFS-6407: The patch looks good to me. +1. > new namenode UI, lost ability to sort columns in datanode tab > - > > Key: HDFS-6407 > URL: https://issues.apache.org/jira/browse/HDFS-6407 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Nathan Roberts >Assignee: Haohui Mai >Priority: Critical > Labels: BB2015-05-TBR > Attachments: 002-datanodes-sorted-capacityUsed.png, > 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, > HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, > HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.011.patch, > HDFS-6407.4.patch, HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, > HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png, sorting > 2.png, sorting table.png > > > old ui supported clicking on column header to sort on that column. The new ui > seems to have dropped this very useful feature. > There are a few tables in the Namenode UI to display datanodes information, > directory listings and snapshots. > When there are many items in the tables, it is useful to have ability to sort > on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8895: -- Status: Patch Available (was: Open) > Remove deprecated BlockStorageLocation APIs > --- > > Key: HDFS-8895 > URL: https://issues.apache.org/jira/browse/HDFS-8895 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: HDFS-8895.001.patch > > > HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so > it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8895: -- Attachment: HDFS-8895.001.patch Patch attached, deleting lots of the code. I looked at the original patch at HDFS-3672 for guidance as to what to delete, would appreciate a second look that I didn't miss anything. > Remove deprecated BlockStorageLocation APIs > --- > > Key: HDFS-8895 > URL: https://issues.apache.org/jira/browse/HDFS-8895 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: HDFS-8895.001.patch > > > HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so > it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8895: -- Release Note: This removes the deprecated DistributedFileSystem#getFileBlockStorageLocations API used for getting VolumeIds of block replicas. Instead, use BlockLocation#getStorageIds to get very similar information. > Remove deprecated BlockStorageLocation APIs > --- > > Key: HDFS-8895 > URL: https://issues.apache.org/jira/browse/HDFS-8895 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: HDFS-8895.001.patch > > > HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so > it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Status: Patch Available (was: Open) > Make Trash Interval configurable for each of the namespaces > --- > > Key: HDFS-6244 > URL: https://issues.apache.org/jira/browse/HDFS-6244 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Labels: BB2015-05-TBR > Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, > HDFS-6244.v3.patch, HDFS-6244.v4.patch > > > Somehow we need to avoid the cluster filling up. > One solution is to have a different trash policy per namespace. However, if > we can simply make the property configurable per namespace, then the same > config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Status: Open (was: Patch Available) > Make Trash Interval configurable for each of the namespaces > --- > > Key: HDFS-6244 > URL: https://issues.apache.org/jira/browse/HDFS-6244 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Labels: BB2015-05-TBR > Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, > HDFS-6244.v3.patch, HDFS-6244.v4.patch > > > Somehow we need to avoid the cluster filling up. > One solution is to have a different trash policy per namespace. However, if > we can simply make the property configurable per namespace, then the same > config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated HDFS-8828: --- Attachment: HDFS-8828.007.patch > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch, HDFS-8828.007.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696162#comment-14696162 ] Yufei Gu commented on HDFS-8828: No. Just one CREATE item in snapshot diff report in this case. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch, HDFS-8828.007.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696164#comment-14696164 ] Yufei Gu commented on HDFS-8828: Hi [~yzhangal], Thanks very much for code review. I've done the modification and uploaded the new patch. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch, HDFS-8828.007.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696167#comment-14696167 ] Yufei Gu commented on HDFS-8828: Thank you, [~jingzhao]. Glad to have you review the code. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch, HDFS-8828.007.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696224#comment-14696224 ] Yongjun Zhang commented on HDFS-8828: - Thank you [~yufeigu] and [~jingzhao]! > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch, HDFS-8828.007.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696262#comment-14696262 ] Yi Liu commented on HDFS-8859: -- Seems Jenkins has some problem and all are timeout, I randomly select 10 of them, they run successfully quickly, let me re-trigger the Jenkins. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap>(); > {code} > Currently we use a HashMap {{Map}} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696288#comment-14696288 ] Brahma Reddy Battula commented on HDFS-7649: [~arpitagarwal] thanks a lot for your review and commit!! > Multihoming docs should emphasize using hostnames in configurations > --- > > Key: HDFS-7649 > URL: https://issues.apache.org/jira/browse/HDFS-7649 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Arpit Agarwal >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-7649.patch > > > The docs should emphasize that master and slave configurations should > hostnames wherever possible. > Link to current docs: > https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-7278: -- Labels: 2.6.1-candidate (was: ) > Add a command that allows sysadmins to manually trigger full block reports > from a DN > > > Key: HDFS-7278 > URL: https://issues.apache.org/jira/browse/HDFS-7278 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: HDFS-7278.002.patch, HDFS-7278.003.patch, > HDFS-7278.004.patch, HDFS-7278.005.patch > > > We should add a command that allows sysadmins to manually trigger full block > reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error
[ https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-7915: Attachment: HDFS-7915.branch-2.6.patch HADOOP-11802 depends on this issue. If we are going to cherry-pick HADOOP-11802, we need to cherry-pick this issue first. Attaching a patch for branch-2.6. > The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell > the DFSClient about it because of a network error > - > > Key: HDFS-7915 > URL: https://issues.apache.org/jira/browse/HDFS-7915 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.7.0 > > Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch, > HDFS-7915.004.patch, HDFS-7915.005.patch, HDFS-7915.006.patch, > HDFS-7915.branch-2.6.patch > > > The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell > the DFSClient about it because of a network error. In > {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first > part (mark the slot as used) and fail at the second part (tell the DFSClient > what it did). The "try" block for unregistering the slot only covers a > failure in the first part, not the second part. In this way, a divergence can > form between the views of which slots are allocated on DFSClient and on > server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-8070: Attachment: HDFS-8070.branch-2.6.patch Attaching a patch for branch-2.6. If we are going to include HADOOP-11802, we need to include HDFS-7915 and this issue as well. > Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode > --- > > Key: HDFS-8070 > URL: https://issues.apache.org/jira/browse/HDFS-8070 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.0 >Reporter: Gopal V >Assignee: Colin Patrick McCabe >Priority: Blocker > Fix For: 2.7.1 > > Attachments: HDFS-8070.001.patch, HDFS-8070.branch-2.6.patch > > > HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded > split-generation. > I hit this immediately after I upgraded the data, so I wonder if the > ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 > Client? > {code} > 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC > pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to > release short-circuit shared memory slot Slot(slotIdx=2, > shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending > ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. > Closing shared memory segment. > java.io.IOException: ERROR_INVALID: there is no shared memory segment > registered with shmId a86ee34576d93c4964005d90b0d97c38 > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC > pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] > shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, > parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got > IOException calling shutdown(SHUT_RDWR) > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) > at > org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) > at > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC > pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to > release short-circuit shared memory slot Slot(slotIdx=4, > shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending > ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. > Closing shared memory segment. > java.io.IOException: ERROR_INVALID: there is no shared memory segment > registered with shmId a86ee34576d93c4964005d90b0d97c38 > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) > at > java.util.concurrent.E
[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zhang updated HDFS-8891: - Attachment: HDFS-8891.002.patch Thanks [~jingzhao] for review. Upload 2th path base on [~jingzhao]'s comment. But also need to change UT code. > HDFS concat should keep srcs order > -- > > Key: HDFS-8891 > URL: https://issues.apache.org/jira/browse/HDFS-8891 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yong Zhang >Assignee: Yong Zhang > Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch > > > FSDirConcatOp.verifySrcFiles may change src files order, but it should their > order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HDFS-7980: -- Attachment: hadoop-241.patch > Incremental BlockReport will dramatically slow down the startup of a namenode > -- > > Key: HDFS-7980 > URL: https://issues.apache.org/jira/browse/HDFS-7980 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hui Zheng >Assignee: Walter Su > Labels: 2.6.1-candidate > Fix For: 2.7.1 > > Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, > HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch, > hadoop-241.patch > > > In the current implementation the datanode will call the > reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before > calling the bpNamenode.blockReport() method. So in a large(several thousands > of datanodes) and busy cluster it will slow down(more than one hour) the > startup of namenode. > {code} > List blockReport() throws IOException { > // send block report if timer has expired. > final long startTime = now(); > if (startTime - lastBlockReport <= dnConf.blockReportInterval) { > return null; > } > final ArrayList cmds = new ArrayList(); > // Flush any block information that precedes the block report. Otherwise > // we have a chance that we will miss the delHint information > // or we will report an RBW replica after the BlockReport already reports > // a FINALIZED one. > reportReceivedDeletedBlocks(); > lastDeletedReport = startTime; > . > // Send the reports to the NN. > int numReportsSent = 0; > int numRPCs = 0; > boolean success = false; > long brSendStartTime = now(); > try { > if (totalBlockCount < dnConf.blockReportSplitThreshold) { > // Below split threshold, send all reports in a single message. > DatanodeCommand cmd = bpNamenode.blockReport( > bpRegistration, bpos.getBlockPoolId(), reports); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696329#comment-14696329 ] Ming Ma commented on HDFS-6244: --- Thanks [~l201514]! The patch adds key with prefix of "dfs.federation" to CommonConfigurationKeysPublic. Not sure if that it is a good place to put it given federation is specific to HDFS and CommonConfigurationKeysPublic and Trash are under hadoop-common-project and might be designed to be used by any FileSystem. Your early patch had NameNode read the new property defined in hdfs-site.xml and set the value for {{fs.trash.interval}} before creating {{Trash}}. Any reason not to go with that? {{dfs.federation.trash.interval.ns.}} might be misleading as ns might mean nanosecond. "minutes" might be better. Another thing, maybe we can drop federation from the name; {{dfs.trash.interval.minutes}} is good enough; just like how {{dfs.namenode.rpc-address}} is used as prefix for different namespaces. It might be useful to add some description for the new property and how it overrides the {{fs.trash.interval}}. The patch includes unrelated FairSchedulerPage. > Make Trash Interval configurable for each of the namespaces > --- > > Key: HDFS-6244 > URL: https://issues.apache.org/jira/browse/HDFS-6244 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Labels: BB2015-05-TBR > Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, > HDFS-6244.v3.patch, HDFS-6244.v4.patch > > > Somehow we need to avoid the cluster filling up. > One solution is to have a different trash policy per namespace. However, if > we can simply make the property configurable per namespace, then the same > config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HDFS-7980: -- Attachment: (was: hadoop-241.patch) > Incremental BlockReport will dramatically slow down the startup of a namenode > -- > > Key: HDFS-7980 > URL: https://issues.apache.org/jira/browse/HDFS-7980 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hui Zheng >Assignee: Walter Su > Labels: 2.6.1-candidate > Fix For: 2.7.1 > > Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, > HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch > > > In the current implementation the datanode will call the > reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before > calling the bpNamenode.blockReport() method. So in a large(several thousands > of datanodes) and busy cluster it will slow down(more than one hour) the > startup of namenode. > {code} > List blockReport() throws IOException { > // send block report if timer has expired. > final long startTime = now(); > if (startTime - lastBlockReport <= dnConf.blockReportInterval) { > return null; > } > final ArrayList cmds = new ArrayList(); > // Flush any block information that precedes the block report. Otherwise > // we have a chance that we will miss the delHint information > // or we will report an RBW replica after the BlockReport already reports > // a FINALIZED one. > reportReceivedDeletedBlocks(); > lastDeletedReport = startTime; > . > // Send the reports to the NN. > int numReportsSent = 0; > int numRPCs = 0; > boolean success = false; > long brSendStartTime = now(); > try { > if (totalBlockCount < dnConf.blockReportSplitThreshold) { > // Below split threshold, send all reports in a single message. > DatanodeCommand cmd = bpNamenode.blockReport( > bpRegistration, bpos.getBlockPoolId(), reports); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity
[ https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696349#comment-14696349 ] Kai Sasaki commented on HDFS-8287: -- I rebased HDFS-7285. Could you please check it? Thank you! > DFSStripedOutputStream.writeChunk should not wait for writing parity > - > > Key: HDFS-8287 > URL: https://issues.apache.org/jira/browse/HDFS-8287 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tsz Wo Nicholas Sze >Assignee: Kai Sasaki > Attachments: HDFS-8287-HDFS-7285.00.patch, > HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, > HDFS-8287-HDFS-7285.03.patch > > > When a stripping cell is full, writeChunk computes and generates parity > packets. It sequentially calls waitAndQueuePacket so that user client cannot > continue to write data until it finishes. > We should allow user client to continue writing instead but not blocking it > when writing parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity
[ https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated HDFS-8287: - Attachment: HDFS-8287-HDFS-7285.03.patch > DFSStripedOutputStream.writeChunk should not wait for writing parity > - > > Key: HDFS-8287 > URL: https://issues.apache.org/jira/browse/HDFS-8287 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tsz Wo Nicholas Sze >Assignee: Kai Sasaki > Attachments: HDFS-8287-HDFS-7285.00.patch, > HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, > HDFS-8287-HDFS-7285.03.patch > > > When a stripping cell is full, writeChunk computes and generates parity > packets. It sequentially calls waitAndQueuePacket so that user client cannot > continue to write data until it finishes. > We should allow user client to continue writing instead but not blocking it > when writing parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696376#comment-14696376 ] Zhihua Deng commented on HDFS-7980: --- Recently, we encountered the same problem in our cluster of version 2.4.1 and created a patch(https://github.com/dengzhhu653/hdfs-2.4.1/blob/master/hadoop-241.patch) according to the patch attached. let the restarted NN process the first full report by the faster processFirstBlockReport method, and add an condition AddBlockResult.ADDED==result in addStoredBlockImmediate method when FSNameSystem tries to invoke incrementSafeBlockCount method. The problem is I am not so sure if there exists any potential issues of the patch when I apply it to our cluster , any advises and opinions will be greatly appreciated and taken seriously, thanks! > Incremental BlockReport will dramatically slow down the startup of a namenode > -- > > Key: HDFS-7980 > URL: https://issues.apache.org/jira/browse/HDFS-7980 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hui Zheng >Assignee: Walter Su > Labels: 2.6.1-candidate > Fix For: 2.7.1 > > Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, > HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch > > > In the current implementation the datanode will call the > reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before > calling the bpNamenode.blockReport() method. So in a large(several thousands > of datanodes) and busy cluster it will slow down(more than one hour) the > startup of namenode. > {code} > List blockReport() throws IOException { > // send block report if timer has expired. > final long startTime = now(); > if (startTime - lastBlockReport <= dnConf.blockReportInterval) { > return null; > } > final ArrayList cmds = new ArrayList(); > // Flush any block information that precedes the block report. Otherwise > // we have a chance that we will miss the delHint information > // or we will report an RBW replica after the BlockReport already reports > // a FINALIZED one. > reportReceivedDeletedBlocks(); > lastDeletedReport = startTime; > . > // Send the reports to the NN. > int numReportsSent = 0; > int numRPCs = 0; > boolean success = false; > long brSendStartTime = now(); > try { > if (totalBlockCount < dnConf.blockReportSplitThreshold) { > // Below split threshold, send all reports in a single message. > DatanodeCommand cmd = bpNamenode.blockReport( > bpRegistration, bpos.getBlockPoolId(), reports); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696383#comment-14696383 ] Hadoop QA commented on HDFS-8828: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 26s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 49s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 6m 26s | Tests passed in hadoop-distcp. | | | | 43m 13s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750404/HDFS-8828.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0a03054 | | hadoop-distcp test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11990/artifact/patchprocess/testrun_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11990/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11990/console | This message was automatically generated. > Utilize Snapshot diff report to build copy list in distcp > - > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch, HDFS-8828.007.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696394#comment-14696394 ] Hadoop QA commented on HDFS-8435: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 24m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:red}-1{color} | javac | 7m 46s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 10m 10s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 58s | Site still builds. | | {color:red}-1{color} | checkstyle | 3m 38s | The applied patch generated 1 new checkstyle issues (total was 104, now 105). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 34s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 23m 20s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 118m 24s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 28s | Tests passed in hadoop-hdfs-client. | | | | 200m 44s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | org.apache.hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750390/HDFS-8435.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / 0a03054 | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/artifact/patchprocess/diffJavacWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11989/console | This message was automatically generated. > createNonRecursive support needed in WebHdfsFileSystem to support HBase > --- > > Key: HDFS-8435 > URL: https://issues.apache.org/jira/browse/HDFS-8435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.6.0 >Reporter: Vinoth Sathappan >Assignee: Jakob Homan > Attachments: HDFS-8435-branch-2.7.001.patch, HDFS-8435.001.patch, > HDFS-8435.002.patch, HDFS-8435.003.patch > > > The WebHdfsFileSystem implementation doesn't support createNonRecursive. > HBase extensively depends on that for proper functioning. Currently, when the > region servers are started over web hdfs, they crash due with - > createNonRecursive unsupported for this filesystem class > org.apache.hadoop.hdfs.web.SWebHdfsFileSystem > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) > at > org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) > at > org.apac
[jira] [Commented] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696399#comment-14696399 ] Hadoop QA commented on HDFS-8895: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 10s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 0m 26s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 28s | Tests passed in hadoop-hdfs-client. | | | | 50m 47s | | \\ \\ || Reason || Tests || | Failed build | hadoop-hdfs | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750398/HDFS-8895.001.patch | | Optional Tests | javac unit javadoc findbugs checkstyle | | git revision | trunk / 0a03054 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11991/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11991/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11991/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11991/console | This message was automatically generated. > Remove deprecated BlockStorageLocation APIs > --- > > Key: HDFS-8895 > URL: https://issues.apache.org/jira/browse/HDFS-8895 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: HDFS-8895.001.patch > > > HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so > it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8622) Implement GETCONTENTSUMMARY operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-8622: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks [~jagadesh.kiran] for the continuous work! > Implement GETCONTENTSUMMARY operation for WebImageViewer > > > Key: HDFS-8622 > URL: https://issues.apache.org/jira/browse/HDFS-8622 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jagadesh Kiran N >Assignee: Jagadesh Kiran N > Fix For: 2.8.0 > > Attachments: HDFS-8622-00.patch, HDFS-8622-01.patch, > HDFS-8622-02.patch, HDFS-8622-03.patch, HDFS-8622-04.patch, > HDFS-8622-05.patch, HDFS-8622-06.patch, HDFS-8622-07.patch, > HDFS-8622-08.patch, HDFS-8622-09.patch, HDFS-8622-10.patch > > > it would be better for administrators if {code} GETCONTENTSUMMARY {code} are > supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity
[ https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696458#comment-14696458 ] Hadoop QA commented on HDFS-8287: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 43s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 39s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:red}-1{color} | eclipse:eclipse | 0m 15s | The patch failed to build with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 25s | Post-patch findbugs hadoop-hdfs-project/hadoop-hdfs compilation is broken. | | {color:green}+1{color} | findbugs | 0m 25s | The patch does not introduce any new Findbugs (version ) warnings. | | {color:red}-1{color} | native | 0m 23s | Failed to build the native portion of hadoop-common prior to running the unit tests in hadoop-hdfs-project/hadoop-hdfs | | | | 37m 9s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750432/HDFS-8287-HDFS-7285.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 1d37a88 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11993/artifact/patchprocess/patchReleaseAuditProblems.txt | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11993/console | This message was automatically generated. > DFSStripedOutputStream.writeChunk should not wait for writing parity > - > > Key: HDFS-8287 > URL: https://issues.apache.org/jira/browse/HDFS-8287 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tsz Wo Nicholas Sze >Assignee: Kai Sasaki > Attachments: HDFS-8287-HDFS-7285.00.patch, > HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, > HDFS-8287-HDFS-7285.03.patch > > > When a stripping cell is full, writeChunk computes and generates parity > packets. It sequentially calls waitAndQueuePacket so that user client cannot > continue to write data until it finishes. > We should allow user client to continue writing instead but not blocking it > when writing parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-6939) Support path-based filtering of inotify events
[ https://issues.apache.org/jira/browse/HDFS-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore reassigned HDFS-6939: Assignee: Surendra Singh Lilhore > Support path-based filtering of inotify events > -- > > Key: HDFS-6939 > URL: https://issues.apache.org/jira/browse/HDFS-6939 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, qjm >Reporter: James Thomas >Assignee: Surendra Singh Lilhore > > Users should be able to specify that they only want events involving > particular paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8894) Set SO_KEEPALIVE on DN server sockets
[ https://issues.apache.org/jira/browse/HDFS-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanaka kumar avvaru reassigned HDFS-8894: - Assignee: kanaka kumar avvaru > Set SO_KEEPALIVE on DN server sockets > - > > Key: HDFS-8894 > URL: https://issues.apache.org/jira/browse/HDFS-8894 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: kanaka kumar avvaru > > SO_KEEPALIVE is not set on things like datastreamer sockets which can cause > lingering ESTABLISHED sockets when there is a network glitch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8824) Do not use small blocks for balancing the cluster
[ https://issues.apache.org/jira/browse/HDFS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696520#comment-14696520 ] Jitendra Nath Pandey commented on HDFS-8824: +1 for the latest patch. > Do not use small blocks for balancing the cluster > - > > Key: HDFS-8824 > URL: https://issues.apache.org/jira/browse/HDFS-8824 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h8824_20150727b.patch, h8824_20150811b.patch > > > Balancer gets datanode block lists from NN and then move the blocks in order > to balance the cluster. It should not use the blocks with small size since > moving the small blocks generates a lot of overhead and the small blocks do > not help balancing the cluster much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696524#comment-14696524 ] Vinayakumar B commented on HDFS-7213: - Cherry-picked to 2.6.1. > processIncrementalBlockReport performance degradation > - > > Key: HDFS-7213 > URL: https://issues.apache.org/jira/browse/HDFS-7213 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Eric Payne >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.6.1 > > Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt > > > {{BlockManager#processIncrementalBlockReport}} has a debug line that is > missing a {{isDebugEnabled}} check. The write lock is being held. Coupled > with the increase in incremental block reports from receiving blocks, under > heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7213: Fix Version/s: (was: 2.7.0) 2.6.1 > processIncrementalBlockReport performance degradation > - > > Key: HDFS-7213 > URL: https://issues.apache.org/jira/browse/HDFS-7213 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Eric Payne >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.6.1 > > Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt > > > {{BlockManager#processIncrementalBlockReport}} has a debug line that is > missing a {{isDebugEnabled}} check. The write lock is being held. Coupled > with the increase in incremental block reports from receiving blocks, under > heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7213: Fix Version/s: 2.7.0 > processIncrementalBlockReport performance degradation > - > > Key: HDFS-7213 > URL: https://issues.apache.org/jira/browse/HDFS-7213 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Eric Payne >Priority: Critical > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt > > > {{BlockManager#processIncrementalBlockReport}} has a debug line that is > missing a {{isDebugEnabled}} check. The write lock is being held. Coupled > with the increase in incremental block reports from receiving blocks, under > heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7213: Labels: (was: 2.6.1-candidate) > processIncrementalBlockReport performance degradation > - > > Key: HDFS-7213 > URL: https://issues.apache.org/jira/browse/HDFS-7213 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Eric Payne >Priority: Critical > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt > > > {{BlockManager#processIncrementalBlockReport}} has a debug line that is > missing a {{isDebugEnabled}} check. The write lock is being held. Coupled > with the increase in incremental block reports from receiving blocks, under > heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7235: Labels: (was: 2.6.1-candidate) > DataNode#transferBlock should report blocks that don't exist using > reportBadBlock > - > > Key: HDFS-7235 > URL: https://issues.apache.org/jira/browse/HDFS-7235 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, > HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, > HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch > > > When to decommission a DN, the process hangs. > What happens is, when NN chooses a replica as a source to replicate data on > the to-be-decommissioned DN to other DNs, it favors choosing this DN > to-be-decommissioned as the source of transfer (see BlockManager.java). > However, because of the bad disk, the DN would detect the source block to be > transfered as invalidBlock with the following logic in FsDatasetImpl.java: > {code} > /** Does the block exist and have the given state? */ > private boolean isValid(final ExtendedBlock b, final ReplicaState state) { > final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), > b.getLocalBlock()); > return replicaInfo != null > && replicaInfo.getState() == state > && replicaInfo.getBlockFile().exists(); > } > {code} > The reason that this method returns false (detecting invalid block) is > because the block file doesn't exist due to bad disk in this case. > The key issue we found here is, after DN detects an invalid block for the > above reason, it doesn't report the invalid block back to NN, thus NN doesn't > know that the block is corrupted, and keeps sending the data transfer request > to the same DN to be decommissioned, again and again. This caused an infinite > loop, so the decommission process hangs. > Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7235: Fix Version/s: 2.6.1 > DataNode#transferBlock should report blocks that don't exist using > reportBadBlock > - > > Key: HDFS-7235 > URL: https://issues.apache.org/jira/browse/HDFS-7235 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, > HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, > HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch > > > When to decommission a DN, the process hangs. > What happens is, when NN chooses a replica as a source to replicate data on > the to-be-decommissioned DN to other DNs, it favors choosing this DN > to-be-decommissioned as the source of transfer (see BlockManager.java). > However, because of the bad disk, the DN would detect the source block to be > transfered as invalidBlock with the following logic in FsDatasetImpl.java: > {code} > /** Does the block exist and have the given state? */ > private boolean isValid(final ExtendedBlock b, final ReplicaState state) { > final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), > b.getLocalBlock()); > return replicaInfo != null > && replicaInfo.getState() == state > && replicaInfo.getBlockFile().exists(); > } > {code} > The reason that this method returns false (detecting invalid block) is > because the block file doesn't exist due to bad disk in this case. > The key issue we found here is, after DN detects an invalid block for the > above reason, it doesn't report the invalid block back to NN, thus NN doesn't > know that the block is corrupted, and keeps sending the data transfer request > to the same DN to be decommissioned, again and again. This caused an infinite > loop, so the decommission process hangs. > Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696535#comment-14696535 ] Vinayakumar B commented on HDFS-7235: - Cherry-picked to 2.6.1 > DataNode#transferBlock should report blocks that don't exist using > reportBadBlock > - > > Key: HDFS-7235 > URL: https://issues.apache.org/jira/browse/HDFS-7235 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, > HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, > HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch > > > When to decommission a DN, the process hangs. > What happens is, when NN chooses a replica as a source to replicate data on > the to-be-decommissioned DN to other DNs, it favors choosing this DN > to-be-decommissioned as the source of transfer (see BlockManager.java). > However, because of the bad disk, the DN would detect the source block to be > transfered as invalidBlock with the following logic in FsDatasetImpl.java: > {code} > /** Does the block exist and have the given state? */ > private boolean isValid(final ExtendedBlock b, final ReplicaState state) { > final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), > b.getLocalBlock()); > return replicaInfo != null > && replicaInfo.getState() == state > && replicaInfo.getBlockFile().exists(); > } > {code} > The reason that this method returns false (detecting invalid block) is > because the block file doesn't exist due to bad disk in this case. > The key issue we found here is, after DN detects an invalid block for the > above reason, it doesn't report the invalid block back to NN, thus NN doesn't > know that the block is corrupted, and keeps sending the data transfer request > to the same DN to be decommissioned, again and again. This caused an infinite > loop, so the decommission process hangs. > Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696547#comment-14696547 ] Hudson commented on HDFS-7213: -- FAILURE: Integrated in Hadoop-trunk-Commit #8298 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8298/]) HDFS-7213. processIncrementalBlockReport performance degradation. Contributed by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > processIncrementalBlockReport performance degradation > - > > Key: HDFS-7213 > URL: https://issues.apache.org/jira/browse/HDFS-7213 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Eric Payne >Priority: Critical > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt > > > {{BlockManager#processIncrementalBlockReport}} has a debug line that is > missing a {{isDebugEnabled}} check. The write lock is being held. Coupled > with the increase in incremental block reports from receiving blocks, under > heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696548#comment-14696548 ] Hudson commented on HDFS-7235: -- FAILURE: Integrated in Hadoop-trunk-Commit #8298 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8298/]) HDFS-7235. DataNode#transferBlock should report blocks that don't exist using reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > DataNode#transferBlock should report blocks that don't exist using > reportBadBlock > - > > Key: HDFS-7235 > URL: https://issues.apache.org/jira/browse/HDFS-7235 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, > HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, > HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch > > > When to decommission a DN, the process hangs. > What happens is, when NN chooses a replica as a source to replicate data on > the to-be-decommissioned DN to other DNs, it favors choosing this DN > to-be-decommissioned as the source of transfer (see BlockManager.java). > However, because of the bad disk, the DN would detect the source block to be > transfered as invalidBlock with the following logic in FsDatasetImpl.java: > {code} > /** Does the block exist and have the given state? */ > private boolean isValid(final ExtendedBlock b, final ReplicaState state) { > final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), > b.getLocalBlock()); > return replicaInfo != null > && replicaInfo.getState() == state > && replicaInfo.getBlockFile().exists(); > } > {code} > The reason that this method returns false (detecting invalid block) is > because the block file doesn't exist due to bad disk in this case. > The key issue we found here is, after DN detects an invalid block for the > above reason, it doesn't report the invalid block back to NN, thus NN doesn't > know that the block is corrupted, and keeps sending the data transfer request > to the same DN to be decommissioned, again and again. This caused an infinite > loop, so the decommission process hangs. > Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696550#comment-14696550 ] Vinayakumar B commented on HDFS-7263: - Cherry-picked to 2.6.1 > Snapshot read can reveal future bytes for appended files. > - > > Key: HDFS-7263 > URL: https://issues.apache.org/jira/browse/HDFS-7263 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Tao Luo > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, > TestSnapshotRead.java > > > The following sequence of steps will produce extra bytes, that should not be > visible, because they are not in the snapshot. > * Create a file of size L, where {{L % blockSize != 0}}. > * Create a snapshot > * Append bytes to the file > * Read file in the snapshot (not the current file) > * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7263: Fix Version/s: 2.6.1 > Snapshot read can reveal future bytes for appended files. > - > > Key: HDFS-7263 > URL: https://issues.apache.org/jira/browse/HDFS-7263 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Tao Luo > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, > TestSnapshotRead.java > > > The following sequence of steps will produce extra bytes, that should not be > visible, because they are not in the snapshot. > * Create a file of size L, where {{L % blockSize != 0}}. > * Create a snapshot > * Append bytes to the file > * Read file in the snapshot (not the current file) > * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7263: Labels: (was: 2.6.1-candidate) > Snapshot read can reveal future bytes for appended files. > - > > Key: HDFS-7263 > URL: https://issues.apache.org/jira/browse/HDFS-7263 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Tao Luo > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, > TestSnapshotRead.java > > > The following sequence of steps will produce extra bytes, that should not be > visible, because they are not in the snapshot. > * Create a file of size L, where {{L % blockSize != 0}}. > * Create a snapshot > * Append bytes to the file > * Read file in the snapshot (not the current file) > * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696558#comment-14696558 ] Hudson commented on HDFS-7263: -- FAILURE: Integrated in Hadoop-trunk-Commit #8299 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8299/]) HDFS-7263. Snapshot read can reveal future bytes for appended files. Contributed by Tao Luo. (vinayakumarb: rev fa2641143c0d74c4fef122d79f27791e15d3b43f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Snapshot read can reveal future bytes for appended files. > - > > Key: HDFS-7263 > URL: https://issues.apache.org/jira/browse/HDFS-7263 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Tao Luo > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, > TestSnapshotRead.java > > > The following sequence of steps will produce extra bytes, that should not be > visible, because they are not in the snapshot. > * Create a file of size L, where {{L % blockSize != 0}}. > * Create a snapshot > * Append bytes to the file > * Read file in the snapshot (not the current file) > * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)