[jira] [Created] (HDFS-12888) NameNode web UI shows stale config values after cli refresh
Zhe Zhang created HDFS-12888: Summary: NameNode web UI shows stale config values after cli refresh Key: HDFS-12888 URL: https://issues.apache.org/jira/browse/HDFS-12888 Project: Hadoop HDFS Issue Type: Bug Components: ui Affects Versions: 2.7.4 Reporter: Zhe Zhang To reproduce: # Run webui /conf # Use {{hdfs -refresh}} to update a configuration value # Run webui /conf again, it will still show the old configuration value -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-12502) nntop should support a category based on FilesInGetListingOps
[ https://issues.apache.org/jira/browse/HDFS-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-12502: -- > nntop should support a category based on FilesInGetListingOps > - > > Key: HDFS-12502 > URL: https://issues.apache.org/jira/browse/HDFS-12502 > Project: Hadoop HDFS > Issue Type: Improvement > Components: metrics >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Fix For: 2.9.0, 2.8.3, 3.0.0, 3.1.0 > > Attachments: HDFS-12502.00.patch, HDFS-12502.01.patch, > HDFS-12502.02.patch, HDFS-12502.03.patch, HDFS-12502.04.patch > > > Large listing ops can oftentimes be the main contributor to NameNode > slowness. The aggregate cost of listing ops is proportional to the > {{FilesInGetListingOps}} rather than the number of listing ops. Therefore > it'd be very useful for nntop to support this category. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12502) nntop should support category based on FilesInGetListingOps
Zhe Zhang created HDFS-12502: Summary: nntop should support category based on FilesInGetListingOps Key: HDFS-12502 URL: https://issues.apache.org/jira/browse/HDFS-12502 Project: Hadoop HDFS Issue Type: Improvement Reporter: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12379) NameNode getListing should use FileStatus instead of HdfsFileStatus
Zhe Zhang created HDFS-12379: Summary: NameNode getListing should use FileStatus instead of HdfsFileStatus Key: HDFS-12379 URL: https://issues.apache.org/jira/browse/HDFS-12379 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang The public {{listStatus}} APIs in {{FileSystem}} and {{DistributedFileSystem}} expose {{FileStatus}} instead of {{HdfsFileStatus}}. Therefore it is a waste to create the more expensive {{HdfsFileStatus}} objects on NameNode. It should be a simple change similar to HDFS-11641. Marking incompatible because wire protocol is incompatible. Not sure what downstream apps are affected by this incompatibility. Maybe those directly using curl, or writing their own HDFS client. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12345) Scale testing HDFS NameNode
Zhe Zhang created HDFS-12345: Summary: Scale testing HDFS NameNode Key: HDFS-12345 URL: https://issues.apache.org/jira/browse/HDFS-12345 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12284) Router support for Kerberos and delegation tokens
Zhe Zhang created HDFS-12284: Summary: Router support for Kerberos and delegation tokens Key: HDFS-12284 URL: https://issues.apache.org/jira/browse/HDFS-12284 Project: Hadoop HDFS Issue Type: Sub-task Components: security Reporter: Zhe Zhang Assignee: xiangguang zheng HDFS Router should support Kerberos authentication and issuing / managing HDFS delegation tokens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11743) Revert HDFS-7933 from branch-2.7 (fsck reporting decommissioning replicas)
Zhe Zhang created HDFS-11743: Summary: Revert HDFS-7933 from branch-2.7 (fsck reporting decommissioning replicas) Key: HDFS-11743 URL: https://issues.apache.org/jira/browse/HDFS-11743 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11732) Backport HDFS-8498 to branch-2.7: Blocks can be committed with wrong size
Zhe Zhang created HDFS-11732: Summary: Backport HDFS-8498 to branch-2.7: Blocks can be committed with wrong size Key: HDFS-11732 URL: https://issues.apache.org/jira/browse/HDFS-11732 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.3 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-9005) Provide configuration support for upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-9005: - > Provide configuration support for upgrade domain > > > Key: HDFS-9005 > URL: https://issues.apache.org/jira/browse/HDFS-9005 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, > HDFS-9005.branch-2.8.001.patch, HDFS-9005.patch > > > As part of the upgrade domain feature, we need to provide a mechanism to > specify upgrade domain for each datanode. One way to accomplish that is to > allow admins specify an upgrade domain script that takes DN ip or hostname as > input and return the upgrade domain. Then namenode will use it at run time to > set {{DatanodeInfo}}'s upgrade domain string. The configuration can be > something like: > {noformat} > > dfs.namenode.upgrade.domain.script.file.name > /etc/hadoop/conf/upgrade-domain.sh > > {noformat} > just like topology script, -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-8873) Allow the directoryScanner to be rate-limited
[ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8873: - > Allow the directoryScanner to be rate-limited > - > > Key: HDFS-8873 > URL: https://issues.apache.org/jira/browse/HDFS-8873 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Daniel Templeton > Labels: 2.7.2-candidate > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, > HDFS-8873.003.patch, HDFS-8873.004.patch, HDFS-8873.005.patch, > HDFS-8873.006.patch, HDFS-8873.007.patch, HDFS-8873.008.patch, > HDFS-8873.009.patch, HDFS-8873-branch-2.7.009.patch > > > The new 2-level directory layout can make directory scans expensive in terms > of disk seeks (see HDFS-8791) for details. > It would be good if the directoryScanner() had a configurable duty cycle that > would reduce its impact on disk performance (much like the approach in > HDFS-8617). > Without such a throttle, disks can go 100% busy for many minutes at a time > (assuming the common case of all inodes in cache but no directory blocks > cached, 64K seeks are required for full directory listing which translates to > 655 seconds) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11709) StandbyCheckpointer should handle an non-existing legacyOivImageDir gracefully
Zhe Zhang created HDFS-11709: Summary: StandbyCheckpointer should handle an non-existing legacyOivImageDir gracefully Key: HDFS-11709 URL: https://issues.apache.org/jira/browse/HDFS-11709 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Affects Versions: 2.6.1 Reporter: Zhe Zhang Assignee: Erik Krogen Priority: Critical In {{StandbyCheckpointer}}, if the legacy OIV directory is not properly created, or was deleted for some reason (e.g. mis-operation), all checkpoint ops will fall. Not only the ANN won't receive new fsimages, the JNs will get full with edit log files, and cause NN to crash. {code} // Save the legacy OIV image, if the output dir is defined. String outputDir = checkpointConf.getLegacyOivImageDir(); if (outputDir != null && !outputDir.isEmpty()) { img.saveLegacyOIVImage(namesystem, outputDir, canceler); } {code} It doesn't make sense to let such an unimportant part (saving OIV) abort all checkpoints and cause NN crash (and possibly lose data). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-8872) Reporting of missing blocks is different in fsck and namenode ui/metasave
[ https://issues.apache.org/jira/browse/HDFS-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8872: - > Reporting of missing blocks is different in fsck and namenode ui/metasave > - > > Key: HDFS-8872 > URL: https://issues.apache.org/jira/browse/HDFS-8872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > Namenode ui and metasave will not report a block as missing if the only > replica is on decommissioning/decomissioned node while fsck will show it as > MISSING. > Since decommissioned node can be formatted/removed anytime, we can actually > lose the block. > Its better to alert on namenode ui if the only copy is on > decomissioned/decommissioning node. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8872) Reporting of missing blocks is different in fsck and namenode ui/metasave
[ https://issues.apache.org/jira/browse/HDFS-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8872. - Resolution: Duplicate > Reporting of missing blocks is different in fsck and namenode ui/metasave > - > > Key: HDFS-8872 > URL: https://issues.apache.org/jira/browse/HDFS-8872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > Namenode ui and metasave will not report a block as missing if the only > replica is on decommissioning/decomissioned node while fsck will show it as > MISSING. > Since decommissioned node can be formatted/removed anytime, we can actually > lose the block. > Its better to alert on namenode ui if the only copy is on > decomissioned/decommissioning node. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8015) Erasure Coding: local and remote block writer for coding work in DataNode
[ https://issues.apache.org/jira/browse/HDFS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8015. - Resolution: Duplicate Resolving this issue since HDFS-9719 already introduced {{StripedBlockWriter}} > Erasure Coding: local and remote block writer for coding work in DataNode > - > > Key: HDFS-8015 > URL: https://issues.apache.org/jira/browse/HDFS-8015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Li Bo > Attachments: HDFS-8015-000.patch, HDFS-8015-001.patch > > > As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure > coding, to perform encoding or decoding, we need to be able to write data > blocks locally or remotely. This is to come up block writer facility in > DataNode side. Better to think about the similar work done in client side, so > in future it's possible to unify the both. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8014) Erasure Coding: local and remote block reader for coding work in DataNode
[ https://issues.apache.org/jira/browse/HDFS-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8014. - Resolution: Duplicate Resolving the issue as HDFS-9719 already creates {{StripedBlockReader}} > Erasure Coding: local and remote block reader for coding work in DataNode > - > > Key: HDFS-8014 > URL: https://issues.apache.org/jira/browse/HDFS-8014 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Zhe Zhang > > As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure > coding, to perform encoding or decoding, we need first to be able to read > locally or remotely data blocks. This is to come up block reader facility in > DataNode side. Better to think about the similar work done in client side, so > in future it's possible to unify the both. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11345) Document the configuration key for FSNamesystem lock fairness
Zhe Zhang created HDFS-11345: Summary: Document the configuration key for FSNamesystem lock fairness Key: HDFS-11345 URL: https://issues.apache.org/jira/browse/HDFS-11345 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, namenode Reporter: Zhe Zhang Assignee: Erik Krogen Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11157) Enhance documentation around cpLock
Zhe Zhang created HDFS-11157: Summary: Enhance documentation around cpLock Key: HDFS-11157 URL: https://issues.apache.org/jira/browse/HDFS-11157 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, ha, namenode Reporter: Zhe Zhang Priority: Minor The {{cpLock}} was introduced in HDFS-7097. Among all operations allowed on SbNN, some acquire this lock (e.g. {{restoreFailedStorage}}) and some do not, including {{metasave}}, {{refreshNodes}}, {{setSafeMode}}. We should enhance the documentation around {{cpLock}} to explain the above. Also, maybe {{metasave}} and {{refreshNodes}} do not need fsn write lock? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-7964: - I think this'd be a very useful addition to branch-2.8 and branch-2.7. Especially since the patch has been used in 2.6 production cluster for a long time. Attaching branch-2.8/branch-2.7 patches to verify with Jenkins. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-1499) mv the namenode NameSpace and BlocksMap to hbase to save the namenode memory
[ https://issues.apache.org/jira/browse/HDFS-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-1499. - Resolution: Duplicate Resolving the old JIRA since many similar JIRAs have been raised, including HDFS-8286. > mv the namenode NameSpace and BlocksMap to hbase to save the namenode memory > > > Key: HDFS-1499 > URL: https://issues.apache.org/jira/browse/HDFS-1499 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: dl.brain.ln > > The NameNode stores all its metadata in the main memory of the machine on > which it is deployed. With the file-count and block number growing, namenode > machine can't hold anymore files and blocks in its memory and thus restrict > the HDFS cluster growth. So many people are talking and thinking abont this > problem. Google's next version of GFS use bigtable to store the metadata of > the DFS and that seem works. What if we use hbase as the same? > In the namenode structure, the namespace of the filesystem and the map of > block -> datanodes, datanode->blocks which keeped in memory are consume most > of the namenode's heap, what if we store those data structure in hbase to > decrease the namenode's memory? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11051) Test Balancer behavior when some block moves are slow
Zhe Zhang created HDFS-11051: Summary: Test Balancer behavior when some block moves are slow Key: HDFS-11051 URL: https://issues.apache.org/jira/browse/HDFS-11051 Project: Hadoop HDFS Issue Type: Test Components: balancer & mover Reporter: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10977) Balancer should query NameNode with a timeout
Zhe Zhang created HDFS-10977: Summary: Balancer should query NameNode with a timeout Key: HDFS-10977 URL: https://issues.apache.org/jira/browse/HDFS-10977 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Reporter: Zhe Zhang Assignee: Zhe Zhang We found a case where {{Dispatcher}} was stuck at {{getBlockList}} *forever* (well, several hours when we found it). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10967) Add configuration for BlockPlacementPolicy to deprioritize near-full DataNodes
Zhe Zhang created HDFS-10967: Summary: Add configuration for BlockPlacementPolicy to deprioritize near-full DataNodes Key: HDFS-10967 URL: https://issues.apache.org/jira/browse/HDFS-10967 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Large production clusters are likely to have heterogeneous nodes in terms of storage capacity, memory, and CPU cores. It is not always possible to proportionally ingest data into DataNodes based on their remaining storage capacity. Therefore it's possible for a subset of DataNodes to be much closer to full capacity than the rest. Notice that this heterogeneity is most likely rack-by-rack -- i.e. _m_ whole racks with low-storage nodes and _n_ whole racks with high-storage nodes. So It'd be very useful if we can deprioritize those near-full DataNodes as destinations for the 2nd and 3rd replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10966) Enhance Dispatcher logic on deciding when to give up a source DataNode
Zhe Zhang created HDFS-10966: Summary: Enhance Dispatcher logic on deciding when to give up a source DataNode Key: HDFS-10966 URL: https://issues.apache.org/jira/browse/HDFS-10966 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Reporter: Zhe Zhang Assignee: Mark Wagner When a {{Dispatcher}} thread works on a source DataNode, in each iteration it tries to execute a {{PendingMove}}. If no block is moved after 5 iterations, this source (over-utlized) DataNode is given up for this Balancer iteration (20 mins). This is problematic if the source DataNode was heavily loaded in the beginning of the iteration. It will quickly encounter 5 unsuccessful moves and be abandoned. We should enhance this logic by e.g. using elapsed time instead of number of iterations. {code} // Check if the previous move was successful } else { // source node cannot find a pending block to move, iteration +1 noPendingMoveIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingMoveIteration >= MAX_NO_PENDING_MOVE_ITERATIONS) { LOG.info("Failed to find a pending move " + noPendingMoveIteration + " times. Skipping " + this); resetScheduledSize(); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10745) Directly resolve paths into INodesInPath
[ https://issues.apache.org/jira/browse/HDFS-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-10745: -- Sorry to reopen the JIRA, testing branch-2.7 patch. > Directly resolve paths into INodesInPath > > > Key: HDFS-10745 > URL: https://issues.apache.org/jira/browse/HDFS-10745 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HDFS-10745.2.patch, HDFS-10745.branch-2.patch, > HDFS-10745.patch > > > The intermediate resolution to a string, only to be decomposed by > {{INodesInPath}} back into a byte[][] can be eliminated by resolving directly > to an IIP. The IIP will contain the resolved path if required. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-10859) TestBalancer#testUnknownDatanodeSimple and testBalancerWithKeytabs are flaky in branch-2.7
[ https://issues.apache.org/jira/browse/HDFS-10859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-10859. -- Resolution: Duplicate Thanks for the pointer Xiao. HDFS-10716 indeed solves the problem. > TestBalancer#testUnknownDatanodeSimple and testBalancerWithKeytabs are flaky > in branch-2.7 > -- > > Key: HDFS-10859 > URL: https://issues.apache.org/jira/browse/HDFS-10859 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, test >Affects Versions: 2.7.4 >Reporter: Zhe Zhang >Priority: Minor > Attachments: testUnknownDatanodeSimple-failure.log > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10744) Internally optimize path component resolution
[ https://issues.apache.org/jira/browse/HDFS-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-10744: -- Sorry for re-opening this. Triggering Jenkins for branch-2.7 patch. > Internally optimize path component resolution > - > > Key: HDFS-10744 > URL: https://issues.apache.org/jira/browse/HDFS-10744 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HDFS-10744-branch-2.7.patch, HDFS-10744.patch > > > {{FSDirectory}}'s path resolution currently uses a mixture of string & > byte[][] conversions, back to string, back to byte[][] for {{INodesInPath}}. > Internally all path component resolution should be byte[][]-based as the > precursor to instantiating an {{INodesInPath}} w/o the last 2 unnecessary > conversions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10673) Optimize FSPermissionChecker's internal path usage
[ https://issues.apache.org/jira/browse/HDFS-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-10673: -- Sorry to reopen the JIRA. I want to test the branch-2.7 patch on Jenkins. > Optimize FSPermissionChecker's internal path usage > -- > > Key: HDFS-10673 > URL: https://issues.apache.org/jira/browse/HDFS-10673 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-10673-branch-2.7.00.patch, HDFS-10673.1.patch, > HDFS-10673.2.patch, HDFS-10673.patch > > > The INodeAttributeProvider and AccessControlEnforcer features degrade > performance and generate excessive garbage even when neither is used. Main > issues: > # A byte[][] of components is unnecessarily created. Each path component > lookup converts a subrange of the byte[][] to a new String[] - then not used > by default attribute provider. > # Subaccess checks are insanely expensive. The full path of every subdir is > created by walking up the inode tree, creating a INode[], building a string > by converting each inode's byte[] name to a string, etc. Which will only be > used if there's an exception. > The expensive of #1 should only be incurred when using the provider/enforcer > feature. For #2, paths should be created on-demand for exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10859) TestBalancer#testUnknownDatanodeSimple and testBalancerWithKeytabs fail in branch-2.7
Zhe Zhang created HDFS-10859: Summary: TestBalancer#testUnknownDatanodeSimple and testBalancerWithKeytabs fail in branch-2.7 Key: HDFS-10859 URL: https://issues.apache.org/jira/browse/HDFS-10859 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover, test Affects Versions: 2.7 Reporter: Zhe Zhang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-8818) Allow Balancer to run faster
[ https://issues.apache.org/jira/browse/HDFS-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8818: - Sorry to reopen this one. I think this is a valid improvement for branch-2.7 and I'm trying to backport it. I'll attach a branch-2.7 patch for Jenkins verification. > Allow Balancer to run faster > > > Key: HDFS-8818 > URL: https://issues.apache.org/jira/browse/HDFS-8818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: h8818_20150723.patch, h8818_20150727.patch > > > The original design of Balancer is intentionally to make it run slowly so > that the balancing activities won't affect the normal cluster activities and > the running jobs. > There are new use case that cluster admin may choose to balance the cluster > when the cluster load is low, or in a maintain window. So that we should > have an option to allow Balancer to run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10854) Remove createStripedFile and addBlockToFile by creating real EC files
Zhe Zhang created HDFS-10854: Summary: Remove createStripedFile and addBlockToFile by creating real EC files Key: HDFS-10854 URL: https://issues.apache.org/jira/browse/HDFS-10854 Project: Hadoop HDFS Issue Type: Bug Components: erasure-coding, test Affects Versions: 3.0.0-alpha2 Reporter: Zhe Zhang {{DFSTestUtil#createStripedFile}} and {{addBlockToFile}} were developed before we completed EC client. They were used to test the {{NameNode}} EC logic when the client was unable to really create/read/write EC files. They are causing confusions in other issues about {{NameNode}}. For example, in one of the patches under {{HDFS-10301}}, {{testProcessOverReplicatedAndMissingStripedBlock}} fails because in the test we fake a block report from a DN, with a randomly generated storage ID. The DN itself is never aware of that storage. This is not possible in a real production environment. {code} DatanodeStorage storage = new DatanodeStorage(UUID.randomUUID().toString()); StorageReceivedDeletedBlocks[] reports = DFSTestUtil .makeReportForReceivedBlock(block, ReceivedDeletedBlockInfo.BlockStatus.RECEIVING_BLOCK, storage); for (StorageReceivedDeletedBlocks report : reports) { ns.processIncrementalBlockReport(dn.getDatanodeId(), report); } {code} Now that we have a fully functional EC client, we should remove the old testing logic and use similar logic as non-EC tests (creating real files and emulate blocks missing / being corrupt). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10662) Optimize UTF8 string/byte conversions
[ https://issues.apache.org/jira/browse/HDFS-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-10662: -- Sorry to reopen the JIRA. I'm backporting to branch-2.7 and it was quite messy. [~daryn] [~kihwal] I'd appreciate it if you could take a look. > Optimize UTF8 string/byte conversions > - > > Key: HDFS-10662 > URL: https://issues.apache.org/jira/browse/HDFS-10662 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HDFS-10662-branch-2.7.00.patch, HDFS-10662.patch, > HDFS-10662.patch.1 > > > String/byte conversions may take either a Charset instance or its canonical > name. One might think a Charset instance would be faster due to avoiding a > lookup and instantiation of a Charset, but it's not. The canonical string > name variants will cache the string encoder/decoder (obtained from a Charset) > resulting in better performance. > LOG4J2-935 describes a real-world performance boost. I micro-benched a > marginal runtime improvement on jdk 7/8. However for a 16 byte path, using > the canonical name generated 50% less garbage. For a 64 byte path, 25% of > the garbage. Given the sheer number of times that paths are (re)parsed, the > cost adds up quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10655) Fix path related byte array conversion bugs
[ https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-10655: -- Sorry to reopen it, I want to backport it to branch-2.7 and have a full Jenkins run before doing that. > Fix path related byte array conversion bugs > --- > > Key: HDFS-10655 > URL: https://issues.apache.org/jira/browse/HDFS-10655 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HDFS-10655-branch-2.7.patch, HDFS-10655.patch, > HDFS-10655.patch > > > {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple > separators, nor does it handle relative paths correctly. > {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless > the specified range is the entire component array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7
Zhe Zhang created HDFS-10809: Summary: getNumEncryptionZones causes NPE in branch-2.7 Key: HDFS-10809 URL: https://issues.apache.org/jira/browse/HDFS-10809 Project: Hadoop HDFS Issue Type: Bug Components: encryption, namenode Affects Versions: 2.7.4 Reporter: Zhe Zhang This bug was caused by the fact that we did HDFS-10458 from trunk to branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, we have the reverse order. Hence the inconsistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
Zhe Zhang created HDFS-10798: Summary: Make the threshold of reporting FSNamesystem lock contention configurable Key: HDFS-10798 URL: https://issues.apache.org/jira/browse/HDFS-10798 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. In a busy system this might add too much overhead. We should make the threshold configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-7933) fsck should also report decommissioning replicas.
[ https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-7933: - > fsck should also report decommissioning replicas. > -- > > Key: HDFS-7933 > URL: https://issues.apache.org/jira/browse/HDFS-7933 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-7933-branch-2.7.00.patch, HDFS-7933.00.patch, > HDFS-7933.01.patch, HDFS-7933.02.patch, HDFS-7933.03.patch > > > Fsck doesn't count replicas that are on decommissioning nodes. If a block has > all replicas on the decommissioning nodes, it will be marked as missing, > which is alarming for the admins, although the system will replicate them > before nodes are decommissioned. > Fsck output should also show decommissioning replicas along with the live > replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-9804) Allow long-running Balancer to login with keytab
[ https://issues.apache.org/jira/browse/HDFS-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-9804: - > Allow long-running Balancer to login with keytab > > > Key: HDFS-9804 > URL: https://issues.apache.org/jira/browse/HDFS-9804 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer & mover, security >Reporter: Xiao Chen >Assignee: Xiao Chen > Labels: supportability > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-9804-branch-2.00.patch, HDFS-9804.01.patch, > HDFS-9804.02.patch, HDFS-9804.03.patch > > > From the discussion of HDFS-9698, it might be nice to allow the balancer to > run as a daemon and login from a keytab. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10680) TestUTF8 fails in branch-2.6
Zhe Zhang created HDFS-10680: Summary: TestUTF8 fails in branch-2.6 Key: HDFS-10680 URL: https://issues.apache.org/jira/browse/HDFS-10680 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Zhe Zhang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-10653) Optimize conversion from path string to components
[ https://issues.apache.org/jira/browse/HDFS-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-10653. -- Resolution: Fixed Local test-patch shows 2 failures but those tests fail even without the patch. I'll create a JIRA to address. Just committed v03 branch-2.6 patch. Thanks Allen and Daryn for the review and comments. > Optimize conversion from path string to components > -- > > Key: HDFS-10653 > URL: https://issues.apache.org/jira/browse/HDFS-10653 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.6.5, 2.7.4 > > Attachments: HDFS-10653-branch-2.6.00.patch, > HDFS-10653-branch-2.6.01.patch, HDFS-10653-branch-2.6.02.patch, > HDFS-10653-branch-2.6.03.patch, HDFS-10653.patch > > > Converting a path String to a byte[][] currently requires an unnecessary > intermediate conversion from String to String[]. Removing this will reduce > excessive object allocation and byte copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10653) Optimize conversion from path string to components
[ https://issues.apache.org/jira/browse/HDFS-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-10653: -- Sorry, have to reopen the issue otherwise Jenkins won't run. > Optimize conversion from path string to components > -- > > Key: HDFS-10653 > URL: https://issues.apache.org/jira/browse/HDFS-10653 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 2.7.4 > > Attachments: HDFS-10653-branch-2.6.00.patch, HDFS-10653.patch > > > Converting a path String to a byte[][] currently requires an unnecessary > intermediate conversion from String to String[]. Removing this will reduce > excessive object allocation and byte copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10534) NameNode WebUI should display DataNode usage rate with a certain percentile
[ https://issues.apache.org/jira/browse/HDFS-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-10534: -- > NameNode WebUI should display DataNode usage rate with a certain percentile > --- > > Key: HDFS-10534 > URL: https://issues.apache.org/jira/browse/HDFS-10534 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode, ui >Reporter: Zhe Zhang >Assignee: Kai Sasaki > Attachments: HDFS-10534.01.patch, HDFS-10534.02.patch, > HDFS-10534.03.patch, HDFS-10534.04.patch, HDFS-10534.05.patch, Screen Shot > 2016-06-23 at 6.25.50 AM.png > > > In addition of *Min/Median/Max*, another meaningful metric for cluster > balance is DN usage rate at a certain percentile (e.g. 90 or 95). We should > add a config option, and another filed on NN WebUI, to display this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10534) NameNode WebUI should display DataNode usage rate with a certain percentile
Zhe Zhang created HDFS-10534: Summary: NameNode WebUI should display DataNode usage rate with a certain percentile Key: HDFS-10534 URL: https://issues.apache.org/jira/browse/HDFS-10534 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, ui Reporter: Zhe Zhang In addition of *Min/Median/Max*, another meaningful metric for cluster balance is DN usage rate at a certain percentile (e.g. 90 or 95). We should add a config option, and another filed on NN WebUI, to display this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10458) getFileEncryptionInfo should return quickly for non-encrypted cluster
Zhe Zhang created HDFS-10458: Summary: getFileEncryptionInfo should return quickly for non-encrypted cluster Key: HDFS-10458 URL: https://issues.apache.org/jira/browse/HDFS-10458 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang {{FSDirectory#getFileEncryptionInfo}} always acquires {{readLock}} and checks if the path belongs to an EZ. For a busy system with potentially many listing operations, this could cause locking contention. I think we should add a call {{EncryptionZoneManager#hasEncryptionZone()}} to return whether the system has any EZ. If no EZ at all, {{getFileEncryptionInfo}} should return null without {{readLock}}. If {{hasEncryptionZone}} is only used in the above scenario, maybe itself doesn't need a {{readLock}} -- if the system doesn't have any EZ when {{getFileEncryptionInfo}} is called on a path, it means the path cannot be encrypted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-9844) Correct path creation in getTrashRoot to handle root dir
Zhe Zhang created HDFS-9844: --- Summary: Correct path creation in getTrashRoot to handle root dir Key: HDFS-9844 URL: https://issues.apache.org/jira/browse/HDFS-9844 Project: Hadoop HDFS Issue Type: Bug Components: encryption Affects Versions: 2.8.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Blocker {code} if ((ez != null)) { return this.makeQualified( new Path(ez.getPath() + "/" + FileSystem.TRASH_PREFIX + dfs.ugi.getShortUserName())); {code} This doesn't handle root dir correctly. The unit test {{testRootDirEZTrash}} in the attached patch can reproduce the error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9799) Reimplement getCurrentTrashDir to remove incompatibility
Zhe Zhang created HDFS-9799: --- Summary: Reimplement getCurrentTrashDir to remove incompatibility Key: HDFS-9799 URL: https://issues.apache.org/jira/browse/HDFS-9799 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Blocker HDFS-8831 changed the signature of {{TrashPolicy#getCurrentTrashDir}} by adding an IOException. This breaks other applications using this public API. This JIRA aims to reimplement the logic to safely handle the IOException within HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9785) Remove unused TrashPolicy#getInstance code
Zhe Zhang created HDFS-9785: --- Summary: Remove unused TrashPolicy#getInstance code Key: HDFS-9785 URL: https://issues.apache.org/jira/browse/HDFS-9785 Project: Hadoop HDFS Issue Type: Improvement Reporter: Zhe Zhang Priority: Minor A follow-on from HDFS-8831: now the {{getInstance}} API with Path is not used anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9172) Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-9172. - Resolution: Invalid > Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client > -- > > Key: HDFS-9172 > URL: https://issues.apache.org/jira/browse/HDFS-9172 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Zhe Zhang > > The idea of this jira is to move the striped stream related classes to > {{hadoop-hdfs-client}} project. This will help to be in sync with the > HDFS-6200 proposal. > - DFSStripedInputStream > - DFSStripedOutputStream > - StripedDataStreamer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9698) Long running Balancer should renew TGT
Zhe Zhang created HDFS-9698: --- Summary: Long running Balancer should renew TGT Key: HDFS-9698 URL: https://issues.apache.org/jira/browse/HDFS-9698 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover, security Affects Versions: 2.6.3 Reporter: Zhe Zhang Assignee: Zhe Zhang When the {{Balancer}} runs beyond the configured TGT lifetime, the current logic won't renew TGT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9688) Test the effect of nested encryption zones in HDFS downgrade
Zhe Zhang created HDFS-9688: --- Summary: Test the effect of nested encryption zones in HDFS downgrade Key: HDFS-9688 URL: https://issues.apache.org/jira/browse/HDFS-9688 Project: Hadoop HDFS Issue Type: Test Components: encryption, test Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9644) Update encryption documentation to reflect nested EZs
Zhe Zhang created HDFS-9644: --- Summary: Update encryption documentation to reflect nested EZs Key: HDFS-9644 URL: https://issues.apache.org/jira/browse/HDFS-9644 Project: Hadoop HDFS Issue Type: New Feature Components: documentation, encryption Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9576) HTrace: collect path/offset/length information on read and write operations
Zhe Zhang created HDFS-9576: --- Summary: HTrace: collect path/offset/length information on read and write operations Key: HDFS-9576 URL: https://issues.apache.org/jira/browse/HDFS-9576 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, tracing Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9578) Incorrect default value (typo) in hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-9578. - Resolution: Duplicate Thanks for reporting this [~tianyin]. It has been fixed in trunk already. > Incorrect default value (typo) in hdfs-default.xml > -- > > Key: HDFS-9578 > URL: https://issues.apache.org/jira/browse/HDFS-9578 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.2, 2.6.3 >Reporter: Tianyin Xu >Priority: Minor > > In {{hdfs-default.xml}}, the default value of > {{dfs.datanode.readahead.bytes}} is wrong. > The value should be 4MB (4 * 1024 * 1024) according to the code, i.e., > {{4194304}}, > while in hdfs-default.xml, it is > {{4193404}}. > > (src) > {code:title=DFSConfigKeys.java|borderStyle=solid} > 134 public static final String DFS_DATANODE_READAHEAD_BYTES_KEY = > "dfs.datanode.readahead.bytes"; > 135 public static final longDFS_DATANODE_READAHEAD_BYTES_DEFAULT = 4 * > 1024 * 1024; // 4MB > {code} > (hdfs-default.xml) > https://hadoop.apache.org/docs/r2.6.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7691) Handle hflush and hsync in the best optimal way possible during online Erasure encoding
[ https://issues.apache.org/jira/browse/HDFS-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-7691. - Resolution: Duplicate Thanks Vinay for confirming. Let's track the hflush related efforts under HDFS-7661. > Handle hflush and hsync in the best optimal way possible during online > Erasure encoding > --- > > Key: HDFS-7691 > URL: https://issues.apache.org/jira/browse/HDFS-7691 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Vinayakumar B >Assignee: Vinayakumar B > > as mentioned in design doc, hsync and hflush tends to make the online erasure > encoding complex. > But these are critical features to ensure fault tolerance for some users. > These operations should be supported in best way possible during online > erasure encoding to support the fault tolerance. > This Jira is a placeholder for the task. How to solve this will be discussed > later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9529) Extend Erasure Code to support POWER Chip acceleration
[ https://issues.apache.org/jira/browse/HDFS-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-9529. - Resolution: Duplicate Thanks Qijun for confirming this. Resolving this one as dup. > Extend Erasure Code to support POWER Chip acceleration > -- > > Key: HDFS-9529 > URL: https://issues.apache.org/jira/browse/HDFS-9529 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: wqijun >Assignee: wqijun > Fix For: 3.0.0 > > > Erasure Code is a very important feature in new HDFS version. This JIRA will > focus on how to extend EC to support multiple types of EC acceleration by C > library and other hardware method, like GPU or FPGA. Compared with > Hadoop-11887, this JIRA will more focus on how to leverage POWER Chip > capability to accelerate the EC calculating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9496) Erasure coding: an erasure codec throughput benchmark tool
[ https://issues.apache.org/jira/browse/HDFS-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-9496. - Resolution: Duplicate Thanks Rui for clarifying this. > Erasure coding: an erasure codec throughput benchmark tool > -- > > Key: HDFS-9496 > URL: https://issues.apache.org/jira/browse/HDFS-9496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Reporter: Hui Zheng > > We need a tool which can help us decide/benchmark an Erasure Codec and schema. > Considering HDFS-8968 has implemented an I/O throughput benchmark tool.Maybe > we could simply add encode/decode operation to it or implement another tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9403) Erasure coding: some EC tests are missing timeout
Zhe Zhang created HDFS-9403: --- Summary: Erasure coding: some EC tests are missing timeout Key: HDFS-9403 URL: https://issues.apache.org/jira/browse/HDFS-9403 Project: Hadoop HDFS Issue Type: Sub-task Components: erasure-coding, test Affects Versions: 3.0.0 Reporter: Zhe Zhang Priority: Minor EC data writing pipeline is still being worked on, and bugs could introduce program hang. We should add a timeout for all tests involving striped writing. I see at least the following: * {{TestErasureCodingPolicies}} * {{TestFileStatusWithECPolicy}} * {{TestDFSStripedOutputStream}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9405) When starting a file, NameNode should generate EDEK in a separate thread
Zhe Zhang created HDFS-9405: --- Summary: When starting a file, NameNode should generate EDEK in a separate thread Key: HDFS-9405 URL: https://issues.apache.org/jira/browse/HDFS-9405 Project: Hadoop HDFS Issue Type: Improvement Components: encryption, namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation to the key provider, which could be slow or cause timeout. It should be done as a separate thread so as to return a proper error message to the RPC caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9386) Erasure coding: updateBlockForPipeline sometimes returns non-striped block for striped file
Zhe Zhang created HDFS-9386: --- Summary: Erasure coding: updateBlockForPipeline sometimes returns non-striped block for striped file Key: HDFS-9386 URL: https://issues.apache.org/jira/browse/HDFS-9386 Project: Hadoop HDFS Issue Type: Sub-task Components: erasure-coding Affects Versions: 3.0.0 Reporter: Zhe Zhang I've seen this bug a few times. The returned {{LocatedBlock}} from {{updateBlockForPipeline}} is sometimes not {{LocatedStripedBlock}}. However, {{FSNamesystem#bumpBlockGenerationStamp}} did return a {{LocatedStripedBlock}}. Maybe a bug in PB. I'm still debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9344) QUEUE_WITH_CORRUPT_BLOCKS is no longer needed
Zhe Zhang created HDFS-9344: --- Summary: QUEUE_WITH_CORRUPT_BLOCKS is no longer needed Key: HDFS-9344 URL: https://issues.apache.org/jira/browse/HDFS-9344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Priority: Minor After the change of HDFS-9205, the {{QUEUE_WITH_CORRUPT_BLOCKS}} queue in {{UnderReplicatedBlocks}} is no longer needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9344) QUEUE_WITH_CORRUPT_BLOCKS is no longer needed
[ https://issues.apache.org/jira/browse/HDFS-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-9344. - Resolution: Invalid Thanks Xiao! Good analysis! Closing this as invalid. > QUEUE_WITH_CORRUPT_BLOCKS is no longer needed > - > > Key: HDFS-9344 > URL: https://issues.apache.org/jira/browse/HDFS-9344 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Xiao Chen >Priority: Minor > > After the change of HDFS-9205, the {{QUEUE_WITH_CORRUPT_BLOCKS}} queue in > {{UnderReplicatedBlocks}} is no longer needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9329) TestBootstrapStandby#testRateThrottling is flaky because fsimage size is smaller than IO buffer size
Zhe Zhang created HDFS-9329: --- Summary: TestBootstrapStandby#testRateThrottling is flaky because fsimage size is smaller than IO buffer size Key: HDFS-9329 URL: https://issues.apache.org/jira/browse/HDFS-9329 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor {{testRateThrottling}} verifies that bootstrap transfer should timeout with a very small {{DFS_IMAGE_TRANSFER_BOOTSTRAP_STANDBY_RATE_KEY}} value. However, throttling on the image sender only happens after sending each IO buffer. Therefore, the test sometimes fails if the receiver receives the full fsimage (which is smaller than IO buffer size) before throttling begins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9280) Document NFS gateway export point parameter
Zhe Zhang created HDFS-9280: --- Summary: Document NFS gateway export point parameter Key: HDFS-9280 URL: https://issues.apache.org/jira/browse/HDFS-9280 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.7.1 Reporter: Zhe Zhang Priority: Trivial We should document the {{nfs.export.point}} configuration parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-7285. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 I just did a final {{git merge}} to sync with trunk and [~andrew.wang] helped push the HDFS-7285 branch to trunk. Resolving this JIRA now; let's keep working on follow-on tasks under HDFS-8031. Thanks very much for all contributors to EC phase I, as well as the helpful discussions in the community. > Erasure Coding Support inside HDFS > -- > > Key: HDFS-7285 > URL: https://issues.apache.org/jira/browse/HDFS-7285 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Weihua Jiang >Assignee: Zhe Zhang > Fix For: 3.0.0 > > Attachments: Compare-consolidated-20150824.diff, > Consolidated-20150707.patch, Consolidated-20150806.patch, > Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, > HDFS-7285-Consolidated-20150911.patch, HDFS-7285-initial-PoC.patch, > HDFS-7285-merge-consolidated-01.patch, > HDFS-7285-merge-consolidated-trunk-01.patch, > HDFS-7285-merge-consolidated.trunk.03.patch, > HDFS-7285-merge-consolidated.trunk.04.patch, > HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, > HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, > HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, > HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, > HDFSErasureCodingSystemTestPlan-20150824.pdf, > HDFSErasureCodingSystemTestReport-20150826.pdf, fsimage-analysis-20150105.pdf > > > Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice > of data reliability, comparing to the existing HDFS 3-replica approach. For > example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, > with storage overhead only being 40%. This makes EC a quite attractive > alternative for big data storage, particularly for cold data. > Facebook had a related open source project called HDFS-RAID. It used to be > one of the contribute packages in HDFS but had been removed since Hadoop 2.0 > for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends > on MapReduce to do encoding and decoding tasks; 2) it can only be used for > cold files that are intended not to be appended anymore; 3) the pure Java EC > coding implementation is extremely slow in practical use. Due to these, it > might not be a good idea to just bring HDFS-RAID back. > We (Intel and Cloudera) are working on a design to build EC into HDFS that > gets rid of any external dependencies, makes it self-contained and > independently maintained. This design lays the EC feature on the storage type > support and considers compatible with existing HDFS features like caching, > snapshot, encryption, high availability and etc. This design will also > support different EC coding schemes, implementations and policies for > different deployment scenarios. By utilizing advanced libraries (e.g. Intel > ISA-L library), an implementation can greatly improve the performance of EC > encoding/decoding and makes the EC solution even more attractive. We will > post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive
Zhe Zhang created HDFS-9119: --- Summary: Discrepancy between edit log tailing interval and RPC timeout for transitionToActive Key: HDFS-9119 URL: https://issues.apache.org/jira/browse/HDFS-9119 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.7.1 Reporter: Zhe Zhang {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute. If active NameNode encounters very intensive metadata workload (in particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files and directories), the amount of updates accumulated in the 2 mins edit log tailing interval is hard for the standby NameNode to catch up in the 1 min timeout window. If that happens, the FailoverController will timeout and give up trying to transition the standby to active. The old ANN will resume adding more edits. When the SbNN finally finishes catching up the edits and tries to become active, it will crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9097) Erasure coding: update EC command "-s" flag to "-p" when specifying policy
Zhe Zhang created HDFS-9097: --- Summary: Erasure coding: update EC command "-s" flag to "-p" when specifying policy Key: HDFS-9097 URL: https://issues.apache.org/jira/browse/HDFS-9097 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang HDFS-8833 missed this update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9098) Erasure coding: emulate race conditions among striped streamers in write pipeline
Zhe Zhang created HDFS-9098: --- Summary: Erasure coding: emulate race conditions among striped streamers in write pipeline Key: HDFS-9098 URL: https://issues.apache.org/jira/browse/HDFS-9098 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Apparently the interleaving of events among {{StripedDataStreamer}}s is very tricky to handle. [~walter.k.su] and [~jingzhao] have discussed several race conditions under HDFS-9040. Let's use FaultInjector to emulate different combinations of interleaved events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8373) Ec files can't be deleted into Trash because of that Trash isn't EC zone.
[ https://issues.apache.org/jira/browse/HDFS-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8373. - Resolution: Not A Problem With HDFS-8833 we should be able to delete EC files into Trash. > Ec files can't be deleted into Trash because of that Trash isn't EC zone. > - > > Key: HDFS-8373 > URL: https://issues.apache.org/jira/browse/HDFS-8373 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: GAO Rui >Assignee: Brahma Reddy Battula > Labels: EC > > When EC files were deleted, they would be moved into {{Trash}} directory. > But, EC files can only be placed under EC zone. So, EC files which have been > deleted can not be moved to {{Trash}} directory. > Problem could be solved by creating a EC zone(floder) inside {{Trash}} to > contain deleted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9079) Erasure coding: preallocate multiple generation stamps when creating striped blocks
Zhe Zhang created HDFS-9079: --- Summary: Erasure coding: preallocate multiple generation stamps when creating striped blocks Key: HDFS-9079 URL: https://issues.apache.org/jira/browse/HDFS-9079 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang A non-striped DataStreamer goes through the following steps in error handling: {code} 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) Updates block on NN {code} To simplify the above we can preallocate GS when NN creates a new striped block group ({{FSN#createNewBlock}}). For each new striped block group we can reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we shouldn't try to further recover anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9050) updatePipeline RPC call should only take new GS as input
Zhe Zhang created HDFS-9050: --- Summary: updatePipeline RPC call should only take new GS as input Key: HDFS-9050 URL: https://issues.apache.org/jira/browse/HDFS-9050 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang The only usage of the call is in {{DataStreamer#updatePipeline}}, where {{newBlock}} differs from current {{block}} only in GS. Basically the RPC call is not supposed to update the {{poolID}}, {{ID}}, and {{numBytes}} of the block on NN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8996) Remove {{scanEditLog}}
Zhe Zhang created HDFS-8996: --- Summary: Remove {{scanEditLog}} Key: HDFS-8996 URL: https://issues.apache.org/jira/browse/HDFS-8996 Project: Hadoop HDFS Issue Type: Bug Components: journal-node, namenode Affects Versions: 2.0.0-alpha Reporter: Zhe Zhang After HDFS-8965 is committed, {{scanEditLog}} will be identical to {{validateEditLog}} in {{EditLogInputStream}} and {{FSEditlogLoader}}. This is a place holder for us to remove the redundant {{scanEditLog}} code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8985) Restarted namenode suffer from block report storm
[ https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8985. - Resolution: Invalid Restarted namenode suffer from block report storm - Key: HDFS-8985 URL: https://issues.apache.org/jira/browse/HDFS-8985 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.4.1 Reporter: Zhihua Deng Priority: Trivial Labels: test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8987) Erasure coding: MapReduce job failed when I set the / folder to the EC zone
[ https://issues.apache.org/jira/browse/HDFS-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8987. - Resolution: Fixed Erasure coding: MapReduce job failed when I set the / folder to the EC zone Key: HDFS-8987 URL: https://issues.apache.org/jira/browse/HDFS-8987 Project: Hadoop HDFS Issue Type: Sub-task Components: HDFS Affects Versions: 3.0.0 Reporter: Lifeng Wang Test progress is as follows * For a new cluster, I format the namenode and then start HDFS service. * After HDFS service is started, there is no files in HDFS and set the / folder to the EC zone. the EC zone is created successfully. * Start the yarn and mr JobHistoryServer services. All the services start successfully. * Then run hadoop example pi program and it failed. The following is the exception. {noformat} ``` org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.UnsupportedActionException): Cannot set replication to a file with striped blocks at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetReplication(FSDirAttrOp.java:391) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setReplication(FSDirAttrOp.java:151) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2231) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:682) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:445) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2171) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2165) ``` {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8987) Erasure coding: MapReduce job failed when I set the / folder to the EC zone
[ https://issues.apache.org/jira/browse/HDFS-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8987: - Erasure coding: MapReduce job failed when I set the / folder to the EC zone Key: HDFS-8987 URL: https://issues.apache.org/jira/browse/HDFS-8987 Project: Hadoop HDFS Issue Type: Sub-task Components: HDFS Affects Versions: 3.0.0 Reporter: Lifeng Wang Test progress is as follows * For a new cluster, I format the namenode and then start HDFS service. * After HDFS service is started, there is no files in HDFS and set the / folder to the EC zone. the EC zone is created successfully. * Start the yarn and mr JobHistoryServer services. All the services start successfully. * Then run hadoop example pi program and it failed. The following is the exception. {noformat} ``` org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.UnsupportedActionException): Cannot set replication to a file with striped blocks at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetReplication(FSDirAttrOp.java:391) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setReplication(FSDirAttrOp.java:151) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2231) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:682) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:445) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2171) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2165) ``` {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8987) Erasure coding: MapReduce job failed when I set the / folder to the EC zone
[ https://issues.apache.org/jira/browse/HDFS-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8987. - Resolution: Duplicate Erasure coding: MapReduce job failed when I set the / folder to the EC zone Key: HDFS-8987 URL: https://issues.apache.org/jira/browse/HDFS-8987 Project: Hadoop HDFS Issue Type: Sub-task Components: HDFS Affects Versions: 3.0.0 Reporter: Lifeng Wang Test progress is as follows * For a new cluster, I format the namenode and then start HDFS service. * After HDFS service is started, there is no files in HDFS and set the / folder to the EC zone. the EC zone is created successfully. * Start the yarn and mr JobHistoryServer services. All the services start successfully. * Then run hadoop example pi program and it failed. The following is the exception. {noformat} ``` org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.UnsupportedActionException): Cannot set replication to a file with striped blocks at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetReplication(FSDirAttrOp.java:391) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setReplication(FSDirAttrOp.java:151) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2231) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:682) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:445) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2171) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2165) ``` {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8982) Consolidate getFileReplication and getPreferredBlockReplication in INodeFile
Zhe Zhang created HDFS-8982: --- Summary: Consolidate getFileReplication and getPreferredBlockReplication in INodeFile Key: HDFS-8982 URL: https://issues.apache.org/jira/browse/HDFS-8982 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Currently {{INodeFile}} provides both {{getFileReplication}} and {{getPreferredBlockReplication}} interfaces. At the very least they should be renamed (e.g. {{getCurrentFileReplication}} and {{getMaxConfiguredFileReplication}}), with clearer Javadoc. I also suspect we are not using them correctly in all places right now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8985) Restarted namenode suffer from block report storm
[ https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8985: - [~dengzh] Could you re-close the case with the correct Resolution? I guess it's Invalid? Restarted namenode suffer from block report storm - Key: HDFS-8985 URL: https://issues.apache.org/jira/browse/HDFS-8985 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.4.1 Reporter: Zhihua Deng Priority: Trivial Labels: test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8982) Consolidate getFileReplication and getPreferredBlockReplication in INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8982: - bq. The so-called getPerferredBlockReplication() records the maximum replication factor of the file w.r.t. the current and all snapshot state of the file. [~wheat9] By using so-called I take that you agree we should at least give it a better name (which was suggested in the JIRA description). Reopening the issue based on the consensus. Consolidate getFileReplication and getPreferredBlockReplication in INodeFile Key: HDFS-8982 URL: https://issues.apache.org/jira/browse/HDFS-8982 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Currently {{INodeFile}} provides both {{getFileReplication}} and {{getPreferredBlockReplication}} interfaces. At the very least they should be renamed (e.g. {{getCurrentFileReplication}} and {{getMaxConfiguredFileReplication}}), with clearer Javadoc. I also suspect we are not using them correctly in all places right now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8964) Provide max TxId when validating in-progress edit log files
Zhe Zhang created HDFS-8964: --- Summary: Provide max TxId when validating in-progress edit log files Key: HDFS-8964 URL: https://issues.apache.org/jira/browse/HDFS-8964 Project: Hadoop HDFS Issue Type: Bug Components: journal-node, namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang NN/JN validates in-progress edit log files in multiple scenarios, via {{EditLogFile#validateLog}}. The method scans through the edit log file to find the last transaction ID. However, an in-progress edit log file could be actively written to, which creates a race condition and causes incorrect data to be read (and later we attempt to interpret the data as ops). Currently {{validateLog}} is used in 3 places: # NN {{getEditsFromTxid}} # JN {{getEditLogManifest}} # NN/JN {{recoverUnfinalizedSegments}} In the first two scenarios we should provide a maximum TxId to validate in the in-progress file. The 3rd scenario won't cause a race condition because only non-current in-progress edit log files are validated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8928) Improvements for BlockUnderConstructionFeature: ReplicaUnderConstruction as a separate class and replicas as an array
Zhe Zhang created HDFS-8928: --- Summary: Improvements for BlockUnderConstructionFeature: ReplicaUnderConstruction as a separate class and replicas as an array Key: HDFS-8928 URL: https://issues.apache.org/jira/browse/HDFS-8928 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.8.0 Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8918) Convert BlockUnderConstructionFeature#replicas form list to array
[ https://issues.apache.org/jira/browse/HDFS-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8918. - Resolution: Duplicate Convert BlockUnderConstructionFeature#replicas form list to array - Key: HDFS-8918 URL: https://issues.apache.org/jira/browse/HDFS-8918 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.8.0 Reporter: Zhe Zhang Assignee: Zhe Zhang {{BlockInfoUnderConstruction}} / {{BlockUnderConstructionFeature}} uses a List to store its {{replicas}}. To reduce memory usage, we can use an array instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8918) Convert BlockUnderConstructionFeature#replicas form list to array
Zhe Zhang created HDFS-8918: --- Summary: Convert BlockUnderConstructionFeature#replicas form list to array Key: HDFS-8918 URL: https://issues.apache.org/jira/browse/HDFS-8918 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.8.0 Reporter: Zhe Zhang Assignee: Zhe Zhang {{BlockInfoUnderConstruction}} / {{BlockUnderConstructionFeature}} uses a List to store its {{replicas}}. To reduce memory usage, we can use an array instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface
[ https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8835. - Resolution: Invalid HDFS-8801 has converted {{BlockInfoUC}} as a feature. Convert BlockInfoUnderConstruction as an interface -- Key: HDFS-8835 URL: https://issues.apache.org/jira/browse/HDFS-8835 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8499, this JIRA aims to convert {{BlockInfoUnderConstruction}} as an interface and {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 branch will add {{BlockInfoStripedUnderConstruction}} as another implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8917) Cleanup BlockInfoUnderConstruction from comments and tests
Zhe Zhang created HDFS-8917: --- Summary: Cleanup BlockInfoUnderConstruction from comments and tests Key: HDFS-8917 URL: https://issues.apache.org/jira/browse/HDFS-8917 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.8.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-8801 eliminates the {{BlockInfoUnderConstruction}} class. This JIRA is a follow-on to cleanup comments and tests which refer to the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature
Zhe Zhang created HDFS-8909: --- Summary: Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature Key: HDFS-8909 URL: https://issues.apache.org/jira/browse/HDFS-8909 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate {{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8849) fsck should report number of missing blocks with replication factor 1
Zhe Zhang created HDFS-8849: --- Summary: fsck should report number of missing blocks with replication factor 1 Key: HDFS-8849 URL: https://issues.apache.org/jira/browse/HDFS-8849 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor HDFS-7165 supports reporting number of blocks with replication factor 1 in {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same support, which is the aim of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8202. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 Target Version/s: HDFS-7285 +1 on the latest patch. I just committed to the branch. Thanks Xinwei for the contribution! Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Fix For: HDFS-7285 Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8846) Create edit log files with old layout version for upgrade testing
Zhe Zhang created HDFS-8846: --- Summary: Create edit log files with old layout version for upgrade testing Key: HDFS-8846 URL: https://issues.apache.org/jira/browse/HDFS-8846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8480, we should create some edit log files with old layout version, to test whether they can be correctly handled in upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8839) Erasure Coding: client occasionally gets less block locations when some datanodes fail
[ https://issues.apache.org/jira/browse/HDFS-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8839. - Resolution: Duplicate Thanks Bo for identifying this. I think this is a duplicate of HDFS-8220. Erasure Coding: client occasionally gets less block locations when some datanodes fail --- Key: HDFS-8839 URL: https://issues.apache.org/jira/browse/HDFS-8839 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo 9 datanodes, write two block groups. A datanode dies when writing the first block group. When client retrieves the second block group from namenode, the returned block group only contains 8 locations occasionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
[ https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8768. - Resolution: Duplicate Erasure Coding: block group ID displayed in WebUI is not consistent with fsck - Key: HDFS-8768 URL: https://issues.apache.org/jira/browse/HDFS-8768 Project: Hadoop HDFS Issue Type: Sub-task Reporter: GAO Rui Attachments: Screen Shot 2015-07-14 at 15.33.08.png, screen-shot-with-HDFS-8779-patch.PNG This is duplicated by [HDFS-8779]. For example, In WebUI( usually, namenode port: 50070) , one Erasure Code file with one block group was displayed as the attached screenshot [^Screen Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the same file was displayed like: {{0. BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 len=6438256640}} After checking block file names in datanodes, we believe WebUI may have some problem with Erasure Code block group display. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
Zhe Zhang created HDFS-8833: --- Summary: Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface
Zhe Zhang created HDFS-8835: --- Summary: Convert BlockInfoUnderConstruction as an interface Key: HDFS-8835 URL: https://issues.apache.org/jira/browse/HDFS-8835 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8499, this JIRA aims to convert {{BlockInfoUnderConstruction}} as an interface and {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 branch will add {{BlockInfoStripedUnderConstruction}} as another implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8728. - Resolution: Later Since HDFS-8499 is reopened, closing this one. We should revisit it after finalizing the HDFS-8499 discussion. Erasure coding: revisit and simplify BlockInfoStriped and INodeFile --- Key: HDFS-8728 URL: https://issues.apache.org/jira/browse/HDFS-8728 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8728-HDFS-7285.00.patch, HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, HDFS-8728-HDFS-7285.03.patch, HDFS-8728.00.patch, HDFS-8728.01.patch, HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch, Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch, Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8499) Refactor BlockInfo class hierarchy with static helper class
[ https://issues.apache.org/jira/browse/HDFS-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8499: - Refactor BlockInfo class hierarchy with static helper class --- Key: HDFS-8499 URL: https://issues.apache.org/jira/browse/HDFS-8499 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.8.0 Attachments: HDFS-8499.00.patch, HDFS-8499.01.patch, HDFS-8499.02.patch, HDFS-8499.03.patch, HDFS-8499.04.patch, HDFS-8499.05.patch, HDFS-8499.06.patch, HDFS-8499.07.patch, HDFS-8499.UCFeature.patch, HDFS-bistriped.patch In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a common abstraction for striped and contiguous UC blocks. This JIRA aims to merge it to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8806) Inconsistent metrics: number of missing blocks with replication factor 1 not properly cleared
Zhe Zhang created HDFS-8806: --- Summary: Inconsistent metrics: number of missing blocks with replication factor 1 not properly cleared Key: HDFS-8806 URL: https://issues.apache.org/jira/browse/HDFS-8806 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang HDFS-7165 introduced a new metric for _number of missing blocks with replication factor 1_. It is maintained as {{UnderReplicatedBlocks#corruptReplOneBlocks}}. However, that variable is not reset when other {{UnderReplicatedBlocks}} are cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature
Zhe Zhang created HDFS-8801: --- Summary: Convert BlockInfoUnderConstruction as a feature Key: HDFS-8801 URL: https://issues.apache.org/jira/browse/HDFS-8801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Per discussion under HDFS-8499, with the erasure coding feature, there will be 4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution was building feature classes like {{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on {{BlockInfo}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8784) BlockInfo#numNodes should be numStorages
Zhe Zhang created HDFS-8784: --- Summary: BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned
Zhe Zhang created HDFS-8786: --- Summary: Erasure coding: DataNode should transfer striped blocks before being decommissioned Key: HDFS-8786 URL: https://issues.apache.org/jira/browse/HDFS-8786 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Per [discussion | https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004] under HDFS-8697, it's too expensive to reconstruct block groups for decomm purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk
Zhe Zhang created HDFS-8787: --- Summary: Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk Key: HDFS-8787 URL: https://issues.apache.org/jira/browse/HDFS-8787 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang As Nicholas suggested under HDFS-8728, we should split the patch on {{BlockInfo}} structure into smaller pieces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8751) Remove setBlocks API from INodeFile and misc code cleanup
Zhe Zhang created HDFS-8751: --- Summary: Remove setBlocks API from INodeFile and misc code cleanup Key: HDFS-8751 URL: https://issues.apache.org/jira/browse/HDFS-8751 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang The public {{INodeFile#setBlocks}} API, when used outside {{INodeFile}}, is always called with {{null}}. Therefore we should replace it with a safer {{clearBlocks}} API. Also merging code cleanups from HDFS-7285 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8497) ErasureCodingWorker fails to do decode work
[ https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8497. - Resolution: Duplicate Thanks for the comment Jing. Closing this as a duplicate of HDFS-8328. ErasureCodingWorker fails to do decode work --- Key: HDFS-8497 URL: https://issues.apache.org/jira/browse/HDFS-8497 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8497-HDFS-7285-01.patch When I run the unit test in HDFS-8449, it fails due to the decode error in ErasureCodingWorker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reopened HDFS-8058: - Per discussion under HDFS-7285 and HDFS-8728, we should revisit the use of {{BlockInfoStriped}} and {{BlockInfoContiguous}} before merging into trunk. Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile --- Key: HDFS-8058 URL: https://issues.apache.org/jira/browse/HDFS-8058 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8058.001.patch, HDFS-8058.002.patch This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous blocks in INodeFile. Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped blocks, and the methods there duplicate with those in INodeFile, and current code need to judge {{isStriped}} then do different things. Also if file is striped, the {{blocks}} in INodeFile occupy a reference memory space. These are not necessary, and we can use the same {{blocks}} to make code more clear. I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from *BlockInfoStriped* to INodeFile, since ideally they are the same for all striped blocks in a file, and store them in block will waste NN memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)