[jira] [Created] (HDFS-13983) TestOfflineImageViewer crashes in windows
Vinayakumar B created HDFS-13983: Summary: TestOfflineImageViewer crashes in windows Key: HDFS-13983 URL: https://issues.apache.org/jira/browse/HDFS-13983 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B TestOfflineImageViewer crashes in windows because, OfflineImageViewer REVERSEXML tries to delete the outputfile and re-create the same stream which is already created. Also there are unclosed RAF for input files which blocks from files being deleted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13816) dfs.getQuotaUsage() throws NPE on non-existent dir instead of FileNotFoundException
Vinayakumar B created HDFS-13816: Summary: dfs.getQuotaUsage() throws NPE on non-existent dir instead of FileNotFoundException Key: HDFS-13816 URL: https://issues.apache.org/jira/browse/HDFS-13816 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B {{dfs.getQuotaUsage()}} on non-existent path should throw FileNotFoundException. {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getQuotaUsageInt(FSDirStatAndListingOp.java:573) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getQuotaUsage(FSDirStatAndListingOp.java:554) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getQuotaUsage(FSNamesystem.java:3221) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getQuotaUsage(NameNodeRpcServer.java:1404) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getQuotaUsage(ClientNamenodeProtocolServerSideTranslatorPB.java:1861) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13027) Handle NPE due to deleted blocks in race condition
Vinayakumar B created HDFS-13027: Summary: Handle NPE due to deleted blocks in race condition Key: HDFS-13027 URL: https://issues.apache.org/jira/browse/HDFS-13027 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B Since File deletions and Block removal from BlocksMap done in separate locks, there are possibilities of NPE due to calls of {{blockManager.getBlockCollection(block)}} returning null. Handle all possibilities of NPEs due to this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13011) Support replacing multiple nodes during pipeline recovery and append
Vinayakumar B created HDFS-13011: Summary: Support replacing multiple nodes during pipeline recovery and append Key: HDFS-13011 URL: https://issues.apache.org/jira/browse/HDFS-13011 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Vinayakumar B During pipeline recovery only one additional node will be asked and will be replaced with failed node. But if initial pipeline size is less than replication, then extra nodes could be added during pipeline recovery to satisfy the replication during write itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12212) Options.Rename.To_TRASH is considered even when Options.Rename.NONE is specified
Vinayakumar B created HDFS-12212: Summary: Options.Rename.To_TRASH is considered even when Options.Rename.NONE is specified Key: HDFS-12212 URL: https://issues.apache.org/jira/browse/HDFS-12212 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B HDFS-8312 introduced {{Options.Rename.TO_TRASH}} to differentiate the movement to trash and other renames for permission checks. When Options.Rename.NONE is passed also TO_TRASH is considered for rename and wrong permissions are checked for rename. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12157) Do fsyncDirectory(..) outside of FSDataset lock
Vinayakumar B created HDFS-12157: Summary: Do fsyncDirectory(..) outside of FSDataset lock Key: HDFS-12157 URL: https://issues.apache.org/jira/browse/HDFS-12157 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Vinayakumar B Priority: Critical -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12120) Use new block for pre-RollingUpgrade files` append requests
Vinayakumar B created HDFS-12120: Summary: Use new block for pre-RollingUpgrade files` append requests Key: HDFS-12120 URL: https://issues.apache.org/jira/browse/HDFS-12120 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B After the RollingUpgrade prepare, append on pre-RU files will re-open the same last block and makes changes to it (appending extra data, changing genstamp etc). These changes to the block will not be tracked in Datanodes (either in trash or via hardlinks) This creates problem if RollingUpgrade.Rollback is called. Since block state and size both changed, after rollback block will be marked corrupted. To avoid this, first time append on pre-RU files can be forced to write to new block itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11898) DFSClient#isHedgedReadsEnabled() should be per client flag
Vinayakumar B created HDFS-11898: Summary: DFSClient#isHedgedReadsEnabled() should be per client flag Key: HDFS-11898 URL: https://issues.apache.org/jira/browse/HDFS-11898 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Vinayakumar B Assignee: Vinayakumar B DFSClient#isHedgedReadsEnabled() returns value based on static {{HEDGED_READ_THREAD_POOL}}. Hence if any of the client initialized this in JVM, all remaining client reads will be going through hedged read itself. This flag should be per client value. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11889) Datanodes should skip reading replicas from cache during Rollback
Vinayakumar B created HDFS-11889: Summary: Datanodes should skip reading replicas from cache during Rollback Key: HDFS-11889 URL: https://issues.apache.org/jira/browse/HDFS-11889 Project: Hadoop HDFS Issue Type: Bug Components: datanode, rolling upgrades Reporter: Vinayakumar B -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11875) Sort last block's locations for append
Vinayakumar B created HDFS-11875: Summary: Sort last block's locations for append Key: HDFS-11875 URL: https://issues.apache.org/jira/browse/HDFS-11875 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Last block's locations are not sorted in shortest node first manner as done for new block allocation. Sort the nodes before giving lastblock for append. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11856) Ability to re-add Upgrading Nodes (remote) to pipeline for future pipeline updates
Vinayakumar B created HDFS-11856: Summary: Ability to re-add Upgrading Nodes (remote) to pipeline for future pipeline updates Key: HDFS-11856 URL: https://issues.apache.org/jira/browse/HDFS-11856 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, rolling upgrades Affects Versions: 2.7.3 Reporter: Vinayakumar B Assignee: Vinayakumar B During rolling upgrade if the DN gets restarted, then it will send special OOB_RESTART status to all streams opened for write. 1. Local clients will wait for 30 seconds to datanode to come back. 2. Remote clients will consider these nodes as bad nodes and continue with pipeline recoveries and write. These restarted nodes will be considered as bad, and will be excluded for lifetime of stream. In case of small cluster, where total nodes itself is 3, each time a remote node restarts for upgrade, it will be excluded. So a stream writing to 3 nodes initial, will end-up writing to only one node at the end, there are no other nodes to replace. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11710) hadoop-hdfs-native-client build fails in trunk after HDFS-11529
Vinayakumar B created HDFS-11710: Summary: hadoop-hdfs-native-client build fails in trunk after HDFS-11529 Key: HDFS-11710 URL: https://issues.apache.org/jira/browse/HDFS-11710 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 3.0.0-alpha3 Reporter: Vinayakumar B Priority: Blocker HDFS-11529 used 'hdfsThreadDestructor()' in jni_helper.c. But this function is implemented in only "posix/thread_local_storage.c" NOT in "windows/thread_local_storage.c" Fails with following errors {noformat} [exec] hdfs.dir\RelWithDebInfo\thread_local_storage.obj /machine:x64 /debug [exec] Creating library D:/hadoop/work/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/bin/RelWithDebInfo/hdfs.lib and object D:/hadoop/work/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/bin/RelWithDebInfo/hdfs.exp [exec] jni_helper.obj : error LNK2019: unresolved external symbol hdfsThreadDestructor referenced in function getJNIEnv [D:\hadoop\work\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\hdfs.vcxproj] [exec] D:\hadoop\work\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\bin\RelWithDebInfo\hdfs.dll : fatal error LNK1120: 1 unresolved externals [D:\hadoop\work\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\hdfs.vcxproj] {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11708) positional read will fail if replicas moved to different DNs after stream is opened
Vinayakumar B created HDFS-11708: Summary: positional read will fail if replicas moved to different DNs after stream is opened Key: HDFS-11708 URL: https://issues.apache.org/jira/browse/HDFS-11708 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.3 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Scenario: 1. File was written to DN1, DN2 with RF=2 2. File stream opened to read and kept. Block Locations are [DN1,DN2] 3. One of the replica (DN2) moved to another datanode (DN3) due to datanode dead/balancing/etc. 4. Latest block locations in NameNode will be DN1 and DN3. 5. DN1 went down, but not yet detected as dead in NameNode. 6. Client start reading using positional read api "read(pos, buf[], offset, length)" -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
Vinayakumar B created HDFS-11674: Summary: reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered Key: HDFS-11674 URL: https://issues.apache.org/jira/browse/HDFS-11674 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Vinayakumar B Assignee: Vinayakumar B Scenario: 1. 3 Node cluster with "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT Block is written with x data. 2. One of the Datanode, NOT the first DN, is down 3. Client tries to append data to block and fails since one DN is down. 4. calls recoverLease() on the file. 5. Successfull recovery happens. Issue: 1. DNs which were connected from client before encountering mirror down, will have the reservedSpaceForReplicas incremented, BUT never decremented. 2. So in long run DN's all space will be in reservedSpaceForReplicas resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11621) Rename 'seq.io.sort.mb' and 'seq.io.sort.factor' with prefix 'io.seqfile'
Vinayakumar B created HDFS-11621: Summary: Rename 'seq.io.sort.mb' and 'seq.io.sort.factor' with prefix 'io.seqfile' Key: HDFS-11621 URL: https://issues.apache.org/jira/browse/HDFS-11621 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B HADOOP-6801 introduced new configs 'seq.io.sort.mb' and 'seq.io.sort.factor' . These can be renamed to have prefix 'io.seqfile' to be consistent with other configs related to sequence file. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11224) Lifeline message should be ignored for dead nodes
Vinayakumar B created HDFS-11224: Summary: Lifeline message should be ignored for dead nodes Key: HDFS-11224 URL: https://issues.apache.org/jira/browse/HDFS-11224 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Lifeline messages should be ignored for dead nodes in NameNode. Otherwise, cluster level stats such as capacity, used, etc will be doubled, after re-registration of node. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11213) FilterFileSystem should override rename(.., options) to take effect of Rename options called via FilterFileSystem implementations
Vinayakumar B created HDFS-11213: Summary: FilterFileSystem should override rename(.., options) to take effect of Rename options called via FilterFileSystem implementations Key: HDFS-11213 URL: https://issues.apache.org/jira/browse/HDFS-11213 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B HDFS-8312 Added Rename.TO_TRASH option to add a security check before moving to trash. But for FilterFileSystem implementations since this rename(..options) is not overridden, it uses default FileSystem implementation where Rename.TO_TRASH option is not delegated to NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11212) FilterFileSystem should override rename(.., options) to take effect of Rename options called via FilterFileSystem implementations
Vinayakumar B created HDFS-11212: Summary: FilterFileSystem should override rename(.., options) to take effect of Rename options called via FilterFileSystem implementations Key: HDFS-11212 URL: https://issues.apache.org/jira/browse/HDFS-11212 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B HDFS-8312 Added Rename.TO_TRASH option to add a security check before moving to trash. But for FilterFileSystem implementations since this rename(..options) is not overridden, it uses default FileSystem implementation where Rename.TO_TRASH option is not delegated to NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11098) Datanode in tests cannot start in Windows after HDFS-10368
Vinayakumar B created HDFS-11098: Summary: Datanode in tests cannot start in Windows after HDFS-10368 Key: HDFS-11098 URL: https://issues.apache.org/jira/browse/HDFS-11098 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0-alpha2 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Blocker After HDFS-10368 Starting datanode's in MiniDFSCluster in windows throws below exception {noformat}java.lang.IllegalArgumentException: URI: file:/D:/code/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1 is not in the expected format at org.apache.hadoop.hdfs.server.datanode.StorageLocation.(StorageLocation.java:68) at org.apache.hadoop.hdfs.server.datanode.StorageLocation.parse(StorageLocation.java:123) at org.apache.hadoop.hdfs.server.datanode.DataNode.getStorageLocations(DataNode.java:2561) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2545) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1613) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:860) at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:491) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:450) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11067) DFS#listStatusIterator(..) should throw FileNotFoundException if the directory deleted before fetching next batch of entries
Vinayakumar B created HDFS-11067: Summary: DFS#listStatusIterator(..) should throw FileNotFoundException if the directory deleted before fetching next batch of entries Key: HDFS-11067 URL: https://issues.apache.org/jira/browse/HDFS-11067 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Vinayakumar B Assignee: Vinayakumar B DFS#listStatusIterator(..) currently stops iterating silently when the directory gets deleted before fetching the next batch of entries. It should throw FileNotFoundException() and let user know that file is deleted in the middle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10957) Retire BKJM from trunk
Vinayakumar B created HDFS-10957: Summary: Retire BKJM from trunk Key: HDFS-10957 URL: https://issues.apache.org/jira/browse/HDFS-10957 Project: Hadoop HDFS Issue Type: Task Reporter: Vinayakumar B Assignee: Vinayakumar B Based on discussions done in mailing list [here|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201607.mbox/%3c1288296033.5942327.1469727453010.javamail.ya...@mail.yahoo.com%3E] BKJM is no longer required as QJM supports the required functionality with minimum deployment overhead. This Jira is for tracking purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10901) QJM should not consider stale/failed txn available in any one of JNs.
Vinayakumar B created HDFS-10901: Summary: QJM should not consider stale/failed txn available in any one of JNs. Key: HDFS-10901 URL: https://issues.apache.org/jira/browse/HDFS-10901 Project: Hadoop HDFS Issue Type: Bug Components: qjm Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical In one of our cluster faced an issue, where NameNode restart failed due to a stale/failed txn available in one JN but not others. Scenario is: 1. Full cluster restart 2. startLogSegment Txn(195222) synced in Only one JN but failed to others, because they were shutting down. Only editlog file was created but txn was not synced in others, so after restart they were marked as empty. 3. Cluster restarted. During failover, this new logSegment missed the recovery because this JN was slow in responding to this call. 4. Other JNs recover was successfull, as there was no in-progress files. 5. editlog.openForWrite() detected that (195222) was already available, and failed the failover. Same steps repeated until that stale editlog in JN was manually deleted. Since QJM is a quorum of JNs, txn is considered successfull, if its written min quorum. Otherwise it will be failed. So, same case should be applied while selecting streams for reading also. Stale/failed txns available in only less JNs should not be considered for reading. HDFS-10519, does similar work to consider 'durable' txns based on 'committedTxnId'. But updating 'committedTxnId' for every flush with one more RPC seems tobe problematic to performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10902) QJM should not consider stale/failed txn available in any one of JNs.
Vinayakumar B created HDFS-10902: Summary: QJM should not consider stale/failed txn available in any one of JNs. Key: HDFS-10902 URL: https://issues.apache.org/jira/browse/HDFS-10902 Project: Hadoop HDFS Issue Type: Bug Components: qjm Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical In one of our cluster faced an issue, where NameNode restart failed due to a stale/failed txn available in one JN but not others. Scenario is: 1. Full cluster restart 2. startLogSegment Txn(195222) synced in Only one JN but failed to others, because they were shutting down. Only editlog file was created but txn was not synced in others, so after restart they were marked as empty. 3. Cluster restarted. During failover, this new logSegment missed the recovery because this JN was slow in responding to this call. 4. Other JNs recover was successfull, as there was no in-progress files. 5. editlog.openForWrite() detected that (195222) was already available, and failed the failover. Same steps repeated until that stale editlog in JN was manually deleted. Since QJM is a quorum of JNs, txn is considered successfull, if its written min quorum. Otherwise it will be failed. So, same case should be applied while selecting streams for reading also. Stale/failed txns available in only less JNs should not be considered for reading. HDFS-10519, does similar work to consider 'durable' txns based on 'committedTxnId'. But updating 'committedTxnId' for every flush with one more RPC seems tobe problematic to performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10570) Netty-all jar should be first in class path while running tests in eclipse
Vinayakumar B created HDFS-10570: Summary: Netty-all jar should be first in class path while running tests in eclipse Key: HDFS-10570 URL: https://issues.apache.org/jira/browse/HDFS-10570 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor While debugging tests in eclipse, Cannot access DN http url. Also WebHdfs tests cannot run in eclipse due to classes loading from old version of netty jars instead of netty-all jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10384) MiniDFSCluster doesnt support multiple HTTPS server instances
Vinayakumar B created HDFS-10384: Summary: MiniDFSCluster doesnt support multiple HTTPS server instances Key: HDFS-10384 URL: https://issues.apache.org/jira/browse/HDFS-10384 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Vinayakumar B Assignee: Vinayakumar B MiniDFSCluster doesnot support Multiple instances of HTTPS servers. Because it will not use the ephemeral port for HTTPS addresses of NameNode and DataNode. So second instance will get PortAlreadyInUse exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10274) Move NameSystem#isInStartupSafeMode() to BlockManagerSafeMode
Vinayakumar B created HDFS-10274: Summary: Move NameSystem#isInStartupSafeMode() to BlockManagerSafeMode Key: HDFS-10274 URL: https://issues.apache.org/jira/browse/HDFS-10274 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B To reduce the number of methods in Namesystem interface and for clean looking refactor, its better to move {{isInStartupSafeMode()}} to BlockManager and BlockManagerSafeMode, as most of the callers are in BlockManager. So one more interface overhead can be reduced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10273) Remove duplicate logSync() and log message in FSN#enterSafemode()
Vinayakumar B created HDFS-10273: Summary: Remove duplicate logSync() and log message in FSN#enterSafemode() Key: HDFS-10273 URL: https://issues.apache.org/jira/browse/HDFS-10273 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor Remove duplicate logSync() and log message in FSN#enterSafemode() {code:title=FSN#enterSafemode(..)} // Before Editlog is in OpenForWrite mode, editLogStream will be null. So, // logSyncAll call can be called only when Edlitlog is in OpenForWrite mode if (isEditlogOpenForWrite) { getEditLog().logSyncAll(); } setManualAndResourceLowSafeMode(!resourcesLow, resourcesLow); NameNode.stateChangeLog.info("STATE* Safe mode is ON.\n" + getSafeModeTip()); if (isEditlogOpenForWrite) { getEditLog().logSyncAll(); } NameNode.stateChangeLog.info("STATE* Safe mode is ON" + getSafeModeTip()); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-10252) Is DataNode aware of the name of the file that it is going to store?
[ https://issues.apache.org/jira/browse/HDFS-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-10252. -- Resolution: Invalid > Is DataNode aware of the name of the file that it is going to store? > > > Key: HDFS-10252 > URL: https://issues.apache.org/jira/browse/HDFS-10252 > Project: Hadoop HDFS > Issue Type: Test > Components: datanode, namenode >Reporter: Dimitrios Sarigiannis >Priority: Minor > > I am going through the HDFS Namenode and Datanode code and I am trying to see > if the DataNode is aware of the names of the files that are stored in it (and > other metadata as well). > Assuming that we have the most simple case: > 1 NameNode > 1 DataNode > 1 single machine running HDFS with replication factor 1. > and considering the way HDFS works a use case could be: > A client requests to write a file from local to HDFS (for example: "hdfs dfs > -put file /file") > He first communicates with NameNode and gets where this file should be stored. > Then, after receiving an answer, he requests to the DataNode to store that > file. > (At that point I am going to be a little more specific about the code) > The DataNode has a DataXceiverServer class which runs and waits for requests. > When a request comes, it starts a DataXceiver thread and try to serve that > request. What I would like to know is, if at that specific point the DataNode > knows the name of the file that it is going to store. I spent hours of > debugging but I could not find it. Is it somewhere there, or only the > NameNode knows the name of that file? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9952) Expose FSNamesystem lock wait time as metrics
Vinayakumar B created HDFS-9952: --- Summary: Expose FSNamesystem lock wait time as metrics Key: HDFS-9952 URL: https://issues.apache.org/jira/browse/HDFS-9952 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Expose FSNameSystem's readlock() and writeLock() wait time as metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8578. - Resolution: Fixed Fixed the compilation in branch-2.7 > On upgrade, Datanode should process all storage/data dirs in parallel > - > > Key: HDFS-8578 > URL: https://issues.apache.org/jira/browse/HDFS-8578 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Raju Bairishetti >Assignee: Vinayakumar B >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, > HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, > HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, > HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, > HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, > HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, > HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, > HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, > h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, > h8578_20151212.patch, h8578_20151213.patch, h8578_20160117.patch, > h8578_20160128.patch, h8578_20160128b.patch, h8578_20160216.patch, > h8578_20160218-branch-2.7-addendum.patch, h8578_20160218.patch > > > Right now, during upgrades datanode is processing all the storage dirs > sequentially. Assume it takes ~20 mins to process a single storage dir then > datanode which has ~10 disks will take around 3hours to come up. > *BlockPoolSliceStorage.java* > {code} >for (int idx = 0; idx < getNumStorageDirs(); idx++) { > doTransition(datanode, getStorageDir(idx), nsInfo, startOpt); > assert getCTime() == nsInfo.getCTime() > : "Data-node and name-node CTimes must be the same."; > } > {code} > It would save lots of time during major upgrades if datanode process all > storagedirs/disks parallelly. > Can we make datanode to process all storage dirs parallelly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9783) DataNode deadlocks after running hdfs mover
[ https://issues.apache.org/jira/browse/HDFS-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-9783. - Resolution: Duplicate Resolving as duplicate. Feel free to re-open if fix in HDFS-9661 doesnt work for you. > DataNode deadlocks after running hdfs mover > --- > > Key: HDFS-9783 > URL: https://issues.apache.org/jira/browse/HDFS-9783 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: David Watzke > Attachments: datanode.stack.txt > > > We're running CDH 5.4.4 (with Hadoop 2.6.0). Recently we've added 800T of > ARCHIVE storage, marked some data (16T * 2 repl. factor) as "COLD" and ran > the hdfs mover to enforce the storage policy. > After a few minutes, the mover hangs because one or more datanodes hang as > well. Please check out the deadlock revealed by jstack. Also, here's what > appeared in DN log: > 2016-02-09 15:58:15,676 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Not able to copy block 1157230686 to /5.45.60.142:40144 because threads quota > is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9776) Truncate block recovery will never happen if the Namenode goes down immediately after truncate
Vinayakumar B created HDFS-9776: --- Summary: Truncate block recovery will never happen if the Namenode goes down immediately after truncate Key: HDFS-9776 URL: https://issues.apache.org/jira/browse/HDFS-9776 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Initial analysys of Recent test failure in {{TestHAAppend#testMultipleAppendsDuringCatchupTailing}} [here|https://builds.apache.org/job/PreCommit-HDFS-Build/14420/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestHAAppend/testMultipleAppendsDuringCatchupTailing/] has found that, if the Active NameNode goes down immediately after truncate operation, but before BlockRecovery command sent to datanode, Then this block will never be truncated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9536) OOM errors during parallel upgrade to Block-ID based layout
Vinayakumar B created HDFS-9536: --- Summary: OOM errors during parallel upgrade to Block-ID based layout Key: HDFS-9536 URL: https://issues.apache.org/jira/browse/HDFS-9536 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B This is a follow-up jira for the OOM errors observed during parallel upgrade to Block-ID based datanode layout using HDFS-8578 fix. more clue [here|https://issues.apache.org/jira/browse/HDFS-8578?focusedCommentId=15042012=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15042012] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9127) Re-replication for files with enough replicas in single rack
[ https://issues.apache.org/jira/browse/HDFS-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-9127. - Resolution: Invalid This problem doesnt exist on latest code. Feel free to re-open if found again. > Re-replication for files with enough replicas in single rack > > > Key: HDFS-9127 > URL: https://issues.apache.org/jira/browse/HDFS-9127 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > > Found while debugging testcases in HDFS-8647 > *Scenario:* > === > Start a cluster with Single rack with three DN's > write a file with RF=3 > adde two Nodes with different racks > As per blockplacement policy ([Rack > Awareness|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html]) > atleast one replica needs to replicate to newly added rack.But it is not > happening..Because of following reason. > {color:blue} > when cluster was single rack,block will be removed from > {{neededReplications}} after 3 replicas. > later, after adding new rack, only replications will happen which are present > in {{neededReplications}} > So for the blocks which already have enough replicas, new rack replications > will not take place.. > {color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9411) HDFS ZoneLabel support
Vinayakumar B created HDFS-9411: --- Summary: HDFS ZoneLabel support Key: HDFS-9411 URL: https://issues.apache.org/jira/browse/HDFS-9411 Project: Hadoop HDFS Issue Type: New Feature Reporter: Vinayakumar B Assignee: Vinayakumar B HDFS currently stores data blocks on different datanodes chosen by BlockPlacement Policy. These datanodes are random within the scope(local-rack/different-rack/nodegroup) of network topology. In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant can be on any datanodes. Based on applications of different tenant, sometimes datanode might get busy making the other tenant's application to slow down. It would be better if admin's have a provision to logically divide the cluster among multi-tenants. ZONE_LABELS can logically divide the cluster datanodes into multiple Zones. High level design doc to follow soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9139) Enable parallel JUnit tests for HDFS Pre-commit
Vinayakumar B created HDFS-9139: --- Summary: Enable parallel JUnit tests for HDFS Pre-commit Key: HDFS-9139 URL: https://issues.apache.org/jira/browse/HDFS-9139 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Forked from HADOOP-11984, With the initial and significant work from [~cnauroth], this Jira is to track and support parallel tests' run for HDFS Precommit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9049) Make Datanode Netty reverse proxy port to be configurable
Vinayakumar B created HDFS-9049: --- Summary: Make Datanode Netty reverse proxy port to be configurable Key: HDFS-9049 URL: https://issues.apache.org/jira/browse/HDFS-9049 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Vinayakumar B Assignee: Vinayakumar B In DatanodeHttpServer.java Netty is used as reverse proxy. But uses random port to start with binding to localhost. This port can be made configurable for better deployments. {code} HttpServer2.Builder builder = new HttpServer2.Builder() .setName("datanode") .setConf(confForInfoServer) .setACL(new AccessControlList(conf.get(DFS_ADMIN, " "))) .hostName(getHostnameForSpnegoPrincipal(confForInfoServer)) .addEndpoint(URI.create("http://localhost:0;)) .setFindPort(true); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8976) Create HTML5 cluster webconsole for federated cluster
Vinayakumar B created HDFS-8976: --- Summary: Create HTML5 cluster webconsole for federated cluster Key: HDFS-8976 URL: https://issues.apache.org/jira/browse/HDFS-8976 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.7.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Since the old jsp variant of cluster web console is no longer present from 2.7 onwards, there is a need for HTML 5 web console for overview of overall cluster. 2.7.1 docs says to check webconsole as below {noformat}Similar to the Namenode status web page, when using federation a Cluster Web Console is available to monitor the federated cluster at http://any_nn_host:port/dfsclusterhealth.jsp. Any Namenode in the cluster can be used to access this web page.{noformat} But this is no longer present, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8882) Use datablocks, parityblocks and cell size from ec zone
Vinayakumar B created HDFS-8882: --- Summary: Use datablocks, parityblocks and cell size from ec zone Key: HDFS-8882 URL: https://issues.apache.org/jira/browse/HDFS-8882 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B As part of earlier development, constants were used for datablocks, parity blocks and cellsize. Now all these are available in ec zone. Use from there and stop using constant values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8822) Add SSD storagepolicy tests in TestBlockStoragePolicy#testDefaultPolicies
Vinayakumar B created HDFS-8822: --- Summary: Add SSD storagepolicy tests in TestBlockStoragePolicy#testDefaultPolicies Key: HDFS-8822 URL: https://issues.apache.org/jira/browse/HDFS-8822 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Add tests for storage policies ALLSSD and ONESSD in {{TestBlockStoragePolicy#testDefaultPolicies(..)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8811) Move BlockStoragePolicy name's constants from HdfsServerConstants.java to HdfsConstants.java
Vinayakumar B created HDFS-8811: --- Summary: Move BlockStoragePolicy name's constants from HdfsServerConstants.java to HdfsConstants.java Key: HDFS-8811 URL: https://issues.apache.org/jira/browse/HDFS-8811 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Currently {{HdfsServerConstants.java}} have following constants, {code} String HOT_STORAGE_POLICY_NAME = HOT; String WARM_STORAGE_POLICY_NAME = WARM; String COLD_STORAGE_POLICY_NAME = COLD;{code} and {{HdfsConstants.java}} have the following {code} public static final String MEMORY_STORAGE_POLICY_NAME = LAZY_PERSIST; public static final String ALLSSD_STORAGE_POLICY_NAME = ALL_SSD; public static final String ONESSD_STORAGE_POLICY_NAME = ONE_SSD;{code} It would be better to move all these to one place HdfsConstants.java, which client APIs also could access since this presents in hdfs-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8703) Merge refactor of DFSInputStream from ErasureCoding branch
Vinayakumar B created HDFS-8703: --- Summary: Merge refactor of DFSInputStream from ErasureCoding branch Key: HDFS-8703 URL: https://issues.apache.org/jira/browse/HDFS-8703 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B There were some refactors done in DFSInputStream for the support of ErasureCoding in branch HDFS-7285. These refactors are generic and applicable to current trunk This Jira targets to merge them back to trunk to reduce size of the final merge patch for the branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8605) Merge Refactor of DFSOutputStream from HDFS-7285 branch
Vinayakumar B created HDFS-8605: --- Summary: Merge Refactor of DFSOutputStream from HDFS-7285 branch Key: HDFS-8605 URL: https://issues.apache.org/jira/browse/HDFS-8605 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Merging the refactor of DFSOutput stream from HDFS-7285 branch This will make things easy while merging changes from trunk periodically to HDFS-7285. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8571) Fix TestErasureCodingCli test
Vinayakumar B created HDFS-8571: --- Summary: Fix TestErasureCodingCli test Key: HDFS-8571 URL: https://issues.apache.org/jira/browse/HDFS-8571 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B TestErasureCodingCli test is failing due to changes done in HDFS-8556. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8556) Erasure Coding: Fix usage of 'createZone'
Vinayakumar B created HDFS-8556: --- Summary: Erasure Coding: Fix usage of 'createZone' Key: HDFS-8556 URL: https://issues.apache.org/jira/browse/HDFS-8556 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor Current usage for '-createZone' does not include 'cellSize' in the syntax But it shows in the detailed options. Add 'cellSize' also to one-line usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8505) Truncate should not be success when Truncate Size and Current Size are equal.
[ https://issues.apache.org/jira/browse/HDFS-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8505. - Resolution: Not A Problem Resolved as 'Not a Problem' as suggested. Truncate should not be success when Truncate Size and Current Size are equal. - Key: HDFS-8505 URL: https://issues.apache.org/jira/browse/HDFS-8505 Project: Hadoop HDFS Issue Type: Bug Reporter: Archana T Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-8505.patch Truncate should not be success when Truncate Size and Current Size are equal. $ ./hdfs dfs -cat /file abcdefgh $ ./hdfs dfs -truncate -w 2 /file Waiting for /file ... Truncated /file to length: 2 $ ./hdfs dfs -cat /file ab {color:red} $ ./hdfs dfs -truncate -w 2 /file Truncated /file to length: 2 {color} $ ./hdfs dfs -cat /file ab Expecting to throw Truncate Error: -truncate: Cannot truncate to a larger file size. Current size: 2, truncate size: 2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8505) Truncate should not be success when Truncate Size and Current Size are equal.
[ https://issues.apache.org/jira/browse/HDFS-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B reopened HDFS-8505: - Truncate should not be success when Truncate Size and Current Size are equal. - Key: HDFS-8505 URL: https://issues.apache.org/jira/browse/HDFS-8505 Project: Hadoop HDFS Issue Type: Bug Reporter: Archana T Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-8505.patch Truncate should not be success when Truncate Size and Current Size are equal. $ ./hdfs dfs -cat /file abcdefgh $ ./hdfs dfs -truncate -w 2 /file Waiting for /file ... Truncated /file to length: 2 $ ./hdfs dfs -cat /file ab {color:red} $ ./hdfs dfs -truncate -w 2 /file Truncated /file to length: 2 {color} $ ./hdfs dfs -cat /file ab Expecting to throw Truncate Error: -truncate: Cannot truncate to a larger file size. Current size: 2, truncate size: 2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8524) Add tests for StartupProgress with using MiniDFSCluster
Vinayakumar B created HDFS-8524: --- Summary: Add tests for StartupProgress with using MiniDFSCluster Key: HDFS-8524 URL: https://issues.apache.org/jira/browse/HDFS-8524 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Current tests in {{TestStartupProgress}} are just unit tests. Need to tests real usages to avoid issues like HDFS-8470. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8466) Refactor BlockInfoContiguous and fix NPE in TestBlockInfo#testCopyConstructor()
Vinayakumar B created HDFS-8466: --- Summary: Refactor BlockInfoContiguous and fix NPE in TestBlockInfo#testCopyConstructor() Key: HDFS-8466 URL: https://issues.apache.org/jira/browse/HDFS-8466 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Vinayakumar B Assignee: Vinayakumar B HDFS-7716 refactored BlockInfoContiguous.java Since then TestBlockInfo#testCopyConstructor(..) fails with NPE. Along with fixing test failure, some of the code can be refactored to re-use code from BlockInfo.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8468) 2 RPC calls for every file read in DFSClient#open(..) resulting in double Audit log entries
Vinayakumar B created HDFS-8468: --- Summary: 2 RPC calls for every file read in DFSClient#open(..) resulting in double Audit log entries Key: HDFS-8468 URL: https://issues.apache.org/jira/browse/HDFS-8468 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B In HDFS-7285 branch, To determine whether file is striped/not and get the Schema for the file, 2 RPCs done to Namenode. This is resulting in double audit logs for every file read for both striped/non-striped. This will be a major impact in size of audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8408) Revisit and refactor ErasureCodingInfo
Vinayakumar B created HDFS-8408: --- Summary: Revisit and refactor ErasureCodingInfo Key: HDFS-8408 URL: https://issues.apache.org/jira/browse/HDFS-8408 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B As mentioned in HDFS-8375 [here|https://issues.apache.org/jira/browse/HDFS-8375?focusedCommentId=14544618page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14544618] {{ErasureCodingInfo}} needs a revisit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8369) TestHdfsConfigFields is placed in wrong dir, introducing compile error
[ https://issues.apache.org/jira/browse/HDFS-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8369. - Resolution: Duplicate TestHdfsConfigFields is placed in wrong dir, introducing compile error -- Key: HDFS-8369 URL: https://issues.apache.org/jira/browse/HDFS-8369 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-8369-01.patch HDFS-7559 has introduced a Test file {{TestHdfsConfigFields}} which was committed in package {{org.apache.hadoop.tools}} But the package declaration inside file is {{org.apache.hadoop.hdfs.tools}} By surprise, this is not giving any compile errors in maven build. But eclipse catches it. So move {{TestHdfsConfigFields}} to correct package {{org.apache.hadoop.hdfs.tools}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8369) TestHdfsConfigFields is placed in wrong dir, introducing compile error
Vinayakumar B created HDFS-8369: --- Summary: TestHdfsConfigFields is placed in wrong dir, introducing compile error Key: HDFS-8369 URL: https://issues.apache.org/jira/browse/HDFS-8369 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B HDFS-7559 has introduced a Test file {{TestHdfsConfigFields }} which was committed in package {{org.apache.hadoop.tools}} But the package declaration inside file is {{org.apache.hadoop.hdfs.tools}} By surprise, this is not giving any compile errors in maven build. But eclipse catches it. So move {{TestHdfsConfigFields}} to correct package {{org.apache.hadoop.hdfs.tools}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8374) Remove chunkSize from ECSchema as its not required for coders
Vinayakumar B created HDFS-8374: --- Summary: Remove chunkSize from ECSchema as its not required for coders Key: HDFS-8374 URL: https://issues.apache.org/jira/browse/HDFS-8374 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Remove {{chunkSize}} from ECSchema as discussed [here|https://issues.apache.org/jira/browse/HDFS-8347?focusedCommentId=14539108page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14539108] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8375) Add cellSize as an XAttr to ECZone
Vinayakumar B created HDFS-8375: --- Summary: Add cellSize as an XAttr to ECZone Key: HDFS-8375 URL: https://issues.apache.org/jira/browse/HDFS-8375 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Add {{cellSize}} as an Xattr for ECZone. as discussed [here|https://issues.apache.org/jira/browse/HDFS-8347?focusedCommentId=14539108page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14539108] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7688) Client side api/config changes to support online encoding
[ https://issues.apache.org/jira/browse/HDFS-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7688. - Resolution: Duplicate Client side api/config changes to support online encoding - Key: HDFS-7688 URL: https://issues.apache.org/jira/browse/HDFS-7688 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Vinayakumar B Assignee: Vinayakumar B This Jira targets to handle Client side API changes to support directly erasure encoding with striped blocks from the client side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B reopened HDFS-7916: - Assignee: Rushabh S Shah (was: Vinayakumar B) Assigning to you 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7916-01.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8230) Erasure Coding: Ignore DatanodeProtocol#DNA_ERASURE_CODING_RECOVERY commands from standbynode if any
[ https://issues.apache.org/jira/browse/HDFS-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8230. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed Committed to HDFS-7285 branch. Thanks [~umamaheswararao] for review Erasure Coding: Ignore DatanodeProtocol#DNA_ERASURE_CODING_RECOVERY commands from standbynode if any Key: HDFS-8230 URL: https://issues.apache.org/jira/browse/HDFS-8230 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Uma Maheswara Rao G Assignee: Vinayakumar B Priority: Minor Fix For: HDFS-7285 Attachments: HDFS-8230-01.patch Let's ignore DNA_ERASURE_CODING_RECOVERY command at DN if its coming from standby namenode. Ideally we should not get this commands from standby node. Since we handle to ignore the commands from standby node, we can add this also to be ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8189) ClientProtocol#createErasureCodingZone API was wrongly annotated as Idempotent
[ https://issues.apache.org/jira/browse/HDFS-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8189. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed Committed to HDFS-7285 branch. Thanks [~umamaheswararao] for the review. ClientProtocol#createErasureCodingZone API was wrongly annotated as Idempotent -- Key: HDFS-8189 URL: https://issues.apache.org/jira/browse/HDFS-8189 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Uma Maheswara Rao G Assignee: Vinayakumar B Fix For: HDFS-7285 Attachments: HDFS-8189-01.patch Currently createErasureCodingZone was annotated as Idempotent But it should be annotated as @AtMostOnce as we handle retries via retryCache. {code} @Idempotent public void createErasureCodingZone(String src, ECSchema schema) throws IOException; {code} It will fail to create Zone if its already a zone. So, simply we can not retry by ignoring previous call success. So, we were using retryCache already for handling this situation. {code} if (getECSchema(srcIIP) != null) { throw new IOException(Directory + src + is already in an + erasure coding zone.); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8231) StackTrace displayed at client while QuotaByStorageType exceeds
[ https://issues.apache.org/jira/browse/HDFS-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8231. - Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed StackTrace displayed at client while QuotaByStorageType exceeds --- Key: HDFS-8231 URL: https://issues.apache.org/jira/browse/HDFS-8231 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: J.Andreina Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-8231.00.patch, HDFS-8231.1.patch, HDFS-8231.2.patch StackTrace displayed at client while QuotaByStorageType exceeds. With reference to HDFS-2360, feel better to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8181) createErasureCodingZone sets retryCache state as false always
[ https://issues.apache.org/jira/browse/HDFS-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8181. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed createErasureCodingZone sets retryCache state as false always - Key: HDFS-8181 URL: https://issues.apache.org/jira/browse/HDFS-8181 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: HDFS-7285 Attachments: HDFS-8181-0.patch Currently createErasureCodingZone sets the RetryCache state as false always. Once the operation success it should pass true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8146) Protobuf changes for BlockECRecoveryCommand and its fields for making it ready for transfer to DN
[ https://issues.apache.org/jira/browse/HDFS-8146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8146. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed Committed to branch Protobuf changes for BlockECRecoveryCommand and its fields for making it ready for transfer to DN -- Key: HDFS-8146 URL: https://issues.apache.org/jira/browse/HDFS-8146 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: HDFS-7285 Attachments: HDFS-8146-1.patch, HDFS-8146-2.patch, HDFS-8146.0.patch As part of working on HDFS-8137, we need to prepare BlockECRecoveryCommand, BlockECRecoveryInfo, DatanodeStorageInfo, DatanodeDescripter (We can use DatanodeInfo for proto trasfer) should be ready in proto format for transferring them in command. Since all this code could be straight forward and to have better focussed review on core part, I propose to separate this part in to this JIRA. First I will prepare all this supported classes protobuf ready and then trasnfer them to DN as part of HDFS-8137 by including ECSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8123) Erasure Coding: Better to move EC related proto messages to a separate erasurecoding proto file
[ https://issues.apache.org/jira/browse/HDFS-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8123. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed committed to branch Thanks [~rakeshr] for the contribution. Thanks [~drankye] for the review. Erasure Coding: Better to move EC related proto messages to a separate erasurecoding proto file --- Key: HDFS-8123 URL: https://issues.apache.org/jira/browse/HDFS-8123 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Fix For: HDFS-7285 Attachments: HDFS-8123-001.patch, HDFS-8123-002.patch, HDFS-8123-003.patch, HDFS-8123-004.patch, HDFS-8123-005.patch While reviewing the code I've noticed EC related proto messages are getting added into {{hdfs.proto}}. IMHO, for better maintainability of the erasure code feature, its good to move this to a separate {{erasurecode.proto}} file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7349) Support DFS command for the EC encoding
[ https://issues.apache.org/jira/browse/HDFS-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7349. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed Thanks [~drankye] , [~drankye] and [~rakeshr] for reviews. Committed to branch. Support DFS command for the EC encoding --- Key: HDFS-7349 URL: https://issues.apache.org/jira/browse/HDFS-7349 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: HDFS-7285 Attachments: HDFS-7349-001.patch, HDFS-7349-002.patch, HDFS-7349-003.patch, HDFS-7349-004.patch, HDFS-7349-005.patch, HDFS-7349-006.patch Support implementation of the following commands {noformat}Usage: hdfs erasurecode [generic options] [-createZone [-s schemaName] path] [-getZoneInfo path] [-help [cmd ...]] [-listSchemas] [-usage [cmd ...]]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8023) Erasure Coding: retrieve eraure coding schema for a file from NameNode
[ https://issues.apache.org/jira/browse/HDFS-8023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8023. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed Thanks [~drankye] and [~jingzhao] for reviews. Committed to HDFS-7285 branch. Erasure Coding: retrieve eraure coding schema for a file from NameNode -- Key: HDFS-8023 URL: https://issues.apache.org/jira/browse/HDFS-8023 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Vinayakumar B Fix For: HDFS-7285 Attachments: HDFS-8023-01.patch, HDFS-8023-02.patch NameNode needs to provide RPC call for client and tool to retrieve eraure coding schema for a file from NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8090) Erasure Coding: Add RPC to client-namenode to list all ECSchemas loaded in Namenode.
Vinayakumar B created HDFS-8090: --- Summary: Erasure Coding: Add RPC to client-namenode to list all ECSchemas loaded in Namenode. Key: HDFS-8090 URL: https://issues.apache.org/jira/browse/HDFS-8090 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B ECSchemas will be configured and loaded only at the Namenode to avoid conflicts. Client has to specify one of these schemas during creation of ecZones. So, add an RPC to ClientProtocol to get all ECSchemas loaded at namenode, so that client can choose only any one of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8027) Update CHANGES-HDFS-7285.txt with branch commits
Vinayakumar B created HDFS-8027: --- Summary: Update CHANGES-HDFS-7285.txt with branch commits Key: HDFS-8027 URL: https://issues.apache.org/jira/browse/HDFS-8027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8027) Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits
[ https://issues.apache.org/jira/browse/HDFS-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8027. - Resolution: Fixed Fix Version/s: HDFS-7285 Committed to HDFS-7285 branch, Committed directly as this is only CHANGES-HDFS-7285.txt update. Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits Key: HDFS-8027 URL: https://issues.apache.org/jira/browse/HDFS-8027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: HDFS-7285 Attachments: HDFS-8027-01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
Vinayakumar B created HDFS-7916: --- Summary: 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'
Vinayakumar B created HDFS-7832: --- Summary: Show 'Last Modified' in Namenode's 'Browse Filesystem' Key: HDFS-7832 URL: https://issues.apache.org/jira/browse/HDFS-7832 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B new UI no longer shows the last modified time for a path while browsing. This could be added to make browse file system more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed
[ https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B reopened HDFS-7414: - Namenode got shutdown and can't recover where edit update might be missed - Key: HDFS-7414 URL: https://issues.apache.org/jira/browse/HDFS-7414 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1, 2.6.0, 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Blocker Fix For: 2.7.0 Scenario: Was running mapreduce job. CPU usage crossed 190% for Datanode and machine became slow.. and seen the following exception .. *Did not get the exact root cause, but as cpu usage more edit log updation might be missed...Need dig to more...anyone have any thoughts.* {noformat} 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation CloseOp [length=0, inodeId=0, path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025, replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=162982] | org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459) 2014-11-20 05:01:18,654 | WARN | main | Encountered exception loading fsimage | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at
[jira] [Resolved] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed
[ https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7414. - Resolution: Duplicate Fix Version/s: (was: 2.7.0) Resolving as duplicate Namenode got shutdown and can't recover where edit update might be missed - Key: HDFS-7414 URL: https://issues.apache.org/jira/browse/HDFS-7414 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1, 2.6.0, 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Blocker Scenario: Was running mapreduce job. CPU usage crossed 190% for Datanode and machine became slow.. and seen the following exception .. *Did not get the exact root cause, but as cpu usage more edit log updation might be missed...Need dig to more...anyone have any thoughts.* {noformat} 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation CloseOp [length=0, inodeId=0, path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025, replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=162982] | org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459) 2014-11-20 05:01:18,654 | WARN | main | Encountered exception loading fsimage | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at
[jira] [Created] (HDFS-7703) Support favouredNodes for the append for new blocks
Vinayakumar B created HDFS-7703: --- Summary: Support favouredNodes for the append for new blocks Key: HDFS-7703 URL: https://issues.apache.org/jira/browse/HDFS-7703 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Currently favorNodes is supported for the new file creation, and these nodes are applicable for all blocks of the file. Same support should be available when file is opened for append. But, even though original file has not used favor nodes, favorNodes passed to append will be used only for new blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7690) Avoid Block movement in Balancer and Mover for the erasure encoded blocks
[ https://issues.apache.org/jira/browse/HDFS-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7690. - Resolution: Duplicate Thanks [~szetszwo] for the pointer. I missed it. Resolving as duplicate Avoid Block movement in Balancer and Mover for the erasure encoded blocks - Key: HDFS-7690 URL: https://issues.apache.org/jira/browse/HDFS-7690 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B As striped design says, its would be more fault tolerant if the striped blocks reside in different nodes of different racks. But Balancer and Mover may break this by moving the encoded blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7689) Add periodic checker to find the corrupted EC blocks/files
Vinayakumar B created HDFS-7689: --- Summary: Add periodic checker to find the corrupted EC blocks/files Key: HDFS-7689 URL: https://issues.apache.org/jira/browse/HDFS-7689 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Periodic checker similar to *ReplicationMonitor* to monitor the EC files/blocks for the corruption/missing and schedule for the recovery/correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7688) Client side api/config changes to support online encoding
Vinayakumar B created HDFS-7688: --- Summary: Client side api/config changes to support online encoding Key: HDFS-7688 URL: https://issues.apache.org/jira/browse/HDFS-7688 Project: Hadoop HDFS Issue Type: Sub-task Components: dfsclient Reporter: Vinayakumar B Assignee: Vinayakumar B This Jira targets to handle Client side API changes to support directly erasure encoding with striped blocks from the client side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7117) Not all datanodes are displayed on the namenode http tab
[ https://issues.apache.org/jira/browse/HDFS-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7117. - Resolution: Invalid Fix Version/s: (was: 2.4.0) This issue is not present in current trunk code. Resolving as invalid. Feel free to re-open if required. Not all datanodes are displayed on the namenode http tab Key: HDFS-7117 URL: https://issues.apache.org/jira/browse/HDFS-7117 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jean-Baptiste Onofré On a single machine, I have three fake nodes (each node use different dfs.datanode.address, dfs.datanode.ipc.address, dfs.datanode.http.address) - node1 starts the namenode and a datanode - node2 starts a datanode - node3 starts a datanode In the namenode http console, on the overview, I can see 3 live nodes: {code} http://localhost:50070/dfshealth.html#tab-overview {code} but, when clicking on the Live Nodes: {code} http://localhost:50070/dfshealth.html#tab-datanode {code} I can see only one node row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7583) Fix findbug in TransferFsImage.java
Vinayakumar B created HDFS-7583: --- Summary: Fix findbug in TransferFsImage.java Key: HDFS-7583 URL: https://issues.apache.org/jira/browse/HDFS-7583 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Fix following findbug resulting in recent jenkins runs {noformat}Exceptional return value of java.io.File.delete() ignored in org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details) In class org.apache.hadoop.hdfs.server.namenode.TransferFsImage In method org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) Called method java.io.File.delete() At TransferFsImage.java:[line 577]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7582) Limit the number of default ACL entries to Half of maximum entries (16)
Vinayakumar B created HDFS-7582: --- Summary: Limit the number of default ACL entries to Half of maximum entries (16) Key: HDFS-7582 URL: https://issues.apache.org/jira/browse/HDFS-7582 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Current ACL limits are only on the total number of entries. But there can be a situation where number of default entries for a directory will be more than half of the maximum entries, i.e. 16. In such case, under this parent directory only files can be created which will have ACLs inherited using parent's default entries. But when directories are created, total number of entries will be more than the maximum allowed, because sub-directories copies both inherited ACLs as well as default entries. and hence directory creation fails. So it would be better to restrict the default entries to 16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7560) ACLs removed by removeDefaultAcl() will be back after NameNode restart/failover
Vinayakumar B created HDFS-7560: --- Summary: ACLs removed by removeDefaultAcl() will be back after NameNode restart/failover Key: HDFS-7560 URL: https://issues.apache.org/jira/browse/HDFS-7560 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.1 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Default ACLs removed using {{removeDefaultAcl()}} will come back after Namenode restart/switch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7481) Add ACL indicator to the Permission Denied exception.
Vinayakumar B created HDFS-7481: --- Summary: Add ACL indicator to the Permission Denied exception. Key: HDFS-7481 URL: https://issues.apache.org/jira/browse/HDFS-7481 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor As mentioned in comment in HDFS-7454 add an ACL indicator similar to ls output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7456) De-duplicate AclFeature instances with same AclEntries do reduce memory footprint of NameNode
Vinayakumar B created HDFS-7456: --- Summary: De-duplicate AclFeature instances with same AclEntries do reduce memory footprint of NameNode Key: HDFS-7456 URL: https://issues.apache.org/jira/browse/HDFS-7456 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B As discussed in HDFS-7454 [here|https://issues.apache.org/jira/browse/HDFS-7454?focusedCommentId=14229454page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14229454], de-duplication of {{AclFeature}} helps in reducing the memory footprint of the namenode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7454) Implement Global ACL Set for memory optimization in NameNode
Vinayakumar B created HDFS-7454: --- Summary: Implement Global ACL Set for memory optimization in NameNode Key: HDFS-7454 URL: https://issues.apache.org/jira/browse/HDFS-7454 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B HDFS-5620 indicated a GlobalAclSet containing unique {{AclFeature}} can be de-duplicated to save the memory in NameNode. However it was not implemented at that time. This Jira re-proposes same implementation, along with de-duplication of unique {{AclEntry}} across all ACLs. One simple usecase is: A mapreduce user's home directory with the set of default ACLs, under which lot of other files/directories could be created when jobs is run. Here all the default ACLs of parent directory will be duplicated till the explicit delete of those ACLs. With de-duplication,only one object will be in memory for the same Entry across all ACLs of all files/directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7451) Namenode HA failover happens very frequently from active to standby
[ https://issues.apache.org/jira/browse/HDFS-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7451. - Resolution: Not a Problem Namenode HA failover happens very frequently from active to standby --- Key: HDFS-7451 URL: https://issues.apache.org/jira/browse/HDFS-7451 Project: Hadoop HDFS Issue Type: Bug Reporter: LAXMAN KUMAR SAHOO Assignee: LAXMAN KUMAR SAHOO We have two namenode having HA enabled. From last couple of days we are observing that the failover happens very frequently from active to standby mode. Below is the log details of the active namenode during failover happens. Is there any fix to get rid of this issue? Namenode logs: {code} 2014-11-25 22:24:02,020 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.hdfs.protocol.Clie ntProtocol.getListing from 10.2.16.214:40751: output error 2014-11-25 22:24:02,020 INFO org.apache.hadoop.ipc.Server: IPC Server handler 23 on 8020 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2195) at org.apache.hadoop.ipc.Server.access$2000(Server.java:110) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:979) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1045) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1798) 2014-11-25 22:24:10,631 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /sda/dfs/namenode/current/edits_inprogress_01643676954 - /sda/dfs/namenode/current/edits_01643676954-01643677390 2014-11-25 22:24:10,631 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Closing java.lang.Exception at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.close(IPCLoggerChannel.java:182) at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.close(AsyncLoggerSet.java:102) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.close(QuorumJournalManager.java:446) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.close(JournalSet.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:222) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:347) at org.apache.hadoop.hdfs.server.namenode.JournalSet.close(JournalSet.java:219) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.close(FSEditLog.java:308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.stopActiveServices(FSNamesystem.java:939) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.stopActiveServices(NameNode.java:1365) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.exitState(ActiveState.java:70) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.setState(ActiveState.java:52) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToStandby(NameNode.java:1278) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToStandby(NameNodeRpcServer.java:1046) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToStandby(HAServiceProtocolServerSideTranslatorPB.java:119) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3635) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) 2014-11-25 22:24:10,632 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state 2014-11-25 22:24:10,633 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node at dc1-had03-m002.dc01.revsci.net/10.2.16.92:8020 every 120 seconds. 2014-11-25 22:24:10,634 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread... Checkpointing active NN at
[jira] [Reopened] (HDFS-7451) Namenode HA failover happens very frequently from active to standby
[ https://issues.apache.org/jira/browse/HDFS-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B reopened HDFS-7451: - Namenode HA failover happens very frequently from active to standby --- Key: HDFS-7451 URL: https://issues.apache.org/jira/browse/HDFS-7451 Project: Hadoop HDFS Issue Type: Bug Reporter: LAXMAN KUMAR SAHOO Assignee: LAXMAN KUMAR SAHOO We have two namenode having HA enabled. From last couple of days we are observing that the failover happens very frequently from active to standby mode. Below is the log details of the active namenode during failover happens. Is there any fix to get rid of this issue? Namenode logs: {code} 2014-11-25 22:24:02,020 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.hdfs.protocol.Clie ntProtocol.getListing from 10.2.16.214:40751: output error 2014-11-25 22:24:02,020 INFO org.apache.hadoop.ipc.Server: IPC Server handler 23 on 8020 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2195) at org.apache.hadoop.ipc.Server.access$2000(Server.java:110) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:979) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1045) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1798) 2014-11-25 22:24:10,631 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /sda/dfs/namenode/current/edits_inprogress_01643676954 - /sda/dfs/namenode/current/edits_01643676954-01643677390 2014-11-25 22:24:10,631 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Closing java.lang.Exception at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.close(IPCLoggerChannel.java:182) at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.close(AsyncLoggerSet.java:102) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.close(QuorumJournalManager.java:446) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.close(JournalSet.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:222) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:347) at org.apache.hadoop.hdfs.server.namenode.JournalSet.close(JournalSet.java:219) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.close(FSEditLog.java:308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.stopActiveServices(FSNamesystem.java:939) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.stopActiveServices(NameNode.java:1365) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.exitState(ActiveState.java:70) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.setState(ActiveState.java:52) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToStandby(NameNode.java:1278) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToStandby(NameNodeRpcServer.java:1046) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToStandby(HAServiceProtocolServerSideTranslatorPB.java:119) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3635) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) 2014-11-25 22:24:10,632 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state 2014-11-25 22:24:10,633 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node at dc1-had03-m002.dc01.revsci.net/10.2.16.92:8020 every 120 seconds. 2014-11-25 22:24:10,634 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread... Checkpointing active NN at dc1-had03-m002.dc01.revsci.net:50070 Serving checkpoints at
[jira] [Created] (HDFS-7410) Support CreateFlags for append()
Vinayakumar B created HDFS-7410: --- Summary: Support CreateFlags for append() Key: HDFS-7410 URL: https://issues.apache.org/jira/browse/HDFS-7410 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Vinayakumar B Assignee: Vinayakumar B Current FileSystem APIs include CreateFlag only for the create() api, and some of these (SYNC_BLOCK) are for only client side and will not be stored in metadata of the file. So append() operation will not know about these flags. It would be Good to support these features for append too. Compatibility: One more overloaded append API needs to be added to support the flags keeping the current API as is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7384) 'getfacl' command and 'getAclStatus' output should be in sync
Vinayakumar B created HDFS-7384: --- Summary: 'getfacl' command and 'getAclStatus' output should be in sync Key: HDFS-7384 URL: https://issues.apache.org/jira/browse/HDFS-7384 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B *getfacl* command will print all the entries including basic and extended entries, mask entries and effective permissions. But, *getAclStatus* FileSystem API will return only extended ACL entries set by the user. But this will not include the mask entry as well as effective permissions. To benefit the client using API, better to include 'mask' entry and effective permissions in the return list of entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7349) Support DFS command for the EC encoding
Vinayakumar B created HDFS-7349: --- Summary: Support DFS command for the EC encoding Key: HDFS-7349 URL: https://issues.apache.org/jira/browse/HDFS-7349 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Support implementation of the following commands *hdfs dfs -convertToEC path* path: Converts all blocks under this path to EC form (if not already in EC form, and if can be coded). *hdfs dfs -convertToRep path* path: Converts all blocks under this path to be replicated form. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-6590) NullPointerException was generated in getBlockLocalPathInfo when datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B reopened HDFS-6590: - NullPointerException was generated in getBlockLocalPathInfo when datanode restarts -- Key: HDFS-6590 URL: https://issues.apache.org/jira/browse/HDFS-6590 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Guo Ruijing 2014-06-11 20:34:40.240119, p43949, th140725562181728, ERROR cannot setup block reader for Block: [block pool ID: BP-1901161041-172.28.1.251-1402542341112 block ID 1073741926_1102] on Datanode: sdw3(172.28.1.3). RpcHelper.h: 74: HdfsIOException: Unexpected exception: when unwrap the rpc remote exception java.lang.NullPointerException, java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1014) at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112) at org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:6373) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-6590) NullPointerException was generated in getBlockLocalPathInfo when datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-6590. - Resolution: Duplicate Closing as duplicate NullPointerException was generated in getBlockLocalPathInfo when datanode restarts -- Key: HDFS-6590 URL: https://issues.apache.org/jira/browse/HDFS-6590 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Guo Ruijing 2014-06-11 20:34:40.240119, p43949, th140725562181728, ERROR cannot setup block reader for Block: [block pool ID: BP-1901161041-172.28.1.251-1402542341112 block ID 1073741926_1102] on Datanode: sdw3(172.28.1.3). RpcHelper.h: 74: HdfsIOException: Unexpected exception: when unwrap the rpc remote exception java.lang.NullPointerException, java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1014) at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112) at org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:6373) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7098) Support polling for more data from a write-in-progress file using same DFSInputStream
Vinayakumar B created HDFS-7098: --- Summary: Support polling for more data from a write-in-progress file using same DFSInputStream Key: HDFS-7098 URL: https://issues.apache.org/jira/browse/HDFS-7098 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Currently DFSInputStream can read only till the bytes written which were written by the time of Opening it for reading. But, later there can be more bytes written to same file. If client expects more data to be available, then client can poll it continuously for more data without re-opening the stream. One simple example use case for this is tailing logs files. Currently LocalFileSystem supports this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7099) Support polling for more data from a write-in-progress file using same DFSInputStream
Vinayakumar B created HDFS-7099: --- Summary: Support polling for more data from a write-in-progress file using same DFSInputStream Key: HDFS-7099 URL: https://issues.apache.org/jira/browse/HDFS-7099 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Currently DFSInputStream can read only till the bytes written which were written by the time of Opening it for reading. But, later there can be more bytes written to same file. If client expects more data to be available, then client can poll it continuously for more data without re-opening the stream. One simple example use case for this is tailing logs files. Currently LocalFileSystem supports this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7098) Support polling for more data from a write-in-progress file using same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7098. - Resolution: Duplicate Duplicate created due to slow internet Support polling for more data from a write-in-progress file using same DFSInputStream - Key: HDFS-7098 URL: https://issues.apache.org/jira/browse/HDFS-7098 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Currently DFSInputStream can read only till the bytes written which were written by the time of Opening it for reading. But, later there can be more bytes written to same file. If client expects more data to be available, then client can poll it continuously for more data without re-opening the stream. One simple example use case for this is tailing logs files. Currently LocalFileSystem supports this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7099) Support polling for more data from a write-in-progress file using same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-7099. - Resolution: Duplicate Same discussion going on in HDFS-6633. So duplicating it. Support polling for more data from a write-in-progress file using same DFSInputStream - Key: HDFS-7099 URL: https://issues.apache.org/jira/browse/HDFS-7099 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Currently DFSInputStream can read only till the bytes written which were written by the time of Opening it for reading. But, later there can be more bytes written to same file. If client expects more data to be available, then client can poll it continuously for more data without re-opening the stream. One simple example use case for this is tailing logs files. Currently LocalFileSystem supports this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3586) Blocks are not getting replicate even DN's are availble.
[ https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-3586. - Resolution: Duplicate Resolving as duplicate Blocks are not getting replicate even DN's are availble. Key: HDFS-3586 URL: https://issues.apache.org/jira/browse/HDFS-3586 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: HDFS-3586-analysis.txt Scenario: = Started four DN's(Say DN1,DN2,DN3 and DN4) writing files with RF=3.. formed pipeline with DN1-DN2-DN3. Since DN3 network is very slow.it's not able to send acks. Again pipeline is fromed with DN1-DN2-DN4. Here DN4 network is also slow. So finally commitblocksync happend tp DN1 and DN2 successfully. block present in all the four DN's(finalized state in two DN's and rbw state in another DN's).. Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's are already present in RBW dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
Vinayakumar B created HDFS-6995: --- Summary: Block should be placed in the client's 'rack-local' node if 'client-local' node is not available Key: HDFS-6995 URL: https://issues.apache.org/jira/browse/HDFS-6995 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B HDFS cluster is rack aware. Client is in different node than of datanode, but Same rack contains one or more datanodes. In this case first preference should be given to select 'rack-local' node. Currently, since no Node in clusterMap corresponds to client's location, blockplacement policy choosing a *random* node as local node and proceeding for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-6752) Avoid Address bind errors in TestDatanodeConfig#testMemlockLimit
Vinayakumar B created HDFS-6752: --- Summary: Avoid Address bind errors in TestDatanodeConfig#testMemlockLimit Key: HDFS-6752 URL: https://issues.apache.org/jira/browse/HDFS-6752 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Above test failed due to Address Bind Exception. Set the HTTP port to ephemeral port in Configuration. {noformat}java.net.BindException: Port in use: 0.0.0.0:50075 at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:853) at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:794) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:473) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:856) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:379) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2046) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1933) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1980) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1970) at org.apache.hadoop.hdfs.TestDatanodeConfig.testMemlockLimit(TestDatanodeConfig.java:146){noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
Vinayakumar B created HDFS-6714: --- Summary: TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster Key: HDFS-6714 URL: https://issues.apache.org/jira/browse/HDFS-6714 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the cluster after test. This could lead to errors in windows while running non-forked tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM
[ https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-3752. - Resolution: Duplicate An option '-skipSharedEditsCheck' has been added to BootstrapStandby in HDFS-4120 to solve this. Resolving this as duplicate BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM --- Key: HDFS-3752 URL: https://issues.apache.org/jira/browse/HDFS-3752 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0-alpha Reporter: Vinayakumar B Assignee: Rakesh R Attachments: HDFS-3752-testcase.patch 1. do {{saveNameSpace}} in ANN node by entering into safemode 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY 3. Now StandBy NN will not able to copy the fsimage_txid from ANN This is because, SNN not able to find the next txid (txid+1) in shared storage. Just after {{saveNameSpace}} shared storage will have the new logsegment with only START_LOG_SEGEMENT edits op. and BookKeeper will not be able to read last entry from inprogress ledger. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6693) TestDFSAdminWithHA fails on windows
Vinayakumar B created HDFS-6693: --- Summary: TestDFSAdminWithHA fails on windows Key: HDFS-6693 URL: https://issues.apache.org/jira/browse/HDFS-6693 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B TestDFSAdminWithHA fails on windows due to multiple reasons. 1. Assertion fails due to using only ''\n in expected, where as in windows line separator is \r\n. 2. miniDFSCluster is not shutdown after each test. -- This message was sent by Atlassian JIRA (v6.2#6252)