[jira] [Created] (HDFS-17400) Expose metrics for inode ChildrenList size
Srinivasu Majeti created HDFS-17400: --- Summary: Expose metrics for inode ChildrenList size Key: HDFS-17400 URL: https://issues.apache.org/jira/browse/HDFS-17400 Project: Hadoop HDFS Issue Type: Improvement Components: dfs Affects Versions: 3.1.1 Reporter: Srinivasu Majeti The very common scenario where customer jobs failed when writing into the "x" directory because the file limit on "x" reached the configured value controlled by dfs.namenode.fs-limits.max-directory-items. Example: The directory item limit of /tmp is exceeded: limit=1048576 items=1048576 I think we need to expose new metrics into "NameNodeMetrics" and add paths that exceed 90% of dfs.namenode.fs-limits.max-directory-items. However, higher costs when recomputing the path size and removing them from metrics on every delete. So, Should we consider letting SNN handle this from updateCountForQuota? Anyways, updateCountForQuota often runs in SNN, so CM can query SNN and alert users when this path list is non-empty. FSDirectory#verifyMaxDirItems. {code:java} /** * Verify children size for fs limit. * * @throws MaxDirectoryItemsExceededException too many children. */ void verifyMaxDirItems(INodeDirectory parent, String parentPath) throws MaxDirectoryItemsExceededException { final int count = parent.getChildrenList(CURRENT_STATE_ID).size(); if (count >= maxDirItems) { final MaxDirectoryItemsExceededException e = new MaxDirectoryItemsExceededException(parentPath, maxDirItems, count); if (namesystem.isImageLoaded()) { throw e; } else { // Do not throw if edits log is still being processed NameNode.LOG.error("FSDirectory.verifyMaxDirItems: " + e.getLocalizedMessage()); } } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17399) Ensure atomic transactions when snapshot manager is facing OS resource limit issues
Srinivasu Majeti created HDFS-17399: --- Summary: Ensure atomic transactions when snapshot manager is facing OS resource limit issues Key: HDFS-17399 URL: https://issues.apache.org/jira/browse/HDFS-17399 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.1.1 Reporter: Srinivasu Majeti One of the customers is facing 'resource' issues ( max number of processes ) at least on one of the Namenodes. {code:java} host02: > As a result, Snapshot creation failed on 14th: 2023-05-14 10:41:28,233 WARN org.apache.hadoop.ipc.Server: IPC Server handler 22 on 8020, call Call#11 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.createSnapshot from xx.xxx.xx.xxx:59442 java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Native Method) at java.base/java.lang.Thread.start(Thread.java:803) at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:140) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodeWithLeases(LeaseManager.java:246) at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.addSnapshot(DirectorySnapshottableFeature.java:211) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addSnapshot(INodeDirectory.java:288) at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:463) at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.createSnapshot(FSDirSnapshotOp.java:110) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.createSnapshot(FSNamesystem.java:6767) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createSnapshot(NameNodeRpcServer.java:1871) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.createSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1273) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNameno \{code} \{code:java} host02 log (NN log) 2023-05-14 10:42:49,983 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream 'http://host03.amd.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true, http://host02.domain.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true' to transaction ID 1623400203 2023-05-14 10:42:49,983 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream 'http://host01.domain.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true' to transaction ID 1623400203 2023-05-14 10:42:50,011 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation DeleteSnapshotOp [snapshotRoot=/user/user1, snapshotName=distcp-1546382661--205240459-new, RpcClientId=31353569-0e2e-4272-9acf-a6b71f51242c, RpcCallId=18] org.apache.hadoop.hdfs.protocol.SnapshotException: Cannot delete snapshot distcp-1546382661--205240459-new from path /user/user1: the snapshot does not exist. at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:260) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:296) {code} Then we identified the wrong records in the edit log and fixed them manually {code:java} The edit causing the problem is "edits_01623400203-01623402627" and contains 38626 lines when converted to XML format. Further investigation, we discovered that there are 602 transactions attempting to delete a snapshot "distcp-1546382661--205240459-new" which does not exist. OP_DELETE_SNAPSHOT 1623401061 /user/user1 distcp-1546382661--205240459-new 31353569-0e2e-4272-9acf-a6b71f51242c 1864 Each transaction consists of above 10 lines, a total of 6020 lines that need to be removed from the original 38626 lines. The no of lines after correction is 38626-6020=32606 . {code} Raising the ticket to discuss how to address this corner issue instead of manually correcting edit logs, for example, there should be a defensive mechanism in Hadoop but missing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17349) evictWriters command does not seem to work effectively
Srinivasu Majeti created HDFS-17349: --- Summary: evictWriters command does not seem to work effectively Key: HDFS-17349 URL: https://issues.apache.org/jira/browse/HDFS-17349 Project: Hadoop HDFS Issue Type: Bug Reporter: Srinivasu Majeti Post running {{evictWriters}} on Datanodes while decommissioning going on, noticed the below messages being logged. That means {{evictWriters is successfully issued to Datanode and it tried interrupting all xceivers.}} {code:java} 2023-11-29 16:37:18,599 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Evicting all writers. 2023-11-29 16:37:18,600 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Stopped the writer: NioInetPeer(Socket[addr=/10.4.33.104,port=42982,localport=9866]) 2023-11-29 16:37:18,600 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Stopped the writer: NioInetPeer(Socket[addr=/10.4.35.105,port=43300,localport=9866]) 2023-11-29 16:37:18,600 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Stopped the writer: NioInetPeer(Socket[addr=/10.4.33.60,port=59978,localport=9866]){code} Even after we see "Stopped the writer: NioInetPeer(Socket[addr=/10.4.35.105", we still see open files not released from 10.4.35.105, and decommission did not progress. {code:java} $ hdfs dfsadmin -listOpenFiles -blockingDecommission -path=/ Client Host Client Name Open File Path 10.4.35.105 DFSClient_NONMAPREDUCE_-211169064_96 /warehouse/tablespace/managed/hive/sys.db/query_data/date=2023-11-28/hive_3162c2fd-cdd0-47f4-979c-d1c3263bfc86_1 10.4.35.149 DFSClient_NONMAPREDUCE_1084942995_59 /warehouse/tablespace/managed/hive/sys.db/query_data/date=2023-11-28/hive_2360faef-7894-41d9-a13c-57d70593583e_1{code} We may need to either report if evictWriters successfully executed or failed by checking for the actual status of xceivers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17336) Provide an option to enable/disable considering space used by .Trash folder for user quota compuation
Srinivasu Majeti created HDFS-17336: --- Summary: Provide an option to enable/disable considering space used by .Trash folder for user quota compuation Key: HDFS-17336 URL: https://issues.apache.org/jira/browse/HDFS-17336 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.1.4 Reporter: Srinivasu Majeti -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17323) Uncontrolled fsimage size due to snapshot diff meta for file deletions
Srinivasu Majeti created HDFS-17323: --- Summary: Uncontrolled fsimage size due to snapshot diff meta for file deletions Key: HDFS-17323 URL: https://issues.apache.org/jira/browse/HDFS-17323 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.1.1 Reporter: Srinivasu Majeti We have seen quite a good number of customer cases w.r.t fsimage size increased drastically while storing snapshot meta for fileDiff entries. Here is an example fsimage meta storing entire inode info after deleting a file. I'm not sure about any restrictions on why the entire inode meta needs to be stored in fileDiff entry when there is no change w.r.t actual inode meta and it's just a delete file operation. The fileDiffEntry for the inode 1860467 seems redundant for a simple file delete operation. {code:java} 431860465DIRECTORYs31704197935903hdfs:supergroup:0755-1-1 441860465DIRECTORYs41704197951829hdfs:supergroup:0755-1-1 1860467FILEfile1317041979173151704197917031134217728hdfs:supergroup:06441074008442267653418 1860467file1043 186046721474836460 1860467143418file1317041979173151704197917031134217728hdfs:supergroup:06440 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16215) File read fails with CannotObtainBlockLengthException after Namenode is restarted
Srinivasu Majeti created HDFS-16215: --- Summary: File read fails with CannotObtainBlockLengthException after Namenode is restarted Key: HDFS-16215 URL: https://issues.apache.org/jira/browse/HDFS-16215 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.3.1, 3.2.2 Reporter: Srinivasu Majeti When a file is being written by first client and fsck shows OPENFORWRITE and HDFS outage happens and brough back up , first client is disconnected and a new client tries to open the file we see "Cannot obtain block length for" as shown below. {code:java} /tmp/hosts7 134217728 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE: OK 0. BP-1958960150-172.25.40.87-1628677864204:blk_1073745252_4430 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[172.25.36.14:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK], DatanodeInfoWithStorage[172.25.33.132:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK], DatanodeInfoWithStorage[172.25.40.70:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK]] Under Construction Block: 1. BP-1958960150-172.25.40.87-1628677864204:blk_1073745253_4431 len=0 Expected_repl=3 [DatanodeInfoWithStorage[172.25.36.14:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK], DatanodeInfoWithStorage[172.25.33.132:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK], DatanodeInfoWithStorage[172.25.40.70:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK]] [root@c1265-node2 ~]# hdfs dfs -get /tmp/hosts7 get: Cannot obtain block length for LocatedBlock{BP-1958960150-172.25.40.87-1628677864204:blk_1073745253_4431; getBlockSize()=0; corrupt=false; offset=134217728; locs=[DatanodeInfoWithStorage[172.25.40.70:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK], DatanodeInfoWithStorage[172.25.33.132:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK], DatanodeInfoWithStorage[172.25.36.14:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK]]} *Exception trace from the logs:* Exception in thread "main" org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-1958960150-172.25.40.87-1628677864204:blk_1073742720_1896; getBlockSize()=0; corrupt=false; offset=134217728; locs=[DatanodeInfoWithStorage[172.25.33.140:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK], DatanodeInfoWithStorage[172.25.40.87:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK], DatanodeInfoWithStorage[172.25.36.17:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK]]} at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:363) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:201) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:185) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1006) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:312) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:324) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:949) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16148) Snapshots: An option to find how much space would be freed up on deletion of a snapshot
Srinivasu Majeti created HDFS-16148: --- Summary: Snapshots: An option to find how much space would be freed up on deletion of a snapshot Key: HDFS-16148 URL: https://issues.apache.org/jira/browse/HDFS-16148 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Srinivasu Majeti Assignee: Shashikant Banerjee We have been seeing lot of large clusters with lot of snapshots being around not cleaned up on time and accumalating fsimage and heap memory etc etc. When one wants to clean them up , there is no easy way today to know howmuch space would be claimed before deleting a snapshot. It would be very ideal and user friendly if there is a switch/option that could be introduced for du/count commands under hdfs that gives clear picture of howmuch DFS space would be claimed after deleting the snapshot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff
Srinivasu Majeti created HDFS-15916: --- Summary: Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff Key: HDFS-15916 URL: https://issues.apache.org/jira/browse/HDFS-15916 Project: Hadoop HDFS Issue Type: Bug Reporter: Srinivasu Majeti Looks like when using distcp diff options between two snapshots from a hadoop 3 cluster to hadoop 2 cluster , we get below exception and seems to be break backward compatibility due to new API introduction getSnapshotDiffReportListing. hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException): Unknown method getSnapshotDiffReportListing called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15770) fsck to support printing snapshot name for missing blocks from snapshot only files
Srinivasu Majeti created HDFS-15770: --- Summary: fsck to support printing snapshot name for missing blocks from snapshot only files Key: HDFS-15770 URL: https://issues.apache.org/jira/browse/HDFS-15770 Project: Hadoop HDFS Issue Type: Improvement Components: fs Affects Versions: 3.3.0, 3.2.0 Reporter: Srinivasu Majeti Today when there are blockids belonging to an older snapshot are different that that of its corresponding live file and block from snapshot file is missing , FSCK reports a blockid missing against the live file but no clue if that block id belongs to the live file or specific snapshot file path alone .[ in the case where file in snapshot and live file system are different with some overwrite option ]. It would be nice to show a flag in the fsck output that the blockid is missing on live file or snapshot file when using -includesnapshots option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15729) Show progress of Balancer in Namenode UI
Srinivasu Majeti created HDFS-15729: --- Summary: Show progress of Balancer in Namenode UI Key: HDFS-15729 URL: https://issues.apache.org/jira/browse/HDFS-15729 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 3.1.4 Reporter: Srinivasu Majeti It would be nice to have a tracking of Balancer process in the Namenode UI to show if something is running and what is the progress to show current status . This would be similar to Namenode startup progress. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path
Srinivasu Majeti created HDFS-15446: --- Summary: CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path Key: HDFS-15446 URL: https://issues.apache.org/jira/browse/HDFS-15446 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.2.0, 3.3.0 Reporter: Srinivasu Majeti Assignee: Stephen O'Donnell After allowing snapshot creation for a path say /app-logs , when we try to create snapshot on /.reserved/raw/app-logs , its successful with snapshot creation but later when Standby Namenode is restarted and tries to load the edit record OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with an exception "ava.io.FileNotFoundException: Directory does not exist: /.reserved/raw/app-logs" . Here are the steps to reproduce : {code:java} # hdfs dfs -ls /.reserved/raw/ Found 15 items drwxrwxrwt - yarn hadoop 0 2020-06-29 10:27 /.reserved/raw/app-logs drwxr-xr-x - hive hadoop 0 2020-06-29 10:29 /.reserved/raw/prod ++ [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs Allowing snapshot on /app-logs succeeded [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod Allowing snapshot on /prod succeeded ++ # hdfs lsSnapshottableDir drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod ++ [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS Created snapshot /.reserved/raw/app-logs/.snapshot/testSS {code} Exception we see in Standby namenode while loading the snapshot creation edit record. {code:java} 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - Failed to start namenode. java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/app-logs at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60) at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259) at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15370) listStatus and getFileStatus behave inconsistent in the case of ViewFs implementation
Srinivasu Majeti created HDFS-15370: --- Summary: listStatus and getFileStatus behave inconsistent in the case of ViewFs implementation Key: HDFS-15370 URL: https://issues.apache.org/jira/browse/HDFS-15370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.1.0, 3.0.0 Reporter: Srinivasu Majeti listStatus implementation in ViewFs and getFileStatus does not return consistent values for an element. {code:java} [hdfs@c3121-node2 ~]$ /usr/jdk64/jdk1.8.0_112/bin/java -cp `hadoop classpath`:./hdfs-append-1.0-SNAPSHOT.jar LauncherGetFileStatus "/" FileStatus of viewfs://c3121/testme21may isDirectory:false FileStatus of viewfs://c3121/tmp isDirectory:false FileStatus of viewfs://c3121/foo isDirectory:false FileStatus of viewfs://c3121/tmp21may isDirectory:false FileStatus of viewfs://c3121/testme isDirectory:false FileStatus of viewfs://c3121/testme2 isDirectory:false <--- returns false FileStatus of / isDirectory:true [hdfs@c3121-node2 ~]$ /usr/jdk64/jdk1.8.0_112/bin/java -cp `hadoop classpath`:./hdfs-append-1.0-SNAPSHOT.jar LauncherGetFileStatus /testme2 FileStatus of viewfs://c3121/testme2/dist-copynativelibs.sh isDirectory:false FileStatus of viewfs://c3121/testme2/newfolder isDirectory:true FileStatus of /testme2 isDirectory:true <--- returns true [hdfs@c3121-node2 ~]$ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15142) Support for -D = in Ozone CLI
Srinivasu Majeti created HDFS-15142: --- Summary: Support for -D = in Ozone CLI Key: HDFS-15142 URL: https://issues.apache.org/jira/browse/HDFS-15142 Project: Hadoop HDFS Issue Type: New Feature Components: ozone Reporter: Srinivasu Majeti Support for -D = in Ozone CLI similar to HDFS to override any server-side config -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15141) Support for getFileChecksum
Srinivasu Majeti created HDFS-15141: --- Summary: Support for getFileChecksum Key: HDFS-15141 URL: https://issues.apache.org/jira/browse/HDFS-15141 Project: Hadoop HDFS Issue Type: New Feature Components: ozone Reporter: Srinivasu Majeti Support for getFileChecksum() and any other way to help distcp to avoid the copy of duplicate files even when the length is the same that of remote storage (cloud copy to s3). Checksum calculations of local ozone files should be better similar to whatever s3 is already doing/returning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14859) Prevent Un-necessary evaluation of costly operation getNumLiveDataNodes when dfs.namenode.safemode.min.datanodes is not zero
Srinivasu Majeti created HDFS-14859: --- Summary: Prevent Un-necessary evaluation of costly operation getNumLiveDataNodes when dfs.namenode.safemode.min.datanodes is not zero Key: HDFS-14859 URL: https://issues.apache.org/jira/browse/HDFS-14859 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.1.0, 3.3.0, 3.1.4 Reporter: Srinivasu Majeti There have been improvements like HDFS-14171 and HDFS-14632 to the performance issue caused from getNumLiveDataNodes calls per block. The improvement has been only done w.r.t dfs.namenode.safemode.min.datanodes paramter being set to 0 or not. private boolean areThresholdsMet() { assert namesystem.hasWriteLock(); -int datanodeNum = blockManager.getDatanodeManager().getNumLiveDataNodes(); +// Calculating the number of live datanodes is time-consuming +// in large clusters. Skip it when datanodeThreshold is zero. +int datanodeNum = 0; +if (datanodeThreshold > 0) { + datanodeNum = blockManager.getDatanodeManager().getNumLiveDataNodes(); +} synchronized (this) { return blockSafe >= blockThreshold && datanodeNum >= datanodeThreshold; } I feel above logic would create similar situation of un-necessary evaluations of getNumLiveDataNodes when dfs.namenode.safemode.min.datanodes paramter is set > 0 even though "blockSafe >= blockThreshold" is false for most of the time in NN startup safe mode. We could do something like below to avoid this private boolean areThresholdsMet() { assert namesystem.hasWriteLock(); synchronized (this) { return blockSafe >= blockThreshold && (datanodeThreshold > 0)? blockManager.getDatanodeManager().getNumLiveDataNodes() >= datanodeThreshold : true; } } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14605) Note missing on expunge command description for encrypted zones
Srinivasu Majeti created HDFS-14605: --- Summary: Note missing on expunge command description for encrypted zones Key: HDFS-14605 URL: https://issues.apache.org/jira/browse/HDFS-14605 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.1.0, 3.0.0, 2.7.5, 2.7.3 Reporter: Srinivasu Majeti Fix For: 3.1.0, 3.0.0, 2.7.5, 2.7.3 expunge command is supported for both encrypted and non-encrypted hdfs paths . This operation initially needs to discover/list all such paths. Listing/Discovering encrypted zone paths is only supported by superuser and expunge command misleads us by printing below message though its a warning . We could add some message in the expunge command description saying that the command supports encrypted zone paths only when run as superuser and it will continue listing and performing the operation for all non encrypted hdfs paths. 19/06/25 08:30:13 WARN hdfs.DFSClient: Cannot get all encrypted trash roots org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Access denied for user ambari-qa. Superuser privilege is required at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:130) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14323) Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path
Srinivasu Majeti created HDFS-14323: --- Summary: Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path Key: HDFS-14323 URL: https://issues.apache.org/jira/browse/HDFS-14323 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.2.0 Reporter: Srinivasu Majeti There was an enhancement to allow semicolon in source/target URLs for distcp use case as part of HDFS-13176 and backward compatibility fix as part of HDFS-13582 . Still there seems to be an issue when trying to trigger distcp from 3.x cluster to pull webhdfs data from 2.x hadoop cluster. We might need to deal with existing fix as described below by making sure if url is already encoded or not. That fixes it. diff --git a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java index 5936603c34a..dc790286aff 100644 --- a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java +++ b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java @@ -609,7 +609,10 @@ URL toUrl(final HttpOpParam.Op op, final Path fspath, boolean pathAlreadyEncoded = false; try { fspathUriDecoded = URLDecoder.decode(fspathUri.getPath(), "UTF-8"); - pathAlreadyEncoded = true; + if(!fspathUri.getPath().equals(fspathUriDecoded)) + { + pathAlreadyEncoded = true; + } } catch (IllegalArgumentException ex) { LOG.trace("Cannot decode URL encoded file", ex); } -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org