[jira] [Created] (HDFS-17400) Expose metrics for inode ChildrenList size

2024-02-27 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-17400:
---

 Summary: Expose metrics for inode ChildrenList size
 Key: HDFS-17400
 URL: https://issues.apache.org/jira/browse/HDFS-17400
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfs
Affects Versions: 3.1.1
Reporter: Srinivasu Majeti


The very common scenario where customer jobs failed when writing into the "x" 
directory because the file limit on "x" reached the configured value controlled 
by dfs.namenode.fs-limits.max-directory-items.

Example:
The directory item limit of /tmp is exceeded: limit=1048576 items=1048576

I think we need to expose new metrics into "NameNodeMetrics" and add paths that 
exceed 90% of dfs.namenode.fs-limits.max-directory-items. However, higher costs 
when recomputing the path size and removing them from metrics on every delete.

So, Should we consider letting SNN handle this from updateCountForQuota? 
Anyways, updateCountForQuota often runs in SNN, so CM can query SNN and alert 
users when this path list is non-empty.

FSDirectory#verifyMaxDirItems.
{code:java}
 /**
   * Verify children size for fs limit.
   *
   * @throws MaxDirectoryItemsExceededException too many children.
   */
  void verifyMaxDirItems(INodeDirectory parent, String parentPath)
  throws MaxDirectoryItemsExceededException {
final int count = parent.getChildrenList(CURRENT_STATE_ID).size();
if (count >= maxDirItems) {
  final MaxDirectoryItemsExceededException e
  = new MaxDirectoryItemsExceededException(parentPath, maxDirItems,
  count);
  if (namesystem.isImageLoaded()) {
throw e;
  } else {
// Do not throw if edits log is still being processed
NameNode.LOG.error("FSDirectory.verifyMaxDirItems: "
+ e.getLocalizedMessage());
  }
}
  }
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17399) Ensure atomic transactions when snapshot manager is facing OS resource limit issues

2024-02-27 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-17399:
---

 Summary: Ensure atomic transactions when snapshot manager is 
facing OS resource limit issues
 Key: HDFS-17399
 URL: https://issues.apache.org/jira/browse/HDFS-17399
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.1.1
Reporter: Srinivasu Majeti


One of the customers is facing 'resource' issues ( max number of processes ) at 
least on one of the Namenodes.

{code:java} 
host02: > As a result, Snapshot creation failed on 14th: 2023-05-14 
10:41:28,233 WARN org.apache.hadoop.ipc.Server: IPC Server handler 22 on 8020, 
call Call#11 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.createSnapshot from 
xx.xxx.xx.xxx:59442 java.lang.OutOfMemoryError: unable to create native thread: 
possibly out of memory or process/resource limits reached at 
java.base/java.lang.Thread.start0(Native Method) at 
java.base/java.lang.Thread.start(Thread.java:803) at 
java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343)
 at 
java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:140)
 at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodeWithLeases(LeaseManager.java:246)
 at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.addSnapshot(DirectorySnapshottableFeature.java:211)
 at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addSnapshot(INodeDirectory.java:288)
 at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:463)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.createSnapshot(FSDirSnapshotOp.java:110)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.createSnapshot(FSNamesystem.java:6767)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createSnapshot(NameNodeRpcServer.java:1871)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.createSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1273)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNameno
 \{code} \{code:java} host02 log (NN log) 2023-05-14 10:42:49,983 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 
'http://host03.amd.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true,
 
http://host02.domain.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true'
 to transaction ID 1623400203 2023-05-14 10:42:49,983 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 
'http://host01.domain.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true'
 to transaction ID 1623400203 2023-05-14 10:42:50,011 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
on operation DeleteSnapshotOp [snapshotRoot=/user/user1, 
snapshotName=distcp-1546382661--205240459-new, 
RpcClientId=31353569-0e2e-4272-9acf-a6b71f51242c, RpcCallId=18] 
org.apache.hadoop.hdfs.protocol.SnapshotException: Cannot delete snapshot 
distcp-1546382661--205240459-new from path /user/user1: the snapshot does not 
exist. at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:260)
 at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:296)
 
{code} 
Then we identified the wrong records in the edit log and fixed them manually 
{code:java} 
The edit causing the problem is "edits_01623400203-01623402627" 
and contains 38626 lines when converted to XML format. Further investigation, 
we discovered that there are 602 transactions attempting to delete a snapshot 
"distcp-1546382661--205240459-new" which does not exist. OP_DELETE_SNAPSHOT 
1623401061 /user/user1 distcp-1546382661--205240459-new 
31353569-0e2e-4272-9acf-a6b71f51242c 1864 Each transaction consists of above 10 
lines, a total of 6020 lines that need to be removed from the original 38626 
lines. The no of lines after correction is 38626-6020=32606 . 
{code} 
Raising the ticket to discuss how to address this corner issue instead of 
manually correcting edit logs, for example, there should be a defensive 
mechanism in Hadoop but missing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17349) evictWriters command does not seem to work effectively

2024-01-22 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-17349:
---

 Summary: evictWriters command does not seem to work effectively
 Key: HDFS-17349
 URL: https://issues.apache.org/jira/browse/HDFS-17349
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Srinivasu Majeti


Post running {{evictWriters}} on Datanodes while decommissioning going on, 
noticed the below messages being logged. That means {{evictWriters is 
successfully issued to Datanode and it tried interrupting all xceivers.}}
{code:java}
2023-11-29 16:37:18,599 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Evicting all writers.
2023-11-29 16:37:18,600 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Stopped the writer: 
NioInetPeer(Socket[addr=/10.4.33.104,port=42982,localport=9866])
2023-11-29 16:37:18,600 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Stopped the writer: 
NioInetPeer(Socket[addr=/10.4.35.105,port=43300,localport=9866])
2023-11-29 16:37:18,600 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Stopped the writer: 
NioInetPeer(Socket[addr=/10.4.33.60,port=59978,localport=9866]){code}
Even after we see "Stopped the writer: NioInetPeer(Socket[addr=/10.4.35.105", 
we still see open files not released from 10.4.35.105, and decommission did not 
progress.
{code:java}
$ hdfs dfsadmin -listOpenFiles -blockingDecommission -path=/
Client Host Client Name Open File Path
10.4.35.105 DFSClient_NONMAPREDUCE_-211169064_96 
/warehouse/tablespace/managed/hive/sys.db/query_data/date=2023-11-28/hive_3162c2fd-cdd0-47f4-979c-d1c3263bfc86_1
10.4.35.149 DFSClient_NONMAPREDUCE_1084942995_59 
/warehouse/tablespace/managed/hive/sys.db/query_data/date=2023-11-28/hive_2360faef-7894-41d9-a13c-57d70593583e_1{code}

We may need to either report if evictWriters successfully executed or failed by 
checking for the actual status of xceivers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17336) Provide an option to enable/disable considering space used by .Trash folder for user quota compuation

2024-01-10 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-17336:
---

 Summary: Provide an option to enable/disable considering space 
used by .Trash folder for user quota compuation
 Key: HDFS-17336
 URL: https://issues.apache.org/jira/browse/HDFS-17336
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.1.4
Reporter: Srinivasu Majeti






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17323) Uncontrolled fsimage size due to snapshot diff meta for file deletions

2024-01-04 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-17323:
---

 Summary: Uncontrolled fsimage size due to snapshot diff meta for 
file deletions
 Key: HDFS-17323
 URL: https://issues.apache.org/jira/browse/HDFS-17323
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.1.1
Reporter: Srinivasu Majeti


We have seen quite a good number of customer cases w.r.t fsimage size increased 
drastically while storing snapshot meta for fileDiff entries. Here is an 
example fsimage meta storing entire inode info after deleting a file. I'm not 
sure about any restrictions on why the entire inode meta needs to be stored in 
fileDiff entry when there is no change w.r.t actual inode meta and it's just a 
delete file operation.

The fileDiffEntry for the inode 1860467 seems redundant for a simple file 
delete operation.
{code:java}
431860465DIRECTORYs31704197935903hdfs:supergroup:0755-1-1
441860465DIRECTORYs41704197951829hdfs:supergroup:0755-1-1

1860467FILEfile1317041979173151704197917031134217728hdfs:supergroup:06441074008442267653418

1860467file1043
186046721474836460

1860467143418file1317041979173151704197917031134217728hdfs:supergroup:06440

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16215) File read fails with CannotObtainBlockLengthException after Namenode is restarted

2021-09-07 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-16215:
---

 Summary: File read fails with CannotObtainBlockLengthException 
after Namenode is restarted
 Key: HDFS-16215
 URL: https://issues.apache.org/jira/browse/HDFS-16215
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.1, 3.2.2
Reporter: Srinivasu Majeti


When a file is being written by first client and fsck shows OPENFORWRITE and 
HDFS outage happens and brough back up , first client is disconnected and a new 
client tries to open the file we see "Cannot obtain block length for"  as shown 
below.

{code:java}
/tmp/hosts7 134217728 bytes, replicated: replication=3, 1 block(s), 
OPENFORWRITE:  OK
0. BP-1958960150-172.25.40.87-1628677864204:blk_1073745252_4430 len=134217728 
Live_repl=3  
[DatanodeInfoWithStorage[172.25.36.14:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK],
 
DatanodeInfoWithStorage[172.25.33.132:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK],
 
DatanodeInfoWithStorage[172.25.40.70:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK]]

Under Construction Block:
1. BP-1958960150-172.25.40.87-1628677864204:blk_1073745253_4431 len=0 
Expected_repl=3  
[DatanodeInfoWithStorage[172.25.36.14:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK],
 
DatanodeInfoWithStorage[172.25.33.132:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK],
 
DatanodeInfoWithStorage[172.25.40.70:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK]]

[root@c1265-node2 ~]# hdfs dfs -get /tmp/hosts7
get: Cannot obtain block length for 
LocatedBlock{BP-1958960150-172.25.40.87-1628677864204:blk_1073745253_4431; 
getBlockSize()=0; corrupt=false; offset=134217728; 
locs=[DatanodeInfoWithStorage[172.25.40.70:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK],
 
DatanodeInfoWithStorage[172.25.33.132:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK],
 
DatanodeInfoWithStorage[172.25.36.14:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK]]}

*Exception trace from the logs:*

Exception in thread "main" 
org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block 
length for 
LocatedBlock{BP-1958960150-172.25.40.87-1628677864204:blk_1073742720_1896; 
getBlockSize()=0; corrupt=false; offset=134217728; 
locs=[DatanodeInfoWithStorage[172.25.33.140:9866,DS-92e75140-d066-4ab5-b250-dbfd329289c5,DISK],
 
DatanodeInfoWithStorage[172.25.40.87:9866,DS-1e280bcd-a2ce-4320-9ebb-33fc903d3a47,DISK],
 
DatanodeInfoWithStorage[172.25.36.17:9866,DS-6357ab37-84ae-4c7c-8794-fef905bcde05,DISK]]}
at 
org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:363)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:201)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:185)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1006)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:312)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:324)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:949)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16148) Snapshots: An option to find how much space would be freed up on deletion of a snapshot

2021-07-30 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-16148:
---

 Summary: Snapshots: An option to find how much space would be 
freed up on deletion of a snapshot
 Key: HDFS-16148
 URL: https://issues.apache.org/jira/browse/HDFS-16148
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Srinivasu Majeti
Assignee: Shashikant Banerjee


We have been seeing lot of large clusters with lot of snapshots being around 
not cleaned up on time and accumalating fsimage and heap memory etc etc. When 
one wants to clean them up , there is no easy way today to know howmuch space 
would be claimed before deleting a snapshot. It would be very ideal and user 
friendly if there is a switch/option that could be introduced for du/count 
commands under hdfs that gives clear picture of howmuch DFS space would be 
claimed after deleting the snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15916:
---

 Summary: Backward compatibility - Distcp fails from Hadoop 3 to 
Hadoop 2 for snapshotdiff
 Key: HDFS-15916
 URL: https://issues.apache.org/jira/browse/HDFS-15916
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Srinivasu Majeti


Looks like when using distcp diff options between two snapshots from a hadoop 3 
cluster to hadoop 2 cluster , we get below exception and seems to be break 
backward compatibility due to new API introduction getSnapshotDiffReportListing.

hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
 Unknown method getSnapshotDiffReportListing called on 
org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15770) fsck to support printing snapshot name for missing blocks from snapshot only files

2021-01-11 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15770:
---

 Summary: fsck to support printing snapshot name for missing blocks 
from snapshot only files
 Key: HDFS-15770
 URL: https://issues.apache.org/jira/browse/HDFS-15770
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.3.0, 3.2.0
Reporter: Srinivasu Majeti


Today when there are blockids belonging to an older snapshot are different that 
that of its corresponding live file and block from snapshot file is missing , 
FSCK reports a blockid missing against the live file but no clue if that block 
id belongs to the live file or specific snapshot file path alone  .[ in the 
case where file in snapshot and live file system are different with some 
overwrite option ]. It would be nice to show a flag in the fsck output that the 
blockid is missing on live file or snapshot file when using -includesnapshots 
option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15729) Show progress of Balancer in Namenode UI

2020-12-12 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15729:
---

 Summary: Show progress of Balancer in Namenode UI
 Key: HDFS-15729
 URL: https://issues.apache.org/jira/browse/HDFS-15729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Affects Versions: 3.1.4
Reporter: Srinivasu Majeti


It would be nice to have a tracking of Balancer process in the Namenode UI to 
show if something is running and what is the progress to show current status . 
This would be similar to Namenode startup progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-06-29 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15446:
---

 Summary: CreateSnapshotOp fails during edit log loading for 
/.reserved/raw/path with error java.io.FileNotFoundException: Directory does 
not exist: /.reserved/raw/path 
 Key: HDFS-15446
 URL: https://issues.apache.org/jira/browse/HDFS-15446
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.2.0, 3.3.0
Reporter: Srinivasu Majeti
Assignee: Stephen O'Donnell


After allowing snapshot creation for a path say /app-logs , when we try to 
create snapshot on 
 /.reserved/raw/app-logs , its successful with snapshot creation but later when 
Standby Namenode is restarted and tries to load the edit record 
OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with an 
exception "ava.io.FileNotFoundException: Directory does not exist: 
/.reserved/raw/app-logs" .

Here are the steps to reproduce :
{code:java}
# hdfs dfs -ls /.reserved/raw/
Found 15 items
drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 /.reserved/raw/app-logs
drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
++
[root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
Allowing snapshot on /app-logs succeeded
[root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
Allowing snapshot on /prod succeeded
++
# hdfs lsSnapshottableDir
drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
++
[root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
{code}
Exception we see in Standby namenode while loading the snapshot creation edit 
record.
{code:java}
2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
Failed to start namenode.
java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/app-logs
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15370) listStatus and getFileStatus behave inconsistent in the case of ViewFs implementation

2020-05-21 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15370:
---

 Summary: listStatus and getFileStatus behave inconsistent in the 
case of ViewFs implementation
 Key: HDFS-15370
 URL: https://issues.apache.org/jira/browse/HDFS-15370
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.0, 3.0.0
Reporter: Srinivasu Majeti


listStatus implementation in ViewFs and getFileStatus does not return 
consistent values for an element.
{code:java}
[hdfs@c3121-node2 ~]$ /usr/jdk64/jdk1.8.0_112/bin/java -cp `hadoop 
classpath`:./hdfs-append-1.0-SNAPSHOT.jar LauncherGetFileStatus "/"
FileStatus of viewfs://c3121/testme21may isDirectory:false
FileStatus of viewfs://c3121/tmp isDirectory:false
FileStatus of viewfs://c3121/foo isDirectory:false
FileStatus of viewfs://c3121/tmp21may isDirectory:false
FileStatus of viewfs://c3121/testme isDirectory:false
FileStatus of viewfs://c3121/testme2 isDirectory:false <--- returns false
FileStatus of / isDirectory:true
[hdfs@c3121-node2 ~]$ /usr/jdk64/jdk1.8.0_112/bin/java -cp `hadoop 
classpath`:./hdfs-append-1.0-SNAPSHOT.jar LauncherGetFileStatus /testme2
FileStatus of viewfs://c3121/testme2/dist-copynativelibs.sh isDirectory:false
FileStatus of viewfs://c3121/testme2/newfolder isDirectory:true
FileStatus of /testme2 isDirectory:true <--- returns true
[hdfs@c3121-node2 ~]$ {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15142) Support for -D = in Ozone CLI

2020-01-23 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15142:
---

 Summary: Support for -D = in Ozone CLI
 Key: HDFS-15142
 URL: https://issues.apache.org/jira/browse/HDFS-15142
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ozone
Reporter: Srinivasu Majeti


Support for -D = in Ozone CLI similar to HDFS to override any 
server-side config 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15141) Support for getFileChecksum

2020-01-22 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15141:
---

 Summary: Support for getFileChecksum
 Key: HDFS-15141
 URL: https://issues.apache.org/jira/browse/HDFS-15141
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ozone
Reporter: Srinivasu Majeti


Support for getFileChecksum() and any other way to help distcp to avoid the 
copy of duplicate files even when the length is the same that of remote storage 
(cloud copy to s3). Checksum calculations of local ozone files should be better 
similar to whatever s3 is already doing/returning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14859) Prevent Un-necessary evaluation of costly operation getNumLiveDataNodes when dfs.namenode.safemode.min.datanodes is not zero

2019-09-20 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-14859:
---

 Summary: Prevent Un-necessary evaluation of costly operation 
getNumLiveDataNodes when dfs.namenode.safemode.min.datanodes is not zero
 Key: HDFS-14859
 URL: https://issues.apache.org/jira/browse/HDFS-14859
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.0, 3.3.0, 3.1.4
Reporter: Srinivasu Majeti


There have been improvements like HDFS-14171 and HDFS-14632 to the performance 
issue caused from getNumLiveDataNodes calls per block. The improvement has been 
only done w.r.t dfs.namenode.safemode.min.datanodes paramter being set to 0 or 
not.
   private boolean areThresholdsMet() {
 assert namesystem.hasWriteLock();
-int datanodeNum = blockManager.getDatanodeManager().getNumLiveDataNodes();
+// Calculating the number of live datanodes is time-consuming
+// in large clusters. Skip it when datanodeThreshold is zero.
+int datanodeNum = 0;
+if (datanodeThreshold > 0) {
+  datanodeNum = blockManager.getDatanodeManager().getNumLiveDataNodes();
+}
 synchronized (this) {
   return blockSafe >= blockThreshold && datanodeNum >= datanodeThreshold;
 }
 
I feel above logic would create similar situation of un-necessary evaluations 
of getNumLiveDataNodes when dfs.namenode.safemode.min.datanodes paramter is set 
> 0 even though "blockSafe >= blockThreshold" is false for most of the time in 
NN startup safe mode. We could do something like below to avoid this

private boolean areThresholdsMet() {
assert namesystem.hasWriteLock();
synchronized (this) {
  return blockSafe >= blockThreshold && (datanodeThreshold > 0)?
  blockManager.getDatanodeManager().getNumLiveDataNodes() >= 
datanodeThreshold : true;
}
  } 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14605) Note missing on expunge command description for encrypted zones

2019-06-25 Thread Srinivasu Majeti (JIRA)
Srinivasu Majeti created HDFS-14605:
---

 Summary: Note missing on expunge command description for encrypted 
zones
 Key: HDFS-14605
 URL: https://issues.apache.org/jira/browse/HDFS-14605
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.0, 3.0.0, 2.7.5, 2.7.3
Reporter: Srinivasu Majeti
 Fix For: 3.1.0, 3.0.0, 2.7.5, 2.7.3


expunge command is supported for both encrypted and non-encrypted hdfs paths . 
This operation initially needs to discover/list all such paths. 
Listing/Discovering encrypted zone paths is only supported by superuser and 
expunge command misleads us by printing below message though its a warning . We 
could add some message in the expunge command description saying that the 
command supports encrypted zone paths only when run as superuser and it will 
continue listing and performing the operation for all non encrypted hdfs paths.

19/06/25 08:30:13 WARN hdfs.DFSClient: Cannot get all encrypted trash roots
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Access denied for user ambari-qa. Superuser privilege is required
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:130)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14323) Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path

2019-02-28 Thread Srinivasu Majeti (JIRA)
Srinivasu Majeti created HDFS-14323:
---

 Summary: Distcp fails in Hadoop 3.x when 2.x source webhdfs url 
has special characters in hdfs file path
 Key: HDFS-14323
 URL: https://issues.apache.org/jira/browse/HDFS-14323
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.2.0
Reporter: Srinivasu Majeti


There was an enhancement to allow semicolon in source/target URLs for distcp 
use case as part of HDFS-13176 and backward compatibility fix as part of 
HDFS-13582 . Still there seems to be an issue when trying to trigger distcp 
from 3.x cluster to pull webhdfs data from 2.x hadoop cluster. We might need to 
deal with existing fix as described below by making sure if url is already 
encoded or not. That fixes it. 

diff --git 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
index 5936603c34a..dc790286aff 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
@@ -609,7 +609,10 @@ URL toUrl(final HttpOpParam.Op op, final Path fspath,
 boolean pathAlreadyEncoded = false;
 try {
 fspathUriDecoded = URLDecoder.decode(fspathUri.getPath(), "UTF-8");
- pathAlreadyEncoded = true;
+ if(!fspathUri.getPath().equals(fspathUriDecoded))
+ {
+ pathAlreadyEncoded = true;
+ }
 } catch (IllegalArgumentException ex) {
 LOG.trace("Cannot decode URL encoded file", ex);
 }



 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org