[jira] [Commented] (HDFS-16019) HDFS: Inode CheckPoint

2021-05-10 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341836#comment-17341836
 ] 

Wei-Chiu Chuang commented on HDFS-16019:


Sounds very similar to what NameNode Analytics provides? [~zero45]  HDFS-15763

> HDFS: Inode CheckPoint 
> ---
>
> Key: HDFS-16019
> URL: https://issues.apache.org/jira/browse/HDFS-16019
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>
> *background*
> The OIV IMAGE analysis tool has brought us many benefits, such as file size 
> distribution, cold and hot data, abnormal growth directory analysis. But in 
> my opinion he is too slow, especially the big IMAGE.
> After Hadoop 2.3, the format of IMAGE has changed. For OIV tools, it is 
> necessary to load the entire IMAGE into the memory to output the inode 
> information into a text format. For large IMAGE, this process takes a long 
> time and consumes more resources and requires a large memory machine to 
> analyze.
> Although, HDFS provides the dfs.namenode.legacy-oiv-image.dir parameter to 
> get the old version of IMAGE through CheckPoint. The old IMAGE parsing does 
> not require too many resources, but we need to parse the IMAGE again through 
> the hdfs oiv_legacy command to get the text information of the Inode, which 
> is relatively time-consuming.
> **
> *Solution*
> We can ask the standby node to periodically check the Inode and serialize the 
> Inode in text mode. For OutPut, different FileSystems can be used according 
> to the configuration, such as the local file system or the HDFS file system.
> The advantage of providing HDFS file system is that we can analyze Inode 
> directly through spark/hive. I think the block information corresponding to 
> the Inode may not be of much use. The size of the file and the number of 
> copies are more useful to us.
> In addition, the sequential output of the Inode is not necessary. We can 
> speed up the CheckPoint for the Inode, and use the partition for the 
> serialized Inode to output different files. Use a production thread to put 
> Inode in the Queue, and use multi-threaded consumption Queue to write to 
> different partition files. For output files, compression can also be used to 
> reduce disk IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2021-05-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-16001:
---
Priority: Blocker  (was: Major)

> TestOfflineEditsViewer.testStored() fails reading negative value of 
> FSEditLogOpCodes
> 
>
> Key: HDFS-16001
> URL: https://issues.apache.org/jira/browse/HDFS-16001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Blocker
>
> {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception
> {noformat}
> java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 
> 17
> {noformat}
> Seems like there is a corrupt record in {{editsStored}} file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15995) Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-05-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15995.

Fix Version/s: 3.4.0
   Resolution: Done

> Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating 
> hadoop
> ---
>
> Key: HDFS-15995
> URL: https://issues.apache.org/jira/browse/HDFS-15995
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 3.4.0
>
>
> As discussed in the mailing list, 
> {quote}
> In HDFS-15624 (fix the function of setting quota by storage type), A new 
> layout version was added
> NVDIMM_SUPPORT(-66, -61, "Support NVDIMM storage type");
> This was added for 3.4.0 (trunk)
> However, there's another jira
> HDFS-15566 (NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0)
> SNAPSHOT_MODIFICATION_TIME(-66, -61, "Support modification time for 
> snapshot");
> where Brahma wanted to add a new layout version in branch-3.3 (3.3.1). The 
> patch got stalled awhile ago and I'm trying to commit it in preparation of 
> 3.3.1 release.
> However, both new layout versions conflict because they intend to use the new 
> same version id. We can't release 3.3.1 without HDFS-15566 but we can't use 
> layout id -66 because of HDFS-15624.
> I propose:
> revert HDFS-15624 (NVDIMM_SUPPORT),
> commit HDFS-15566 (SNAPSHOT_MODIFICATION_TIME)
> re-work on HDFS-15624 but with layout version id -67
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-28 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15624.

Resolution: Fixed

The updated patch was committed. Thanks Ayush for the help!

>  Fix the SetQuotaByStorageTypeOp problem after updating hadoop 
> ---
>
> Key: HDFS-15624
> URL: https://issues.apache.org/jira/browse/HDFS-15624
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: YaYun Wang
>Assignee: huangtianhua
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 3.4.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum 
> of StorageType. And, setting the quota by storageType depends on the 
> ordinal(), therefore, it may cause the setting of quota to be invalid after 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16002) TestJournalNodeRespectsBindHostKeys#testHttpsBindHostKey very flaky

2021-04-28 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16002:
--

 Summary: TestJournalNodeRespectsBindHostKeys#testHttpsBindHostKey 
very flaky
 Key: HDFS-16002
 URL: https://issues.apache.org/jira/browse/HDFS-16002
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


This test appears to be failing a lot lately. I suspect it has to be with the 
new change to support reloading httpserver2 certificates, but I've not looked 
into it.
{noformat}
Stacktrace
java.lang.NullPointerException
at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77)
at sun.nio.fs.UnixPath.(UnixPath.java:71)
at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
at java.nio.file.Paths.get(Paths.java:84)
at 
org.apache.hadoop.http.HttpServer2$Builder.makeConfigurationChangeMonitor(HttpServer2.java:609)
at 
org.apache.hadoop.http.HttpServer2$Builder.createHttpsChannelConnector(HttpServer2.java:592)
at 
org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:518)
at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:81)
at 
org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:238)
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.(MiniJournalCluster.java:120)
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.(MiniJournalCluster.java:47)
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster$Builder.build(MiniJournalCluster.java:79)
at 
org.apache.hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys.testHttpsBindHostKey(TestJournalNodeRespectsBindHostKeys.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-04-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15982:
---

===Bulk update===

I am planning to cut the branch for Hadoop 3.3.1 release, and this jira targets 
3.3.1 currently. Please take the time to review the patch, or push out of 3.3.1 
if you think it can't be finished in the next few weeks.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support

2021-04-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15788:
---

===Bulk update===

I am planning to cut the branch for Hadoop 3.3.1 release, and this jira targets 
3.3.1 currently. Please take the time to review the patch, or push out of 3.3.1 
if you think it can't be finished in the next few weeks.

> Correct the statement for pmem cache to reflect cache persistence support
> -
>
> Key: HDFS-15788
> URL: https://issues.apache.org/jira/browse/HDFS-15788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch
>
>
> Correct the statement for pmem cache to reflect cache persistence support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS

2021-04-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13916:
---

===Bulk update===

I am planning to cut the branch for Hadoop 3.3.1 release, and this jira targets 
3.3.1 currently. Please take the time to review the patch, or push out of 3.3.1 
if you think it can't be finished in the next few weeks.

> Distcp SnapshotDiff to support WebHDFS
> --
>
> Key: HDFS-13916
> URL: https://issues.apache.org/jira/browse/HDFS-13916
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: distcp, webhdfs
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Xun REN
>Assignee: Xun REN
>Priority: Major
>  Labels: easyfix, newbie, patch
> Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, 
> HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, 
> HDFS-13916.patch
>
>
> [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to 
> make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, 
> there is no modification for the real java class which is used by launching 
> the command "hadoop distcp ..."
>  
> You can check in the latest version here:
> [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100]
> In the method "preSyncCheck" of the class "DistCpSync", we still check if the 
> file system is DFS. 
> So I propose to change the class DistCpSync in order to take into 
> consideration what was committed by Lokesh Jain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0

2021-04-25 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15566:
---
Fix Version/s: 3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~brahmareddy]!!

> NN restart fails after RollingUpgrade from  3.1.3/3.2.1 to 3.3.0
> 
>
> Key: HDFS-15566
> URL: https://issues.apache.org/jira/browse/HDFS-15566
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15566-001.patch, HDFS-15566-002.patch, 
> HDFS-15566-003.patch
>
>
> * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, 
> it fails while replaying edit logs.
>  * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* 
> bits to the editLog transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the *modification time* bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> {noformat}
> 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361
> 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | 
> NameNode.java:1751
> java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>  at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606)
>  at java.lang.String.valueOf(String.java:2994)
>  at java.lang.StringBuilder.append(StringBuilder.java:131)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0

2021-04-25 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331714#comment-17331714
 ] 

Wei-Chiu Chuang commented on HDFS-15566:


+1 committing 003.

> NN restart fails after RollingUpgrade from  3.1.3/3.2.1 to 3.3.0
> 
>
> Key: HDFS-15566
> URL: https://issues.apache.org/jira/browse/HDFS-15566
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: HDFS-15566-001.patch, HDFS-15566-002.patch, 
> HDFS-15566-003.patch
>
>
> * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, 
> it fails while replaying edit logs.
>  * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* 
> bits to the editLog transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the *modification time* bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> {noformat}
> 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361
> 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | 
> NameNode.java:1751
> java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>  at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606)
>  at java.lang.String.valueOf(String.java:2994)
>  at java.lang.StringBuilder.append(StringBuilder.java:131)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-15995) Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-25 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-15995 started by Wei-Chiu Chuang.
--
> Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating 
> hadoop
> ---
>
> Key: HDFS-15995
> URL: https://issues.apache.org/jira/browse/HDFS-15995
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>
> As discussed in the mailing list, 
> {quote}
> In HDFS-15624 (fix the function of setting quota by storage type), A new 
> layout version was added
> NVDIMM_SUPPORT(-66, -61, "Support NVDIMM storage type");
> This was added for 3.4.0 (trunk)
> However, there's another jira
> HDFS-15566 (NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0)
> SNAPSHOT_MODIFICATION_TIME(-66, -61, "Support modification time for 
> snapshot");
> where Brahma wanted to add a new layout version in branch-3.3 (3.3.1). The 
> patch got stalled awhile ago and I'm trying to commit it in preparation of 
> 3.3.1 release.
> However, both new layout versions conflict because they intend to use the new 
> same version id. We can't release 3.3.1 without HDFS-15566 but we can't use 
> layout id -66 because of HDFS-15624.
> I propose:
> revert HDFS-15624 (NVDIMM_SUPPORT),
> commit HDFS-15566 (SNAPSHOT_MODIFICATION_TIME)
> re-work on HDFS-15624 but with layout version id -67
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-04-25 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331521#comment-17331521
 ] 

Wei-Chiu Chuang commented on HDFS-15982:


So – essentially we change the default behavior of webhdfs delete operation.

 

This is a big incompatible change. If we think this should be part of 3.4.0, 
risking our compatibility guarantee (which I think makes sense, given how many 
times I was involved in accidental data deletion), I think it can be part of 
3.3.1. We traditionally regard 3.3.0 as non-production ready, so making an 
incompat change in 3.3.1 probably is justifiable. 

 

You will want to make sure to add a release note. Also, we should consider 
making the delete behavior of DistributedFileSystem consistent with 
webhdfs/httpfs (i.e. do not skip trash by default)

 

Thoughts?

 

(BTW, thanks for involving me. I suspect it's going to break some of our 
applications/integration tests, so having a little more preparedness is good)

 

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15995) Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-23 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15995:
--

 Summary: Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem 
after updating hadoop
 Key: HDFS-15995
 URL: https://issues.apache.org/jira/browse/HDFS-15995
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


As discussed in the mailing list, 

{quote}
In HDFS-15624 (fix the function of setting quota by storage type), A new layout 
version was added
NVDIMM_SUPPORT(-66, -61, "Support NVDIMM storage type");
This was added for 3.4.0 (trunk)

However, there's another jira
HDFS-15566 (NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0)
SNAPSHOT_MODIFICATION_TIME(-66, -61, "Support modification time for snapshot");

where Brahma wanted to add a new layout version in branch-3.3 (3.3.1). The 
patch got stalled awhile ago and I'm trying to commit it in preparation of 
3.3.1 release.

However, both new layout versions conflict because they intend to use the new 
same version id. We can't release 3.3.1 without HDFS-15566 but we can't use 
layout id -66 because of HDFS-15624.

I propose:
revert HDFS-15624 (NVDIMM_SUPPORT),
commit HDFS-15566 (SNAPSHOT_MODIFICATION_TIME)
re-work on HDFS-15624 but with layout version id -67
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-23 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329956#comment-17329956
 ] 

Wei-Chiu Chuang edited comment on HDFS-15624 at 4/23/21, 8:29 AM:
--

Hi looks like we have a problem here. I'm going to reopen this issue.
For details, please see my discussion thread: 
https://lists.apache.org/thread.html/rbdd58fda1b528c345713f902c6a659fa1fc8671cbf67f59fc31e25ee%40%3Chdfs-dev.hadoop.apache.org%3E

{quote}
In HDFS-15624 (fix the function of setting quota by storage type), A new
layout version was added
NVDIMM_SUPPORT(-66, -61, "Support NVDIMM storage type");
This was added for 3.4.0 (trunk)

However, there's another jira
HDFS-15566 (NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0)
SNAPSHOT_MODIFICATION_TIME(-66, -61, "Support modification time for
snapshot");

where Brahma wanted to add a new layout version in branch-3.3 (3.3.1). The
patch got stalled awhile ago and I'm trying to commit it in preparation of
3.3.1 release.

However, both new layout versions conflict because they intend to use the
new same version id. We can't release 3.3.1 without HDFS-15566 but we can't
use layout id -66 because of HDFS-15624.

I propose:
revert HDFS-15624 (NVDIMM_SUPPORT),
commit HDFS-15566 (SNAPSHOT_MODIFICATION_TIME)
re-work on HDFS-15624 but with layout version id -67
{quote}


was (Author: jojochuang):
Hi looks like we have a problem here. I'm going to reopen this issue.
For details, please see my discussion thread: 
https://lists.apache.org/thread.html/rbdd58fda1b528c345713f902c6a659fa1fc8671cbf67f59fc31e25ee%40%3Chdfs-dev.hadoop.apache.org%3E

>  Fix the SetQuotaByStorageTypeOp problem after updating hadoop 
> ---
>
> Key: HDFS-15624
> URL: https://issues.apache.org/jira/browse/HDFS-15624
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: YaYun Wang
>Assignee: huangtianhua
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 3.4.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum 
> of StorageType. And, setting the quota by storageType depends on the 
> ordinal(), therefore, it may cause the setting of quota to be invalid after 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-15624:


>  Fix the SetQuotaByStorageTypeOp problem after updating hadoop 
> ---
>
> Key: HDFS-15624
> URL: https://issues.apache.org/jira/browse/HDFS-15624
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: YaYun Wang
>Assignee: huangtianhua
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 3.4.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum 
> of StorageType. And, setting the quota by storageType depends on the 
> ordinal(), therefore, it may cause the setting of quota to be invalid after 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-22 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329956#comment-17329956
 ] 

Wei-Chiu Chuang commented on HDFS-15624:


Hi looks like we have a problem here. I'm going to reopen this issue.
For details, please see my discussion thread: 
https://lists.apache.org/thread.html/rbdd58fda1b528c345713f902c6a659fa1fc8671cbf67f59fc31e25ee%40%3Chdfs-dev.hadoop.apache.org%3E

>  Fix the SetQuotaByStorageTypeOp problem after updating hadoop 
> ---
>
> Key: HDFS-15624
> URL: https://issues.apache.org/jira/browse/HDFS-15624
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: YaYun Wang
>Assignee: huangtianhua
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 3.4.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum 
> of StorageType. And, setting the quota by storageType depends on the 
> ordinal(), therefore, it may cause the setting of quota to be invalid after 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0

2021-04-22 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329955#comment-17329955
 ] 

Wei-Chiu Chuang commented on HDFS-15566:


HDFS-15624 blocks this jira. We need to find a solution.

> NN restart fails after RollingUpgrade from  3.1.3/3.2.1 to 3.3.0
> 
>
> Key: HDFS-15566
> URL: https://issues.apache.org/jira/browse/HDFS-15566
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: HDFS-15566-001.patch, HDFS-15566-002.patch, 
> HDFS-15566-003.patch
>
>
> * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, 
> it fails while replaying edit logs.
>  * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* 
> bits to the editLog transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the *modification time* bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> {noformat}
> 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361
> 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | 
> NameNode.java:1751
> java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>  at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606)
>  at java.lang.String.valueOf(String.java:2994)
>  at java.lang.StringBuilder.append(StringBuilder.java:131)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15850) Superuser actions should be reported to external enforcers

2021-04-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327041#comment-17327041
 ] 

Wei-Chiu Chuang commented on HDFS-15850:


We should get HADOOP-17079 to branch-3.3 too. I'll look into that one.

> Superuser actions should be reported to external enforcers
> --
>
> Key: HDFS-15850
> URL: https://issues.apache.org/jira/browse/HDFS-15850
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: security
>Affects Versions: 3.3.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15850.branch-3.3.001.patch, HDFS-15850.v1.patch, 
> HDFS-15850.v2.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Currently, HDFS superuser checks or actions are not reported to external 
> enforcers like Ranger and the audit report provided by such external enforces 
> are not complete and are missing the superuser actions. To fix this, add a 
> new method to "AccessControlEnforcer" for all superuser checks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-04-19 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325054#comment-17325054
 ] 

Wei-Chiu Chuang commented on HDFS-15796:


[~Daniel Ma] ping. Please let us know more details. Meanwhile I updated target 
version to 3.4.0.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-04-19 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15796:
---
Target Version/s: 3.4.0  (was: 3.3.1)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-04-19 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15796:
---
Fix Version/s: (was: 3.1.1)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15965) Please upgrade the log4j dependency to log4j2

2021-04-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15965:
---
Fix Version/s: (was: 3.4.0)
   (was: 3.3.0)

> Please upgrade the log4j dependency to log4j2
> -
>
> Key: HDFS-15965
> URL: https://issues.apache.org/jira/browse/HDFS-15965
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.0, 3.2.1, 3.2.2
>Reporter: helen huang
>Priority: Major
>
> The log4j dependency being use by hadoop-common is currently version 1.2.17. 
> Our fortify scan picked up a couple of issue with this dependency. Please 
> update it to the latest version of log4j2 dependencies:
> 
>  org.apache.logging.log4j
>  log4j-api
>  2.14.1
> 
> 
>  org.apache.logging.log4j
>  log4j-core
>  2.14.1
> 
>  
> The slf4j dependency will need to be updated as well after you upgrade log4j 
> to log4j2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15964) Please update the okhttp version to 4.9.1

2021-04-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15964:
---
Target Version/s: 3.3.1, 3.4.0  (was: 3.3.0, 3.4.0)

> Please update the okhttp version to 4.9.1
> -
>
> Key: HDFS-15964
> URL: https://issues.apache.org/jira/browse/HDFS-15964
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, dfsclient, security
>Affects Versions: 3.3.0
>Reporter: helen huang
>Priority: Major
>
> Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan 
> flagged two issues with this version. Please update it to the latest (It is 
> okhttp3 4.9.1 at this point). Thanks!
> 
>  com.squareup.okhttp3
>  okhttp
>  4.9.1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15965) Please upgrade the log4j dependency to log4j2

2021-04-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15965:
---
Target Version/s: 3.3.1, 3.4.0  (was: 3.3.0, 3.4.0)

> Please upgrade the log4j dependency to log4j2
> -
>
> Key: HDFS-15965
> URL: https://issues.apache.org/jira/browse/HDFS-15965
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.0, 3.2.1, 3.2.2
>Reporter: helen huang
>Priority: Major
>
> The log4j dependency being use by hadoop-common is currently version 1.2.17. 
> Our fortify scan picked up a couple of issue with this dependency. Please 
> update it to the latest version of log4j2 dependencies:
> 
>  org.apache.logging.log4j
>  log4j-api
>  2.14.1
> 
> 
>  org.apache.logging.log4j
>  log4j-core
>  2.14.1
> 
>  
> The slf4j dependency will need to be updated as well after you upgrade log4j 
> to log4j2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15561) Fix NullPointException when start dfsrouter

2021-04-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15561:
---
Target Version/s: 3.3.1  (was: 3.3.0)

> Fix NullPointException when start dfsrouter
> ---
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15561) Fix NullPointException when start dfsrouter

2021-04-18 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324645#comment-17324645
 ] 

Wei-Chiu Chuang commented on HDFS-15561:


[~fengnanli] are you taking of the PR? I'll go ahead update target version.

> Fix NullPointException when start dfsrouter
> ---
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15561) Fix NullPointException when start dfsrouter

2021-04-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15561:
---
Fix Version/s: (was: 3.3.1)

> Fix NullPointException when start dfsrouter
> ---
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-04-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15796:
---
Target Version/s: 3.3.1  (was: 3.3.0)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Fix For: 3.1.1
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15985) Incorrect sorting will cause failure to load an FsImage file

2021-04-16 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322701#comment-17322701
 ] 

Wei-Chiu Chuang commented on HDFS-15985:


[~sodonnell] 

> Incorrect sorting will cause failure to load an FsImage file
> 
>
> Key: HDFS-15985
> URL: https://issues.apache.org/jira/browse/HDFS-15985
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After we have introduced HDFS-14617 or HDFS-14771, when loading an fsimage 
> file, the following error will pop up:
> 2021-04-15 17:21:17,868 [293072]-INFO [main:FSImage@784]-Planning to load 
> image: 
> FSImageFile(file=//hadoop/hdfs/namenode/current/fsimage_0, 
> cpktTxId=0)
> 2021-04-15 17:25:53,288 [568492]-INFO 
> [main:FSImageFormatPBINode$Loader@229]-Loading 725097952 INodes.
> 2021-04-15 17:25:53,289 [568493]-ERROR [main:FSImage@730]-Failed to load 
> image from 
> FSImageFile(file=//hadoop/hdfs/namenode/current/fsimage_0, 
> cpktTxId=0)
> java.lang.IllegalStateException: GLOBAL: serial number 3 does not exist
> at 
> org.apache.hadoop.hdfs.server.namenode.SerialNumberMap.get(SerialNumberMap.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.SerialNumberManager.getString(SerialNumberManager.java:121)
> at 
> org.apache.hadoop.hdfs.server.namenode.SerialNumberManager.getString(SerialNumberManager.java:125)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields$PermissionStatusFormat.toPermissionStatus(INodeWithAdditionalFields.java:86)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadPermission(FSImageFormatPBINode.java:93)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeFile(FSImageFormatPBINode.java:303)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:280)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:237)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:237)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:176)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:937)
> It was found that this anomaly was related to sorting, as follows:
> ArrayList sections = Lists.newArrayList(summary
>   .getSectionsList());
>   Collections.sort(sections, new Comparator() {
> @Override
> public int compare(FileSummary.Section s1, FileSummary.Section s2) {
>   SectionName n1 = SectionName.fromString(s1.getName());
>   SectionName n2 = SectionName.fromString(s2.getName());
>   if (n1 == null) {
> return n2 == null? 0: -1;
>   } else if (n2 == null) {
> return -1;
>   } else {
> return n1.ordinal()-n2.ordinal();
>   }
> }
>   });
> When n1 != null and n2 == null, this will cause sorting errors.
> When loading Sections, the correct order of loading Sections:
> NS_INFO -> STRING_TABLE -> INODE
> If the sorting is incorrect, the loading order is as follows:
> INDOE -> NS_INFO -> STRING_TABLE
> Because when loading INODE, you need to rely on STRING_TABLE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15957) The ignored IOException in the RPC response sent by FSEditLogAsync can cause the HDFS client to hang

2021-04-14 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320983#comment-17320983
 ] 

Wei-Chiu Chuang commented on HDFS-15957:


[~kihwal] [~daryn] [~ahussein] any ideas about this async edit logger bug?

> The ignored IOException in the RPC response sent by FSEditLogAsync can cause 
> the HDFS client to hang
> 
>
> Key: HDFS-15957
> URL: https://issues.apache.org/jira/browse/HDFS-15957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
> Attachments: fsshell.txt, namenode.txt, reproduce.patch, 
> secondnamenode.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>     In FSEditLogAsync, the RpcEdit notification in line 248 could be skipped, 
> because the possible exception (e.g., IOException) thrown in line 365 is 
> always ignored.
>  
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
> class FSEditLogAsync extends FSEditLog implements Runnable {
>   // ...
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit();
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit();
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId());
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);   // line 
> 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
>   // the calling rpc thread will return immediately from logSync but the
>   // rpc response will not be sent until the edit is durable.
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();// line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
>   }
> }
> {code}
>     The `call.sendResponse()` may throw an IOException. According to the 
> comment (“don’t care if not sent”) there, this exception is neither handled 
> nor printed in log. However, we suspect that some RPC responses sent there 
> may be critical, and there should be some retry mechanism.
>     We try to introduce a single IOException in line 365, and find that the 
> HDFS client (e.g., `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`) may get 
> stuck forever (hang for >30min without any log). We can reproduce this 
> symptom in multiple ways. One of the simplest ways of reproduction is shown 
> as follows:
>  # Start a new empty HDFS cluster (1 namenode, 2 datanodes) with the default 
> configuration.
>  # Generate a file of 15MB for testing, by `fallocate -l 1500 foo.txt`.
>  # Run the HDFS client `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`.
>  # When line 365 is invoked the third time (it is invoked 6 times in total in 
> this experiment), inject an IOException there. (A patch for injecting the 
> exception this way is attached to reproduce the issue)
>     Then the client hangs forever, without any log. If we run `bin/hdfs dfs 
> -ls /` to check the file status, we can not see the expected 15MB `/1.txt` 
> file.
>     The jstack of the HDFS client shows that there is an RPC call infinitely 
> waiting.
> {code:java}
> "Thread-6" #18 daemon prio=5 os_prio=0 tid=0x7f9cd5295800 nid=0x26b9 in 
> Object.wait() [0x7f9ca354f000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00071e709610> (a org.apache.hadoop.ipc.Client$Call)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556)

[jira] [Updated] (HDFS-15815) if required storageType are unavailable, log the failed reason during choosing Datanode

2021-04-13 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15815:
---
Fix Version/s: 3.2.3

>  if required storageType are unavailable, log the failed reason during 
> choosing Datanode
> 
>
> Key: HDFS-15815
> URL: https://issues.apache.org/jira/browse/HDFS-15815
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15815.001.patch, HDFS-15815.002.patch, 
> HDFS-15815.003.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For better debug,  if required storageType are unavailable, log the failed 
> reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15815) if required storageType are unavailable, log the failed reason during choosing Datanode

2021-04-12 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15815:
---
Fix Version/s: 3.3.1

>  if required storageType are unavailable, log the failed reason during 
> choosing Datanode
> 
>
> Key: HDFS-15815
> URL: https://issues.apache.org/jira/browse/HDFS-15815
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15815.001.patch, HDFS-15815.002.patch, 
> HDFS-15815.003.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For better debug,  if required storageType are unavailable, log the failed 
> reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15759:
---
Fix Version/s: 3.1.5

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15759:
---
Fix Version/s: 3.2.3

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15887) Make LogRoll and TailEdits execute in parallel

2021-04-08 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317645#comment-17317645
 ] 

Wei-Chiu Chuang commented on HDFS-15887:


Not an expert here, but makes sense to me.

> Make LogRoll and TailEdits execute in parallel
> --
>
> Key: HDFS-15887
> URL: https://issues.apache.org/jira/browse/HDFS-15887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: edit_files.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the EditLogTailer class, LogRoll and TailEdits are executed in a thread, 
> and when a checkpoint occurs, it will compete with TailEdits for lock 
> (FSNamesystem#cpLock).
> Usually, it takes a long time to execute checkpoint, which will cause the 
> size of the generated edit log file to be relatively large.
> For example, here is an actual effect:
> The StandbyCheckpointer log is triggered as follows :  edit_files.jpg
> 2021-03-11 09:18:42,513 [769071096]-INFO [Standby State 
> Checkpointer:StandbyCheckpointer$CheckpointerThread@335]-Triggering 
> checkpoint because there have been 5142154 txns since the last checkpoint, 
> which exceeds the configured threshold 100
> When loading an edit log with a large amount of data, the processing time 
> will be longer. We should make the edit log size as even as possible, which 
> is good for the operation of the system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15243) Add an option to prevent sub-directories of protected directories from deletion

2021-04-08 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15243:
---
Fix Version/s: 3.3.1

> Add an option to prevent sub-directories of protected directories from 
> deletion
> ---
>
> Key: HDFS-15243
> URL: https://issues.apache.org/jira/browse/HDFS-15243
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1
>Affects Versions: 3.1.1
>Reporter: liuyanyu
>Assignee: liuyanyu
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15243.001.patch, HDFS-15243.002.patch, 
> HDFS-15243.003.patch, HDFS-15243.004.patch, HDFS-15243.005.patch, 
> HDFS-15243.006.patch, image-2020-03-28-09-23-31-335.png
>
>
> HDFS-8983 add  fs.protected.directories to support protected directories on 
> NameNode.  But as I test, when set a parent directory(eg /testA)  to 
> protected directory, the child directory (eg /testA/testB) still can be 
> deleted or renamed. When we protect a directory  mainly for protecting the 
> data under this directory , So I think the child directory should not be 
> delete or renamed if the parent directory is a protected directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-08 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15759:
---
Fix Version/s: 3.3.1

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-05 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315247#comment-17315247
 ] 

Wei-Chiu Chuang commented on HDFS-15759:


This is a great tool. If no objections I intend to backport it to lower 
branches.

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15316) Deletion failure should not remove directory from snapshottables

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15316:
---
Fix Version/s: 3.3.1

> Deletion failure should not remove directory from snapshottables
> 
>
> Key: HDFS-15316
> URL: https://issues.apache.org/jira/browse/HDFS-15316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15316.001.patch, HDFS-15316.002.patch
>
>
> If deleting a directory doesn't succeeds , still we are removing directory 
> from snapshottables  
> this makes the system inconsistent , we will be able to create snapshots but 
> snapshot diff throws Directory is not snaphottable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15351) Blocks scheduled count was wrong on truncate

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15351:
---
Fix Version/s: 3.3.1

> Blocks scheduled count was wrong on truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15362) FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all distinct blocks

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15362:
---
Fix Version/s: 3.3.1

> FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all 
> distinct blocks
> --
>
> Key: HDFS-15362
> URL: https://issues.apache.org/jira/browse/HDFS-15362
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15362.001.patch, HDFS-15362.002.patch
>
>
> FileWithSnapshotFeature#updateQuotaAndCollectBlocks uses list to collect 
> blocks 
> {code:java}
>  List allBlocks = new ArrayList();
>  if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>   }{code}
>  INodeFile#storagespaceConsumedContiguous collects all distinct blocks by set
> {code:java}
> // Collect all distinct blocks
>  Set allBlocks = new HashSet<>(Arrays.asList(getBlocks()));
>  DiffList diffs = sf.getDiffs().asList();
>  for(FileDiff diff : diffs) {
>BlockInfo[] diffBlocks = diff.getBlocks();
>if (diffBlocks != null) {
>  allBlocks.addAll(Arrays.asList(diffBlocks));
>  } {code}
> but on updating the reclaim context we subtract these both , so wrong quota 
> value can be updated
> {code:java}
> QuotaCounts current = file.storagespaceConsumed(bsp);
> reclaimContext.quotaDelta().add(oldCounts.subtract(current)); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15196:
---
Fix Version/s: 3.3.1

> RBF: RouterRpcServer getListing cannot list large dirs correctly
> 
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, 
> HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, 
> HDFS-15196.005.patch, HDFS-15196.006.patch, HDFS-15196.007.patch, 
> HDFS-15196.008.patch, HDFS-15196.009.patch, HDFS-15196.010.patch, 
> HDFS-15196.011.patch, HDFS-15196.012.patch, HDFS-15196.013.patch, 
> HDFS-15196.014.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The initial fix is just to append the mount points when there is no more 
> batch query necessary, but this will break the order of returned entries. 
> Therefore more complex logic is added to make sure the order is kept. At the 
> same time the remainingEntries variable inside DirectoryListing is also 
> updated to include the remaining mount points.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15252) HttpFS: setWorkingDirectory should not accept invalid paths

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15252:
---
Fix Version/s: 3.3.1

> HttpFS: setWorkingDirectory should not accept invalid paths
> ---
>
> Key: HDFS-15252
> URL: https://issues.apache.org/jira/browse/HDFS-15252
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15252.001.patch, HDFS-15252.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15300) RBF: updateActiveNamenode() is invalid when RPC address is IP

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15300:
---
Fix Version/s: 3.3.1

> RBF: updateActiveNamenode() is invalid when RPC address is IP
> -
>
> Key: HDFS-15300
> URL: https://issues.apache.org/jira/browse/HDFS-15300
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15300-001.patch, HDFS-15300-002.patch
>
>
> ActiveNamenodeResolver#updateActiveNamenode will invalid when the rpc address 
> like ip:port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15198) RBF: Add test for MountTableRefresherService failed to refresh other router MountTableEntries in secure mode

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15198:
---
Fix Version/s: 3.3.1

> RBF: Add test for MountTableRefresherService failed to refresh other router 
> MountTableEntries in secure mode
> 
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch, HDFS-15198.004.patch, HDFS-15198.005.patch, 
> HDFS-15198.006.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15591) RBF: Fix webHdfs file display error

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15591:
---
Fix Version/s: 3.3.1

> RBF: Fix webHdfs file display error
> ---
>
> Key: HDFS-15591
> URL: https://issues.apache.org/jira/browse/HDFS-15591
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15591-001.patch, HDFS-15591-002.patch, 
> HDFS-15591-003.patch, HDFS-15591-004.patch, RBF_Browse_Directory.png, 
> RBF_Browse_Directory_PostFix.png, after-1.jpg, after-2.jpg, before-1.jpg, 
> before-2.jpg
>
>
> The path mounted by the router does not exist on NN,router will  create 
> virtual folder with the mount name, but the "browse the file syaytem" display 
> on http is wrong. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15510) RBF: Quota and Content Summary was not correct in Multiple Destinations

2021-04-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15510:
---
Fix Version/s: 3.3.1

> RBF: Quota and Content Summary was not correct in Multiple Destinations
> ---
>
> Key: HDFS-15510
> URL: https://issues.apache.org/jira/browse/HDFS-15510
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Critical
> Fix For: 3.3.1, 3.4.0
>
> Attachments: 15510.png, HDFS-15510.001.patch, HDFS-15510.002.patch, 
> HDFS-15510.003.patch, HDFS-15510.004.patch, HDFS-15510.005.patch, 
> HDFS-15510.006.patch
>
>
> steps :
> *) create a mount entry with multiple destinations ( for suppose 2)
> *) Set NS quota as 10 for mount entry by dfsrouteradmin command, Content 
> Summary on the Mount Entry shows NS quota as 20
> *) Create 10 files through router, on creating 11th file , NS Quota Exceeded 
> Exception is coming 
> though the Content Summary showing the NS quota as 20 , we are not able to 
> create 20 files
>  
> the problem here is router stores the mount entry's NS quota as 10 , but 
> invokes NS quota on both the name services by set NS quota as 10 , so content 
> summary on mount entry aggregates the content summary of both the name 
> services by making NS quota as 20



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15945) DataNodes with zero capacity and zero blocks should be decommissioned immediately

2021-04-02 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313736#comment-17313736
 ] 

Wei-Chiu Chuang commented on HDFS-15945:


Thanks a lot for reporting the issue. Make sense to me. 

> DataNodes with zero capacity and zero blocks should be decommissioned 
> immediately
> -
>
> Key: HDFS-15945
> URL: https://issues.apache.org/jira/browse/HDFS-15945
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Such as when there is a storage problem, DataNode capacity and block count 
> sometimes become zero.
>  When we tried to decommission those DataNodes, we ran into an issue that the 
> decommission did not complete because the NameNode had not received their 
> first block report.
> {noformat}
> INFO  blockmanagement.DatanodeAdminManager 
> (DatanodeAdminManager.java:startDecommission(183)) - Starting decommission of 
> 127.0.0.1:58343 
> [DISK]DS-a29de094-2b19-4834-8318-76cda3bd86bf:NORMAL:127.0.0.1:58343 with 0 
> blocks
> INFO  blockmanagement.BlockManager 
> (BlockManager.java:isNodeHealthyForDecommissionOrMaintenance(4587)) - Node 
> 127.0.0.1:58343 hasn't sent its first block report.
> INFO  blockmanagement.DatanodeAdminDefaultMonitor 
> (DatanodeAdminDefaultMonitor.java:check(258)) - Node 127.0.0.1:58343 isn't 
> healthy. It needs to replicate 0 more blocks. Decommission In Progress is 
> still in progress.
> {noformat}
> To make matters worse, even if we stopped these DataNodes afterward, they 
> remained in a dead&decommissioning state until NameNode restarted.
> I think those DataNodes should be decommissioned immediately even if NameNode 
> hasn't recived the first block report.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS

2021-04-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15266:
---
Fix Version/s: 3.3.1

> Add missing DFSOps Statistics in WebHDFS
> 
>
> Key: HDFS-15266
> URL: https://issues.apache.org/jira/browse/HDFS-15266
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15266-01.patch, HDFS-15266-02.patch
>
>
> Couple of operations doesn't increment the count of number of read/write ops 
> and DFSOpsCountStatistics
> like : getStoragePolicy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15265) HttpFS: validate content-type in HttpFSUtils

2021-04-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15265:
---
Fix Version/s: 3.3.1

> HttpFS: validate content-type in HttpFSUtils
> 
>
> Key: HDFS-15265
> URL: https://issues.apache.org/jira/browse/HDFS-15265
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15265.001.patch, HDFS-15265.002.patch
>
>
> Validate that the content-type in HttpFSUtils is JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2021-04-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15332:
---
Fix Version/s: 3.3.1

> Quota Space consumed was wrong in truncate with Snapshots
> -
>
> Key: HDFS-15332
> URL: https://issues.apache.org/jira/browse/HDFS-15332
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15332.001.patch
>
>
> On calculating space quota usage
> {code:java}
>if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>}
>if (removed.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(removed.getBlocks()));
>}  
>for (BlockInfo b: allBlocks) { {code}
> we missed out the blocks of file snapshot feature's Diffs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15246) ArrayIndexOfboundsException in BlockManager CreateLocatedBlock

2021-04-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15246:
---
Fix Version/s: 3.3.1

> ArrayIndexOfboundsException in BlockManager CreateLocatedBlock
> --
>
> Key: HDFS-15246
> URL: https://issues.apache.org/jira/browse/HDFS-15246
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15246-testrepro.patch, HDFS-15246.001.patch, 
> HDFS-15246.002.patch, HDFS-15246.003.patch
>
>
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1362)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1501)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2047)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15667) Audit log record the unexpected allowed result when delete called

2021-04-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15667:
---
Fix Version/s: 3.3.1

> Audit log record the unexpected allowed result when delete called
> -
>
> Key: HDFS-15667
> URL: https://issues.apache.org/jira/browse/HDFS-15667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.0, 3.2.1
>Reporter: Baolong Mao
>Assignee: Baolong Mao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I met this issue if rm root directory, for remove non-root and non-empty 
> directory, toRemovedBlocks isn't null, its toDeleteList size is 0.
>  !screenshot-1.png! 
> when will return null?
> Through this screenshot, we can find that if fileRemoved = -1, then 
> toRemovedBlocks = null
>  !screenshot-2.png! 
> And when deleteAllowed(iip) return false, fileRemoved can be -1,
> {code:java}
>  private static boolean deleteAllowed(final INodesInPath iip) {
> if (iip.length() < 1 || iip.getLastINode() == null) {
>   if (NameNode.stateChangeLog.isDebugEnabled()) {
> NameNode.stateChangeLog.debug(
> "DIR* FSDirectory.unprotectedDelete: failed to remove "
> + iip.getPath() + " because it does not exist");
>   }
>   return false;
> } else if (iip.length() == 1) { // src is the root
>   NameNode.stateChangeLog.warn(
>   "DIR* FSDirectory.unprotectedDelete: failed to remove " +
>   iip.getPath() + " because the root is not allowed to be 
> deleted");
>   return false;
> }
> return true;
>   }
> {code}
> Through the code of deleteAllowed, we can find that when src is the root, it 
> can return false.
> So without this PR, when I execute *bin/hdfs dfs -rm -r /*
> I find the confusing auditlog line like following
> 2020-11-05 14:32:53,420 INFO  FSNamesystem.audit 
> (FSNamesystem.java:logAuditMessage(8102)) - allowed=true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14383) Compute datanode load based on StoragePolicy

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14383:
---
Fix Version/s: 3.3.1

> Compute datanode load based on StoragePolicy
> 
>
> Key: HDFS-14383
> URL: https://issues.apache.org/jira/browse/HDFS-14383
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.7.3, 3.1.2
>Reporter: Karthik Palanisamy
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14383-01.patch, HDFS-14383-02.patch
>
>
> Datanode load check logic needs to be changed because existing computation 
> will not consider StoragePolicy.
> DatanodeManager#getInServiceXceiverAverage
> {code}
> public double getInServiceXceiverAverage() {
>  double avgLoad = 0;
>  final int nodes = getNumDatanodesInService();
>  if (nodes != 0) {
>  final int xceivers = heartbeatManager
>  .getInServiceXceiverCount();
>  avgLoad = (double)xceivers/nodes;
>  }
>  return avgLoad;
> }
> {code}
>  
> For example: with 10 nodes (HOT), average 50 xceivers and 90 nodes (COLD) 
> with average 10 xceivers the calculated threshold by the NN is 28 (((500 + 
> 900)/100)*2), which means those 10 nodes (the whole HOT tier) becomes 
> unavailable when the COLD tier nodes are barely in use. Turning this check 
> off helps to mitigate this issue, however the 
> dfs.namenode.replication.considerLoad helps to "balance" the load of the DNs, 
> upon turning it off can lead to situations where specific DNs are 
> "overloaded".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15253:
---
Fix Version/s: 3.3.1

> Set default throttle value on dfs.image.transfer.bandwidthPerSec
> 
>
> Key: HDFS-15253
> URL: https://issues.apache.org/jira/browse/HDFS-15253
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can 
> use maximum available bandwidth for fsimage transfers during checkpoint. I 
> think we should throttle this. Many users were experienced namenode failover 
> when transferring large image size along with fsimage replication on 
> dfs.namenode.name.dir. eg. >25Gb.  
> Thought to set,
> dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s)
> dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent 
> checkpoint. However, the default checkpoint runs every 6 hours once)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15506) [JDK 11] Fix javadoc errors in hadoop-hdfs module

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15506:
---
Fix Version/s: 3.3.1

> [JDK 11] Fix javadoc errors in hadoop-hdfs module
> -
>
> Key: HDFS-15506
> URL: https://issues.apache.org/jira/browse/HDFS-15506
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15506.001.patch, HDFS-15506.002.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java:43:
>  error: self-closing element not allowed
> [ERROR]  * 
> [ERROR]^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java:682:
>  error: malformed HTML
> [ERROR]* a NameNode per second. Values <= 0 disable throttling. This 
> affects
> [ERROR]^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java:1780:
>  error: exception not thrown: java.io.FileNotFoundException
> [ERROR]* @throws FileNotFoundException
> [ERROR]  ^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectorySnapshottableFeature.java:176:
>  error: @param name not found
> [ERROR]* @param mtime The snapshot creation time set by Time.now().
> [ERROR] ^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:2187:
>  error: exception not thrown: java.lang.Exception
> [ERROR]* @exception Exception if the filesystem does not exist.
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/a0c16f0408a623e798dd7df29fbddf82
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15508:
---
Fix Version/s: 3.3.1

> [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
> -
>
> Key: HDFS-15508
> URL: https://issues.apache.org/jira/browse/HDFS-15508
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15508.01.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21:
>  error: reference not found
> [ERROR]  * Implementations should extend {@link 
> AbstractDelegationTokenSecretManager}.
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15331) Remove invalid exclusions that minicluster dependency on HDFS

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15331:
---
Fix Version/s: 3.3.1

> Remove invalid exclusions that minicluster dependency on HDFS
> -
>
> Key: HDFS-15331
> URL: https://issues.apache.org/jira/browse/HDFS-15331
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> Ozone has split into independent repo, but the invalid exclusions (kubernetes 
> client) that minicluster dependency on HDFS is kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15507) [JDK 11] Fix javadoc errors in hadoop-hdfs-client module

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15507:
---
Fix Version/s: 3.3.1

> [JDK 11] Fix javadoc errors in hadoop-hdfs-client module
> 
>
> Key: HDFS-15507
> URL: https://issues.apache.org/jira/browse/HDFS-15507
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15507.001.patch, HDFS-15507.002.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:32:
>  error: self-closing element not allowed
> [ERROR]  * 
> [ERROR]^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1245:
>  error: unexpected text
> [ERROR]* Same as {@link #create(String, FsPermission, EnumSet, boolean, 
> short, long,
> [ERROR]  ^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java:161:
>  error: reference not found
> [ERROR]* {@link HdfsConstants#LEASE_HARDLIMIT_PERIOD hard limit}. Until 
> the
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/7ab1c48a9bd7a0fdb11fa82eb04874d5
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15651) Client could not obtain block when DN CommandProcessingThread exit

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15651:
---
Fix Version/s: 3.3.1

> Client could not obtain block when DN CommandProcessingThread exit
> --
>
> Key: HDFS-15651
> URL: https://issues.apache.org/jira/browse/HDFS-15651
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15651.001.patch, HDFS-15651.002.patch, 
> HDFS-15651.patch
>
>
> In our cluster, we applied the HDFS-14997 improvement.
>  We find one case that CommandProcessingThread will exit due to OOM error. 
> OOM error was caused by our one abnormal application that running on this DN 
> node.
> {noformat}
> 2020-10-18 10:27:12,604 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Command processor 
> encountered fatal exception and exit.
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2005)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:671)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:617)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1247)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.access$1000(BPServiceActor.java:1194)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread$3.run(BPServiceActor.java:1299)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1221)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1208)
> {noformat}
> Here the main point is that CommandProcessingThread crashed will lead a very 
> bad impact. All the NN response commands will not be processed by DN side.
> We enabled the block token to access the data, but here the DN command 
> DNA_ACCESSKEYUPDATE is not processed on time by DN. And then we see lots of 
> Sasl error due to key expiration in DN log:
> {noformat}
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=xxx, keyId=xx, 
> userId=xxx, blockPoolId=, blockId=xxx, access modes=[READ]), since the 
> required block key (keyID=xxx) doesn't exist.]
> {noformat}
>  
> For the impact in client side, our users receive lots of 'could not obtain 
> block' error  with BlockMissingException.
> CommandProcessingThread is a critical thread, it should always be running.
> {code:java}
>   /**
>* CommandProcessingThread that process commands asynchronously.
>*/
>   class CommandProcessingThread extends Thread {
> private final BPServiceActor actor;
> private final BlockingQueue queue;
> ...
> @Override
> public void run() {
>   try {
> processQueue();
>   } catch (Throwable t) {
> LOG.error("{} encountered fatal exception and exit.", getName(), t);  
>  <=== should not exit this thread
>   }
> }
> {code}
> Once a unexpected error happened, a better handing should be:
>  * catch the exception, appropriately deal with the error and let 
> processQueue continue to run
>  or
>  * exit the DN process to let admin user investigate this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec

2021-03-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15253:
---
Release Note: The configuration dfs.image.transfer.bandwidthPerSec which 
defines the maximum bandwidth available for fsimage transfer is changed from 0 
(meaning no throttle at all) to 50MB/s.

> Set default throttle value on dfs.image.transfer.bandwidthPerSec
> 
>
> Key: HDFS-15253
> URL: https://issues.apache.org/jira/browse/HDFS-15253
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can 
> use maximum available bandwidth for fsimage transfers during checkpoint. I 
> think we should throttle this. Many users were experienced namenode failover 
> when transferring large image size along with fsimage replication on 
> dfs.namenode.name.dir. eg. >25Gb.  
> Thought to set,
> dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s)
> dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent 
> checkpoint. However, the default checkpoint runs every 6 hours once)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15924) Log4j will cause Server handler blocked when audit log boom.

2021-03-30 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311345#comment-17311345
 ] 

Wei-Chiu Chuang commented on HDFS-15924:


Check this out: HDFS-15720. The additional properties supported may give you a 
little relieve.

> Log4j will cause Server handler blocked when audit log boom.
> 
>
> Key: HDFS-15924
> URL: https://issues.apache.org/jira/browse/HDFS-15924
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Qi Zhu
>Priority: Major
> Attachments: image-2021-03-26-16-18-03-341.png, 
> image-2021-03-26-16-19-42-165.png
>
>
> !image-2021-03-26-16-18-03-341.png|width=707,height=234!
> !image-2021-03-26-16-19-42-165.png|width=824,height=198!
> The thread blocked when audit log boom show in above.
> Such as [https://dzone.com/articles/log4j-thread-deadlock-case] , it seems 
> the same case when heavy load, should we update to Log4j2 or other things we 
> can do to improve it in heavy audit log.
>  
> {code:java}
>  /**
>  Call the appenders in the hierrachy starting at
>  this.  If no appenders could be found, emit a
>  warning.
>  This method calls all the appenders inherited from the
>  hierarchy circumventing any evaluation of whether to log or not
>  to log the particular log request.
>  @param event the event to log.  */
> public void callAppenders(LoggingEvent event) {
> int writes = 0;
> for(Category c = this; c != null; c=c.parent) {
>   // Protected against simultaneous call to addAppender, 
> removeAppender,...
>   synchronized(c) {
> if(c.aai != null) {
> writes += c.aai.appendLoopOnAppenders(event);
> }
> if(!c.additive) {
> break;
> }
>   }
> }
> if(writes == 0) {
>   repository.emitNoAppenderWarning(this);
> }
>   }{code}
> The log4j code, use the  global synchronized, it will cause this happened.
> cc [~weichiu] [~hexiaoqiao] [~ayushtkn]  [~shv] [~ferhui]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15933) distcp geo hdfs site is not working

2021-03-29 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311138#comment-17311138
 ] 

Wei-Chiu Chuang commented on HDFS-15933:


This is not a HDFS bug. Please use user@hadoop for questions like this.

That said, please check your firewall configuration, make sure connections can 
be made across the network without problem.
For distcp, you will need to specify the cluster NameNodes and logical service 
id in core-site.xml / hdfs-site.xml

> distcp geo hdfs site is not working
> ---
>
> Key: HDFS-15933
> URL: https://issues.apache.org/jira/browse/HDFS-15933
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.1.4
> Environment: Linux Redhat - RHEL 8,0
>Reporter: Suresh
>Priority: Blocker
> Attachments: GEO_Cluster.jpg
>
>
> I am facing some issues in deploying HDFS at docker swarm network across geo 
> sites.
> “distcp” command which is running at a HDFS datanodes container is not able 
> to pull the data from one site to another site(which is set to run at another 
> docker swarm network). It expects the docker swarm, HDFS datanodes to be in 
> host network configuration other than overlay(with ip forwarding), bridge 
> network.
> We want to know how the shipment of docker based HDFS deployment would look 
> like, how distcp command would get invoked across docker instances sitting at 
> different Geo’s
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15456) TestExternalStoragePolicySatisfier fails intermittently

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15456:
---
Fix Version/s: 3.3.1

> TestExternalStoragePolicySatisfier fails intermittently
> ---
>
> Key: HDFS-15456
> URL: https://issues.apache.org/jira/browse/HDFS-15456
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available, test
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{TestExternalStoragePolicySatisfier}} frequently times-out on hadoop trunk 
> {code:bash}
> [ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 421.443 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
> [ERROR] 
> testChooseInSameDatanodeWithONESSDShouldNotChooseIfNoSpace(org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier)
>   Time elapsed: 43.983 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-07-07 07:51:10,267
> "IPC Server handler 4 on default port 44933" daemon prio=5 tid=1138 
> timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:307)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2918)
> "ForkJoinPool-2-worker-19" daemon prio=5 tid=235 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
> at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> "refreshUsed-/home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/sourcedir/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data1/current/BP-912129709-172.17.0.2-1594151429636"
>  daemon prio=5 tid=1217 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:205)
> at java.lang.Thread.run(Thread.java:748)
> "Socket Reader #1 for port 0" daemon prio=5 tid=1192 runnable
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
> at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1273)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1252)
> "pool-90-thread-1"  prio=5 tid=1069 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "IPC Server handler 2 on default port 37995" daemon prio=5 tid=1169 
> timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlock

[jira] [Updated] (HDFS-14546) Document block placement policies

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14546:
---
Fix Version/s: 3.3.1

> Document block placement policies
> -
>
> Key: HDFS-14546
> URL: https://issues.apache.org/jira/browse/HDFS-14546
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Amithsha
>Priority: Major
>  Labels: documentation
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14546-01.patch, HDFS-14546-02.patch, 
> HDFS-14546-03.patch, HDFS-14546-04.patch, HDFS-14546-05.patch, 
> HDFS-14546-06.patch, HDFS-14546-07.patch, HDFS-14546-08.patch, 
> HDFS-14546-09.patch, HdfsDesign.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, all the documentation refers to the default block placement policy.
> However, over time there have been new policies:
> * BlockPlacementPolicyRackFaultTolerant (HDFS-7891)
> * BlockPlacementPolicyWithNodeGroup (HDFS-3601)
> * BlockPlacementPolicyWithUpgradeDomain (HDFS-9006)
> We should update the documentation to refer to them explaining their 
> particularities and probably how to setup each one of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15380) RBF: Could not fetch real remote IP in RouterWebHdfsMethods

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15380:
---
Fix Version/s: 3.3.1

> RBF: Could not fetch real remote IP in RouterWebHdfsMethods
> ---
>
> Key: HDFS-15380
> URL: https://issues.apache.org/jira/browse/HDFS-15380
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: router, webhdfs
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15380.001.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We plan to add audit log for hdfs router, then we fetch remote ip via 
> Server.getRemoteIp(), but the result is "localhost/127.0.0.1".
>   
>  "REMOTE_ADDRESS" in RouterWebHdfsMethods.java is a ThreadLocal field, 
> setting in construction method RouterWebHdfsMethods() and init(). When we 
> call method Server.getRemoteIp() to fetch remote ip, the thread would be 
> changed, so the ThreadLocal field "REMOTE_ADDRESS" is null, and would be 
> passed to "localhost/127.0.0.1" via InetAddress.getByName().
>   
>  So we can change this field "REMOTE_ADDRESS" to a String value, just like 
> NamenodeWebHdfsMethods does.
>   
> I printed thread name and the value of "REMOTE_ADDRESS" in log, the log is 
> shown below:
> {code:java}
> 2020-05-27 19:15:18,797 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:(138)) - RouterWebHdfsMethods 
> REMOTE_ADDRESS: 14.39.39.28, current thread: qtp476579021-1090
> 2020-05-27 19:15:18,827 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:init(150)) - init REMOTE_ADDRESS: 14.39.39.28, 
> current thread: qtp476579021-1090
> 2020-05-27 19:15:18,836 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:getRemoteAddr(170)) - getRemoteAddr 
> REMOTE_ADDRESS: null, current thread: IPC Server handler 75 on 
> 2020-05-27 19:15:18,837 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:getRemoteAddr(170)) - getRemoteAddr 
> REMOTE_ADDRESS: null, current thread: IPC Server handler 75 on 
> 2020-05-27 19:15:18,883 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:reset(164)) - reset REMOTE_ADDRESS: null, current 
> thread: IPC Server handler 75 on 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15610) Reduce datanode upgrade/hardlink thread

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15610:
---
Fix Version/s: 3.3.1

> Reduce datanode upgrade/hardlink thread
> ---
>
> Key: HDFS-15610
> URL: https://issues.apache.org/jira/browse/HDFS-15610
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 3.1.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is a kernel overhead on datanode upgrade. If datanode with millions of 
> blocks and 10+ disks then block-layout migration will be super expensive 
> during its hardlink operation.  Slowness is observed when running with large 
> hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 
> thread for each disk) and its runs for 2+ hours. 
> I.e 10*12=120 threads (for 10 disks)
> Small test:
> RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap
> ||dfs.datanode.block.id.layout.upgrade.threads||Blocks||Disks||Time taken||
> |12|3.3 Million|1|2 minutes and 59 seconds|
> |6|3.3 Million|1|2 minutes and 35 seconds|
> |3|3.3 Million|1|2 minutes and 51 seconds|
> Tried same test twice and 95% is accurate (only a few sec difference on each 
> iteration). Using 6 thread is faster than 12 thread because of its overhead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15275:
---
Fix Version/s: 3.3.1

> HttpFS: Response of Create was not correct with noredirect and data are true
> 
>
> Key: HDFS-15275
> URL: https://issues.apache.org/jira/browse/HDFS-15275
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15275.001.patch, HDFS-15275.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12288) Fix DataNode's xceiver count calculation

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12288:
---
Fix Version/s: 3.3.1

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch, HDFS-12288.004.patch, HDFS-12288.005.patch, 
> HDFS-12288.006.patch, HDFS-12288.007.patch, HDFS-12288.008.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307692#comment-17307692
 ] 

Wei-Chiu Chuang commented on HDFS-15160:


+1 to backport.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6

2021-03-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307657#comment-17307657
 ] 

Wei-Chiu Chuang commented on HDFS-15660:


Sorry I am still confused. Shouldn't this get cherrypicked to 3.1, 3.2 and 3.3?

> StorageTypeProto is not compatiable between 3.x and 2.6
> ---
>
> Key: HDFS-15660
> URL: https://issues.apache.org/jira/browse/HDFS-15660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Fix For: 2.9.3, 3.4.0, 2.10.2
>
> Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch
>
>
> In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6,  
> we found hive to call getContentSummary method , the client and server was 
> not compatible  because of hadoop3 added new PROVIDED storage type.
> {code:java}
> // code placeholder
> 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while 
> invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over 
> x/x:8020. Trying to fail over immediately.
> java.io.IOException: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713)
>         at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> Caused by: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272)
>         at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816)
>         ... 23 more
> Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
> required fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetContentSummaryResponseProto$Builder.build(ClientNamenodeProtocolProtos.java:65392)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetContentSummaryResponseProto$Builder.build(ClientNamenodeProtocolProtos.java:65331)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.

[jira] [Updated] (HDFS-15249) ThrottledAsyncChecker is not thread-safe.

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15249:
---
Fix Version/s: 3.3.1

> ThrottledAsyncChecker is not thread-safe.
> -
>
> Key: HDFS-15249
> URL: https://issues.apache.org/jira/browse/HDFS-15249
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> ThrottledAsyncChecker should be thread-safe because it can be used by 
> multiple threads when we have multiple namespaces.
> *checksInProgress* and *completedChecks* are respectively HashMap and 
> WeakHashMap which are not thread-safe. So we need to put them in synchronized 
> block whenever we access them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15743) Fix -Pdist build failure of hadoop-hdfs-native-client

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15743:
---
Fix Version/s: 3.3.1

> Fix -Pdist build failure of hadoop-hdfs-native-client
> -
>
> Key: HDFS-15743
> URL: https://issues.apache.org/jira/browse/HDFS-15743
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [INFO] --- exec-maven-plugin:1.3.1:exec (pre-dist) @ 
> hadoop-hdfs-native-client ---
> tar: ./*: Cannot stat: No such file or directory
> tar: Exiting with failure status due to previous errors
> Checking to bundle with:
> bundleoption=false, liboption=snappy.lib, pattern=libsnappy. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=zstd.lib, pattern=libzstd. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=openssl.lib, pattern=libcrypto. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=isal.lib, pattern=libisal. libdir=
> Checking to bundle with:
> bundleoption=, liboption=pmdk.lib, pattern=pmdk libdir=
> Bundling bin files failed
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15610) Reduce datanode upgrade/hardlink thread

2021-03-23 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307589#comment-17307589
 ] 

Wei-Chiu Chuang commented on HDFS-15610:


I think this is a quite important improvement which stabilizes the upgrade 
experience. I'll cherry pick it to lower branches

> Reduce datanode upgrade/hardlink thread
> ---
>
> Key: HDFS-15610
> URL: https://issues.apache.org/jira/browse/HDFS-15610
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 3.1.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is a kernel overhead on datanode upgrade. If datanode with millions of 
> blocks and 10+ disks then block-layout migration will be super expensive 
> during its hardlink operation.  Slowness is observed when running with large 
> hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 
> thread for each disk) and its runs for 2+ hours. 
> I.e 10*12=120 threads (for 10 disks)
> Small test:
> RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap
> ||dfs.datanode.block.id.layout.upgrade.threads||Blocks||Disks||Time taken||
> |12|3.3 Million|1|2 minutes and 59 seconds|
> |6|3.3 Million|1|2 minutes and 35 seconds|
> |3|3.3 Million|1|2 minutes and 51 seconds|
> Tried same test twice and 95% is accurate (only a few sec difference on each 
> iteration). Using 6 thread is faster than 12 thread because of its overhead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15910) Replace bzero with explicit_bzero for better safety

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15910:
---
Fix Version/s: 3.3.1

> Replace bzero with explicit_bzero for better safety
> ---
>
> Key: HDFS-15910
> URL: https://issues.apache.org/jira/browse/HDFS-15910
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.2.2
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It is better to always use explicit_bzero since it guarantees that the buffer 
> will be cleared irrespective of the compiler optimizations - 
> https://man7.org/linux/man-pages/man3/bzero.3.html.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15868) Possible Resource Leak in EditLogFileOutputStream

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15868:
---
Fix Version/s: 3.3.1

> Possible Resource Leak in EditLogFileOutputStream
> -
>
> Key: HDFS-15868
> URL: https://issues.apache.org/jira/browse/HDFS-15868
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/1f1a1ef52df896a2b66b16f5bbc17aa39b1a1dd7/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileOutputStream.java#L91].
>  If an I/O error occurs at line 91, rp remains open since the exception isn't 
> caught locally, and there is no way for any caller to close the 
> RandomAccessFile.
>  I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15908) Possible Resource Leak in org.apache.hadoop.hdfs.qjournal.server.Journal

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15908:
---
Fix Version/s: 3.3.1

> Possible Resource Leak in org.apache.hadoop.hdfs.qjournal.server.Journal
> 
>
> Key: HDFS-15908
> URL: https://issues.apache.org/jira/browse/HDFS-15908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/cd44e917d0b331a2d1e1fa63fdd498eac01ae323/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java#L266].
>  The call to close on {{storage}} at line 267 can throw an exception. If it 
> occurs, then {{committedTxnId}} and {{curSegment}} are never closed.
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15809) DeadNodeDetector doesn't remove live nodes from dead node set.

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15809:
---
Fix Version/s: 3.3.1

> DeadNodeDetector doesn't remove live nodes from dead node set.
> --
>
> Key: HDFS-15809
> URL: https://issues.apache.org/jira/browse/HDFS-15809
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15809.001.patch, HDFS-15809.002.patch, 
> HDFS-15809.003.patch, HDFS-15809.004.patch, HDFS-15809.005.patch, 
> HDFS-15809.006.patch, HDFS-15809.007.patch
>
>
> We found the dead node detector might never remove the alive nodes from the 
> dead node set in a big cluster. For example:
>  # 200 nodes are added to the dead node set by DeadNodeDetector.
>  # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the 
> deadNodesProbeQueue because the queue limited length is 100.
>  # The probe threads start working and probe 30 nodes.
>  # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead 
> node set  and adds 30 nodes to the deadNodesProbeQueue. But the order is the 
> same as the last time. So the 30 nodes that has already been probed are added 
> to the queue again.
>  # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If 
> they are all dead then the live nodes behind them could never be recovered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15806) DeadNodeDetector should close all the threads when it is closed.

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15806:
---
Fix Version/s: 3.3.1

> DeadNodeDetector should close all the threads when it is closed.
> 
>
> Key: HDFS-15806
> URL: https://issues.apache.org/jira/browse/HDFS-15806
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15806.001.patch
>
>
> The DeadNodeDetector doesn't close all the threads when it is closed. This 
> Jira trys to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15791) Possible Resource Leak in FSImageFormatProtobuf

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15791:
---
Fix Version/s: 3.3.1

> Possible Resource Leak in FSImageFormatProtobuf
> ---
>
> Key: HDFS-15791
> URL: https://issues.apache.org/jira/browse/HDFS-15791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L271].
>  If an I/O error occurs at line 
> [273|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L273]
>  or 
> [277|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L277],
>  {{fin}} remains open since the exception isn't caught locally, and there is 
> no way for any caller to close the FileInputStream
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15551) Tiny Improve for DeadNode detector

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15551:
---
Fix Version/s: 3.3.1

> Tiny Improve for DeadNode detector
> --
>
> Key: HDFS-15551
> URL: https://issues.apache.org/jira/browse/HDFS-15551
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.3.0
>Reporter: dark_num
>Assignee: imbajin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> # add or improve some logs for adding local & global deadnodes
>  # logic improve
>  # fix typo



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15661) The DeadNodeDetector shouldn't be shared by different DFSClients.

2021-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15661:
---
Fix Version/s: 3.3.1

> The DeadNodeDetector shouldn't be shared by different DFSClients.
> -
>
> Key: HDFS-15661
> URL: https://issues.apache.org/jira/browse/HDFS-15661
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15661.001.patch, HDFS-15661.002.patch, 
> HDFS-15661.003.patch, HDFS-15661.004.patch, HDFS-15661.005.patch
>
>
> Currently the DeadNodeDetector is a member of ClientContext. That means it is 
> shared by many different DFSClients. When one DFSClient.close() is invoked, 
> the DeadNodeDetecotor thread would be interrupted and impact other DFSClients.
> From the original design of HDFS-13571 we could see the DeadNodeDetector is 
> supposed to share dead nodes of many input streams from the same client. 
> We should move the DeadNodeDetector as a member of DFSClient instead of 
> ClientContext. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15901) Solve the problem of DN repeated block reports occupying too many RPCs during Safemode

2021-03-18 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304646#comment-17304646
 ] 

Wei-Chiu Chuang commented on HDFS-15901:


We have some users running 1000+ node scale clusters but I don't watch the 
clusters every day. I am honestly not the best person for opinions when it 
comes to extreme scale clusters. 

[~hexiaoqiao] or [~ferhui] may have better ideas.

> Solve the problem of DN repeated block reports occupying too many RPCs during 
> Safemode
> --
>
> Key: HDFS-15901
> URL: https://issues.apache.org/jira/browse/HDFS-15901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the cluster exceeds thousands of nodes, we want to restart the NameNode 
> service, and all DataNodes send a full Block action to the NameNode. During 
> SafeMode, some DataNodes may send blocks to NameNode multiple times, which 
> will take up too much RPC. In fact, this is unnecessary.
> In this case, some block report leases will fail or time out, and in extreme 
> cases, the NameNode will always stay in Safe Mode.
> 2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(:port, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.
> 2021-03-13 18:36:08,145 [29220972] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15719) [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket timeout

2021-03-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298434#comment-17298434
 ] 

Wei-Chiu Chuang commented on HDFS-15719:


I'm inclined for longer timeout now.

It was initially set to 1s to make KMS client expire connections quickly, so 
that it doesn't accumulate too many connections. However, more recently we 
realized the entire issue was caused by jetty client that doesn't reuse 
connections.

HADOOP-15813 changes the jetty client behavior so that KMS client reuses 
connections. The longer timeout value should work (better) now.

> [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket 
> timeout
> -
>
> Key: HDFS-15719
> URL: https://issues.apache.org/jira/browse/HDFS-15719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After Hadoop 3, we migrated Jetty 6 to Jetty 9. It was implemented in 
> HADOOP-10075.
> However, HADOOP-10075 erroneously set the HttpServer2 socket idle timeout too 
> low.
> We replaced SelectChannelConnector.setLowResourceMaxIdleTime() with 
> ServerConnector.setIdleTimeout() but they aren't the same.
> Essentially, the HttpServer2's idle timeout was the default timeout set by 
> Jetty 6, which is 200 seconds. After Hadoop 3, the idle timeout is set to 10 
> seconds, which is unreasonable for JN. If NameNodes try to download a big 
> edit log from JournalNodes (say a few hundred MB), it is likely to exceed 10 
> seconds. When it happens, both NN crashes and there's no way to workaround 
> unless you apply the patch in HADOOP-15696 to add a config switch for the 
> idle timeout. Fortunately, it doesn't happen a lot.
> Propose: bump the idle timeout default to 200 seconds to match the behavior 
> in Jetty 6. (Jetty 9 reduces the default idle timeout to 30 seconds, which is 
> not suitable for JN)
> Other things to consider:
> 1. fsck serverlet? (somehow I suspect this is related to the socket timeout 
> reported in HDFS-7175)
> 2. webhdfs, httpfs? --> we've also received reports that webhdfs can timeout. 
> so having a longer timeout makes sense here.
> 2. kms? will the longer timeout cause more lingering sockets?
> Thanks [~zhenshan.wen] for the discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14013) Skip any credentials stored in HDFS when starting ZKFC

2021-03-01 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292703#comment-17292703
 ] 

Wei-Chiu Chuang commented on HDFS-14013:


+1 thank you!

> Skip any credentials stored in HDFS when starting ZKFC
> --
>
> Key: HDFS-14013
> URL: https://issues.apache.org/jira/browse/HDFS-14013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Krzysztof Adamski
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: zkfc
> Attachments: HDFS-14013.001.patch, hadoop-hdfs-zkfc-server1.log
>
>
> HADOOP-15157 added the ability to use a jceks credential provider to store 
> the Zookeeper credentials needed by the Failover Controller to connect to 
> Zookeeper.
> By default, if any provider is specified in 
> hadoop.security.credential.provider.path it will be checked to see if it 
> holds the required information, otherwise the traditional way of getting the 
> the login will be used.
> hadoop.security.credential.provider.path can hold a list of credential 
> providers and if there is an error reading any of them, the exception bubbles 
> up and causes the ZKFC to fail. The intent of HADOOP-15157 is to have a local 
> jceks file for the FC credentials, but if there is another provider stored in 
> HDFS (eg S3A credentials), then it will fail to be read and cause the FC to 
> fail.
> Other components which use credential providers (eg S3A, ABFS etc) explicitly 
> disallow storing the credentials in the same type of filesystem. Ie, S3A 
> cannot use providers stored in S3. To avoid this sort of circular dependency, 
> any such credentials are removed from the list before they are used.
> The Failover Controller should do the same, and ensure it does not try to 
> read any credentials stored in HDFS, as it will never be able to do so until 
> HDFS is full started.
> For reference, the stack logged when the FC meets this problem is:
>   
> {code:java}
> 2018-10-22 08:17:09,251 FATAL tools.DFSZKFailoverController 
> (DFSZKFailoverController.java:main(197)) - DFSZKFailOverController exiting 
> due to earlier exception java.io.IOException: Configuration problem with 
> provider path. 2018-10-22 08:17:09,252 DEBUG util.ExitUtil 
> (ExitUtil.java:terminate(209)) - Exiting with status 1: java.io.IOException: 
> Configuration problem with provider path. 1: java.io.IOException: 
> Configuration problem with provider path. at 
> org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:199)
>  Caused by: java.io.IOException: Configuration problem with provider path.    
>  at 
> org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2363)
>  at 
> org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2282) 
> at 
> org.apache.hadoop.security.SecurityUtil.getZKAuthInfos(SecurityUtil.java:732) 
> at 
> org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:343)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:194)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:360) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171)  
>    at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:195)
>  Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1951)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1427)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3100)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1154)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProt

[jira] [Commented] (HDFS-15210) EC : File write hanged when DN is shutdown by admin command.

2021-02-25 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291358#comment-17291358
 ] 

Wei-Chiu Chuang commented on HDFS-15210:


Cherrypicked to lower branches branch-3.3 ~ branch-3.1.


> EC : File write hanged when DN is shutdown by admin command.
> 
>
> Key: HDFS-15210
> URL: https://issues.apache.org/jira/browse/HDFS-15210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15210.001.patch, HDFS-15210.002.patch, 
> HDFS-15210.003.patch, dump.txt
>
>
> EC Blocks : blk_-9223372036854291632_10668910, 
> blk_-9223372036854291631_10668910, blk_-9223372036854291630_10668910, 
> blk_-9223372036854291629_10668910, blk_-9223372036854291628_10668910
>  
> Two block DN restarted : blk_-9223372036854291630_10668910 & 
> blk_-9223372036854291632_10668910
> {code:java}
> 2020-03-03 18:12:17,074 DEBUG hdfs.DataStreamer: DFSClient seqno: -2 reply: 
> OOB_RESTART downstreamAckTimeNanos: 0 flag: 8
> 2020-03-03 18:13:39,469 DEBUG hdfs.DataStreamer: DFSClient seqno: -2 reply: 
> OOB_RESTART downstreamAckTimeNanos: 0 flag: 8 {code}
>  
> Restarted streams are stuck in below stacktrace :
> {code}
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.take(DFSStripedOutputStream.java:110)
>  at 
> org.apache.hadoop.hdfs.StripedDataStreamer.setupPipelineInternal(StripedDataStreamer.java:140)
>  at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1540)
>  at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1276)
>  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:669) at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15210) EC : File write hanged when DN is shutdown by admin command.

2021-02-25 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15210:
---
Fix Version/s: 3.2.3
   3.1.5
   3.3.1

> EC : File write hanged when DN is shutdown by admin command.
> 
>
> Key: HDFS-15210
> URL: https://issues.apache.org/jira/browse/HDFS-15210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15210.001.patch, HDFS-15210.002.patch, 
> HDFS-15210.003.patch, dump.txt
>
>
> EC Blocks : blk_-9223372036854291632_10668910, 
> blk_-9223372036854291631_10668910, blk_-9223372036854291630_10668910, 
> blk_-9223372036854291629_10668910, blk_-9223372036854291628_10668910
>  
> Two block DN restarted : blk_-9223372036854291630_10668910 & 
> blk_-9223372036854291632_10668910
> {code:java}
> 2020-03-03 18:12:17,074 DEBUG hdfs.DataStreamer: DFSClient seqno: -2 reply: 
> OOB_RESTART downstreamAckTimeNanos: 0 flag: 8
> 2020-03-03 18:13:39,469 DEBUG hdfs.DataStreamer: DFSClient seqno: -2 reply: 
> OOB_RESTART downstreamAckTimeNanos: 0 flag: 8 {code}
>  
> Restarted streams are stuck in below stacktrace :
> {code}
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.take(DFSStripedOutputStream.java:110)
>  at 
> org.apache.hadoop.hdfs.StripedDataStreamer.setupPipelineInternal(StripedDataStreamer.java:140)
>  at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1540)
>  at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1276)
>  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:669) at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14013) HDFS ZKFC on standby NN not starting with credentials stored in hdfs

2021-02-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289817#comment-17289817
 ] 

Wei-Chiu Chuang commented on HDFS-14013:


Other than that the fix LGTM

> HDFS ZKFC on standby NN not starting with credentials stored in hdfs
> 
>
> Key: HDFS-14013
> URL: https://issues.apache.org/jira/browse/HDFS-14013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Krzysztof Adamski
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: zkfc
> Attachments: HDFS-14013.001.patch, hadoop-hdfs-zkfc-server1.log
>
>
> HDFS ZKFailoverController is not starting correctly on strandby NameNode when 
> credential provideris stored in hdfs. Removing credential provider entry from 
> core-site helps. See full exception stack attached.
> It looks like if it only checks the credentials on the same host namenode, 
> not redirects to the active one. It may make sense to delay credential check 
> after active namenode is elected and redirect to the active namenode as well.
>  
>  
> {code:java}
> 2018-10-22 08:17:09,251 FATAL tools.DFSZKFailoverController 
> (DFSZKFailoverController.java:main(197)) - DFSZKFailOverController exiting 
> due to earlier exception java.io.IOException: Configuration problem with 
> provider path. 2018-10-22 08:17:09,252 DEBUG util.ExitUtil 
> (ExitUtil.java:terminate(209)) - Exiting with status 1: java.io.IOException: 
> Configuration problem with provider path. 1: java.io.IOException: 
> Configuration problem with provider path. at 
> org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:199)
>  Caused by: java.io.IOException: Configuration problem with provider path.    
>  at 
> org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2363)
>  at 
> org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2282) 
> at 
> org.apache.hadoop.security.SecurityUtil.getZKAuthInfos(SecurityUtil.java:732) 
> at 
> org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:343)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:194)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:360) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171)  
>    at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:195)
>  Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1951)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1427)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3100)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1154)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:966)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) 
>

[jira] [Commented] (HDFS-14013) HDFS ZKFC on standby NN not starting with credentials stored in hdfs

2021-02-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289782#comment-17289782
 ] 

Wei-Chiu Chuang commented on HDFS-14013:


I read the description and wondered why use HDFS to store credential file for 
ZKFC. Now that I read the patch I realized the intent is to exclude HDFS path 
during ZKFC startup. Can we change the jira summary to make it more clear?

> HDFS ZKFC on standby NN not starting with credentials stored in hdfs
> 
>
> Key: HDFS-14013
> URL: https://issues.apache.org/jira/browse/HDFS-14013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Krzysztof Adamski
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: zkfc
> Attachments: HDFS-14013.001.patch, hadoop-hdfs-zkfc-server1.log
>
>
> HDFS ZKFailoverController is not starting correctly on strandby NameNode when 
> credential provideris stored in hdfs. Removing credential provider entry from 
> core-site helps. See full exception stack attached.
> It looks like if it only checks the credentials on the same host namenode, 
> not redirects to the active one. It may make sense to delay credential check 
> after active namenode is elected and redirect to the active namenode as well.
>  
>  
> {code:java}
> 2018-10-22 08:17:09,251 FATAL tools.DFSZKFailoverController 
> (DFSZKFailoverController.java:main(197)) - DFSZKFailOverController exiting 
> due to earlier exception java.io.IOException: Configuration problem with 
> provider path. 2018-10-22 08:17:09,252 DEBUG util.ExitUtil 
> (ExitUtil.java:terminate(209)) - Exiting with status 1: java.io.IOException: 
> Configuration problem with provider path. 1: java.io.IOException: 
> Configuration problem with provider path. at 
> org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:199)
>  Caused by: java.io.IOException: Configuration problem with provider path.    
>  at 
> org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2363)
>  at 
> org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2282) 
> at 
> org.apache.hadoop.security.SecurityUtil.getZKAuthInfos(SecurityUtil.java:732) 
> at 
> org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:343)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:194)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:360) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171)  
>    at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:195)
>  Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1951)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1427)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3100)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1154)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:966)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.securit

[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2021-02-23 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289615#comment-17289615
 ] 

Wei-Chiu Chuang commented on HDFS-15422:


I think the patch looks reasonable. I am +1.

bq. // TODO: Pretty confident this should be s/storedBlock/block below,
I had stared at this line years before and wondered why it was not updated :)

> Reported IBR is partially replaced with stored info when queuing.
> -
>
> Key: HDFS-15422
> URL: https://issues.apache.org/jira/browse/HDFS-15422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Stephen O'Donnell
>Priority: Critical
> Attachments: HDFS-15422-branch-2.10.001.patch, HDFS-15422.001.patch
>
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.
> Although it sounds bad, I am not making it blocker 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15734) [READ] DirectoryScanner#scan need not check StorageType.PROVIDED

2021-02-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15734:
---
Fix Version/s: (was: 3.40)
   3.2.3
   3.1.5
   3.4.0
   3.3.1

> [READ] DirectoryScanner#scan need not check StorageType.PROVIDED
> 
>
> Key: HDFS-15734
> URL: https://issues.apache.org/jira/browse/HDFS-15734
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since https://issues.apache.org/jira/browse/HDFS-12777 , there is no PROVIDED 
> storage in volume report.
> We don't need check it in DirectoryScanner#scan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15734) [READ] DirectoryScanner#scan need not check StorageType.PROVIDED

2021-02-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15734.

Fix Version/s: 3.40
   Resolution: Fixed

> [READ] DirectoryScanner#scan need not check StorageType.PROVIDED
> 
>
> Key: HDFS-15734
> URL: https://issues.apache.org/jira/browse/HDFS-15734
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.40
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since https://issues.apache.org/jira/browse/HDFS-12777 , there is no PROVIDED 
> storage in volume report.
> We don't need check it in DirectoryScanner#scan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15843) [libhdfs++] Make write cross platform

2021-02-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15843:
---
Summary: [libhdfs++] Make write cross platform  (was: Make write cross 
platform)

> [libhdfs++] Make write cross platform
> -
>
> Key: HDFS-15843
> URL: https://issues.apache.org/jira/browse/HDFS-15843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.2.2
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We're currently using the *write* function from unistd.h which isn't 
> cross-platform. We need to replace this with *std::cout.write* instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15826) Solve the problem of incorrect progress of delegation tokens when loading FsImage

2021-02-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15826.

Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

Thanks! 

> Solve the problem of incorrect progress of delegation tokens when loading 
> FsImage
> -
>
> Key: HDFS-15826
> URL: https://issues.apache.org/jira/browse/HDFS-15826
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: 2.jpg, in_ progress.jpg
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When loading the FsImage, if the delegation tokens information is included, 
> the progress bar is displayed on the ui as 100%. However, the delegation 
> tokens information is still being processed at this time, which is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15824) Update to enable TLS >=1.2 as default secure protocols

2021-02-06 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280339#comment-17280339
 ] 

Wei-Chiu Chuang edited comment on HDFS-15824 at 2/7/21, 12:09 AM:
--

Thanks for reporting the issue. I'm pretty sure we use TLS1.2 by default in the 
latest version. What version did you check?
https://github.com/apache/hadoop/blob/6b5d9e2334bec199518e580d4a2863c26518efcb/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ssl/SSLFactory.java#L75


was (Author: jojochuang):
Thanks for reporting the issue. I'm pretty sure we use TLS1.2 by default in the 
latest version.
https://github.com/apache/hadoop/blob/6b5d9e2334bec199518e580d4a2863c26518efcb/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ssl/SSLFactory.java#L75

> Update to enable TLS >=1.2 as default secure protocols 
> ---
>
> Key: HDFS-15824
> URL: https://issues.apache.org/jira/browse/HDFS-15824
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/hdfsproxy
>Reporter: Vicky Zhang
>Priority: Major
>
> in file 
> src/contrib/hdfsproxy/src/java/org/apache/hadoop/hdfsproxy/ProxyUtil.java, 
> line 125, the SSL protocol is used in statement:  SSLContext sc = 
> SSLContext.getInstance("SSL");
> *Impact:* 
> An SSL DDoS attack targets the SSL handshake protocol either by sending 
> worthless data to the SSL server which will result in connection issues for 
> legitimate users or by abusing the SSL handshake protocol itself.
> *Suggestions:*
> Upgrade the implementation to the “TLS”, and configure https.protocols JVM 
> option to include TLSv1.2:
> *Useful links:*
> [https://blogs.oracle.com/java-platform-group/diagnosing-tls,-ssl,-and-https]
> [https://www.appmarq.com/public/tqi,1039002,CWE-319-Avoid-using-Deprecated-SSL-protocols-to-secure-connection]
> *Please share with us your opinions/comments if there is any:*
> Is the bug report helpful?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15824) Update to enable TLS >=1.2 as default secure protocols

2021-02-06 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280339#comment-17280339
 ] 

Wei-Chiu Chuang commented on HDFS-15824:


Thanks for reporting the issue. I'm pretty sure we use TLS1.2 by default in the 
latest version.
https://github.com/apache/hadoop/blob/6b5d9e2334bec199518e580d4a2863c26518efcb/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ssl/SSLFactory.java#L75

> Update to enable TLS >=1.2 as default secure protocols 
> ---
>
> Key: HDFS-15824
> URL: https://issues.apache.org/jira/browse/HDFS-15824
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/hdfsproxy
>Reporter: Vicky Zhang
>Priority: Major
>
> in file 
> src/contrib/hdfsproxy/src/java/org/apache/hadoop/hdfsproxy/ProxyUtil.java, 
> line 125, the SSL protocol is used in statement:  SSLContext sc = 
> SSLContext.getInstance("SSL");
> *Impact:* 
> An SSL DDoS attack targets the SSL handshake protocol either by sending 
> worthless data to the SSL server which will result in connection issues for 
> legitimate users or by abusing the SSL handshake protocol itself.
> *Suggestions:*
> Upgrade the implementation to the “TLS”, and configure https.protocols JVM 
> option to include TLSv1.2:
> *Useful links:*
> [https://blogs.oracle.com/java-platform-group/diagnosing-tls,-ssl,-and-https]
> [https://www.appmarq.com/public/tqi,1039002,CWE-319-Avoid-using-Deprecated-SSL-protocols-to-secure-connection]
> *Please share with us your opinions/comments if there is any:*
> Is the bug report helpful?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15801) Backport HDFS-14582 to branch-2.10 (Failed to start DN with ArithmeticException when NULL checksum used)

2021-02-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15801.

Fix Version/s: 2.10.2
   Resolution: Fixed

Thanks. This is merged.

> Backport HDFS-14582 to branch-2.10 (Failed to start DN with 
> ArithmeticException when NULL checksum used)
> 
>
> Key: HDFS-15801
> URL: https://issues.apache.org/jira/browse/HDFS-15801
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.10.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In HDFS-14582, the error message is more clear as follows:
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.validateIntegrityAndSetLength(BlockPoolSlice.java:823)
> at 
> {code}
> But in branch-2.10.1, the exception message is omitted as follows:
> {code:java}
> 2021-01-29 14:20:30,694 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(204)) 
> - Caught exception while adding replicas from /mnt/disk/0/hdfs/data/current. 
> Will throw later.
> java.io.IOException: Failed to start sub tasks to add replica in replica map 
> :java.lang.ArithmeticExceptionjava.io.IOException: Failed to start sub tasks 
> to add replica in replica map :java.lang.ArithmeticException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:434)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:930)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:196)
> {code}
> The specific error message is omitted, causing it harder to find the root 
> cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15801) Backport HDFS-14582 to branch-2.10 (Failed to start DN with ArithmeticException when NULL checksum used)

2021-02-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-15801:
--

Assignee: Janus Chow

> Backport HDFS-14582 to branch-2.10 (Failed to start DN with 
> ArithmeticException when NULL checksum used)
> 
>
> Key: HDFS-15801
> URL: https://issues.apache.org/jira/browse/HDFS-15801
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In HDFS-14582, the error message is more clear as follows:
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.validateIntegrityAndSetLength(BlockPoolSlice.java:823)
> at 
> {code}
> But in branch-2.10.1, the exception message is omitted as follows:
> {code:java}
> 2021-01-29 14:20:30,694 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(204)) 
> - Caught exception while adding replicas from /mnt/disk/0/hdfs/data/current. 
> Will throw later.
> java.io.IOException: Failed to start sub tasks to add replica in replica map 
> :java.lang.ArithmeticExceptionjava.io.IOException: Failed to start sub tasks 
> to add replica in replica map :java.lang.ArithmeticException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:434)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:930)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:196)
> {code}
> The specific error message is omitted, causing it harder to find the root 
> cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15791) Possible Resource Leak in FSImageFormatProtobuf

2021-02-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15791.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Possible Resource Leak in FSImageFormatProtobuf
> ---
>
> Key: HDFS-15791
> URL: https://issues.apache.org/jira/browse/HDFS-15791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L271].
>  If an I/O error occurs at line 
> [273|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L273]
>  or 
> [277|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L277],
>  {{fin}} remains open since the exception isn't caught locally, and there is 
> no way for any caller to close the FileInputStream
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15791) Possible Resource Leak in FSImageFormatProtobuf

2021-02-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-15791:
--

Assignee: Narges Shadab

> Possible Resource Leak in FSImageFormatProtobuf
> ---
>
> Key: HDFS-15791
> URL: https://issues.apache.org/jira/browse/HDFS-15791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L271].
>  If an I/O error occurs at line 
> [273|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L273]
>  or 
> [277|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L277],
>  {{fin}} remains open since the exception isn't caught locally, and there is 
> no way for any caller to close the FileInputStream
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   8   9   10   >