[jira] [Created] (HDFS-16557) BootstrapStandby failed because of checking Gap for inprogress EditLogInputStream

2022-04-22 Thread tomscut (Jira)
tomscut created HDFS-16557:
--

 Summary: BootstrapStandby failed because of checking Gap for 
inprogress EditLogInputStream
 Key: HDFS-16557
 URL: https://issues.apache.org/jira/browse/HDFS-16557
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


The lastTxId of an inprogress EditLogInputStream lastTxId isn't necessarily 
HdfsServerConstants.INVALID_TXID. We can determine its status directly by 
EditLogInputStream#isInProgress.

For example, when bootstrapStandby, the EditLogInputStream of inProgress is 
misjudged, resulting in a gap check failure, which causes bootstrapStandby to 
fail.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16552) Fix NPE for BlockManager

2022-04-21 Thread tomscut (Jira)
tomscut created HDFS-16552:
--

 Summary: Fix NPE for BlockManager
 Key: HDFS-16552
 URL: https://issues.apache.org/jira/browse/HDFS-16552
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


There is a NPE in BlockManager when run 
TestBlockManager#testSkipReconstructionWithManyBusyNodes2. Because 
NameNodeMetrics is not initialized in this unit test.

 

Related ci link, see 
[this|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt].

 
{code:java}
[ERROR] Tests run: 34, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.088 
s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager
[ERROR] 
testSkipReconstructionWithManyBusyNodes2(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager)
  Time elapsed: 2.783 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.scheduleReconstruction(BlockManager.java:2171)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSkipReconstructionWithManyBusyNodes2(TestBlockManager.java:947)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash

2022-04-20 Thread tomscut (Jira)
tomscut created HDFS-16550:
--

 Summary: [SBN read] Improper cache-size for journal node may cause 
cluster crash
 Key: HDFS-16550
 URL: https://issues.apache.org/jira/browse/HDFS-16550
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2022-04-21-09-54-29-751.png, 
image-2022-04-21-09-54-57-111.png

When we introduced SBN Read, we encountered a situation when upgrading the 
JournalNodes.

Cluster Info: 
*Active: nn0*
*Standby: nn1*

1. Rolling restart journal node. {color:#FF}(related config: 
fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}

2. The cluster runs for a while.

3. {color:#FF}Active namenode(nn0){color} shutdown because of Timed out 
waiting 12ms for a quorum of nodes to respond.

4. Transfer nn1 to Active state.

5. {color:#FF}New Active namenode(nn1){color} also shutdown because of 
Timed out waiting 12ms for a quorum of nodes to respond.

6. {color:#FF}The cluster crashed{color}.

 

Related code:
{code:java}
JournaledEditsCache(Configuration conf) {
  capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
  DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
  if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
"maximum JVM memory is only %d bytes. It is recommended that you " +
"decrease the cache size or increase the heap size.",
capacity, Runtime.getRuntime().maxMemory()));
  }
  Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
  "of bytes: " + capacity);
  ReadWriteLock lock = new ReentrantReadWriteLock(true);
  readLock = new AutoCloseableLock(lock.readLock());
  writeLock = new AutoCloseableLock(lock.writeLock());
  initialize(INVALID_TXN_ID);
} {code}
Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size 
than the memory requested by the process. If 
{*}fs.journalNode.edit-cache-sie.bytes > 0.9 * 
Runtime.getruntime().maxMemory(){*}, only warn logs are printed during 
journalnode startup. This can easily be overlooked by users. However, as the 
cluster runs to a certain period of time, it is likely to cause the cluster to 
crash.

!image-2022-04-21-09-54-57-111.png|width=1227,height=57!

IMO, when {*}fs.journalNode.edit-cache-size-bytes > threshold * 
Runtime.getruntime ().maxMemory(){*}, we should throw an Exception and 
{color:#FF}fast fail{color}. Giving a clear hint for users to update 
related configurations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2

2022-04-20 Thread tomscut (Jira)
tomscut created HDFS-16548:
--

 Summary: Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2
 Key: HDFS-16548
 URL: https://issues.apache.org/jira/browse/HDFS-16548
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut


 
{code:java}
[ERROR] Tests run: 44, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 
143.701 s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
[ERROR] 
testRenameMoreThanOnceAcrossSnapDirs_2(org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots)
  Time elapsed: 6.606 s  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<1>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots.testRenameMoreThanOnceAcrossSnapDirs_2(TestRenameWithSnapshots.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state

2022-04-19 Thread tomscut (Jira)
tomscut created HDFS-16547:
--

 Summary: [SBN read] Namenode in safe mode should not be transfer 
to observer state
 Key: HDFS-16547
 URL: https://issues.apache.org/jira/browse/HDFS-16547
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Currently, when a Namenode is in safemode(under starting or enter safemode 
manually), we can transfer this Namenode to Observer by command. This Observer 
node may receive many requests and then throw a SafemodeException, this causes 
unnecessary failover on the client.

So Namenode in safe mode should not be transfer to observer state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16527) Add global timeout rule for TestRouterDistCpProcedure

2022-03-31 Thread tomscut (Jira)
tomscut created HDFS-16527:
--

 Summary: Add global timeout rule for TestRouterDistCpProcedure
 Key: HDFS-16527
 URL: https://issues.apache.org/jira/browse/HDFS-16527
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


As [Ayush Saxena|https://github.com/ayushtkn] mentioned 
[here|[https://github.com/apache/hadoop/pull/4009#pullrequestreview-925554297].]
 TestRouterDistCpProcedure failed many times because of timeout. I will add a 
global timeout rule for it. This makes it easy to set the timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16513) [SBN read] Observer Namenode does not trigger the edits rolling of active Namenode

2022-03-20 Thread tomscut (Jira)
tomscut created HDFS-16513:
--

 Summary: [SBN read] Observer Namenode does not trigger the edits 
rolling of active Namenode
 Key: HDFS-16513
 URL: https://issues.apache.org/jira/browse/HDFS-16513
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


To avoid frequent edtis rolling, we should disable OBN from triggering the 
edits rolling of active Namenode. 

It is sufficient to retain only the triggering of SNN and the auto rolling of 
ANN. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16507) Purged edit logs which is in process

2022-03-15 Thread tomscut (Jira)
tomscut created HDFS-16507:
--

 Summary: Purged edit logs which is in process
 Key: HDFS-16507
 URL: https://issues.apache.org/jira/browse/HDFS-16507
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: {code:java}
// code placeholder
{code}
Reporter: tomscut


We introduced Standby read functionality in branch-3.1.0, but found a FATAL 
exception. It looks like it's purging edit logs which is in process.

According to the analysis, I suspect that the Editlog to be purged does not 
finalize normally .

I post some key logs for your reference:

1. ANN. Create editlog, 
{color:#FF}edits_InProgresS_00024207987{color}.

 
{code:java}
2022-03-15 17:24:52,558 INFO  namenode.FSEditLog 
(FSEditLog.java:startLogSegment(1394)) - Starting log segment at 24207987
2022-03-15 17:24:52,609 INFO  namenode.FSEditLog 
(FSEditLog.java:startLogSegment(1423)) - Ending log segment at 24207987
2022-03-15 17:24:52,610 INFO  namenode.FSEditLog 
(FSEditLog.java:startLogSegmentAndWriteHeaderTxn(1432)) - logEdit at 24207987
2022-03-15 17:24:52,624 INFO  namenode.FSEditLog 
(FSEditLog.java:startLogSegmentAndWriteHeaderTxn(1434)) - logSync at 24207987 
{code}
2. SNN. Checkpoint.

 

{color:#FF}25892513 + 1 - 100 = 24892514{color}
{color:#FF}dfs.namenode.num.extra.edits.retained=100{color}

 
{code:java}
2022-03-15 17:28:02,640 INFO  ha.StandbyCheckpointer 
(StandbyCheckpointer.java:doWork(443)) - Triggering checkpoint because there 
have been 1189661 txns since the last checkpoint, which exceeds the configured 
threshold 2
2022-03-15 17:28:02,648 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(188)) - Edits file 
ByteStringEditLog[27082175, 27082606], ByteStringEditLog[27082175, 27082606], 
ByteStringEditLog[27082175, 27082606] of size 60008 edits # 432 loaded in 0 
seconds
2022-03-15 17:28:02,649 INFO  namenode.FSImage 
(FSImage.java:saveNamespace(1121)) - Save namespace ...
2022-03-15 17:28:02,650 INFO  namenode.FSImageFormatProtobuf 
(FSImageFormatProtobuf.java:save(718)) - Saving image file 
/data/hadoop/hdfs/namenode/current/fsimage.ckpt_00027082606 using no 
compression
2022-03-15 17:28:03,180 INFO  namenode.FSImageFormatProtobuf 
(FSImageFormatProtobuf.java:save(722)) - Image file 
/data/hadoop/hdfs/namenode/current/fsimage.ckpt_00027082606 of size 
17885002 bytes saved in 0 seconds .
2022-03-15 17:28:03,183 INFO  namenode.NNStorageRetentionManager 
(NNStorageRetentionManager.java:getImageTxIdToRetain(211)) - Going to retain 2 
images with txid >= 25892513
2022-03-15 17:28:03,183 INFO  namenode.NNStorageRetentionManager 
(NNStorageRetentionManager.java:purgeImage(233)) - Purging old image 
FSImageFile(file=/data/hadoop/hdfs/namenode/current/fsimage_00024794305,
 cpktTxId=00024794305)
2022-03-15 17:28:03,188 INFO  namenode.NNStorageRetentionManager 
(NNStorageRetentionManager.java:purgeOldStorage(169)) - purgeLogsFrom: 24892514
2022-03-15 17:28:03,282 INFO  namenode.TransferFsImage 
(TransferFsImage.java:copyFileToStream(396)) - Sending fileName: 
/data/hadoop/hdfs/namenode/current/fsimage_00027082606, fileSize: 
17885002. Sent total: 17885002 bytes. Size of last segment intended to send: -1 
bytes.
2022-03-15 17:28:03,536 INFO  namenode.TransferFsImage 
(TransferFsImage.java:uploadImageFromStorage(240)) - Uploaded image with txid 
27082606 to namenode at http://sg-test-ambari-nn1.bigdata.bigo.inner:50070 in 
0.343 seconds
2022-03-15 17:28:03,640 INFO  namenode.TransferFsImage 
(TransferFsImage.java:copyFileToStream(396)) - Sending fileName: 
/data/hadoop/hdfs/namenode/current/fsimage_00027082606, fileSize: 
17885002. Sent total: 17885002 bytes. Size of last segment intended to send: -1 
bytes.
2022-03-15 17:28:03,684 INFO  namenode.TransferFsImage 
(TransferFsImage.java:uploadImageFromStorage(240)) - Uploaded image with txid 
27082606 to namenode at http://sg-test-ambari-dn1.bigdata.bigo.inner:50070 in 
0.148 seconds
2022-03-15 17:28:03,748 INFO  namenode.TransferFsImage 
(TransferFsImage.java:copyFileToStream(396)) - Sending fileName: 
/data/hadoop/hdfs/namenode/current/fsimage_00027082606, fileSize: 
17885002. Sent total: 17885002 bytes. Size of last segment intended to send: -1 
bytes.
2022-03-15 17:28:03,798 INFO  namenode.TransferFsImage 
(TransferFsImage.java:uploadImageFromStorage(240)) - Uploaded image with txid 
27082606 to namenode at http://sg-test-ambari-dn2.bigdata.bigo.inner:50070 in 
0.113 seconds
2022-03-15 17:28:03,798 INFO  ha.StandbyCheckpointer 
(StandbyCheckpointer.java:doWork(482)) - Checkpoint finished successfully.
 {code}
3. ANN. Purge edit logs.

 

{color:#FF}25892513 + 1 - 100 = 24892514{color}
{color:#FF}dfs.namenode.num.extra.edits.retained=100{color}
{code:java}
2022-03-15 17:28:03,515 INFO  namenode.NNStorageRetentionManager 

[jira] [Created] (HDFS-16506) Unit tests failed because of OutOfMemoryError

2022-03-14 Thread tomscut (Jira)
tomscut created HDFS-16506:
--

 Summary: Unit tests failed because of OutOfMemoryError
 Key: HDFS-16506
 URL: https://issues.apache.org/jira/browse/HDFS-16506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut


Unit tests failed because of OutOfMemoryError.

An example: 
[[OutOfMemoryError|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4009/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt].|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4009/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
{code:java}
[ERROR] Tests run: 32, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 95.727 
s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped
[ERROR] testGetBlockInfo[4: ErasureCodingPolicy=[Name=RS-10-4-1024k, 
Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], 
CellSize=1048576, 
Id=5]](org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped)  
Time elapsed: 15.831 s  <<< ERROR!
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at 
io.netty.util.concurrent.ThreadPerTaskExecutor.execute(ThreadPerTaskExecutor.java:32)
at 
io.netty.util.internal.ThreadExecutorMap$1.execute(ThreadExecutorMap.java:57)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.doStartThread(SingleThreadEventExecutor.java:975)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.ensureThreadStarted(SingleThreadEventExecutor.java:958)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:660)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:163)
at 
io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:70)
at 
org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:346)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2348)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2166)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2156)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2135)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2109)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2102)
at org.apache.hadoop.hdfs.MiniDFSCluster.close(MiniDFSCluster.java:3479)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped.testGetBlockInfo(TestBlockInfoStriped.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16505) Setting safemode should not be interrupted by abnormal nodes

2022-03-14 Thread tomscut (Jira)
tomscut created HDFS-16505:
--

 Summary: Setting safemode should not be interrupted by abnormal 
nodes
 Key: HDFS-16505
 URL: https://issues.apache.org/jira/browse/HDFS-16505
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2022-03-15-09-29-36-538.png, 
image-2022-03-15-09-29-44-430.png

Setting safemode should not be interrupted by abnormal nodes. 

For example, we have four namenodes configured in the following order:
NS1 -> active
NS2 -> standby
NS3 -> observer
NS4 -> observer.

When the {color:#FF}NS1 {color}process exits, setting the states of 
safemode, {color:#FF}NS2{color}, {color:#FF}NS3{color}, and 
{color:#FF}NS4 {color}fails. Similarly, when the {color:#FF}NS2{color} 
process exits, only the safemode state of {color:#FF}NS1{color} can be set 
successfully.

 

When the {color:#FF}NS1{color} process exits:

Before the change:

!image-2022-03-15-09-29-36-538.png|width=1145,height=97!

After the change:

!image-2022-03-15-09-29-44-430.png|width=1104,height=119!

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16503) Should verify whether the path name is valid in the WebHDFS

2022-03-13 Thread tomscut (Jira)
tomscut created HDFS-16503:
--

 Summary: Should verify whether the path name is valid in the 
WebHDFS
 Key: HDFS-16503
 URL: https://issues.apache.org/jira/browse/HDFS-16503
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2022-03-14-09-35-49-860.png

When creating a file using WebHDFS, there are two main steps:
1. Obtain the location of the Datanode to be written.
2. Put the file to this location.

Currently *NameNodeRpcServer* verifies that pathName is valid, but 
*NamenodeWebHdfsMethods* and *RouterWebHdfsMethods* do not.

So if we use an invalid path, the first step returns success, but the second 
step throws an {*}InvalidPathException{*}. We should also do the validation in 
WebHdfs, which is consistent with the NameNodeRpcServer.

!image-2022-03-14-09-35-49-860.png|width=548,height=164!

The same webHDFS operations are: CREATE, APPEND, OPEN, GETFILECHECKSUM. So we 
can add DFSUtil.isValidName to redirectURI for *NamenodeWebHdfsMethods* and 
*RouterWebHdfsMethods.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16499) [SPS]: Should not start indefinitely while another SPS process is running

2022-03-09 Thread tomscut (Jira)
tomscut created HDFS-16499:
--

 Summary: [SPS]: Should not start indefinitely while another SPS 
process is running
 Key: HDFS-16499
 URL: https://issues.apache.org/jira/browse/HDFS-16499
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: tomscut
Assignee: tomscut


Normally, we can only start one SPS process at a time. When one process is 
running, start another process and retry indefinitely. I think, in this case, 
we should exit immediately.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16498) Fix NPE for checkBlockReportLease

2022-03-09 Thread tomscut (Jira)
tomscut created HDFS-16498:
--

 Summary: Fix NPE for checkBlockReportLease
 Key: HDFS-16498
 URL: https://issues.apache.org/jira/browse/HDFS-16498
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


During the restart of Namenode, a Datanode is not registered, but this Datanode 
triggers FBR, which causes NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16488) [SPS]: Expose metrics to JMX for external SPS

2022-02-26 Thread tomscut (Jira)
tomscut created HDFS-16488:
--

 Summary: [SPS]: Expose metrics to JMX for external SPS
 Key: HDFS-16488
 URL: https://issues.apache.org/jira/browse/HDFS-16488
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: tomscut
Assignee: tomscut


Currently, external SPS has no monitoring metrics. We do not know how many 
blocks are waiting to be processed, how many blocks are waiting to be retried, 
and how many blocks have been migrated.

We can expose these metrics in JMX for easy collection and display by 
monitoring systems.

For example, in our cluster, we exposed these metrics to JMX, collected by 
JMX-Exporter and combined with Prometheus, and finally display by Grafana.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS

2022-02-22 Thread tomscut (Jira)
tomscut created HDFS-16477:
--

 Summary: [SPS]: Add metric PendingSPSPaths for getting the number 
of paths to be processed by SPS
 Key: HDFS-16477
 URL: https://issues.apache.org/jira/browse/HDFS-16477
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: tomscut
Assignee: tomscut


Currently we have no idea how many paths are waiting to be processed when using 
the SPS feature. We should add metric PendingSPSPaths for getting the number of 
paths to be processed by SPS in NameNode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16460) [SPS]: Handle failure retries for moving tasks

2022-02-19 Thread tomscut (Jira)
tomscut created HDFS-16460:
--

 Summary: [SPS]: Handle failure retries for moving tasks
 Key: HDFS-16460
 URL: https://issues.apache.org/jira/browse/HDFS-16460
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: tomscut
Assignee: tomscut


Handle failure retries for moving tasks. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16458) [SPS]: Fix bug for unit test of reconfiguring SPS mode

2022-02-18 Thread tomscut (Jira)
tomscut created HDFS-16458:
--

 Summary: [SPS]: Fix bug for unit test of reconfiguring SPS mode
 Key: HDFS-16458
 URL: https://issues.apache.org/jira/browse/HDFS-16458
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: tomscut
Assignee: tomscut


TestNameNodeReconfigure#verifySPSEnabled was compared with itself(isSPSRunning) 
at assertEquals.

In addition, after an *internal SPS* has been removed, *spsService daemon* will 
not start within StoragePolicySatisfyManager. I think the relevant code can be 
removed to simplify the code.

IMO, after reconfig SPS mode, we just need to confirm whether the mode is 
correct and whether spsManager is NULL.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16446) Consider ioutils of disk when choosing volume

2022-02-04 Thread tomscut (Jira)
tomscut created HDFS-16446:
--

 Summary: Consider ioutils of disk when choosing volume
 Key: HDFS-16446
 URL: https://issues.apache.org/jira/browse/HDFS-16446
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2022-02-05-09-50-12-241.png

Consider ioutils of disk when choosing volume.

Principle is as follows:

!image-2022-02-05-09-50-12-241.png|width=309,height=159!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16444) Show start time of JournalNode on Web

2022-01-28 Thread tomscut (Jira)
tomscut created HDFS-16444:
--

 Summary: Show start time of JournalNode on Web
 Key: HDFS-16444
 URL: https://issues.apache.org/jira/browse/HDFS-16444
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2022-01-29-08-09-42-544.png, 
image-2022-01-29-08-09-53-734.png

Show start time of JournalNode on Web.

Before:

!image-2022-01-29-08-09-42-544.png|width=379,height=98!

After:

!image-2022-01-29-08-09-53-734.png|width=378,height=118!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16438) Avoid holding read locks for a long time when scanDatanodeStorage

2022-01-25 Thread tomscut (Jira)
tomscut created HDFS-16438:
--

 Summary: Avoid holding read locks for a long time when 
scanDatanodeStorage
 Key: HDFS-16438
 URL: https://issues.apache.org/jira/browse/HDFS-16438
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2022-01-25-23-18-30-275.png

At the time of decommission, if use {*}DatanodeAdminBackoffMonitor{*}, there 
will be a heavy operation: {*}scanDatanodeStorage{*}. If the number of blocks 
on a storage is large(more than 5 million), and GC performance is also poor, it 
may hold *read lock* for a long time, we should optimize it.

 

!image-2022-01-25-23-18-30-275.png|width=764,height=193!

 
{code:java}
2021-12-22 07:49:01,279 INFO  namenode.FSNamesystem 
(FSNamesystemLock.java:readUnlock(220)) - FSNamesystem scanDatanodeStorage read 
lock held for 5491 ms via
java.lang.Thread.getStackTrace(Thread.java:1552)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:222)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1641)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.scanDatanodeStorage(DatanodeAdminBackoffMonitor.java:646)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.checkForCompletedNodes(DatanodeAdminBackoffMonitor.java:417)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.check(DatanodeAdminBackoffMonitor.java:300)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.run(DatanodeAdminBackoffMonitor.java:201)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
    Number of suppressed read-lock reports: 0
    Longest read-lock held interval: 5491 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16435) Remove no need TODO comment for ObserverReadProxyProvider

2022-01-23 Thread tomscut (Jira)
tomscut created HDFS-16435:
--

 Summary: Remove no need TODO comment for ObserverReadProxyProvider
 Key: HDFS-16435
 URL: https://issues.apache.org/jira/browse/HDFS-16435
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Based on discussion in 
[HDFS-13923|https://issues.apache.org/jira/browse/HDFS-13923], we don't think 
need to Add a configuration to turn on/off observer reads.

So I suggest removing the `TODO comment` that are not needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16434) Add operation name to read/write lock for remaining operations

2022-01-22 Thread tomscut (Jira)
tomscut created HDFS-16434:
--

 Summary: Add operation name to read/write lock for remaining 
operations
 Key: HDFS-16434
 URL: https://issues.apache.org/jira/browse/HDFS-16434
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


In this issue at [HDFS-10872|https://issues.apache.org/jira/browse/HDFS-10872], 
we add opname to read and write locks. However, there are still many operations 
that have not been completed. When analyzing some operations that hold locks 
for a long time, we can only find specific methods through stack. I suggest 
that these remaining operations be completed to facilitate later performance 
optimization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16427) Add debug log for BlockManager#chooseExcessRedundancyStriped

2022-01-13 Thread tomscut (Jira)
tomscut created HDFS-16427:
--

 Summary: Add debug log for 
BlockManager#chooseExcessRedundancyStriped
 Key: HDFS-16427
 URL: https://issues.apache.org/jira/browse/HDFS-16427
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


To solve this 
issue[HDFS-16420|https://issues.apache.org/jira/browse/HDFS-16420] , we added 
some debug logs, which were also necessary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16413) Reconfig dfs usage parameters for datanode

2022-01-05 Thread tomscut (Jira)
tomscut created HDFS-16413:
--

 Summary: Reconfig dfs usage parameters for datanode
 Key: HDFS-16413
 URL: https://issues.apache.org/jira/browse/HDFS-16413
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut


Reconfig dfs usage parameters for datanode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16404) Fix typo for CachingGetSpaceUsed

2021-12-31 Thread tomscut (Jira)
tomscut created HDFS-16404:
--

 Summary: Fix typo for CachingGetSpaceUsed
 Key: HDFS-16404
 URL: https://issues.apache.org/jira/browse/HDFS-16404
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


Fix typo for CachingGetSpaceUsed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16402) HeartbeatManager may cause incorrect stats

2021-12-28 Thread tomscut (Jira)
tomscut created HDFS-16402:
--

 Summary: HeartbeatManager may cause incorrect stats
 Key: HDFS-16402
 URL: https://issues.apache.org/jira/browse/HDFS-16402
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


After reconfig {*}dfs.datanode.data.dir{*}, we found that the stats of the 
Namenode Web became negative and there were many NPE in namenode logs. This 
problem has been solved by 
[HDFS-14042|https://issues.apache.org/jira/browse/HDFS-14042].

However, if HeartbeatManager#updateHeartbeat and 
HeartbeatManager#updateLifeline throw other exceptions, stats errors can also 
occur. We should ensure that stats.subtract() and stats.add() are transactional.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16400) Reconfig DataXceiver parameters for datanode

2021-12-27 Thread tomscut (Jira)
tomscut created HDFS-16400:
--

 Summary: Reconfig DataXceiver parameters for datanode
 Key: HDFS-16400
 URL: https://issues.apache.org/jira/browse/HDFS-16400
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16399) Reconfig cache report parameters for datanode

2021-12-27 Thread tomscut (Jira)
tomscut created HDFS-16399:
--

 Summary: Reconfig cache report parameters for datanode
 Key: HDFS-16399
 URL: https://issues.apache.org/jira/browse/HDFS-16399
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16398) Reconfig block report parameters for datanode

2021-12-27 Thread tomscut (Jira)
tomscut created HDFS-16398:
--

 Summary: Reconfig block report parameters for datanode
 Key: HDFS-16398
 URL: https://issues.apache.org/jira/browse/HDFS-16398
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16397) Reconfig slow disk parameters for datanode

2021-12-25 Thread tomscut (Jira)
tomscut created HDFS-16397:
--

 Summary: Reconfig slow disk parameters for datanode
 Key: HDFS-16397
 URL: https://issues.apache.org/jira/browse/HDFS-16397
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut


In large clusters, rolling restart datanodes takes long time. We can make slow 
peers parameters and slow disks parameters in datanode reconfigurable to 
facilitate cluster operation and maintenance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16396) Reconfig slow peer parameters for datanode

2021-12-25 Thread tomscut (Jira)
tomscut created HDFS-16396:
--

 Summary: Reconfig slow peer parameters for datanode
 Key: HDFS-16396
 URL: https://issues.apache.org/jira/browse/HDFS-16396
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut


In large clusters, rolling restart datanodes takes a long time. We can make 
slow peers parameters and slow disks parameters in datanode reconfigurable to 
facilitate cluster operation and maintenance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16379) Reset fullBlockReportLeaseId after any exceptions

2021-12-10 Thread tomscut (Jira)
tomscut created HDFS-16379:
--

 Summary: Reset fullBlockReportLeaseId after any exceptions
 Key: HDFS-16379
 URL: https://issues.apache.org/jira/browse/HDFS-16379
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


Recently we encountered FBR-related problems in the production environment, 
which were solved by introducing HDFS-12914 and HDFS-14314.

But there may be situations like this:
1 DN got *fullBlockReportLeaseId* via heartbeat.

2 DN trigger a blockReport, but some exception occurs (this may be rare, but it 
may exist), and then DN does multiple retries {*}without resetting leaseID{*}. 
Because leaseID is reset only if it succeeds currently.

3 After a while, the exception is cleared, but the LeaseID has expired. *Since 
NN did not throw an exception after the lease expired, the DN considered that 
the blockReport was successful.* So the blockReport was not actually executed 
this time and needs to wait until the next time.


Therefore, {*}should we consider resetting the fullBlockReportLeaseId in the 
finally block{*}? The advantage of this is that lease expiration can be 
avoided. The downside is that each heartbeat will apply for a new 
fullBlockReportLeaseId during the exception, but I think this cost is 
negligible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16378) Add datanode address to BlockReportLeaseManager logs

2021-12-10 Thread tomscut (Jira)
tomscut created HDFS-16378:
--

 Summary: Add datanode address to BlockReportLeaseManager logs
 Key: HDFS-16378
 URL: https://issues.apache.org/jira/browse/HDFS-16378
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-12-11-09-58-59-494.png

We should add datanode address to BlockReportLeaseManager logs. Because the 
datanodeuuid is not convenient for tracking.

!image-2021-12-11-09-58-59-494.png|width=643,height=152!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16377) Should CheckNotNull before access FsDatasetSpi

2021-12-10 Thread tomscut (Jira)
tomscut created HDFS-16377:
--

 Summary: Should CheckNotNull before access FsDatasetSpi
 Key: HDFS-16377
 URL: https://issues.apache.org/jira/browse/HDFS-16377
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-12-10-19-19-22-957.png, 
image-2021-12-10-19-20-58-022.png

When starting the DN, we found NPE in the staring DN's log, as follows:

!image-2021-12-10-19-19-22-957.png|width=909,height=126!

The logs of the upstream DN are as follows:

!image-2021-12-10-19-20-58-022.png|width=905,height=239!

This is mainly because *FsDatasetSpi* has not been initialized at the time of 
access. 

I noticed that checkNotNull is already done in these two 
method({*}DataNode#getBlockLocalPathInfo{*} and {*}DataNode#getVolumeInfo{*}). 
So we should add it to other places(interfaces that clients and other DN can 
access directly) so that we can add a message when throwing exceptions.

Therefore, the client and the upstream DN know that FsDatasetSpi has not been 
initialized, rather than blindly unaware of the specific cause of the NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16376) Expose metrics of NodeNotChosenReason to JMX

2021-12-09 Thread tomscut (Jira)
tomscut created HDFS-16376:
--

 Summary: Expose metrics of NodeNotChosenReason to JMX
 Key: HDFS-16376
 URL: https://issues.apache.org/jira/browse/HDFS-16376
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-12-09-23-48-42-865.png

In our cluster, we can see logs for nodes that are not chosen. But it's hard to 
see the percentages in each reason from the logs. It is best to add relevant 
metrics to monitor the entire cluster.

!image-2021-12-09-23-48-42-865.png|width=517,height=187!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16375) The FBR lease ID should be exposed to the log

2021-12-08 Thread tomscut (Jira)
tomscut created HDFS-16375:
--

 Summary: The FBR lease ID should be exposed to the log
 Key: HDFS-16375
 URL: https://issues.apache.org/jira/browse/HDFS-16375
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Our Hadoop version is 3.1.0. We encountered HDFS-12914 and HDFS-14314 in the 
production environment.

When locating the problem, the *fullBrLeaseId* was not exposed in the log, 
which caused some difficulties. We should expose it to the log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16371) Exclude slow disks when choosing volume

2021-12-03 Thread tomscut (Jira)
tomscut created HDFS-16371:
--

 Summary: Exclude slow disks when choosing volume
 Key: HDFS-16371
 URL: https://issues.apache.org/jira/browse/HDFS-16371
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Currently, the datanode can detect slow disks. When choosing volume, we can 
exclude these slow disks according to some rules. This will prevents some slow 
disks from affecting the throughput of the whole datanode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16370) Fix assert message for BlockInfo

2021-12-03 Thread tomscut (Jira)
tomscut created HDFS-16370:
--

 Summary: Fix assert message for BlockInfo
 Key: HDFS-16370
 URL: https://issues.apache.org/jira/browse/HDFS-16370
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


In both methods BlockInfo#getPrevious and BlockInfo#getNext, the assert message 
is wrong. This may cause some misunderstanding and needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16361) Fix log format for QueryCommand

2021-11-28 Thread tomscut (Jira)
tomscut created HDFS-16361:
--

 Summary: Fix log format for QueryCommand
 Key: HDFS-16361
 URL: https://issues.apache.org/jira/browse/HDFS-16361
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


Fix log format for QueryCommand.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16359) RBF: RouterRpcServer#invokeAtAvailableNs does not take effect when retrying

2021-11-27 Thread tomscut (Jira)
tomscut created HDFS-16359:
--

 Summary: RBF: RouterRpcServer#invokeAtAvailableNs does not take 
effect when retrying
 Key: HDFS-16359
 URL: https://issues.apache.org/jira/browse/HDFS-16359
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


RouterRpcServer#invokeAtAvailableNs does not take effect when retrying.

The original code of RouterRpcServer#getNameSpaceInfo looks like this:
{code:java}
private Set getNameSpaceInfo(String nsId) {
  Set namespaceInfos = new HashSet<>();
  for (FederationNamespaceInfo ns : namespaceInfos) {
    if (!nsId.equals(ns.getNameserviceId())) {
      namespaceInfos.add(ns);
    }
  }
  return namespaceInfos;
}  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16344) Improve DirectoryScanner.Stats#toString

2021-11-21 Thread tomscut (Jira)
tomscut created HDFS-16344:
--

 Summary: Improve DirectoryScanner.Stats#toString
 Key: HDFS-16344
 URL: https://issues.apache.org/jira/browse/HDFS-16344
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Improve DirectoryScanner.Stats#toString.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16339) Show the threshold when mover threads quota is exceeded

2021-11-19 Thread tomscut (Jira)
tomscut created HDFS-16339:
--

 Summary: Show the threshold when mover threads quota is exceeded
 Key: HDFS-16339
 URL: https://issues.apache.org/jira/browse/HDFS-16339
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-11-20-12-32-55-167.png

Show the threshold when mover threads quota is exceeded in 
DataXceiver#replaceBlock and DataXceiver#copyBlock.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16337) Show start time of Datanode on Web

2021-11-18 Thread tomscut (Jira)
tomscut created HDFS-16337:
--

 Summary: Show start time of Datanode on Web
 Key: HDFS-16337
 URL: https://issues.apache.org/jira/browse/HDFS-16337
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-11-19-08-55-58-343.png

Show _start time_ of Datanode on Web.

!image-2021-11-19-08-55-58-343.png|width=540,height=155!

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16335) Fix HDFSCommands.md

2021-11-18 Thread tomscut (Jira)
tomscut created HDFS-16335:
--

 Summary: Fix HDFSCommands.md
 Key: HDFS-16335
 URL: https://issues.apache.org/jira/browse/HDFS-16335
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Fix HDFSCommands.md.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable

2021-11-17 Thread tomscut (Jira)
tomscut created HDFS-16331:
--

 Summary: Make dfs.blockreport.intervalMsec reconfigurable
 Key: HDFS-16331
 URL: https://issues.apache.org/jira/browse/HDFS-16331
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-11-18-09-33-24-236.png, 
image-2021-11-18-09-35-35-400.png

We have a cold data cluster, which stores as EC policy. There are 24 fast disks 
on each node and each disk is 7 TB. 

Recently, many nodes have more than 10 million blocks, and the interval of FBR 
is 6h as default. Frequent FBR caused great pressure on NN.

!image-2021-11-18-09-35-35-400.png|width=491,height=337!

!image-2021-11-18-09-33-24-236.png|width=912,height=256!

We want to increase the interval of FBR, but have to rolling restart the DNs, 
this operation is very heavy. In this scenario, it is necessary to make 
_dfs.blockreport.intervalMsec_ reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16329) Fix log format for BlockManager

2021-11-17 Thread tomscut (Jira)
tomscut created HDFS-16329:
--

 Summary: Fix log format for BlockManager
 Key: HDFS-16329
 URL: https://issues.apache.org/jira/browse/HDFS-16329
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


Fix log format for BlockManager.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16327) Change dfs.namenode.max.slowpeer.collect.nodes to a proportional value

2021-11-15 Thread tomscut (Jira)
tomscut created HDFS-16327:
--

 Summary: Change dfs.namenode.max.slowpeer.collect.nodes to a 
proportional value
 Key: HDFS-16327
 URL: https://issues.apache.org/jira/browse/HDFS-16327
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Currently, dfs.namenode.max.slowpeer.collect.nodes is a fixed value, but often 
needs to be changed as the cluster size changes. We can change it to a scale 
value and make it reconfigurable. See 
[HDFS-15879|https://issues.apache.org/jira/browse/HDFS-15879].

And dfs.datanode.max.disks.to.report can be changed similarly. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16326) Simplify the code for DiskBalancer

2021-11-15 Thread tomscut (Jira)
tomscut created HDFS-16326:
--

 Summary: Simplify the code for DiskBalancer
 Key: HDFS-16326
 URL: https://issues.apache.org/jira/browse/HDFS-16326
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Simplify the code for DiskBalancer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount

2021-11-12 Thread tomscut (Jira)
tomscut created HDFS-16319:
--

 Summary: Add metrics doc for ReadLockLongHoldCount and 
WriteLockLongHoldCount
 Key: HDFS-16319
 URL: https://issues.apache.org/jira/browse/HDFS-16319
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See 
[HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16315) Add metrics related to Transfer and NativeCopy to DataNode

2021-11-10 Thread tomscut (Jira)
tomscut created HDFS-16315:
--

 Summary: Add metrics related to Transfer and NativeCopy to DataNode
 Key: HDFS-16315
 URL: https://issues.apache.org/jira/browse/HDFS-16315
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-11-11-08-26-33-074.png

Datanodes already have Read, Write, Sync and Flush metrics. We should add 
NativeCopy and Transfer as well.

Here is a partial look after the change:

!image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16312) Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents

2021-11-09 Thread tomscut (Jira)
tomscut created HDFS-16312:
--

 Summary: Fix typo for DataNodeVolumeMetrics and 
ProfilingFileIoEvents
 Key: HDFS-16312
 URL: https://issues.apache.org/jira/browse/HDFS-16312
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16311) Metric metadataOperationRate calculation error in DataNodeVolumeMetrics

2021-11-09 Thread tomscut (Jira)
tomscut created HDFS-16311:
--

 Summary: Metric metadataOperationRate calculation error in 
DataNodeVolumeMetrics
 Key: HDFS-16311
 URL: https://issues.apache.org/jira/browse/HDFS-16311
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-11-09-20-22-26-828.png

Metric metadataOperationRate calculation error in 
DataNodeVolumeMetrics#addFileIoError, causing MetadataOperationRateAvgTime is 
very large in some cases.

!image-2021-11-09-20-22-26-828.png|width=450,height=205!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16310) RBF: Add client port to CallerContext for Router

2021-11-09 Thread tomscut (Jira)
tomscut created HDFS-16310:
--

 Summary: RBF: Add client port to CallerContext for Router
 Key: HDFS-16310
 URL: https://issues.apache.org/jira/browse/HDFS-16310
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


We mentioned in [HDFS-16266|https://issues.apache.org/jira/browse/HDFS-16266] 
that adding the client port to the CallerContext of the Router.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16299) Fix bug for TestDataNodeVolumeMetrics#verifyDataNodeVolumeMetrics

2021-11-04 Thread tomscut (Jira)
tomscut created HDFS-16299:
--

 Summary: Fix bug for 
TestDataNodeVolumeMetrics#verifyDataNodeVolumeMetrics
 Key: HDFS-16299
 URL: https://issues.apache.org/jira/browse/HDFS-16299
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Fix bug for TestDataNodeVolumeMetrics#verifyDataNodeVolumeMetrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16298) Improve error msg for BlockMissingException

2021-11-04 Thread tomscut (Jira)
tomscut created HDFS-16298:
--

 Summary: Improve error msg for BlockMissingException
 Key: HDFS-16298
 URL: https://issues.apache.org/jira/browse/HDFS-16298
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


When the client fails to obtain a block, a BlockMissingException is thrown. To 
analyze the issues, we can add the relevant location information to error msg 
here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16281) Fix flaky unit tests failed due to timeout

2021-10-21 Thread tomscut (Jira)
tomscut created HDFS-16281:
--

 Summary: Fix flaky unit tests failed due to timeout
 Key: HDFS-16281
 URL: https://issues.apache.org/jira/browse/HDFS-16281
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


I found that this unit test *_TestViewFileSystemOverloadSchemeWithHdfsScheme_* 
failed several times due to timeout. Can we change the timeout for some methods 
from _*3s*_ to *_30s_* to be consistent with the other methods?

 

 
{code:java}
[ERROR] Tests run: 19, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 65.39 
s <<< FAILURE! - in 
org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS[ERROR]
 Tests run: 19, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 65.39 s <<< 
FAILURE! - in 
org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS[ERROR]
 
testNflyRepair(org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS)
  Time elapsed: 4.132 s  <<< 
ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 3000 
milliseconds at java.lang.Object.wait(Native Method) at 
java.lang.Object.wait(Object.java:502) at 
org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1577) at 
org.apache.hadoop.ipc.Client.call(Client.java:1535) at 
org.apache.hadoop.ipc.Client.call(Client.java:1432) at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
 at com.sun.proxy.$Proxy26.setTimes(Unknown Source) at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setTimes(ClientNamenodeProtocolTranslatorPB.java:1059)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
 at com.sun.proxy.$Proxy27.setTimes(Unknown Source) at 
org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:2658) at 
org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1978)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1975)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1988)
 at org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:542) 
at 
org.apache.hadoop.fs.viewfs.ChRootedFileSystem.setTimes(ChRootedFileSystem.java:328)
 at 
org.apache.hadoop.fs.viewfs.NflyFSystem$NflyOutputStream.commit(NflyFSystem.java:439)
 at 
org.apache.hadoop.fs.viewfs.NflyFSystem$NflyOutputStream.close(NflyFSystem.java:395)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme.writeString(TestViewFileSystemOverloadSchemeWithHdfsScheme.java:685)
 at 
org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme.testNflyRepair(TestViewFileSystemOverloadSchemeWithHdfsScheme.java:622)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)
{code}
 

 



--
This message was sent by Atlassian Jira

[jira] [Created] (HDFS-16280) Fix typo for ShortCircuitReplica#isStale

2021-10-20 Thread tomscut (Jira)
tomscut created HDFS-16280:
--

 Summary: Fix typo for ShortCircuitReplica#isStale
 Key: HDFS-16280
 URL: https://issues.apache.org/jira/browse/HDFS-16280
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Fix typo for ShortCircuitReplica#isStale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16279) Print detail datanode info when process block report

2021-10-19 Thread tomscut (Jira)
tomscut created HDFS-16279:
--

 Summary: Print detail datanode info when process block report
 Key: HDFS-16279
 URL: https://issues.apache.org/jira/browse/HDFS-16279
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-10-19-20-37-55-850.png

Print detail datanode info when process block report.

!image-2021-10-19-20-37-55-850.png|width=547,height=98!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16274) Improve log for FSNamesystem#startFileInt

2021-10-14 Thread tomscut (Jira)
tomscut created HDFS-16274:
--

 Summary: Improve log for FSNamesystem#startFileInt
 Key: HDFS-16274
 URL: https://issues.apache.org/jira/browse/HDFS-16274
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-10-14-23-52-53-100.png, 
image-2021-10-14-23-55-04-133.png

When the blocksize of a file is smaller than 
dfs.namenode.fs-limits.min-block-size, an IOE will be thrown. In current 
exception messages, it is easy to confuse the value of blocksize with the value 
of dfs.namenode.fs-limits.min-block-size. 

Before the change:

!image-2021-10-14-23-55-04-133.png|width=678,height=111!

After the change:

!image-2021-10-14-23-52-53-100.png|width=710,height=63!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16266) Add remote port information to HDFS audit log

2021-10-10 Thread tomscut (Jira)
tomscut created HDFS-16266:
--

 Summary: Add remote port information to HDFS audit log
 Key: HDFS-16266
 URL: https://issues.apache.org/jira/browse/HDFS-16266
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


In our production environment, we occasionally encounter a problem where a user 
submits an abnormal computation task, causing a sudden flood of requests, which 
causes the queueTime and processingTime of the Namenode to rise very high, 
causing a large backlog of tasks.

We usually locate and kill specific Spark, Flink, or MapReduce tasks based on 
metrics and audit logs. Currently, IP and UGI are recorded in audit logs, but 
there is no port information, so it is difficult to locate specific processes 
sometimes. Therefore, I propose that we add the port information to the audit 
log, so that we can easily track the upstream process.

Currently, some projects contain port information in audit logs, such as Hbase 
and Alluxio. I think it is also necessary to add port information for HDFS 
audit logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16232) Fix java doc for BlockReaderRemote#newBlockReader

2021-09-19 Thread tomscut (Jira)
tomscut created HDFS-16232:
--

 Summary: Fix java doc for BlockReaderRemote#newBlockReader
 Key: HDFS-16232
 URL: https://issues.apache.org/jira/browse/HDFS-16232
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Fix java doc for BlockReaderRemote#newBlockReader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16225) Fix typo for FederationTestUtils

2021-09-13 Thread tomscut (Jira)
tomscut created HDFS-16225:
--

 Summary: Fix typo for FederationTestUtils
 Key: HDFS-16225
 URL: https://issues.apache.org/jira/browse/HDFS-16225
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Fix typo for FederationTestUtils.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16209) Set dfs.namenode.caching.enabled to false as default

2021-09-02 Thread tomscut (Jira)
tomscut created HDFS-16209:
--

 Summary: Set dfs.namenode.caching.enabled to false as default
 Key: HDFS-16209
 URL: https://issues.apache.org/jira/browse/HDFS-16209
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.1.0
Reporter: tomscut
Assignee: tomscut


Namenode config:
dfs.namenode.write-lock-reporting-threshold-ms=50ms
dfs.namenode.caching.enabled=true (default)

 

In fact, the caching feature is not used in our cluster, but this switch is 
turned on by default(dfs.namenode.caching.enabled=true), incurring some 
additional write lock overhead. We count the number of write lock warnings in a 
log file, and find that the number of rescan cache warnings reaches about 32%, 
which greatly affects the performance of Namenode.

 

We should set 'dfs.namenode.caching.enabled' to false by default and turn it on 
when we wants to use it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16203) Discover datanodes with unbalanced block pool usage by the standard deviation

2021-09-01 Thread tomscut (Jira)
tomscut created HDFS-16203:
--

 Summary: Discover datanodes with unbalanced block pool usage by 
the standard deviation
 Key: HDFS-16203
 URL: https://issues.apache.org/jira/browse/HDFS-16203
 Project: Hadoop HDFS
  Issue Type: New Feature
 Environment: !image-2021-09-01-19-16-27-172.png|width=581,height=216!
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2021-09-01-19-16-27-172.png

Discover datanodes with unbalanced volume usage by the standard deviation

In some scenarios, we may cause unbalanced datanode disk usage:
1. Repair the damaged disk and make it online again.
2. Add disks to some Datanodes.
3. Some disks are damaged, resulting in slow data writing.
4. Use some custom volume choosing policies.

In the case of unbalanced disk usage, a sudden increase in datanode write 
traffic may result in busy disk I/O with low volume usage, resulting in 
decreased throughput across datanodes.

We need to find these nodes in time to do diskBalance, or other processing. 
Based on the volume usage of each datanode, we can calculate the standard 
deviation of the volume usage. The more unbalanced the volume, the higher the 
standard deviation.

We can display the result on the Web of namenode, and then sorting directly to 
find the nodes where the volumes usages are unbalanced.

{color:#172b4d}This interface is only used to obtain metrics and does not 
adversely affect namenode performance.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16158) Discover datanodes with unbalanced volume usage by the standard deviation

2021-08-31 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut resolved HDFS-16158.

Resolution: Abandoned

> Discover datanodes with unbalanced volume usage by the standard deviation 
> --
>
> Key: HDFS-16158
> URL: https://issues.apache.org/jira/browse/HDFS-16158
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-08-11-10-14-58-430.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Discover datanodes with unbalanced volume usage by the standard deviation
> In some scenarios, we may cause unbalanced datanode disk usage:
> 1. Repair the damaged disk and make it online again.
> 2. Add disks to some Datanodes.
> 3. Some disks are damaged, resulting in slow data writing.
> 4. Use some custom volume choosing policies.
> In the case of unbalanced disk usage, a sudden increase in datanode write 
> traffic may result in busy disk I/O with low volume usage, resulting in 
> decreased throughput across datanodes.
> In this case, we need to find these nodes in time to do diskBalance, or other 
> processing. Based on the volume usage of each datanode, we can calculate the 
> standard deviation of the volume usage. The more unbalanced the volume, the 
> higher the standard deviation.
> To prevent the namenode from being too busy, we can calculate the standard 
> variance on the datanode side, transmit it to the namenode through heartbeat, 
> and display the result on the Web of namenode. We can then sort directly to 
> find the nodes on the Web where the volumes usages are unbalanced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16194) Add a public method DatanodeID#getDisplayName

2021-08-28 Thread tomscut (Jira)
tomscut created HDFS-16194:
--

 Summary: Add a public method DatanodeID#getDisplayName
 Key: HDFS-16194
 URL: https://issues.apache.org/jira/browse/HDFS-16194
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Add a public method DatanodeID#getDisplayName to simplify the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16179) Update loglevel for BlockManager#chooseExcessRedundancyStriped avoid too much logs

2021-08-18 Thread tomscut (Jira)
tomscut created HDFS-16179:
--

 Summary: Update loglevel for 
BlockManager#chooseExcessRedundancyStriped avoid too much logs
 Key: HDFS-16179
 URL: https://issues.apache.org/jira/browse/HDFS-16179
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.1.0
Reporter: tomscut
Assignee: tomscut
 Attachments: log-count.jpg, logs.jpg

{code:java}
private void chooseExcessRedundancyStriped(BlockCollection bc,
final Collection nonExcess,
BlockInfo storedBlock,
DatanodeDescriptor delNodeHint) {
  ...
  // cardinality of found indicates the expected number of internal blocks
  final int numOfTarget = found.cardinality();
  final BlockStoragePolicy storagePolicy = storagePolicySuite.getPolicy(
  bc.getStoragePolicyID());
  final List excessTypes = storagePolicy.chooseExcess(
  (short) numOfTarget, DatanodeStorageInfo.toStorageTypes(nonExcess));
  if (excessTypes.isEmpty()) {
LOG.warn("excess types chosen for block {} among storages {} is empty",
storedBlock, nonExcess);
return;
  }
  ...
}
{code}
 
IMO, here is just detecting excess StorageType and setting the log level to 
debug has no effect.
 
We have a cluster that uses the EC policy to store data. The current log level 
is WARN here, and in about 50 minutes, 286,093 logs are printed, which can 
cause other important logs to drown out.
 
!logs.jpg|width=1167,height=62!
 
!log-count.jpg|width=760,height=30!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16177) Bug fix for Util#receiveFile

2021-08-18 Thread tomscut (Jira)
tomscut created HDFS-16177:
--

 Summary: Bug fix for Util#receiveFile
 Key: HDFS-16177
 URL: https://issues.apache.org/jira/browse/HDFS-16177
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 3.1.0
Reporter: tomscut
Assignee: tomscut
 Attachments: download-fsimage.jpg

The time to write file was miscalculated in Util#receiveFile.

!download-fsimage.jpg|width=578,height=134!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16160) Improve the parameter annotation in DatanodeProtocol#sendHeartbeat

2021-08-10 Thread tomscut (Jira)
tomscut created HDFS-16160:
--

 Summary: Improve the parameter annotation in 
DatanodeProtocol#sendHeartbeat
 Key: HDFS-16160
 URL: https://issues.apache.org/jira/browse/HDFS-16160
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Improve the parameter annotation in DatanodeProtocol#sendHeartbeat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16158) Discover datanodes with unbalanced volume usage by the standard deviation

2021-08-10 Thread tomscut (Jira)
tomscut created HDFS-16158:
--

 Summary: Discover datanodes with unbalanced volume usage by the 
standard deviation 
 Key: HDFS-16158
 URL: https://issues.apache.org/jira/browse/HDFS-16158
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: tomscut
Assignee: tomscut


Discover datanodes with unbalanced volume usage by the standard deviation

In some scenarios, we may cause unbalanced datanode disk usage:
1. Repair the damaged disk and make it online again.
2. Add disks to some Datanodes.
3. Some disks are damaged, resulting in slow data writing.
4. Use some custom volume choosing policies.

In the case of unbalanced disk usage, a sudden increase in datanode write 
traffic may result in busy disk I/O with low volume usage, resulting in 
decreased throughput across datanodes.

In this case, we need to find these nodes in time to do diskBalance, or other 
processing. Based on the volume usage of each datanode, we can calculate the 
standard deviation of the volume usage. The more unbalanced the volume, the 
higher the standard deviation.

To prevent the namenode from being too busy, we can calculate the standard 
variance on the datanode side, transmit it to the namenode through heartbeat, 
and display the result on the Web of namenode. We can then sort directly to 
find the nodes on the Web where the volumes usages are unbalanced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16131) Show storage type for failed volumes on namenode web

2021-07-16 Thread tomscut (Jira)
tomscut created HDFS-16131:
--

 Summary: Show storage type for failed volumes on namenode web
 Key: HDFS-16131
 URL: https://issues.apache.org/jira/browse/HDFS-16131
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


To make it easy to query the storage type for failed volumes,  we can display 
them on namenode web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16122) Fix DistCpContext#toString()

2021-07-09 Thread tomscut (Jira)
tomscut created HDFS-16122:
--

 Summary: Fix DistCpContext#toString() 
 Key: HDFS-16122
 URL: https://issues.apache.org/jira/browse/HDFS-16122
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut
 Attachments: distcp.jpg

!distcp.jpg|width=880,height=71!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16112) Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor

2021-07-04 Thread tomscut (Jira)
tomscut created HDFS-16112:
--

 Summary: Fix flaky unit test 
TestDecommissioningStatusWithBackoffMonitor 
 Key: HDFS-16112
 URL: https://issues.apache.org/jira/browse/HDFS-16112
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut


The unit test 
TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, 
we should fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-04 Thread tomscut (Jira)
tomscut created HDFS-16110:
--

 Summary: Remove unused method reportChecksumFailure in DFSClient
 Key: HDFS-16110
 URL: https://issues.apache.org/jira/browse/HDFS-16110
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Remove unused method reportChecksumFailure and fix some code styles by the way 
in DFSClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16109) Fix flaky some unit tests since they offen timeout

2021-07-02 Thread tomscut (Jira)
tomscut created HDFS-16109:
--

 Summary: Fix flaky some unit tests since they offen timeout
 Key: HDFS-16109
 URL: https://issues.apache.org/jira/browse/HDFS-16109
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Increase timeout for TestBootstrapStandby, TestFsVolumeList and 
TestDecommissionWithBackoffMonitor since they offen timeout.

 

TestBootstrapStandby:
{code:java}
[ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 
s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests 
run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] 
testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby)
  Time elapsed: 31.262 s  <<< 
ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at 
java.io.RandomAccessFile.write(RandomAccessFile.java:512) at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947)
 at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910)
 at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699)
 at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642)
 at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387)
 at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) 
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) 
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
 at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261) 
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231) 
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)
{code}
TestFsVolumeList:
{code:java}
[ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
190.294 s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 190.294 s <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
testAddRplicaProcessorForAddingReplicaInMap(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList)
  Time elapsed: 60.028 s  <<< 
ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 6 
milliseconds at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at 
java.util.concurrent.FutureTask.get(FutureTask.java:191) at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList.testAddRplicaProcessorForAddingReplicaInMap(TestFsVolumeList.java:395)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 

[jira] [Created] (HDFS-16106) Fix flaky unit test TestDFSShell

2021-07-01 Thread tomscut (Jira)
tomscut created HDFS-16106:
--

 Summary: Fix flaky unit test TestDFSShell
 Key: HDFS-16106
 URL: https://issues.apache.org/jira/browse/HDFS-16106
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


This unit test occasionally fails.

The value set for dfs.namenode.accesstime.precision is too low, result in the 
execution of the method, accesstime could be set many times, eventually leading 
to failed assert.

IMO, dfs.namenode.accesstime.precision should be greater than or equal to the 
timeout(120s) of TestDFSShell#testCopyCommandsWithPreserveOption(), or directly 
set to 0 to disable this feature.

 
{code:java}
[ERROR] Tests run: 52, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 
106.778 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSShell[ERROR] Tests 
run: 52, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 106.778 s <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestDFSShell [ERROR] 
testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell)  Time 
elapsed: 2.353 s  <<< FAILURE! java.lang.AssertionError: 
expected:<1625095098319> but was:<1625095099374> at 
org.junit.Assert.fail(Assert.java:89) at 
org.junit.Assert.failNotEquals(Assert.java:835) at 
org.junit.Assert.assertEquals(Assert.java:647) at 
org.junit.Assert.assertEquals(Assert.java:633) at 
org.apache.hadoop.hdfs.TestDFSShell.testCopyCommandsWithPreserveOption(TestDFSShell.java:2282)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)

[ERROR] testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell) 
 Time elapsed: 2.467 s  <<< FAILURE! java.lang.AssertionError: 
expected:<1625095192527> but was:<1625095193950> at 
org.junit.Assert.fail(Assert.java:89) at 
org.junit.Assert.failNotEquals(Assert.java:835) at 
org.junit.Assert.assertEquals(Assert.java:647) at 
org.junit.Assert.assertEquals(Assert.java:633) at 
org.apache.hadoop.hdfs.TestDFSShell.testCopyCommandsWithPreserveOption(TestDFSShell.java:2323)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)

[ERROR] testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell) 
 Time elapsed: 2.173 s  <<< FAILURE! java.lang.AssertionError: 
expected:<1625095196756> but was:<1625095197975> at 
org.junit.Assert.fail(Assert.java:89) at 
org.junit.Assert.failNotEquals(Assert.java:835) at 
org.junit.Assert.assertEquals(Assert.java:647) at 
org.junit.Assert.assertEquals(Assert.java:633) at 
org.apache.hadoop.hdfs.TestDFSShell.testCopyCommandsWithPreserveOption(TestDFSShell.java:2303)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 

[jira] [Created] (HDFS-16104) Remove unused parameter and fix java doc for DiskBalancerCLI

2021-06-30 Thread tomscut (Jira)
tomscut created HDFS-16104:
--

 Summary: Remove unused parameter and fix java doc for 
DiskBalancerCLI
 Key: HDFS-16104
 URL: https://issues.apache.org/jira/browse/HDFS-16104
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Remove unused parameter and fix java doc for DiskBalancerCLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16089) Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor

2021-06-26 Thread tomscut (Jira)
tomscut created HDFS-16089:
--

 Summary: Add metric EcReconstructionValidateTimeMillis for 
StripedBlockReconstructor
 Key: HDFS-16089
 URL: https://issues.apache.org/jira/browse/HDFS-16089
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor, so 
that we can count the elapsed time for striped block reconstructing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load

2021-06-24 Thread tomscut (Jira)
tomscut created HDFS-16088:
--

 Summary: Standby NameNode process getLiveDatanodeStorageReport 
request to reduce Active load
 Key: HDFS-16088
 URL: https://issues.apache.org/jira/browse/HDFS-16088
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


As with [HDFS-13183|https://issues.apache.org/jira/browse/HDFS-13183], 
NameNodeConnector#getLiveDatanodeStorageReport() can also request to SNN to 
reduce the ANN load.

There are two points that need to be mentioned:
1. NameNodeConnector#getLiveDatanodeStorageReport() is 
OperationCategory.UNCHECKED in FSNamesystem, so we can access SNN directly.
2. We can share the same UT(testBalancerRequestSBNWithHA) with 
NameNodeConnector#getBlocks().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16086) Add volume information to datanode log for tracing

2021-06-23 Thread tomscut (Jira)
tomscut created HDFS-16086:
--

 Summary: Add volume information to datanode log for tracing
 Key: HDFS-16086
 URL: https://issues.apache.org/jira/browse/HDFS-16086
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


To keep track of the block in volume, we can add the volume information to the 
datanode log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16085) Move the getPermissionChecker out of the read lock

2021-06-22 Thread tomscut (Jira)
tomscut created HDFS-16085:
--

 Summary: Move the getPermissionChecker out of the read lock
 Key: HDFS-16085
 URL: https://issues.apache.org/jira/browse/HDFS-16085
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Move the getPermissionChecker out of the read lock in 
NamenodeFsck#getBlockLocations() since the operation does not need to be locked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16079) Improve the block state change log

2021-06-18 Thread tomscut (Jira)
tomscut created HDFS-16079:
--

 Summary: Improve the block state change log
 Key: HDFS-16079
 URL: https://issues.apache.org/jira/browse/HDFS-16079
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Improve the block state change log. Add readOnlyReplicas and 
replicasOnStaleNodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16078) Remove unused parameters for DatanodeManager.handleLifeline()

2021-06-18 Thread tomscut (Jira)
tomscut created HDFS-16078:
--

 Summary: Remove unused parameters for 
DatanodeManager.handleLifeline()
 Key: HDFS-16078
 URL: https://issues.apache.org/jira/browse/HDFS-16078
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Remove unused parameters (blockPoolId, maxTransfers) for 
DatanodeManager.handleLifeline().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16076) Avoid using slow DataNodes for reading by sorting locations

2021-06-17 Thread tomscut (Jira)
tomscut created HDFS-16076:
--

 Summary: Avoid using slow DataNodes for reading by sorting 
locations
 Key: HDFS-16076
 URL: https://issues.apache.org/jira/browse/HDFS-16076
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


After sorting the expected location list will be: live -> slow -> stale -> 
staleAndSlow -> entering_maintenance -> decommissioned. This reduces the 
probability that slow nodes will be used for reading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-08 Thread tomscut (Jira)
tomscut created HDFS-16057:
--

 Summary: Make sure the order for location in ENTERING_MAINTENANCE 
state
 Key: HDFS-16057
 URL: https://issues.apache.org/jira/browse/HDFS-16057
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut


We use compactor to sort locations in getBlockLocations(), and the expected 
result is: live -> stale -> entering_maintenance -> decommissioned.

But the networktopology. SortByDistance() will disrupt the order. We should 
also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
networktopology. SortByDistance().

 

org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
{code:java}
DatanodeInfoWithStorage[] di = lb.getLocations();
// Move decommissioned/stale datanodes to the bottom
Arrays.sort(di, comparator);

// Sort nodes by network distance only for located blocks
int lastActiveIndex = di.length - 1;
while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
  --lastActiveIndex;
}
int activeLen = lastActiveIndex + 1;
if(nonDatanodeReader) {
  networktopology.sortByDistanceUsingNetworkLocation(client,
  lb.getLocations(), activeLen, createSecondaryNodeSorter());
} else {
  networktopology.sortByDistance(client, lb.getLocations(), activeLen,
  createSecondaryNodeSorter());
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16048) Print network topology on the router web

2021-05-30 Thread tomscut (Jira)
tomscut created HDFS-16048:
--

 Summary: Print network topology on the router web
 Key: HDFS-16048
 URL: https://issues.apache.org/jira/browse/HDFS-16048
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


In order to query the network topology information conveniently, we can print 
it on the router web. It's related to 
[HDFS-15970|https://issues.apache.org/jira/browse/HDFS-15970]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15991) Add location into datanode info for NameNodeMXBean

2021-04-19 Thread tomscut (Jira)
tomscut created HDFS-15991:
--

 Summary: Add location into datanode info for NameNodeMXBean
 Key: HDFS-15991
 URL: https://issues.apache.org/jira/browse/HDFS-15991
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Add location into datanode info for NameNodeMXBean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15975) Use LongAdder instead of AtomicLong

2021-04-14 Thread tomscut (Jira)
tomscut created HDFS-15975:
--

 Summary: Use LongAdder instead of AtomicLong
 Key: HDFS-15975
 URL: https://issues.apache.org/jira/browse/HDFS-15975
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


When counting some indicators, we can use LongAdder instead of AtomicLong to 
improve performance. The long value is not an atomic snapshot in LongAdder, but 
I think we can tolerate that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15970) Print network topology on web

2021-04-12 Thread tomscut (Jira)
tomscut created HDFS-15970:
--

 Summary: Print network topology on web
 Key: HDFS-15970
 URL: https://issues.apache.org/jira/browse/HDFS-15970
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut
 Attachments: hdfs-topology.jpg, hdfs-web.jpg

In order to query the network topology information conveniently, we can print 
it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15951) Remove unused parameters in NameNodeProxiesClient

2021-04-03 Thread tomscut (Jira)
tomscut created HDFS-15951:
--

 Summary: Remove unused parameters in NameNodeProxiesClient
 Key: HDFS-15951
 URL: https://issues.apache.org/jira/browse/HDFS-15951
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Remove unused parameters in org.apache.hadoop.hdfs.NameNodeProxiesClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15946) Fix java doc in FSPermissionChecker

2021-04-02 Thread tomscut (Jira)
tomscut created HDFS-15946:
--

 Summary: Fix java doc in FSPermissionChecker
 Key: HDFS-15946
 URL: https://issues.apache.org/jira/browse/HDFS-15946
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Fix java doc for 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker#hasAclPermission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15938) Fix java doc in FSEditLog

2021-03-30 Thread tomscut (Jira)
tomscut created HDFS-15938:
--

 Summary: Fix java doc in FSEditLog
 Key: HDFS-15938
 URL: https://issues.apache.org/jira/browse/HDFS-15938
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Fix java doc in 
org.apache.hadoop.hdfs.server.namenode.FSEditLog#logAddCacheDirectiveInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15906) Close FSImage and FSNamesystem after formatting is complete

2021-03-19 Thread tomscut (Jira)
tomscut created HDFS-15906:
--

 Summary: Close FSImage and FSNamesystem after formatting is 
complete
 Key: HDFS-15906
 URL: https://issues.apache.org/jira/browse/HDFS-15906
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Close FSImage and FSNamesystem after formatting is complete. 

org.apache.hadoop.hdfs.server.namenode#format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15892) Add metric for editPendingQ in FSEditLogAsync

2021-03-12 Thread tomscut (Jira)
tomscut created HDFS-15892:
--

 Summary: Add metric for editPendingQ in FSEditLogAsync
 Key: HDFS-15892
 URL: https://issues.apache.org/jira/browse/HDFS-15892
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


To monitor editPendingQ in FSEditLogAsync, we add a metric  
and print log when the queue is full.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15884) RBF: Remove unused method getCreateLocation in RouterRpcServer

2021-03-09 Thread tomscut (Jira)
tomscut created HDFS-15884:
--

 Summary: RBF: Remove unused method getCreateLocation in 
RouterRpcServer
 Key: HDFS-15884
 URL: https://issues.apache.org/jira/browse/HDFS-15884
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Remove unused method 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer#getCreateLocation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15883) Add a metric BlockReportQueueFullCount

2021-03-07 Thread tomscut (Jira)
tomscut created HDFS-15883:
--

 Summary: Add a metric BlockReportQueueFullCount
 Key: HDFS-15883
 URL: https://issues.apache.org/jira/browse/HDFS-15883
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Add a metric that reflects the number of times the block report queue is full



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15879) Exclude slow nodes when choose targets for blocks

2021-03-05 Thread tomscut (Jira)
tomscut created HDFS-15879:
--

 Summary: Exclude slow nodes when choose targets for blocks
 Key: HDFS-15879
 URL: https://issues.apache.org/jira/browse/HDFS-15879
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Previously, we have monitored the slow nodes, related to 
https://issues.apache.org/jira/browse/HDFS-11194.

We can use a thread to periodically collect these slow nodes into a set. Then 
use the set to filter out slow nodes when choose targets for blocks.

This feature can be configured to be turned on when needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15873) Add namenode address in logs for block report

2021-03-04 Thread tomscut (Jira)
tomscut created HDFS-15873:
--

 Summary: Add namenode address in logs for block report
 Key: HDFS-15873
 URL: https://issues.apache.org/jira/browse/HDFS-15873
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: datanode, hdfs
Reporter: tomscut
Assignee: tomscut


Add namenode address in logs for block report. It's easier to track when the 
block report was sent to ANN or SNN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15870) Remove unused configuration dfs.namenode.stripe.min

2021-03-02 Thread tomscut (Jira)
tomscut created HDFS-15870:
--

 Summary: Remove unused configuration dfs.namenode.stripe.min
 Key: HDFS-15870
 URL: https://issues.apache.org/jira/browse/HDFS-15870
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Remove unused configuration dfs.namenode.stripe.min.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15854) Make some parameters configurable for SlowDiskTracker and SlowPeerTracker

2021-02-23 Thread tomscut (Jira)
tomscut created HDFS-15854:
--

 Summary: Make some parameters configurable for SlowDiskTracker and 
SlowPeerTracker
 Key: HDFS-15854
 URL: https://issues.apache.org/jira/browse/HDFS-15854
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut


Make some parameters configurable for SlowDiskTracker and SlowPeerTracker. 
Related to https://issues.apache.org/jira/browse/HDFS-15814.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



  1   2   >