[jira] [Created] (HDFS-17051) Fix wrong time util in TestFileAppend4#recoverFile

2023-06-14 Thread Zhaohui Wang (Jira)
Zhaohui Wang created HDFS-17051:
---

 Summary: Fix wrong time util in TestFileAppend4#recoverFile
 Key: HDFS-17051
 URL: https://issues.apache.org/jira/browse/HDFS-17051
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhaohui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.6 RC0

2023-06-14 Thread Wei-Chiu Chuang
The hbase-filesystem tests passed after reverting HADOOP-18596
 and HADOOP-18633
 from my local tree.
So I think it's a matter of the default behavior being changed. It's not
the end of the world. I think we can address it by adding an incompatible
change flag and a release note.

On Wed, Jun 14, 2023 at 3:55 PM Wei-Chiu Chuang  wrote:

> Cross referenced git history and jira. Changelog needs some update
>
> Not in the release
>
>1. HDFS-16858 
>
>
>1. HADOOP-18532 
>2.
>   1. HDFS-16861 
>  2.
> 1. HDFS-16866
> 
> 2.
>1. HADOOP-18320
>
>2.
>
> Updated fixed version. Will generate. new Changelog in the next RC.
>
> Was able to build HBase and hbase-filesystem without any code change.
>
> hbase has one unit test failure. This one is reproducible even with Hadoop
> 3.3.5, so maybe a red herring. Local env or something.
>
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 9.007 s <<< FAILURE! - in
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker
> [ERROR]
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness
>  Time elapsed: 3.13 s  <<< ERROR!
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker$RandomTestData.(TestSyncTimeRangeTracker.java:91)
> at
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness(TestSyncTimeRangeTracker.java:156)
>
> hbase-filesystem has three test failures in TestHBOSSContractDistCp, and
> is not reproducible with Hadoop 3.3.5.
> [ERROR] Failures: [ERROR]
> TestHBOSSContractDistCp>AbstractContractDistCpTest.testDistCpUpdateCheckFileSkip:976->Assert.fail:88
> 10 errors in file of length 10
> [ERROR]
> TestHBOSSContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureNoChange:270->AbstractContractDistCpTest.assertCounterInRange:290->Assert.assertTrue:41->Assert.fail:88
> Files Skipped value 0 too below minimum 1
> [ERROR]
> TestHBOSSContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote:259->AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure:334->AbstractContractDistCpTest.assertCounterInRange:294->Assert.assertTrue:41->Assert.fail:88
> Files Copied value 2 above maximum 1
> [INFO]
> [ERROR] Tests run: 240, Failures: 3, Errors: 0, Skipped: 58
>
>
> Ozone
> test in progress. Will report back.
>
>
> On Tue, Jun 13, 2023 at 11:27 PM Wei-Chiu Chuang 
> wrote:
>
>> I am inviting anyone to try and vote on this release candidate.
>>
>> Note:
>> This is built off branch-3.3.6 plus PR#5741 (aws sdk update) and PR#5740
>> (LICENSE file update)
>>
>> The RC is available at:
>> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/ (for amd64)
>> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-arm64/ (for arm64)
>>
>> Git tag: release-3.3.6-RC0
>> https://github.com/apache/hadoop/releases/tag/release-3.3.6-RC0
>>
>> Maven artifacts is built by x86 machine and are staged at
>> https://repository.apache.org/content/repositories/orgapachehadoop-1378/
>>
>> My public key:
>> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>>
>> Changelog:
>> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/CHANGELOG.md
>>
>> Release notes:
>> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/RELEASENOTES.md
>>
>> This is a relatively small release (by Hadoop standard) containing about
>> 120 commits.
>> Please give it a try, this RC vote will run for 7 days.
>>
>>
>> Feature highlights:
>>
>> SBOM artifacts
>> 
>> Starting from this release, Hadoop publishes Software Bill of Materials
>> (SBOM) using
>> CycloneDX Maven plugin. For more information about SBOM, please go to
>> [SBOM](https://cwiki.apache.org/confluence/display/COMDEV/SBOM).
>>
>> HDFS RBF: RDBMS based token storage support
>> 
>> HDFS Router-Router Based Federation now supports storing delegation
>> tokens on MySQL,
>> [HADOOP-18535](https://issues.apache.org/jira/browse/HADOOP-18535)
>> which improves token operation through over the original Zookeeper-based
>> implementation.
>>
>>
>> New File System APIs
>> 
>> [HADOOP-18671](https://issues.apache.org/jira/browse/HADOOP-18671) moved
>> a number of
>> HDFS-specific APIs to Hadoop Common to make it possible for certain
>> applications that
>> depend on HDFS semantics to run on other Hadoop compatible file systems.
>>
>> In particular, recove

Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2023-06-14 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1058/

No changes


ERROR: File 'out/email-report.txt' does not exist

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17045) File renamed from a snapshottable dir to a non-snapshottable dir cannot be deleted.

2023-06-14 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved HDFS-17045.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

The pull request is now merged.

> File renamed from a snapshottable dir to a non-snapshottable dir cannot be 
> deleted.
> ---
>
> Key: HDFS-17045
> URL: https://issues.apache.org/jira/browse/HDFS-17045
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HDFS-16972 added a 
> [shouldDestroy|https://github.com/szetszwo/hadoop/blob/331e075115b4a35574622318b26f6d4731658d57/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java#L834-L845]
>  method which caused the following bug.
> h3. Background:
>  - When {{FileSystem.rename(src, dst)}} from a snapshottable dir (src) to a 
> snapshottable dir (dst), dstSnapshotId is set to the latest snapshot at dst. 
> As a result, dst is NOT in dstSnapshotId because dstSnapshotId was already 
> taken before rename.
>  - snapshotToBeDeleted is the snapshot id of the current operation if the 
> operation is {{{}FileSystem.deleteSnapshot{}}}. Otherwise, 
> snapshotToBeDeleted is set to CURRENT_STATE_ID.
>  - If (snapshotToBeDeleted > dstSnapshotId), dst is in snapshotToBeDeleted. 
> The shouldDestroy method returns true to continue deletion.
>  - If (snapshotToBeDeleted <= dstSnapshotId), dst must not be in 
> snapshotToBeDeleted. The shouldDestroy method returns false to stop deletion.
> All the above are correct for renaming within snapshottable directories.
> h3. Bug:
>  - If rename(src, dst) from a snapshottable dir (src) to a non-snapshottable 
> dir (dst), dstSnapshotId becomes CURRENT_STATE_ID.
>  - When {{FileSystem.delete(dst)}} happens, snapshotToBeDeleted is also set 
> to CURRENT_STATE_ID.
>  - In this case, snapshotToBeDeleted == dstSnapshotId, the shouldDestroy 
> method will return false and it incorrectly stops the deletion.
> Not that this bug may cause fsimage corruption and quota miscalculation since 
> some files can be partially deleted.  Fortunately, this bug won't cause data 
> loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.6 RC0

2023-06-14 Thread Wei-Chiu Chuang
Cross referenced git history and jira. Changelog needs some update

Not in the release

   1. HDFS-16858 


   1. HADOOP-18532 
   2.
  1. HDFS-16861 
 2.
1. HDFS-16866 
2.
   1. HADOOP-18320
   
   2.

Updated fixed version. Will generate. new Changelog in the next RC.

Was able to build HBase and hbase-filesystem without any code change.

hbase has one unit test failure. This one is reproducible even with Hadoop
3.3.5, so maybe a red herring. Local env or something.

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
9.007 s <<< FAILURE! - in
org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker
[ERROR]
org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness
 Time elapsed: 3.13 s  <<< ERROR!
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker$RandomTestData.(TestSyncTimeRangeTracker.java:91)
at
org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness(TestSyncTimeRangeTracker.java:156)

hbase-filesystem has three test failures in TestHBOSSContractDistCp, and is
not reproducible with Hadoop 3.3.5.
[ERROR] Failures: [ERROR]
TestHBOSSContractDistCp>AbstractContractDistCpTest.testDistCpUpdateCheckFileSkip:976->Assert.fail:88
10 errors in file of length 10
[ERROR]
TestHBOSSContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureNoChange:270->AbstractContractDistCpTest.assertCounterInRange:290->Assert.assertTrue:41->Assert.fail:88
Files Skipped value 0 too below minimum 1
[ERROR]
TestHBOSSContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote:259->AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure:334->AbstractContractDistCpTest.assertCounterInRange:294->Assert.assertTrue:41->Assert.fail:88
Files Copied value 2 above maximum 1
[INFO]
[ERROR] Tests run: 240, Failures: 3, Errors: 0, Skipped: 58


Ozone
test in progress. Will report back.


On Tue, Jun 13, 2023 at 11:27 PM Wei-Chiu Chuang  wrote:

> I am inviting anyone to try and vote on this release candidate.
>
> Note:
> This is built off branch-3.3.6 plus PR#5741 (aws sdk update) and PR#5740
> (LICENSE file update)
>
> The RC is available at:
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/ (for amd64)
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-arm64/ (for arm64)
>
> Git tag: release-3.3.6-RC0
> https://github.com/apache/hadoop/releases/tag/release-3.3.6-RC0
>
> Maven artifacts is built by x86 machine and are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1378/
>
> My public key:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Changelog:
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/CHANGELOG.md
>
> Release notes:
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/RELEASENOTES.md
>
> This is a relatively small release (by Hadoop standard) containing about
> 120 commits.
> Please give it a try, this RC vote will run for 7 days.
>
>
> Feature highlights:
>
> SBOM artifacts
> 
> Starting from this release, Hadoop publishes Software Bill of Materials
> (SBOM) using
> CycloneDX Maven plugin. For more information about SBOM, please go to
> [SBOM](https://cwiki.apache.org/confluence/display/COMDEV/SBOM).
>
> HDFS RBF: RDBMS based token storage support
> 
> HDFS Router-Router Based Federation now supports storing delegation tokens
> on MySQL,
> [HADOOP-18535](https://issues.apache.org/jira/browse/HADOOP-18535)
> which improves token operation through over the original Zookeeper-based
> implementation.
>
>
> New File System APIs
> 
> [HADOOP-18671](https://issues.apache.org/jira/browse/HADOOP-18671) moved
> a number of
> HDFS-specific APIs to Hadoop Common to make it possible for certain
> applications that
> depend on HDFS semantics to run on other Hadoop compatible file systems.
>
> In particular, recoverLease() and isFileClosed() are exposed through
> LeaseRecoverable
> interface. While setSafeMode() is exposed through SafeMode interface.
>
>
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2023-06-14 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/

[Jun 13, 2023, 4:31:35 AM] (github) HDFS-16946. Fix getTopTokenRealOwners to 
return String (#5696). Contributed by Nishtha Shah.
[Jun 13, 2023, 4:55:04 PM] (github) HDFS-17041. RBF: Fix putAll impl for mysql 
and file based state stores (#5723)




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

Failed junit tests :

   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.yarn.client.TestFederationRMFailoverProxyProvider 
   hadoop.mapreduce.v2.TestUberAM 
   hadoop.mapreduce.v2.TestMRJobsWithProfiler 
   hadoop.mapreduce.v2.TestMRJobs 
   
hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver
 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-compile-javac-root.txt
 [12K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/blanks-eol.txt
 [14M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-checkstyle-root.txt
 [13M]

   hadolint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-hadolint.txt
 [20K]

   pathlen:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-pathlen.txt
 [16K]

   pylint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-pylint.txt
 [20K]

   shellcheck:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-shellcheck.txt
 [24K]

   xml:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/xml.txt
 [24K]

   javadoc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/results-javadoc-javadoc-root.txt
 [244K]

   unit:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 [684K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 [44K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
 [72K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1257/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 [96K]

Powered by Apache Yetus 0.14.0-SNAPSHOT   https://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17047) BlockManager#addStoredBlock should log storage id when AddBlockResult is REPLACED

2023-06-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17047.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> BlockManager#addStoredBlock should log storage id when AddBlockResult is 
> REPLACED
> -
>
> Key: HDFS-17047
> URL: https://issues.apache.org/jira/browse/HDFS-17047
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Recently, we found some logs in active namenode frequently like belows:
>  
> {code:java}
> 2023-06-12 05:34:09,821 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 05:34:09,892 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 11:34:07,932 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 11:34:08,027 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 17:34:08,742 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 17:34:08,813 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 23:34:09,752 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 23:34:09,812 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 05:34:08,065 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 05:34:08,144 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 11:34:08,638 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 11:34:08,681 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010{code}
>  
>  
> All logs have the same ec block id : blk_-9223372036614126544_57136788  and 
> printed every 6 hours(FBR interval of our cluster).
> To figure out what happened, I think we should also log storage id here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy

2023-06-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-17030.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Limit wait time for getHAServiceState in ObserverReaderProxy
> 
>
> Key: HDFS-17030
> URL: https://issues.apache.org/jira/browse/HDFS-17030
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When namenode HA is enabled and a standby NN is not responsible, we have 
> observed it would take a long time to serve a request, even though we have a 
> healthy observer or active NN. 
> Basically, when a standby is down, the RPC client would (re)try to create 
> socket connection to that standby for _ipc.client.connect.timeout_ _* 
> ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
> heap dump at a standby, the NN still accepts the socket connection but it 
> won't send responses to these RPC requests and we would timeout after 
> _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters 
> at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a 
> request takes more than 2 mins to complete when we take a heap dump at a 
> standby. This has been causing user job failures. 
> We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
> getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
> still use the original value from the config). However, that would double the 
> socket connection between clients and the NN (which is a deal-breaker). 
> The proposal is to add a timeout on getHAServiceState() calls in 
> ObserverReaderProxy and we will only wait for the timeout for an NN to 
> respond its HA state. Once we pass that timeout, we will move on to probe the 
> next NN. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2023-06-14 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/

No changes




-1 overall


The following subsystems voted -1:
asflicense compile golang hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.hdfs.server.blockmanagement.TestBlockReportLease 
   hadoop.hdfs.TestRollingUpgrade 
   hadoop.hdfs.TestFileLengthOnClusterRestart 
   hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics 
   hadoop.hdfs.TestLeaseRecovery2 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.yarn.sls.TestSLSRunner 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl
 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator
 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
   hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
  

   compile:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-compile-root.txt
  [680K]

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-compile-root.txt
  [680K]

   golang:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-compile-root.txt
  [680K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-compile-root.txt
  [680K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-mvnsite-root.txt
  [592K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/diff-patch-shellcheck.txt
  [72K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-javadoc-root.txt
  [40K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [244K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [460K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [36K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1057/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-l

[jira] [Resolved] (HDFS-17048) FSNamesystem.delete() maybe cause data residue when active namenode crash or shutdown

2023-06-14 Thread liuguanghua (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua resolved HDFS-17048.

Resolution: Not A Problem

> FSNamesystem.delete() maybe cause data residue when active namenode crash  or 
> shutdown 
> ---
>
> Key: HDFS-17048
> URL: https://issues.apache.org/jira/browse/HDFS-17048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: hdfs3.3
>Reporter: liuguanghua
>Priority: Major
>
> Consider the following scenario:
> (1) User delete a hdfs dir with many blocks.
> (2) Then ative Namenode is crash or shutdown or failover to standby Namenode  
> by administrator
> (3) This may result in residual data
>  
> FSNamesystem.delete() will
> (1)delete dir first
> (2)add toRemovedBlocks into markedDeleteQueue. 
> (3) MarkedDeleteBlockScrubber Thread will consumer the markedDeleteQueue and 
> delete blocks.
> If the active namenode crash, the blocks in markedDeleteQueue will be lost 
> and never be deleted. And the block cloud not find via hdfs fsck command. But 
> it is alive in datanode disk.
>  
> Thus , 
> SummaryA =  hdfs dfs -du -s / 
> SummaryB =sum( datanode report dfsused)
> SummaryA < SummaryB
>  
> This may be unavoidable.  But is there any way to find out the blocks that 
> should be deleted and clean it ?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17050) Erasure coding: invalidate duplicated block when actual block numbers less than data blocks plus parity blocks.

2023-06-14 Thread farmmamba (Jira)
farmmamba created HDFS-17050:


 Summary: Erasure coding: invalidate duplicated block when actual 
block numbers less than data blocks plus parity blocks.
 Key: HDFS-17050
 URL: https://issues.apache.org/jira/browse/HDFS-17050
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.3.2, 3.4.0
Reporter: farmmamba


Currently, I found a strange phenomenon mentioned in HDFS-17047.

When triggered FBR manually or automatically, we found some warn log like below:
{code:java}
2023-06-14 16:29:36,432 WARN BlockStateChange: BLOCK* addStoredBlock: block 
blk_-9223372036578646784_59354864 moved to storageType DISK on node 
datanode12:50010
2023-06-14 16:29:36,477 WARN BlockStateChange: BLOCK* addStoredBlock: block 
blk_-9223372036578646784_59354864 moved to storageType DISK on node 
datanode12:50010{code}
The above logs print the same storedBlock two times. After diving into logs, I 
found that there exist two blocks of a same block group due to some unknown 
reasons. And one of the two blocks is also exists in other datanode. But fsck 
did not print the duplicated replicas info.

additional information: the file is 3MB+,  we use RS-6-3-1024K, so the fsck 
only print seven blocks information. But indeed, we have eight blocks and one 
of them is a duplicated block.

 

The reason why print above logs is that:

In BlockManager#addStoredBlock method, because a datanode has two blocks of the 
same block group, the AddBlockResult would be REPLACED.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org