[jira] [Work logged] (HDFS-16509) Fix decommission UnsupportedOperationException: Remove unsupported

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16509?focusedWorklogId=756180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756180
 ]

ASF GitHub Bot logged work on HDFS-16509:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 05:35
Start Date: 13/Apr/22 05:35
Worklog Time Spent: 10m 
  Work Description: cndaimin commented on PR #4077:
URL: https://github.com/apache/hadoop/pull/4077#issuecomment-1097578253

   @jojochuang Yes, the exception is not well handled, and will make 
decommission stuck.




Issue Time Tracking
---

Worklog Id: (was: 756180)
Time Spent: 1h 50m  (was: 1h 40m)

> Fix decommission UnsupportedOperationException: Remove unsupported
> --
>
> Key: HDFS-16509
> URL: https://issues.apache.org/jira/browse/HDFS-16509
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We encountered an "UnsupportedOperationException: Remove unsupported" error 
> when some datanodes were in decommission. The reason of the exception is that 
> datanode.getBlockIterator() returns an Iterator does not support remove, 
> however DatanodeAdminDefaultMonitor#processBlocksInternal invokes it.remove() 
> when a block not found, e.g, the file containing the block is deleted.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16509) Fix decommission UnsupportedOperationException: Remove unsupported

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16509?focusedWorklogId=756174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756174
 ]

ASF GitHub Bot logged work on HDFS-16509:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 04:54
Start Date: 13/Apr/22 04:54
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on PR #4077:
URL: https://github.com/apache/hadoop/pull/4077#issuecomment-1097556767

   great catch! Is this the reason that sometimes decomm never completes?
   




Issue Time Tracking
---

Worklog Id: (was: 756174)
Time Spent: 1h 40m  (was: 1.5h)

> Fix decommission UnsupportedOperationException: Remove unsupported
> --
>
> Key: HDFS-16509
> URL: https://issues.apache.org/jira/browse/HDFS-16509
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We encountered an "UnsupportedOperationException: Remove unsupported" error 
> when some datanodes were in decommission. The reason of the exception is that 
> datanode.getBlockIterator() returns an Iterator does not support remove, 
> however DatanodeAdminDefaultMonitor#processBlocksInternal invokes it.remove() 
> when a block not found, e.g, the file containing the block is deleted.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-14478) Add libhdfs APIs for openFile

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?focusedWorklogId=756171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756171
 ]

ASF GitHub Bot logged work on HDFS-14478:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 04:30
Start Date: 13/Apr/22 04:30
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4166:
URL: https://github.com/apache/hadoop/pull/4166#issuecomment-1097544133

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  15m 45s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   3m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  67m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  cc  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  golang  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  31m 11s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  92m 27s |  |  hadoop-hdfs-native-client in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 213m 41s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4166 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit 
codespell golang |
   | uname | Linux 10f36a3ff98f 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 
06:54:29 UTC 2021 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0d383e5698f449c0fa1fd6c4abeb3b2ad136eb38 |
   | Default Java | Debian-11.0.14+9-post-Debian-1deb10u1 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/testReport/ |
   | Max. process+thread count | 536 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/console |
   | versions | git=2.20.1 maven=3.6.0 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking
---

Worklog Id: (was: 756171)
Time Spent: 1h 40m  (was: 1.5h)

> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
>   int bufferSize, short replication, tSize blocksize);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path);
> hdfsOpenFileBuilder 

[jira] [Work logged] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=756148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756148
 ]

ASF GitHub Bot logged work on HDFS-16484:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 03:22
Start Date: 13/Apr/22 03:22
Worklog Time Spent: 10m 
  Work Description: liubingxing commented on PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#issuecomment-1097512478

   @tasanuma Thanks for your review and merge




Issue Time Tracking
---

Worklog Id: (was: 756148)
Time Spent: 4h 10m  (was: 4h)

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
>   }
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16484.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
   Resolution: Fixed

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
>   }
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-04-12 Thread liutongwei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521419#comment-17521419
 ] 

liutongwei commented on HDFS-16493:
---

[~Feng Yuan]  Sorry for the mistake when I create path I do not recheck code. 
Update a new version.

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: exapmle.v1.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-04-12 Thread liutongwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liutongwei updated HDFS-16493:
--
Attachment: exapmle.v1.patch

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: exapmle.v1.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-04-12 Thread liutongwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liutongwei updated HDFS-16493:
--
Attachment: (was: example.patch)

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: exapmle.v1.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread tomscut (Jira)


[ https://issues.apache.org/jira/browse/HDFS-16507 ]


tomscut deleted comment on HDFS-16507:


was (Author: tomscut):
[~xkrogen] Your comment makes a lot of sense to me.

IMO,  there are two ways to approach this problem:
1. Throw an IllegalArgumentException, wait for Edit to be turned off normally, 
and then automatically FSEditLog#purgeLogsOlderThan. However, if SNN is down 
for a long time, edits   log may take up more disk space.

2. Update `minTxIdToKeep` here. Like the PR I submitted at the beginning.
{code:java}
// Reset purgeLogsFrom to avoid purging edit log which is in progress.
if (isSegmentOpen()) {
   minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : 
minTxIdToKeep;
} {code}
What do you think of this? cc [~sunchao] [~vjasani] .

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> 

[jira] [Work logged] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=756139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756139
 ]

ASF GitHub Bot logged work on HDFS-16484:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 02:28
Start Date: 13/Apr/22 02:28
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#issuecomment-1097486730

   @liubingxing Thanks for your contribution!




Issue Time Tracking
---

Worklog Id: (was: 756139)
Time Spent: 4h  (was: 3h 50m)

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
>   }
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=756138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756138
 ]

ASF GitHub Bot logged work on HDFS-16484:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 02:27
Start Date: 13/Apr/22 02:27
Worklog Time Spent: 10m 
  Work Description: tasanuma merged PR #4032:
URL: https://github.com/apache/hadoop/pull/4032




Issue Time Tracking
---

Worklog Id: (was: 756138)
Time Spent: 3h 50m  (was: 3h 40m)

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
>   }
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521409#comment-17521409
 ] 

tomscut edited comment on HDFS-16507 at 4/13/22 2:06 AM:
-

[~xkrogen] Your comment makes a lot of sense to me.

IMO,  there are two ways to approach this problem:
1. Throw an IllegalArgumentException, wait for Edit to be turned off normally, 
and then automatically FSEditLog#purgeLogsOlderThan. However, if SNN is down 
for a long time, edits   log may take up more disk space.

2. Update `minTxIdToKeep` here. Like the PR I submitted at the beginning.
{code:java}
// Reset purgeLogsFrom to avoid purging edit log which is in progress.
if (isSegmentOpen()) {
   minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : 
minTxIdToKeep;
} {code}
What do you think of this? cc [~sunchao] [~vjasani] .


was (Author: tomscut):
[~xkrogen] Your comment makes a lot of sense to me.

IMO,  there are two ways to approach this problem:
1. Throw an IllegalArgumentException, wait for Edit to be turned off normally, 
and then automatically FSEditLog#purgeLogsOlderThan. However, if SNN is down 
for a long time, edits may take up more disk space.

2. Update `minTxIdToKeep` here. Like the PR I submitted at the beginning.
{code:java}
// Reset purgeLogsFrom to avoid purging edit log which is in progress.
if (isSegmentOpen()) {
   minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : 
minTxIdToKeep;
} {code}
What do you think of this? cc [~sunchao] [~vjasani] .

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     

[jira] [Work logged] (HDFS-16513) [SBN read] Observer Namenode should not trigger the edits rolling of active Namenode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16513?focusedWorklogId=756135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756135
 ]

ASF GitHub Bot logged work on HDFS-16513:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 02:05
Start Date: 13/Apr/22 02:05
Worklog Time Spent: 10m 
  Work Description: tomscut commented on code in PR #4087:
URL: https://github.com/apache/hadoop/pull/4087#discussion_r849006815


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1938,6 +1938,14 @@ public boolean isInStandbyState() {
 HAServiceState.OBSERVER == haContext.getState().getServiceState();
   }
 
+  public boolean isInObserverState() {
+if (haContext == null || haContext.getState() == null) {
+  return haEnabled;

Review Comment:
   > Seems like this was probably copied from `isInStandbyState()`? But I don't 
think it's right. If we can't find a state, we assume `STANDBY` state. If we 
assume `STANDBY` state because a valid state could not be found, then 
`isInObserverState()` should be false. So I think we should just `return false` 
here.
   
   I agree with you. Thanks.





Issue Time Tracking
---

Worklog Id: (was: 756135)
Time Spent: 2h 20m  (was: 2h 10m)

> [SBN read] Observer Namenode should not trigger the edits rolling of active 
> Namenode
> 
>
> Key: HDFS-16513
> URL: https://issues.apache.org/jira/browse/HDFS-16513
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> To avoid frequent edtis rolling, we should disable OBN from triggering the 
> edits rolling of active Namenode. 
> It is sufficient to retain only the triggering of SNN and the auto rolling of 
> ANN. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16513) [SBN read] Observer Namenode should not trigger the edits rolling of active Namenode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16513?focusedWorklogId=756134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756134
 ]

ASF GitHub Bot logged work on HDFS-16513:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 02:04
Start Date: 13/Apr/22 02:04
Worklog Time Spent: 10m 
  Work Description: tomscut commented on code in PR #4087:
URL: https://github.com/apache/hadoop/pull/4087#discussion_r849006683


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1938,6 +1938,14 @@ public boolean isInStandbyState() {
 HAServiceState.OBSERVER == haContext.getState().getServiceState();
   }
 
+  public boolean isInObserverState() {
+if (haContext == null || haContext.getState() == null) {
+  return haEnabled;

Review Comment:
   I agree with you.





Issue Time Tracking
---

Worklog Id: (was: 756134)
Time Spent: 2h 10m  (was: 2h)

> [SBN read] Observer Namenode should not trigger the edits rolling of active 
> Namenode
> 
>
> Key: HDFS-16513
> URL: https://issues.apache.org/jira/browse/HDFS-16513
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> To avoid frequent edtis rolling, we should disable OBN from triggering the 
> edits rolling of active Namenode. 
> It is sufficient to retain only the triggering of SNN and the auto rolling of 
> ANN. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16513) [SBN read] Observer Namenode should not trigger the edits rolling of active Namenode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16513?focusedWorklogId=756133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756133
 ]

ASF GitHub Bot logged work on HDFS-16513:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 02:03
Start Date: 13/Apr/22 02:03
Worklog Time Spent: 10m 
  Work Description: tomscut commented on PR #4087:
URL: https://github.com/apache/hadoop/pull/4087#issuecomment-1097474203

   > Hi @tomscut, sorry for the delay in my response.
   > 
   > I am inclined to agree with @sunchao that the approach laid out in 
[HDFS-14378](https://issues.apache.org/jira/browse/HDFS-14378) is a better 
long-term solution.
   > 
   > > It might be risky(we can look at here 
[HDFS-2737](https://issues.apache.org/jira/browse/HDFS-2737)) by simply 
disabling all SNN to trigger active roll edits log.
   > 
   > Can you clarify what from 
[HDFS-2737](https://issues.apache.org/jira/browse/HDFS-2737) makes you feel 
that it is risky? I skimmed the discussed and didn't notice anything alarming. 
You may also want to see [this comment on 
HDFS-14378](https://issues.apache.org/jira/browse/HDFS-14378?focusedCommentId=16907765=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16907765)
 where this same point was discussed.
   > 
   > That all being said, I think this PR may be a good step in the interim, 
since [HDFS-14378](https://issues.apache.org/jira/browse/HDFS-14378) is a more 
substantial change. I would appreciate some other opinions, though. cc 
@simbadzina @aajisaka @shvachko
   
   Thanks you @xkrogen very much for your comments. 
   It is mentioned in the description of HDFS-2737:
   ```
   Currently, the edit log tailing process can only read finalized log 
segments. So, if the active NN is not rolling its logs periodically, the SBN 
will lag a lot. This also causes many datanode messages to be queued up in the 
PendingDatanodeMessage structure.
   
   To combat this, the active NN needs to roll its logs periodically – perhaps 
based on a time threshold, or perhaps based on a number of transactions. I'm 
not sure yet whether it's better to have the NN roll on its own or to have the 
SBN ask the active NN to roll its logs.
   ```
   The pendingDatanodeMessage issue mentioned here strikes me as a bit risky. 
However, after supporting `SBN READ`, `Journal` supports `read inProgress`. If 
we enable `read inProgress`, even if we disable all SNN to roll edits, the 
pendingDatanodeMessage problem is not too serious. 
   
   I would also appreciate some other opinions.
   
   




Issue Time Tracking
---

Worklog Id: (was: 756133)
Time Spent: 2h  (was: 1h 50m)

> [SBN read] Observer Namenode should not trigger the edits rolling of active 
> Namenode
> 
>
> Key: HDFS-16513
> URL: https://issues.apache.org/jira/browse/HDFS-16513
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> To avoid frequent edtis rolling, we should disable OBN from triggering the 
> edits rolling of active Namenode. 
> It is sufficient to retain only the triggering of SNN and the auto rolling of 
> ANN. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16452) msync RPC should send to Acitve Namenode directly

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16452?focusedWorklogId=756128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756128
 ]

ASF GitHub Bot logged work on HDFS-16452:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 01:54
Start Date: 13/Apr/22 01:54
Worklog Time Spent: 10m 
  Work Description: hfutatzhanghb commented on PR #3976:
URL: https://github.com/apache/hadoop/pull/3976#issuecomment-1097470379

   > Sorry for the delay in my response @hfutatzhanghb.
   > 
   > I want to make sure I understand your proposal fully. I think what you are 
saying is this:
   > 
   > Currently when using `ConfiguredFailoverProxyProvider`, we have a single 
config, `dfs.ha.namenodes.`, where we list all NN addresses. Since we 
don't know the state of any of the addresses, we have to scan all of them to 
find the active. Instead, we could have two configs, e.g. 
`dfs.ha.namenodes..hanodes` and `dfs.ha.namenodes..observers`. The 
first lists NNs that could be either ACTIVE or STANDBY state. The second lists 
only NNs that could be OBSERVER state. (alternatively, 
`dfs.ha.namenodes.` could remain the same and still contain all NNs, and 
we could just add the additional config `dfs.ha.namenodes..observers` to 
mark which of the list are observers, or even add a new per-NN config like 
`dfs.namenode.observer-only..`) It's not possible to transition NNs 
in or out of OBSERVER state w/o changing the config on the client side to move 
NNs between the two lists. This is okay because we assume that we have enough 
NNs in the ACTIVE+STANDBY pool to meet our HA needs, and OBSERVER NNs are only 
used for scaling reads, not providing HA.
   > 
   > Do I have that right? If so, I think this is a reasonable proposal. I 
believe our installations at LinkedIn also assume that ONNs don't participate 
in SbNN/ANN failover transitions.
   
   @xkrogen ,totally right, eric! thanks for your reply. I will submit the 
patch soon. thanks!




Issue Time Tracking
---

Worklog Id: (was: 756128)
Time Spent: 3h 20m  (was: 3h 10m)

> msync RPC  should send to Acitve Namenode directly
> --
>
> Key: HDFS-16452
> URL: https://issues.apache.org/jira/browse/HDFS-16452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.1
>Reporter: zhanghaobo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In current ObserverReadProxyProvider implementation,   we use the following 
> code to  invoke msync RPC.
> {code:java}
> getProxyAsClientProtocol(failoverProxy.getProxy().proxy).msync(); {code}
> But msync RPC maybe send to Observer NameNode in this way, and then failover 
> to Active NameNode.   This can be avoid by applying this patch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521409#comment-17521409
 ] 

tomscut commented on HDFS-16507:


[~xkrogen] Your comment makes a lot of sense to me.

IMO,  there are two ways to approach this problem:
1. Throw an IllegalArgumentException, wait for Edit to be turned off normally, 
and then automatically FSEditLog#purgeLogsOlderThan. However, if SNN is down 
for a long time, edits may take up more disk space.

2. Update `minTxIdToKeep` here. Like the PR I submitted at the beginning.
{code:java}
// Reset purgeLogsFrom to avoid purging edit log which is in progress.
if (isSegmentOpen()) {
   minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : 
minTxIdToKeep;
} {code}
What do you think of this? cc [~sunchao] [~vjasani] .

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> 

[jira] [Comment Edited] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521403#comment-17521403
 ] 

tomscut edited comment on HDFS-16507 at 4/13/22 1:41 AM:
-

Hi [~xkrogen] , thanks for your comments.

The process is as follows:

After checkpoint, SNN passes fsimage to ANN. When ANN receives fsimage, it 
trigger  in ImageServlet#doPut.

Do you mean, if the situation arises that ` minTxIdToKeep > curSegmentTxId `, 
ANN will crash because `Preconditions.CheckArgument` failure?


was (Author: tomscut):
Hi [~xkrogen] , thanks for your comments.

The process is as follows:

After checkpoint, SNN passes fsimage to ANN. When ANN receives fsimage, it 
fires fseditlogpurgelogsolderthan in ImageServlet#doPut.

Do you mean, if the situation arises that ` minTxIdToKeep > curSegmentTxId `, 
ANN will crash because `Preconditions.CheckArgument` failure?

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> 

[jira] [Comment Edited] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521403#comment-17521403
 ] 

tomscut edited comment on HDFS-16507 at 4/13/22 1:41 AM:
-

Hi [~xkrogen] , thanks for your comments.

The process is as follows:

After checkpoint, SNN passes fsimage to ANN. When ANN receives fsimage, it 
trigger FSEditLog#purgeLogsOlderThan in ImageServlet#doPut.

Do you mean, if the situation arises that ` minTxIdToKeep > curSegmentTxId `, 
ANN will crash because `Preconditions.CheckArgument` failure?


was (Author: tomscut):
Hi [~xkrogen] , thanks for your comments.

The process is as follows:

After checkpoint, SNN passes fsimage to ANN. When ANN receives fsimage, it 
trigger  in ImageServlet#doPut.

Do you mean, if the situation arises that ` minTxIdToKeep > curSegmentTxId `, 
ANN will crash because `Preconditions.CheckArgument` failure?

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     

[jira] [Commented] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521403#comment-17521403
 ] 

tomscut commented on HDFS-16507:


Hi [~xkrogen] , thanks for your comments.

The process is as follows:

After checkpoint, SNN passes fsimage to ANN. When ANN receives fsimage, it 
fires fseditlogpurgelogsolderthan in ImageServlet#doPut.

Do you mean, if the situation arises that ` minTxIdToKeep > curSegmentTxId `, 
ANN will crash because `Preconditions.CheckArgument` failure?

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> 

[jira] [Work logged] (HDFS-16526) Add metrics for slow DataNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16526?focusedWorklogId=756124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756124
 ]

ASF GitHub Bot logged work on HDFS-16526:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 01:14
Start Date: 13/Apr/22 01:14
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4162:
URL: https://github.com/apache/hadoop/pull/4162#issuecomment-1097450196

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  0s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m  3s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 410m 10s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 517m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
   |   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4162 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 7788dc3e5bdc 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c77291fe161f9a7d0f29c7c8dc9200a79a14adff |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/2/testReport/ |
   | Max. process+thread count | 

[jira] [Work logged] (HDFS-14478) Add libhdfs APIs for openFile

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?focusedWorklogId=756119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756119
 ]

ASF GitHub Bot logged work on HDFS-14478:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 00:56
Start Date: 13/Apr/22 00:56
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4166:
URL: https://github.com/apache/hadoop/pull/4166#issuecomment-1097440193

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  25m 35s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  25m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   3m 55s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  53m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 45s |  |  the patch passed  |
   | +1 :green_heart: |  cc  |   3m 45s |  |  the patch passed  |
   | +1 :green_heart: |  golang  |   3m 45s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 45s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 36s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 104m 33s |  |  hadoop-hdfs-native-client in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 213m 35s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4166 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit 
codespell golang |
   | uname | Linux f79fb0ed5fd5 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 
06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0d383e5698f449c0fa1fd6c4abeb3b2ad136eb38 |
   | Default Java | Red Hat, Inc.-1.8.0_312-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/testReport/ |
   | Max. process+thread count | 598 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/console |
   | versions | git=2.27.0 maven=3.6.3 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking
---

Worklog Id: (was: 756119)
Time Spent: 1.5h  (was: 1h 20m)

> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
>   int bufferSize, short replication, tSize blocksize);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path);
> hdfsOpenFileBuilder 

[jira] [Work logged] (HDFS-16526) Add metrics for slow DataNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16526?focusedWorklogId=756110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756110
 ]

ASF GitHub Bot logged work on HDFS-16526:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 00:37
Start Date: 13/Apr/22 00:37
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4162:
URL: https://github.com/apache/hadoop/pull/4162#issuecomment-1097429921

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 45s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  16m  7s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |  20m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   3m 41s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |  22m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  20m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 37s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m  0s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 43s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 233m 54s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 14s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 468m 59s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4162 |
   | Optional Tests | dupname asflicense mvnsite codespell markdownlint compile 
javac javadoc mvninstall unit shadedclient spotbugs checkstyle |
   | uname | Linux 09c06697174a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0caef8ec5c716c79ea621bb67455a361c6f7ee59 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-16513) [SBN read] Observer Namenode should not trigger the edits rolling of active Namenode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16513?focusedWorklogId=756108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756108
 ]

ASF GitHub Bot logged work on HDFS-16513:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 00:36
Start Date: 13/Apr/22 00:36
Worklog Time Spent: 10m 
  Work Description: xkrogen commented on code in PR #4087:
URL: https://github.com/apache/hadoop/pull/4087#discussion_r848971956


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1938,6 +1938,14 @@ public boolean isInStandbyState() {
 HAServiceState.OBSERVER == haContext.getState().getServiceState();
   }
 
+  public boolean isInObserverState() {
+if (haContext == null || haContext.getState() == null) {
+  return haEnabled;

Review Comment:
   Seems like this was probably copied from `isInStandbyState()`? But I don't 
think it's right. If we can't find a state, we assume `STANDBY` state. If we 
assume `STANDBY` state because a valid state could not be found, then 
`isInObserverState()` should be false. So I think we should just `return false` 
here.





Issue Time Tracking
---

Worklog Id: (was: 756108)
Time Spent: 1h 50m  (was: 1h 40m)

> [SBN read] Observer Namenode should not trigger the edits rolling of active 
> Namenode
> 
>
> Key: HDFS-16513
> URL: https://issues.apache.org/jira/browse/HDFS-16513
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> To avoid frequent edtis rolling, we should disable OBN from triggering the 
> edits rolling of active Namenode. 
> It is sufficient to retain only the triggering of SNN and the auto rolling of 
> ANN. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16452) msync RPC should send to Acitve Namenode directly

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16452?focusedWorklogId=756105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756105
 ]

ASF GitHub Bot logged work on HDFS-16452:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 00:23
Start Date: 13/Apr/22 00:23
Worklog Time Spent: 10m 
  Work Description: xkrogen commented on PR #3976:
URL: https://github.com/apache/hadoop/pull/3976#issuecomment-1097422287

   Sorry for the delay in my response @hfutatzhanghb.
   
   I want to make sure I understand your proposal fully. I think what you are 
saying is this:
   
   Currently when using `ConfiguredFailoverProxyProvider`, we have a single 
config, `dfs.ha.namenodes.`, where we list all NN addresses. Since we 
don't know the state of any of the addresses, we have to scan all of them to 
find the active. Instead, we could have two configs, e.g. 
`dfs.ha.namenodes..hanodes` and `dfs.ha.namenodes..observers`. The 
first lists NNs that could be either ACTIVE or STANDBY state. The second lists 
only NNs that could be OBSERVER state. (alternatively, 
`dfs.ha.namenodes.` could remain the same and still contain all NNs, and 
we could just add the additional config `dfs.ha.namenodes..observers` to 
mark which of the list are observers, or even add a new per-NN config like 
`dfs.namenode.observer-only..`) It's not possible to transition NNs 
in or out of OBSERVER state w/o changing the config on the client side to move 
NNs between the two lists. This is okay because we assume that we have enough 
NNs in the ACTIVE+STANDBY pool to meet our HA needs, and OBSERVER NNs are only 
used for scaling reads, not providing HA.
   
   Do I have that right? If so, I think this is a reasonable proposal. I 
believe our installations at LinkedIn also assume that ONNs don't participate 
in SbNN/ANN failover transitions. 




Issue Time Tracking
---

Worklog Id: (was: 756105)
Time Spent: 3h 10m  (was: 3h)

> msync RPC  should send to Acitve Namenode directly
> --
>
> Key: HDFS-16452
> URL: https://issues.apache.org/jira/browse/HDFS-16452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.1
>Reporter: zhanghaobo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In current ObserverReadProxyProvider implementation,   we use the following 
> code to  invoke msync RPC.
> {code:java}
> getProxyAsClientProtocol(failoverProxy.getProxy().proxy).msync(); {code}
> But msync RPC maybe send to Observer NameNode in this way, and then failover 
> to Active NameNode.   This can be avoid by applying this patch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521381#comment-17521381
 ] 

Erik Krogen commented on HDFS-16507:


[~tomscut] -- thanks for changing the assert to a 
{{Preconditions.checkArgument()}}. I agree that this makes sense to ensure 
safety of the log.
However if the situation arises that `{{minTxIdToKeep > curSegmentTxId}}`, then 
the check will fail, so the NN will still crash, right? Forgive me if I'm 
misremembering how the NN will handle a failure in 
{{FSEditLog#purgeLogsOlderThan()}}. While I agree that PR#4082 was a good step, 
the underlying issue seems not to be resolved, right?

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     

[jira] [Work logged] (HDFS-14478) Add libhdfs APIs for openFile

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?focusedWorklogId=756034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756034
 ]

ASF GitHub Bot logged work on HDFS-14478:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 21:23
Start Date: 12/Apr/22 21:23
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4166:
URL: https://github.com/apache/hadoop/pull/4166#issuecomment-1097233270

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  44m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   3m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  70m  7s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 38s |  |  the patch passed  |
   | +1 :green_heart: |  cc  |   3m 38s |  |  the patch passed  |
   | +1 :green_heart: |  golang  |   3m 38s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  93m 15s |  |  hadoop-hdfs-native-client in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 236m 34s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4166 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit 
codespell golang |
   | uname | Linux e288a158ab82 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 
06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0d383e5698f449c0fa1fd6c4abeb3b2ad136eb38 |
   | Default Java | Red Hat, Inc.-1.8.0_322-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/testReport/ |
   | Max. process+thread count | 522 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4166/1/console |
   | versions | git=2.9.5 maven=3.6.3 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking
---

Worklog Id: (was: 756034)
Time Spent: 1h 20m  (was: 1h 10m)

> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
>   int bufferSize, short replication, tSize blocksize);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path);
> hdfsOpenFileBuilder 

[jira] [Work logged] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16531?focusedWorklogId=756014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756014
 ]

ASF GitHub Bot logged work on HDFS-16531:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 20:28
Start Date: 12/Apr/22 20:28
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on code in PR #4148:
URL: https://github.com/apache/hadoop/pull/4148#discussion_r848841769


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -2466,11 +2466,12 @@ boolean setReplication(final String src, final short 
replication)
   logAuditEvent(false, operationName, src);
   throw e;
 }
-if (success) {
+if (status == FSDirAttrOp.SetRepStatus.SUCCESS) {
   getEditLog().logSync();
-  logAuditEvent(true, operationName, src);
 }
-return success;
+logAuditEvent(status != FSDirAttrOp.SetRepStatus.INVALID,

Review Comment:
   Yea, it should only log false if `INVALID != SetRepStatus.INVALID` -> this 
would return false. SUCCESS or UNCHANGED would log true. The change here, is 
that failures (INVALID) were previously not logged in the audits, which was a 
bug IMO. Other operations seem to audit failures in general.





Issue Time Tracking
---

Worklog Id: (was: 756014)
Time Spent: 50m  (was: 40m)

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=756001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756001
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 20:03
Start Date: 12/Apr/22 20:03
Worklog Time Spent: 10m 
  Work Description: simbadzina commented on code in PR #4127:
URL: https://github.com/apache/hadoop/pull/4127#discussion_r848823251


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java:
##
@@ -1380,8 +1437,9 @@ private static boolean isExpectedValue(Object 
expectedValue, Object value) {
 final CallerContext originContext = CallerContext.getCurrent();
 for (final T location : locations) {
   String nsId = location.getNameserviceId();
+  boolean isObserverRead = observerReadEnabled && isReadCall(m);
   final List namenodes =
-  getNamenodesForNameservice(nsId);
+  msync(nsId, ugi, isObserverRead);

Review Comment:
   I see. I've added a commit to allow nameservice specific overrides. I left 
observer.auto-msync-period as a global configure





Issue Time Tracking
---

Worklog Id: (was: 756001)
Time Spent: 5h 40m  (was: 5.5h)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=755989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755989
 ]

ASF GitHub Bot logged work on HDFS-16484:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 19:41
Start Date: 12/Apr/22 19:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#issuecomment-1097142217

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 32s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 36s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 24s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 230m 59s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4032/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 337m  0s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4032/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4032 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 373347fb1236 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a4a283206d90e8d765fcd651e6182746a983d97f |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4032/5/testReport/ |
   | Max. process+thread count | 2829 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Work logged] (HDFS-14478) Add libhdfs APIs for openFile

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?focusedWorklogId=755978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755978
 ]

ASF GitHub Bot logged work on HDFS-14478:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 19:20
Start Date: 12/Apr/22 19:20
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on PR #4166:
URL: https://github.com/apache/hadoop/pull/4166#issuecomment-1097122735

   test run of the branch in the docker image on an MBP m1 with tests in 
`/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client`
   ```
[exec] The following tests FAILED:
[exec]  14 - memcheck_rpc_engine (Failed)
[exec]  34 - memcheck_hdfs_config_connect_bugs (Failed)
[exec]  38 - 
memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static (Failed)
   ```
   
   will need to see if they happen on trunk as is




Issue Time Tracking
---

Worklog Id: (was: 755978)
Time Spent: 1h 10m  (was: 1h)

> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
>   int bufferSize, short replication, tSize blocksize);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileFuture *hdfsOpenFileBuilderBuild(hdfsOpenFileBuilder *builder);
> void hdfsOpenFileBuilderFree(hdfsOpenFileBuilder *builder);
> hdfsFile hdfsOpenFileFutureGet(hdfsOpenFileFuture *future);
> hdfsFile hdfsOpenFileFutureGetWithTimeout(hdfsOpenFileFuture *future,
> int64_t timeout, javaConcurrentTimeUnit timeUnit);
> int hdfsOpenFileFutureCancel(hdfsOpenFileFuture *future,
> int mayInterruptIfRunning);
> void hdfsOpenFileFutureFree(hdfsOpenFileFuture *future);
> {code}
> Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
> would just expose the functionality of {{Future}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-14478) Add libhdfs APIs for openFile

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?focusedWorklogId=755899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755899
 ]

ASF GitHub Bot logged work on HDFS-14478:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 17:17
Start Date: 12/Apr/22 17:17
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on PR #4166:
URL: https://github.com/apache/hadoop/pull/4166#issuecomment-1096989666

   this is the original PR; if it compiles and the tests pass, i will merge as 
it.
   
   i do think it needs some more tests (what if a file isn't there, setting 
invalid options etc)
   
   be interesting to think of actually adding tests for this against s3/abfs too




Issue Time Tracking
---

Worklog Id: (was: 755899)
Time Spent: 1h  (was: 50m)

> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
>   int bufferSize, short replication, tSize blocksize);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileFuture *hdfsOpenFileBuilderBuild(hdfsOpenFileBuilder *builder);
> void hdfsOpenFileBuilderFree(hdfsOpenFileBuilder *builder);
> hdfsFile hdfsOpenFileFutureGet(hdfsOpenFileFuture *future);
> hdfsFile hdfsOpenFileFutureGetWithTimeout(hdfsOpenFileFuture *future,
> int64_t timeout, javaConcurrentTimeUnit timeUnit);
> int hdfsOpenFileFutureCancel(hdfsOpenFileFuture *future,
> int mayInterruptIfRunning);
> void hdfsOpenFileFutureFree(hdfsOpenFileFuture *future);
> {code}
> Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
> would just expose the functionality of {{Future}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-14478) Add libhdfs APIs for openFile

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?focusedWorklogId=755897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755897
 ]

ASF GitHub Bot logged work on HDFS-14478:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 17:16
Start Date: 12/Apr/22 17:16
Worklog Time Spent: 10m 
  Work Description: steveloughran opened a new pull request, #4166:
URL: https://github.com/apache/hadoop/pull/4166

   
   ### Description of PR
   
   pr #955 rebased to trunk
   
   ### How was this patch tested?
   
   there's a new test
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




Issue Time Tracking
---

Worklog Id: (was: 755897)
Time Spent: 50m  (was: 40m)

> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
>   int bufferSize, short replication, tSize blocksize);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileFuture *hdfsOpenFileBuilderBuild(hdfsOpenFileBuilder *builder);
> void hdfsOpenFileBuilderFree(hdfsOpenFileBuilder *builder);
> hdfsFile hdfsOpenFileFutureGet(hdfsOpenFileFuture *future);
> hdfsFile hdfsOpenFileFutureGetWithTimeout(hdfsOpenFileFuture *future,
> int64_t timeout, javaConcurrentTimeUnit timeUnit);
> int hdfsOpenFileFutureCancel(hdfsOpenFileFuture *future,
> int mayInterruptIfRunning);
> void hdfsOpenFileFutureFree(hdfsOpenFileFuture *future);
> {code}
> Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
> would just expose the functionality of {{Future}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16526) Add metrics for slow DataNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16526?focusedWorklogId=755874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755874
 ]

ASF GitHub Bot logged work on HDFS-16526:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 16:38
Start Date: 12/Apr/22 16:38
Worklog Time Spent: 10m 
  Work Description: prasad-acit commented on PR #4162:
URL: https://github.com/apache/hadoop/pull/4162#issuecomment-1096952896

   Updated checkstyle issues.
   @hemanthboyina can you plz review MR?




Issue Time Tracking
---

Worklog Id: (was: 755874)
Time Spent: 0.5h  (was: 20m)

> Add metrics for slow DataNode
> -
>
> Key: HDFS-16526
> URL: https://issues.apache.org/jira/browse/HDFS-16526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add some more metrics for slow datanode operations - FlushOrSync, 
> PacketResponder send ACK.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16474) Make HDFS tail tool cross platform

2022-04-12 Thread Gautham Banasandra (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautham Banasandra resolved HDFS-16474.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Merged PR https://github.com/apache/hadoop/pull/4157 to trunk.

> Make HDFS tail tool cross platform
> --
>
> Key: HDFS-16474
> URL: https://issues.apache.org/jira/browse/HDFS-16474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs++, tools
>Affects Versions: 3.4.0
> Environment: Centos 7, Centos 8, Debian 10, Ubuntu Focal
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: libhdfscpp, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The source files for *hdfs_tail* uses *getopt* for parsing the command line 
> arguments. getopt is available only on Linux and thus, isn't cross platform. 
> We need to replace getopt with *boost::program_options* to make these tools 
> cross platform.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16474) Make HDFS tail tool cross platform

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16474?focusedWorklogId=755816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755816
 ]

ASF GitHub Bot logged work on HDFS-16474:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 15:01
Start Date: 12/Apr/22 15:01
Worklog Time Spent: 10m 
  Work Description: GauthamBanasandra merged PR #4157:
URL: https://github.com/apache/hadoop/pull/4157




Issue Time Tracking
---

Worklog Id: (was: 755816)
Time Spent: 1h  (was: 50m)

> Make HDFS tail tool cross platform
> --
>
> Key: HDFS-16474
> URL: https://issues.apache.org/jira/browse/HDFS-16474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs++, tools
>Affects Versions: 3.4.0
> Environment: Centos 7, Centos 8, Debian 10, Ubuntu Focal
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: libhdfscpp, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The source files for *hdfs_tail* uses *getopt* for parsing the command line 
> arguments. getopt is available only on Linux and thus, isn't cross platform. 
> We need to replace getopt with *boost::program_options* to make these tools 
> cross platform.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16509) Fix decommission UnsupportedOperationException: Remove unsupported

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16509?focusedWorklogId=755808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755808
 ]

ASF GitHub Bot logged work on HDFS-16509:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 14:56
Start Date: 12/Apr/22 14:56
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4077:
URL: https://github.com/apache/hadoop/pull/4077#issuecomment-1096839051

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 33s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 246m 55s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4077/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 356m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4077/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4077 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 86c425975ba9 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a1b31764e6a00d7a1856c2ce80f1f6367e11 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4077/3/testReport/ |
   | Max. process+thread count | 3194 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Work logged] (HDFS-16509) Fix decommission UnsupportedOperationException: Remove unsupported

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16509?focusedWorklogId=755806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755806
 ]

ASF GitHub Bot logged work on HDFS-16509:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 14:53
Start Date: 12/Apr/22 14:53
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4077:
URL: https://github.com/apache/hadoop/pull/4077#issuecomment-1096835061

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 45s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  0s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 243m 48s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 351m 40s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4077/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4077 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux d352ea646987 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a1b31764e6a00d7a1856c2ce80f1f6367e11 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4077/4/testReport/ |
   | Max. process+thread count | 3460 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4077/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message 

[jira] [Work logged] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=755772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755772
 ]

ASF GitHub Bot logged work on HDFS-16484:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 14:06
Start Date: 12/Apr/22 14:06
Worklog Time Spent: 10m 
  Work Description: liubingxing commented on code in PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#discussion_r848480069


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/sps/BlockStorageMovementNeeded.java:
##
@@ -248,13 +251,22 @@ public void run() {
   pendingWorkForDirectory.get(startINode);
   if (dirPendingWorkInfo != null
   && dirPendingWorkInfo.isDirWorkDone()) {
-ctxt.removeSPSHint(startINode);
+try {
+  ctxt.removeSPSHint(startINode);
+} catch (FileNotFoundException e) {
+  // ignore if the file doesn't already exist
+  startINode = null;
+}
 pendingWorkForDirectory.remove(startINode);
   }
 }
 startINode = null; // Current inode successfully scanned.
   }
 } catch (Throwable t) {
+  retryCount++;
+  if (retryCount >= 3) {
+startINode = null;
+  }

Review Comment:
   @tasanuma Thanks for your advices. I updated the code.





Issue Time Tracking
---

Worklog Id: (was: 755772)
Time Spent: 3.5h  (was: 3h 20m)

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
>   }
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16526) Add metrics for slow DataNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16526?focusedWorklogId=755741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755741
 ]

ASF GitHub Bot logged work on HDFS-16526:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:24
Start Date: 12/Apr/22 13:24
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4162:
URL: https://github.com/apache/hadoop/pull/4162#issuecomment-1096724703

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 10s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  15m 37s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  27m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 53s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 119 unchanged 
- 0 fixed = 122 total (was 119)  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 378m 43s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 470m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4162/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4162 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux bc535f39d733 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 
11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 20008543bb453fe979afe0583353a4c64e501596 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 

[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-04-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521131#comment-17521131
 ] 

袁枫 commented on HDFS-16493:
---

[~liutongwei]
 Is there some mistake?
1. 
{code:java}
failLoggerAtTxn(spies.get(1), 4);
failLoggerAtTxn(spies.get(2), 4);
{code}
indicate jn2 and jn3?

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: example.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16428) Source path with storagePolicy cause wrong typeConsumed while rename

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16428:
--
Target Version/s: 3.3.3

> Source path with storagePolicy cause wrong typeConsumed while rename
> 
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.4
>
> Attachments: example.txt
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16355) Improve the description of dfs.block.scanner.volume.bytes.per.second

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16355:
--
Target Version/s: 3.3.3

> Improve the description of dfs.block.scanner.volume.bytes.per.second
> 
>
> Key: HDFS-16355
> URL: https://issues.apache.org/jira/browse/HDFS-16355
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, hdfs
>Affects Versions: 3.3.1
>Reporter: guophilipse
>Assignee: guophilipse
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> datanode block scanner will be disabled if 
> `dfs.block.scanner.volume.bytes.per.second` is configured less then or equal 
> to zero, we can improve the desciption



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16355) Improve the description of dfs.block.scanner.volume.bytes.per.second

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HDFS-16355:
---

> Improve the description of dfs.block.scanner.volume.bytes.per.second
> 
>
> Key: HDFS-16355
> URL: https://issues.apache.org/jira/browse/HDFS-16355
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, hdfs
>Affects Versions: 3.3.1
>Reporter: guophilipse
>Assignee: guophilipse
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> datanode block scanner will be disabled if 
> `dfs.block.scanner.volume.bytes.per.second` is configured less then or equal 
> to zero, we can improve the desciption



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16501) Print the exception when reporting a bad block

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HDFS-16501:
---

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16501) Print the exception when reporting a bad block

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16501:
--
Target Version/s: 3.3.3

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11041) Unable to unregister FsDatasetState MBean if DataNode is shutdown twice

2022-04-12 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521095#comment-17521095
 ] 

Steve Loughran commented on HDFS-11041:
---

including in 3.3.3 release for consistency with 3.2.3

> Unable to unregister FsDatasetState MBean if DataNode is shutdown twice
> ---
>
> Key: HDFS-11041
> URL: https://issues.apache.org/jira/browse/HDFS-11041
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Trivial
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.4
>
> Attachments: HDFS-11041.01.patch, HDFS-11041.02.patch, 
> HDFS-11041.03.patch
>
>
> I saw error message like the following in some tests
> {noformat}
> 2016-10-21 04:09:03,900 [main] WARN  util.MBeans 
> (MBeans.java:unregister(114)) - Error unregistering 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
> javax.management.InstanceNotFoundException: 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:112)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:2127)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1985)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1962)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeReport.testDatanodeReport(TestDatanodeReport.java:144)
> {noformat}
> The test shuts down datanode, and then shutdown cluster, which shuts down the 
> a datanode twice. Resetting the FsDatasetSpi reference in DataNode to null 
> resolves the issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11041) Unable to unregister FsDatasetState MBean if DataNode is shutdown twice

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HDFS-11041:
---

> Unable to unregister FsDatasetState MBean if DataNode is shutdown twice
> ---
>
> Key: HDFS-11041
> URL: https://issues.apache.org/jira/browse/HDFS-11041
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Trivial
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.4
>
> Attachments: HDFS-11041.01.patch, HDFS-11041.02.patch, 
> HDFS-11041.03.patch
>
>
> I saw error message like the following in some tests
> {noformat}
> 2016-10-21 04:09:03,900 [main] WARN  util.MBeans 
> (MBeans.java:unregister(114)) - Error unregistering 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
> javax.management.InstanceNotFoundException: 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:112)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:2127)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1985)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1962)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeReport.testDatanodeReport(TestDatanodeReport.java:144)
> {noformat}
> The test shuts down datanode, and then shutdown cluster, which shuts down the 
> a datanode twice. Resetting the FsDatasetSpi reference in DataNode to null 
> resolves the issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11041) Unable to unregister FsDatasetState MBean if DataNode is shutdown twice

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-11041:
--
Target Version/s: 3.3.3

> Unable to unregister FsDatasetState MBean if DataNode is shutdown twice
> ---
>
> Key: HDFS-11041
> URL: https://issues.apache.org/jira/browse/HDFS-11041
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Trivial
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.4
>
> Attachments: HDFS-11041.01.patch, HDFS-11041.02.patch, 
> HDFS-11041.03.patch
>
>
> I saw error message like the following in some tests
> {noformat}
> 2016-10-21 04:09:03,900 [main] WARN  util.MBeans 
> (MBeans.java:unregister(114)) - Error unregistering 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
> javax.management.InstanceNotFoundException: 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:112)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:2127)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1985)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1962)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeReport.testDatanodeReport(TestDatanodeReport.java:144)
> {noformat}
> The test shuts down datanode, and then shutdown cluster, which shuts down the 
> a datanode twice. Resetting the FsDatasetSpi reference in DataNode to null 
> resolves the issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HDFS-16507:
---

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>     
> 

[jira] [Reopened] (HDFS-16428) Source path with storagePolicy cause wrong typeConsumed while rename

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HDFS-16428:
---

> Source path with storagePolicy cause wrong typeConsumed while rename
> 
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.4
>
> Attachments: example.txt
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16507:
--
Target Version/s: 3.3.3

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>     
> 

[jira] [Updated] (HDFS-16437) ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16437:
--
Target Version/s: 3.3.3

> ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.
> --
>
> Key: HDFS-16437
> URL: https://issues.apache.org/jira/browse/HDFS-16437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1, 3.3.0
>Reporter: yanbin.zhang
>Assignee: yanbin.zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.4
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In a cluster environment without snapshot, if you want to convert back to 
> fsimage through the generated xml, an error will be reported.
> {code:java}
> //代码占位符
> [test@test001 ~]$ hdfs oiv -p ReverseXML -i fsimage_0257220.xml 
> -o fsimage_0257220
> OfflineImageReconstructor failed: FSImage XML ended prematurely, without 
> including section(s) SnapshotDiffSection
> java.io.IOException: FSImage XML ended prematurely, without including 
> section(s) SnapshotDiffSection
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1765)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1842)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:211)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:149)
> 22/01/25 15:56:52 INFO util.ExitUtil: Exiting with status 1: ExitException 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16437) ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HDFS-16437:
---

> ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.
> --
>
> Key: HDFS-16437
> URL: https://issues.apache.org/jira/browse/HDFS-16437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1, 3.3.0
>Reporter: yanbin.zhang
>Assignee: yanbin.zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.4
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In a cluster environment without snapshot, if you want to convert back to 
> fsimage through the generated xml, an error will be reported.
> {code:java}
> //代码占位符
> [test@test001 ~]$ hdfs oiv -p ReverseXML -i fsimage_0257220.xml 
> -o fsimage_0257220
> OfflineImageReconstructor failed: FSImage XML ended prematurely, without 
> including section(s) SnapshotDiffSection
> java.io.IOException: FSImage XML ended prematurely, without including 
> section(s) SnapshotDiffSection
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1765)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1842)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:211)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:149)
> 22/01/25 15:56:52 INFO util.ExitUtil: Exiting with status 1: ExitException 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16422:
--
Target Version/s: 3.3.3

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.4
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HDFS-16422:
---

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.4
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16479) EC: NameNode should not send a reconstruction work when the source datanodes are insufficient

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16479?focusedWorklogId=755680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755680
 ]

ASF GitHub Bot logged work on HDFS-16479:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 09:53
Start Date: 12/Apr/22 09:53
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4138:
URL: https://github.com/apache/hadoop/pull/4138#issuecomment-1096474092

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  17m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 29s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 343m 15s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 476m 25s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4138/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4138 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 50fed07d0f97 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 
06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a0d57569803182cf83330753661d6a5d1d7ba660 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4138/3/testReport/ |
   | Max. process+thread count | 1962 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4138/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This 

[jira] [Work logged] (HDFS-16509) Fix decommission UnsupportedOperationException: Remove unsupported

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16509?focusedWorklogId=755655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755655
 ]

ASF GitHub Bot logged work on HDFS-16509:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 09:02
Start Date: 12/Apr/22 09:02
Worklog Time Spent: 10m 
  Work Description: cndaimin commented on PR #4077:
URL: https://github.com/apache/hadoop/pull/4077#issuecomment-1096403594

   Hi @Hexiaoqiao Thanks for your review! I have added a test to this PR. 
Please take a look again, thanks.




Issue Time Tracking
---

Worklog Id: (was: 755655)
Time Spent: 1h 10m  (was: 1h)

> Fix decommission UnsupportedOperationException: Remove unsupported
> --
>
> Key: HDFS-16509
> URL: https://issues.apache.org/jira/browse/HDFS-16509
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We encountered an "UnsupportedOperationException: Remove unsupported" error 
> when some datanodes were in decommission. The reason of the exception is that 
> datanode.getBlockIterator() returns an Iterator does not support remove, 
> however DatanodeAdminDefaultMonitor#processBlocksInternal invokes it.remove() 
> when a block not found, e.g, the file containing the block is deleted.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=755644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755644
 ]

ASF GitHub Bot logged work on HDFS-16484:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 08:43
Start Date: 12/Apr/22 08:43
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#issuecomment-1096375247

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m  7s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  4s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 22s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 230m 38s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 337m  0s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4032/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4032 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 3498137c0050 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 11efafe90fce86bcd137b42d891db6d30d499688 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4032/4/testReport/ |
   | Max. process+thread count | 3326 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4032/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This