[jira] [Commented] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-08 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317683#comment-17317683
 ] 

Fengnan Li commented on HDFS-15756:
---

This was discussed in 
[HDFS-14405|https://issues.apache.org/jira/browse/HDFS-14405]. And yes a 
different storage with strong consistency to the view of clients can solve the 
issue.

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?focusedWorklogId=579770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579770
 ]

ASF GitHub Bot logged work on HDFS-15423:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 05:38
Start Date: 09/Apr/21 05:38
Worklog Time Spent: 10m 
  Work Description: fengnanli commented on pull request #2605:
URL: https://github.com/apache/hadoop/pull/2605#issuecomment-816422020


   @goiri  Can we land this one? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579770)
Time Spent: 5h 50m  (was: 5h 40m)

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15887) Make LogRoll and TailEdits execute in parallel

2021-04-08 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317645#comment-17317645
 ] 

Wei-Chiu Chuang commented on HDFS-15887:


Not an expert here, but makes sense to me.

> Make LogRoll and TailEdits execute in parallel
> --
>
> Key: HDFS-15887
> URL: https://issues.apache.org/jira/browse/HDFS-15887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: edit_files.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the EditLogTailer class, LogRoll and TailEdits are executed in a thread, 
> and when a checkpoint occurs, it will compete with TailEdits for lock 
> (FSNamesystem#cpLock).
> Usually, it takes a long time to execute checkpoint, which will cause the 
> size of the generated edit log file to be relatively large.
> For example, here is an actual effect:
> The StandbyCheckpointer log is triggered as follows :  edit_files.jpg
> 2021-03-11 09:18:42,513 [769071096]-INFO [Standby State 
> Checkpointer:StandbyCheckpointer$CheckpointerThread@335]-Triggering 
> checkpoint because there have been 5142154 txns since the last checkpoint, 
> which exceeds the configured threshold 100
> When loading an edit log with a large amount of data, the processing time 
> will be longer. We should make the edit log size as even as possible, which 
> is good for the operation of the system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15243) Add an option to prevent sub-directories of protected directories from deletion

2021-04-08 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15243:
---
Fix Version/s: 3.3.1

> Add an option to prevent sub-directories of protected directories from 
> deletion
> ---
>
> Key: HDFS-15243
> URL: https://issues.apache.org/jira/browse/HDFS-15243
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1
>Affects Versions: 3.1.1
>Reporter: liuyanyu
>Assignee: liuyanyu
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15243.001.patch, HDFS-15243.002.patch, 
> HDFS-15243.003.patch, HDFS-15243.004.patch, HDFS-15243.005.patch, 
> HDFS-15243.006.patch, image-2020-03-28-09-23-31-335.png
>
>
> HDFS-8983 add  fs.protected.directories to support protected directories on 
> NameNode.  But as I test, when set a parent directory(eg /testA)  to 
> protected directory, the child directory (eg /testA/testB) still can be 
> deleted or renamed. When we protect a directory  mainly for protecting the 
> data under this directory , So I think the child directory should not be 
> delete or renamed if the parent directory is a protected directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15887) Make LogRoll and TailEdits execute in parallel

2021-04-08 Thread JiangHua Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317614#comment-17317614
 ] 

JiangHua Zhu commented on HDFS-15887:
-

[~weichiu] [~hexiaoqiao], I submitted some code. Can you give me a review?
Thank you very much.

> Make LogRoll and TailEdits execute in parallel
> --
>
> Key: HDFS-15887
> URL: https://issues.apache.org/jira/browse/HDFS-15887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: edit_files.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the EditLogTailer class, LogRoll and TailEdits are executed in a thread, 
> and when a checkpoint occurs, it will compete with TailEdits for lock 
> (FSNamesystem#cpLock).
> Usually, it takes a long time to execute checkpoint, which will cause the 
> size of the generated edit log file to be relatively large.
> For example, here is an actual effect:
> The StandbyCheckpointer log is triggered as follows :  edit_files.jpg
> 2021-03-11 09:18:42,513 [769071096]-INFO [Standby State 
> Checkpointer:StandbyCheckpointer$CheckpointerThread@335]-Triggering 
> checkpoint because there have been 5142154 txns since the last checkpoint, 
> which exceeds the configured threshold 100
> When loading an edit log with a large amount of data, the processing time 
> will be longer. We should make the edit log size as even as possible, which 
> is good for the operation of the system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15960) Router NamenodeHeartbeatService fails to authenticate with namenode in a kerberized envi

2021-04-08 Thread Borislav Iordanov (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Borislav Iordanov updated HDFS-15960:
-
Description: We use http.hadoop.authentication.type = "kerberos" and when 
the NamenodeHeartbeatService calls the namenode via JMX, it is not providing a 
user security context so the authentication token is not transmitted and it 
fails.  

> Router NamenodeHeartbeatService fails to authenticate with namenode in a 
> kerberized envi
> 
>
> Key: HDFS-15960
> URL: https://issues.apache.org/jira/browse/HDFS-15960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Borislav Iordanov
>Priority: Major
>
> We use http.hadoop.authentication.type = "kerberos" and when the 
> NamenodeHeartbeatService calls the namenode via JMX, it is not providing a 
> user security context so the authentication token is not transmitted and it 
> fails.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15960) Router NamenodeHeartbeatService fails to authenticate with namenode in a kerberized envi

2021-04-08 Thread Borislav Iordanov (Jira)
Borislav Iordanov created HDFS-15960:


 Summary: Router NamenodeHeartbeatService fails to authenticate 
with namenode in a kerberized envi
 Key: HDFS-15960
 URL: https://issues.apache.org/jira/browse/HDFS-15960
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Borislav Iordanov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15940) Some tests in TestBlockRecovery are consistently failing

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15940?focusedWorklogId=579694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579694
 ]

ASF GitHub Bot logged work on HDFS-15940:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 02:21
Start Date: 09/Apr/21 02:21
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #2874:
URL: https://github.com/apache/hadoop/pull/2874#issuecomment-816354020


   Merged to trunk and cherry-picked to branch-3.3. Thanks for your PR, 
@virajjasani.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579694)
Time Spent: 5h 20m  (was: 5h 10m)

> Some tests in TestBlockRecovery are consistently failing
> 
>
> Key: HDFS-15940
> URL: https://issues.apache.org/jira/browse/HDFS-15940
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Some long running tests in TestBlockRecovery are consistently failing. Also, 
> TestBlockRecovery is huge with so many tests, we should refactor some of long 
> running and race condition specific tests to separate class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15940) Some tests in TestBlockRecovery are consistently failing

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15940?focusedWorklogId=579692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579692
 ]

ASF GitHub Bot logged work on HDFS-15940:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 02:10
Start Date: 09/Apr/21 02:10
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #2874:
URL: https://github.com/apache/hadoop/pull/2874


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579692)
Time Spent: 5h 10m  (was: 5h)

> Some tests in TestBlockRecovery are consistently failing
> 
>
> Key: HDFS-15940
> URL: https://issues.apache.org/jira/browse/HDFS-15940
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Some long running tests in TestBlockRecovery are consistently failing. Also, 
> TestBlockRecovery is huge with so many tests, we should refactor some of long 
> running and race condition specific tests to separate class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15958) TestBPOfferService.testMissBlocksWhenReregister is flaky

2021-04-08 Thread Borislav Iordanov (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Borislav Iordanov updated HDFS-15958:
-
Priority: Minor  (was: Major)

> TestBPOfferService.testMissBlocksWhenReregister is flaky
> 
>
> Key: HDFS-15958
> URL: https://issues.apache.org/jira/browse/HDFS-15958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Borislav Iordanov
>Priority: Minor
>
> This test fails relatively frequently due to a race condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15959) Add support to digest based authentication in ZKDelegationTokenSecretManager

2021-04-08 Thread Borislav Iordanov (Jira)
Borislav Iordanov created HDFS-15959:


 Summary: Add support to digest based authentication in 
ZKDelegationTokenSecretManager
 Key: HDFS-15959
 URL: https://issues.apache.org/jira/browse/HDFS-15959
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Borislav Iordanov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15621) Datanode DirectoryScanner uses excessive memory

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15621?focusedWorklogId=579626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579626
 ]

ASF GitHub Bot logged work on HDFS-15621:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 22:52
Start Date: 08/Apr/21 22:52
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2849:
URL: https://github.com/apache/hadoop/pull/2849#issuecomment-816282010


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m  4s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 44s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 349m  5s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 442m 11s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   |   | hadoop.hdfs.server.balancer.TestBalancer |
   |   | hadoop.hdfs.server.datanode.TestBlockScanner |
   |   | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.server.namenode.TestFileTruncate |
   |   | hadoop.hdfs.TestViewDistributedFileSystemWithMountLinks |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
   |   | hadoop.hdfs.TestPersistBlocks |
   |   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.TestDFSShell |
   |   | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots |
   |   | hadoop.hdfs.server.datanode.TestIncrementalBrVariations |
   |   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList |
   |   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/2/artifact/out/Dockerfile
 |
   | GITHUB PR | 

[jira] [Created] (HDFS-15958) TestBPOfferService.testMissBlocksWhenReregister is flaky

2021-04-08 Thread Borislav Iordanov (Jira)
Borislav Iordanov created HDFS-15958:


 Summary: TestBPOfferService.testMissBlocksWhenReregister is flaky
 Key: HDFS-15958
 URL: https://issues.apache.org/jira/browse/HDFS-15958
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Borislav Iordanov


This test fails relatively frequently due to a race condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15957) The ignored IOException in the RPC response sent by FSEditLogAsync can cause the HDFS client to hang

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15957:
--
Labels: pull-request-available  (was: )

> The ignored IOException in the RPC response sent by FSEditLogAsync can cause 
> the HDFS client to hang
> 
>
> Key: HDFS-15957
> URL: https://issues.apache.org/jira/browse/HDFS-15957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
> Attachments: fsshell.txt, namenode.txt, reproduce.patch, 
> secondnamenode.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>     In FSEditLogAsync, the RpcEdit notification in line 248 could be skipped, 
> because the possible exception (e.g., IOException) thrown in line 365 is 
> always ignored.
>  
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
> class FSEditLogAsync extends FSEditLog implements Runnable {
>   // ...
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit();
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit();
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId());
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);   // line 
> 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
>   // the calling rpc thread will return immediately from logSync but the
>   // rpc response will not be sent until the edit is durable.
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();// line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
>   }
> }
> {code}
>     The `call.sendResponse()` may throw an IOException. According to the 
> comment (“don’t care if not sent”) there, this exception is neither handled 
> nor printed in log. However, we suspect that some RPC responses sent there 
> may be critical, and there should be some retry mechanism.
>     We try to introduce a single IOException in line 365, and find that the 
> HDFS client (e.g., `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`) may get 
> stuck forever (hang for >30min without any log). We can reproduce this 
> symptom in multiple ways. One of the simplest ways of reproduction is shown 
> as follows:
>  # Start a new empty HDFS cluster (1 namenode, 2 datanodes) with the default 
> configuration.
>  # Generate a file of 15MB for testing, by `fallocate -l 1500 foo.txt`.
>  # Run the HDFS client `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`.
>  # When line 365 is invoked the third time (it is invoked 6 times in total in 
> this experiment), inject an IOException there. (A patch for injecting the 
> exception this way is attached to reproduce the issue)
>     Then the client hangs forever, without any log. If we run `bin/hdfs dfs 
> -ls /` to check the file status, we can not see the expected 15MB `/1.txt` 
> file.
>     The jstack of the HDFS client shows that there is an RPC call infinitely 
> waiting.
> {code:java}
> "Thread-6" #18 daemon prio=5 os_prio=0 tid=0x7f9cd5295800 nid=0x26b9 in 
> Object.wait() [0x7f9ca354f000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00071e709610> (a org.apache.hadoop.ipc.Client$Call)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556)
> - locked <0x00071e709610> (a org.apache.hadoop.ipc.Client$Call)
> at 

[jira] [Work logged] (HDFS-15957) The ignored IOException in the RPC response sent by FSEditLogAsync can cause the HDFS client to hang

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15957?focusedWorklogId=579611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579611
 ]

ASF GitHub Bot logged work on HDFS-15957:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 22:20
Start Date: 08/Apr/21 22:20
Worklog Time Spent: 10m 
  Work Description: functioner opened a new pull request #2878:
URL: https://github.com/apache/hadoop/pull/2878


   I propose a fix for 
[HDFS-15957](https://issues.apache.org/jira/browse/HDFS-15957). And probably we 
should make `RESPONSE_SEND_RETRIES` configurable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579611)
Remaining Estimate: 0h
Time Spent: 10m

> The ignored IOException in the RPC response sent by FSEditLogAsync can cause 
> the HDFS client to hang
> 
>
> Key: HDFS-15957
> URL: https://issues.apache.org/jira/browse/HDFS-15957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
> Attachments: fsshell.txt, namenode.txt, reproduce.patch, 
> secondnamenode.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>     In FSEditLogAsync, the RpcEdit notification in line 248 could be skipped, 
> because the possible exception (e.g., IOException) thrown in line 365 is 
> always ignored.
>  
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
> class FSEditLogAsync extends FSEditLog implements Runnable {
>   // ...
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit();
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit();
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId());
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);   // line 
> 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
>   // the calling rpc thread will return immediately from logSync but the
>   // rpc response will not be sent until the edit is durable.
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();// line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
>   }
> }
> {code}
>     The `call.sendResponse()` may throw an IOException. According to the 
> comment (“don’t care if not sent”) there, this exception is neither handled 
> nor printed in log. However, we suspect that some RPC responses sent there 
> may be critical, and there should be some retry mechanism.
>     We try to introduce a single IOException in line 365, and find that the 
> HDFS client (e.g., `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`) may get 
> stuck forever (hang for >30min without any log). We can reproduce this 
> symptom in multiple ways. One of the simplest ways of reproduction is shown 
> as follows:
>  # Start a new empty HDFS cluster (1 namenode, 2 datanodes) with the default 
> configuration.
>  # Generate a file of 15MB for testing, by `fallocate -l 1500 foo.txt`.
>  # Run the HDFS client `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`.
>  # When line 365 is invoked the third time (it is invoked 6 times in total in 
> this experiment), inject an IOException there. (A patch for injecting the 
> exception this way is attached to reproduce the issue)
>     Then the client hangs forever, without any log. If we run `bin/hdfs 

[jira] [Created] (HDFS-15957) The ignored IOException in the RPC response sent by FSEditLogAsync can cause the HDFS client to hang

2021-04-08 Thread Haoze Wu (Jira)
Haoze Wu created HDFS-15957:
---

 Summary: The ignored IOException in the RPC response sent by 
FSEditLogAsync can cause the HDFS client to hang
 Key: HDFS-15957
 URL: https://issues.apache.org/jira/browse/HDFS-15957
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs async, namenode
Affects Versions: 3.2.2
Reporter: Haoze Wu
 Attachments: fsshell.txt, namenode.txt, reproduce.patch, 
secondnamenode.txt

    In FSEditLogAsync, the RpcEdit notification in line 248 could be skipped, 
because the possible exception (e.g., IOException) thrown in line 365 is always 
ignored.

 
{code:java}
//hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
class FSEditLogAsync extends FSEditLog implements Runnable {
  // ...

  @Override
  public void run() {
try {
  while (true) {
boolean doSync;
Edit edit = dequeueEdit();
if (edit != null) {
  // sync if requested by edit log.
  doSync = edit.logEdit();
  syncWaitQ.add(edit);
} else {
  // sync when editq runs dry, but have edits pending a sync.
  doSync = !syncWaitQ.isEmpty();
}
if (doSync) {
  // normally edit log exceptions cause the NN to terminate, but tests
  // relying on ExitUtil.terminate need to see the exception.
  RuntimeException syncEx = null;
  try {
logSync(getLastWrittenTxId());
  } catch (RuntimeException ex) {
syncEx = ex;
  }
  while ((edit = syncWaitQ.poll()) != null) {
edit.logSyncNotify(syncEx);   // line 
248
  }
}
  }
} catch (InterruptedException ie) {
  LOG.info(Thread.currentThread().getName() + " was interrupted, exiting");
} catch (Throwable t) {
  terminate(t);
}
  }

  // the calling rpc thread will return immediately from logSync but the
  // rpc response will not be sent until the edit is durable.
  private static class RpcEdit extends Edit {
// ...

@Override
public void logSyncNotify(RuntimeException syncEx) {
  try {
if (syncEx == null) {
  call.sendResponse();// line 
365
} else {
  call.abortResponse(syncEx);
}
  } catch (Exception e) {} // don't care if not sent.
}

  }

}
{code}
    The `call.sendResponse()` may throw an IOException. According to the 
comment (“don’t care if not sent”) there, this exception is neither handled nor 
printed in log. However, we suspect that some RPC responses sent there may be 
critical, and there should be some retry mechanism.

    We try to introduce a single IOException in line 365, and find that the 
HDFS client (e.g., `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`) may get 
stuck forever (hang for >30min without any log). We can reproduce this symptom 
in multiple ways. One of the simplest ways of reproduction is shown as follows:
 # Start a new empty HDFS cluster (1 namenode, 2 datanodes) with the default 
configuration.
 # Generate a file of 15MB for testing, by `fallocate -l 1500 foo.txt`.
 # Run the HDFS client `bin/hdfs dfs -copyFromLocal ./foo.txt /1.txt`.
 # When line 365 is invoked the third time (it is invoked 6 times in total in 
this experiment), inject an IOException there. (A patch for injecting the 
exception this way is attached to reproduce the issue)

    Then the client hangs forever, without any log. If we run `bin/hdfs dfs -ls 
/` to check the file status, we can not see the expected 15MB `/1.txt` file.

    The jstack of the HDFS client shows that there is an RPC call infinitely 
waiting.
{code:java}
"Thread-6" #18 daemon prio=5 os_prio=0 tid=0x7f9cd5295800 nid=0x26b9 in 
Object.wait() [0x7f9ca354f000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00071e709610> (a org.apache.hadoop.ipc.Client$Call)
at java.lang.Object.wait(Object.java:502)
at org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556)
- locked <0x00071e709610> (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.Client.call(Client.java:1513)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

[jira] [Commented] (HDFS-15955) Make explicit_bzero cross platform

2021-04-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317336#comment-17317336
 ] 

Íñigo Goiri commented on HDFS-15955:


Thanks [~gautham] for the patch.
Merged PR 2875.

> Make explicit_bzero cross platform
> --
>
> Key: HDFS-15955
> URL: https://issues.apache.org/jira/browse/HDFS-15955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The function explicit_bzero isn't available in Visual C++. Need to make this 
> cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15955) Make explicit_bzero cross platform

2021-04-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-15955.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Make explicit_bzero cross platform
> --
>
> Key: HDFS-15955
> URL: https://issues.apache.org/jira/browse/HDFS-15955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The function explicit_bzero isn't available in Visual C++. Need to make this 
> cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15955) Make explicit_bzero cross platform

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15955?focusedWorklogId=579354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579354
 ]

ASF GitHub Bot logged work on HDFS-15955:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 16:44
Start Date: 08/Apr/21 16:44
Worklog Time Spent: 10m 
  Work Description: goiri merged pull request #2875:
URL: https://github.com/apache/hadoop/pull/2875


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579354)
Time Spent: 50m  (was: 40m)

> Make explicit_bzero cross platform
> --
>
> Key: HDFS-15955
> URL: https://issues.apache.org/jira/browse/HDFS-15955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The function explicit_bzero isn't available in Visual C++. Need to make this 
> cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15956) Provide utility class for FSNamesystem

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15956?focusedWorklogId=579345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579345
 ]

ASF GitHub Bot logged work on HDFS-15956:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 16:19
Start Date: 08/Apr/21 16:19
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #2876:
URL: https://github.com/apache/hadoop/pull/2876#issuecomment-815959265


   I understand, some of the most critical and fundamental operations are 
executed by Namesystem so refactoring might make it difficult to retain clean 
git history, however at the same time the class might reach 10k lines of code 
pretty soon.
   Perhaps the pros of keeping clean git blame history and smooth backportings 
might overwhelm the cons of having ~9/10k lines of code. Let's wait at least 1 
day for any further opinion? If nothing else is added, I can close the PR.
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579345)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide utility class for FSNamesystem
> --
>
> Key: HDFS-15956
> URL: https://issues.apache.org/jira/browse/HDFS-15956
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With ever-growing functionalities, FSNamesystem has become very huge (with 
> ~9k lines of code) over a period of time, we should provide a utility class 
> and refactor as many basic utility functions to new class as we can.
> With any further suggestions, we can create sub-tasks of this Jira and work 
> on them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15916) DistCp: Backward compatibility: Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-04-08 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-15916.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed to trunk.
Thanx [~weichiu] and [~vjasani] for the reviews, [~smajeti] for the report.

Cherry-picking has issues, due to  HADOOP-17482, Will wait to see if that can 
be backported or raise a backport PR.

> DistCp: Backward compatibility: Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> ---
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15916) DistCp: Backward compatibility: Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-04-08 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-15916:
---

Assignee: Ayush Saxena

> DistCp: Backward compatibility: Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> ---
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15916) DistCp: Backward compatibility: Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15916?focusedWorklogId=579294=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579294
 ]

ASF GitHub Bot logged work on HDFS-15916:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 15:19
Start Date: 08/Apr/21 15:19
Worklog Time Spent: 10m 
  Work Description: ayushtkn merged pull request #2863:
URL: https://github.com/apache/hadoop/pull/2863


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579294)
Time Spent: 0.5h  (was: 20m)

> DistCp: Backward compatibility: Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> ---
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15956) Provide utility class for FSNamesystem

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15956?focusedWorklogId=579285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579285
 ]

ASF GitHub Bot logged work on HDFS-15956:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 15:04
Start Date: 08/Apr/21 15:04
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #2876:
URL: https://github.com/apache/hadoop/pull/2876#issuecomment-815898530


   > we do not refactored this code very much unless necessary for easier 
history, git blame && backport future changes
   
   I agree to this, Mostly backports would be a pain post this, So in my 
opinion  if it isn't gonna fetch us something, let it stay as is but in case 
folks feels we should go ahead with this, No objections from my side, provided 
we check this carefully since it is touching some critical part of the code. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579285)
Time Spent: 1h 10m  (was: 1h)

> Provide utility class for FSNamesystem
> --
>
> Key: HDFS-15956
> URL: https://issues.apache.org/jira/browse/HDFS-15956
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> With ever-growing functionalities, FSNamesystem has become very huge (with 
> ~9k lines of code) over a period of time, we should provide a utility class 
> and refactor as many basic utility functions to new class as we can.
> With any further suggestions, we can create sub-tasks of this Jira and work 
> on them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15942) Increase Quota initialization threads

2021-04-08 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15942:
-
Fix Version/s: 3.3.1

> Increase Quota initialization threads
> -
>
> Key: HDFS-15942
> URL: https://issues.apache.org/jira/browse/HDFS-15942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15942.001.patch
>
>
> On large namespaces, the quota initialization at started can take a long time 
> with the default 4 threads. Also on NN failover, often the quota needs to be 
> calculated before the failover can completed, delaying the failover.
> I performed some benchmarks some time back on a large image (316M inodes 35GB 
> on disk), the quota load takes:
> {code}
> quota - 4  threads 39 seconds
> quota - 8  threads 23 seconds
> quota - 12 threads 20 seconds
> quota - 16 threads 15 seconds
> {code}
> As the quota is calculated when the NN is starting up (and hence doing no 
> other work) or at failover time before the new standby becomes active, I 
> think the quota should use as many threads as possible.
> I proposed we change the default to 8 or 12 on at least trunk and branch-3.3 
> so we have a better default going forward.
> Has anyone got any other thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15937) Reduce memory used during datanode layout upgrade

2021-04-08 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-15937.
--
Resolution: Fixed

> Reduce memory used during datanode layout upgrade
> -
>
> Key: HDFS-15937
> URL: https://issues.apache.org/jira/browse/HDFS-15937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: heap-dump-after.png, heap-dump-before.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the datanode block layout is upgrade from -56 (256x256) to -57 (32x32), 
> we have found the datanode uses a lot more memory than usual.
> For each volume, the blocks are scanned and a list is created holding a 
> series of LinkArgs objects. This object contains a File object for the block 
> source and destination. The file object stores the path as a string, eg:
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825
> This is string is repeated for every block and meta file on the DN, and much 
> of the string is the same each time, leading to a large amount of memory.
> If we change the linkArgs to store:
> * Src Path without the block, eg 
> /data01/dfs/dn/previous.tmp/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0
> * Dest Path without the block eg 
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir10
> * Block / Meta file name, eg blk_12345678_1001 or blk_12345678_1001.meta
> Then ensure were reuse the same file object for repeated src and dest paths, 
> we can save most of the memory without reworking the logic of the code.
> The current logic works along the source paths recursively, so you can easily 
> re-use the src path object.
> For the destination path, there are only 32x32 (1024) distinct paths, so we 
> can simply cache them in a hashMap and lookup the re-useable object each time.
> I tested locally by generating 100k block files and attempting the layout 
> upgrade. A heap dump showed the 100k blocks using about 140MB of heap. That 
> is close to 1.5GB per 1M blocks.
> After the change outlined above the same 100K blocks used about 20MB of heap, 
> so 200MB per million blocks.
> A general DN sizing recommendation is 1GB of heap per 1M blocks, so the 
> upgrade should be able to happen within the pre-upgrade heap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15937) Reduce memory used during datanode layout upgrade

2021-04-08 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317117#comment-17317117
 ] 

Stephen O'Donnell commented on HDFS-15937:
--

Committed this from 3.1 up to trunk.

> Reduce memory used during datanode layout upgrade
> -
>
> Key: HDFS-15937
> URL: https://issues.apache.org/jira/browse/HDFS-15937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: heap-dump-after.png, heap-dump-before.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the datanode block layout is upgrade from -56 (256x256) to -57 (32x32), 
> we have found the datanode uses a lot more memory than usual.
> For each volume, the blocks are scanned and a list is created holding a 
> series of LinkArgs objects. This object contains a File object for the block 
> source and destination. The file object stores the path as a string, eg:
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825
> This is string is repeated for every block and meta file on the DN, and much 
> of the string is the same each time, leading to a large amount of memory.
> If we change the linkArgs to store:
> * Src Path without the block, eg 
> /data01/dfs/dn/previous.tmp/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0
> * Dest Path without the block eg 
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir10
> * Block / Meta file name, eg blk_12345678_1001 or blk_12345678_1001.meta
> Then ensure were reuse the same file object for repeated src and dest paths, 
> we can save most of the memory without reworking the logic of the code.
> The current logic works along the source paths recursively, so you can easily 
> re-use the src path object.
> For the destination path, there are only 32x32 (1024) distinct paths, so we 
> can simply cache them in a hashMap and lookup the re-useable object each time.
> I tested locally by generating 100k block files and attempting the layout 
> upgrade. A heap dump showed the 100k blocks using about 140MB of heap. That 
> is close to 1.5GB per 1M blocks.
> After the change outlined above the same 100K blocks used about 20MB of heap, 
> so 200MB per million blocks.
> A general DN sizing recommendation is 1GB of heap per 1M blocks, so the 
> upgrade should be able to happen within the pre-upgrade heap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15937) Reduce memory used during datanode layout upgrade

2021-04-08 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15937:
-
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1

> Reduce memory used during datanode layout upgrade
> -
>
> Key: HDFS-15937
> URL: https://issues.apache.org/jira/browse/HDFS-15937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: heap-dump-after.png, heap-dump-before.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the datanode block layout is upgrade from -56 (256x256) to -57 (32x32), 
> we have found the datanode uses a lot more memory than usual.
> For each volume, the blocks are scanned and a list is created holding a 
> series of LinkArgs objects. This object contains a File object for the block 
> source and destination. The file object stores the path as a string, eg:
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825
> This is string is repeated for every block and meta file on the DN, and much 
> of the string is the same each time, leading to a large amount of memory.
> If we change the linkArgs to store:
> * Src Path without the block, eg 
> /data01/dfs/dn/previous.tmp/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0
> * Dest Path without the block eg 
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir10
> * Block / Meta file name, eg blk_12345678_1001 or blk_12345678_1001.meta
> Then ensure were reuse the same file object for repeated src and dest paths, 
> we can save most of the memory without reworking the logic of the code.
> The current logic works along the source paths recursively, so you can easily 
> re-use the src path object.
> For the destination path, there are only 32x32 (1024) distinct paths, so we 
> can simply cache them in a hashMap and lookup the re-useable object each time.
> I tested locally by generating 100k block files and attempting the layout 
> upgrade. A heap dump showed the 100k blocks using about 140MB of heap. That 
> is close to 1.5GB per 1M blocks.
> After the change outlined above the same 100K blocks used about 20MB of heap, 
> so 200MB per million blocks.
> A general DN sizing recommendation is 1GB of heap per 1M blocks, so the 
> upgrade should be able to happen within the pre-upgrade heap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15940) Some tests in TestBlockRecovery are consistently failing

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15940?focusedWorklogId=579110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579110
 ]

ASF GitHub Bot logged work on HDFS-15940:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 11:39
Start Date: 08/Apr/21 11:39
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2874:
URL: https://github.com/apache/hadoop/pull/2874#issuecomment-815693097


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  1s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 38s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 354m 17s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2874/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 453m 13s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
   |   | hadoop.hdfs.TestBlocksScheduledCounter |
   |   | hadoop.hdfs.TestSnapshotCommands |
   |   | hadoop.hdfs.server.datanode.TestBlockScanner |
   |   | hadoop.hdfs.server.mover.TestMover |
   |   | hadoop.hdfs.TestDFSShell |
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.TestStateAlignmentContextWithHA |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
   |   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
   |   | hadoop.hdfs.TestHDFSFileSystemContract |
   |   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList |
   |   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   |   | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots |
   |   | hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage |
   |   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.TestViewDistributedFileSystemContract |
   |   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.TestPersistBlocks |
   
   
   | Subsystem | Report/Notes 

[jira] [Work logged] (HDFS-15937) Reduce memory used during datanode layout upgrade

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15937?focusedWorklogId=579081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579081
 ]

ASF GitHub Bot logged work on HDFS-15937:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 10:59
Start Date: 08/Apr/21 10:59
Worklog Time Spent: 10m 
  Work Description: sodonnel merged pull request #2838:
URL: https://github.com/apache/hadoop/pull/2838


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579081)
Time Spent: 2h  (was: 1h 50m)

> Reduce memory used during datanode layout upgrade
> -
>
> Key: HDFS-15937
> URL: https://issues.apache.org/jira/browse/HDFS-15937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Attachments: heap-dump-after.png, heap-dump-before.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the datanode block layout is upgrade from -56 (256x256) to -57 (32x32), 
> we have found the datanode uses a lot more memory than usual.
> For each volume, the blocks are scanned and a list is created holding a 
> series of LinkArgs objects. This object contains a File object for the block 
> source and destination. The file object stores the path as a string, eg:
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825
> This is string is repeated for every block and meta file on the DN, and much 
> of the string is the same each time, leading to a large amount of memory.
> If we change the linkArgs to store:
> * Src Path without the block, eg 
> /data01/dfs/dn/previous.tmp/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0
> * Dest Path without the block eg 
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir10
> * Block / Meta file name, eg blk_12345678_1001 or blk_12345678_1001.meta
> Then ensure were reuse the same file object for repeated src and dest paths, 
> we can save most of the memory without reworking the logic of the code.
> The current logic works along the source paths recursively, so you can easily 
> re-use the src path object.
> For the destination path, there are only 32x32 (1024) distinct paths, so we 
> can simply cache them in a hashMap and lookup the re-useable object each time.
> I tested locally by generating 100k block files and attempting the layout 
> upgrade. A heap dump showed the 100k blocks using about 140MB of heap. That 
> is close to 1.5GB per 1M blocks.
> After the change outlined above the same 100K blocks used about 20MB of heap, 
> so 200MB per million blocks.
> A general DN sizing recommendation is 1GB of heap per 1M blocks, so the 
> upgrade should be able to happen within the pre-upgrade heap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15940) Some tests in TestBlockRecovery are consistently failing

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15940?focusedWorklogId=579046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579046
 ]

ASF GitHub Bot logged work on HDFS-15940:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 09:57
Start Date: 08/Apr/21 09:57
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2874:
URL: https://github.com/apache/hadoop/pull/2874#issuecomment-815627370


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 251m 53s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2874/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 19s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 351m 24s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
   |   | hadoop.hdfs.server.namenode.TestFileTruncate |
   |   | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
   |   | hadoop.hdfs.TestGetBlocks |
   |   | hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap |
   |   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
   |   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
   |   | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithOrderedSnapshotDeletion |
   |   | hadoop.hdfs.TestClientReportBadBlock |
   |   | hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.blockmanagement.TestErasureCodingCorruption |
   |   | hadoop.hdfs.server.namenode.TestMetadataVersionOutput |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM |
   |   | hadoop.hdfs.server.blockmanagement.TestSlowDiskTracker |
   |   | hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs |
   |   | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement |
   |   | 

[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-08 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15759:
---
Fix Version/s: 3.3.1

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-04-08 Thread Max Xie (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316997#comment-17316997
 ] 

Max  Xie edited comment on HDFS-15175 at 4/8/21, 9:01 AM:
--

We encountered this bug on hdfs 3.2.1. 

Is there any progress now?

ping [~hexiaoqiao] [~wanchang] . 


was (Author: max2049):
ping [~hexiaoqiao] [~wanchang] . 

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-04-08 Thread Max Xie (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316997#comment-17316997
 ] 

Max  Xie commented on HDFS-15175:
-

ping [~hexiaoqiao] [~wanchang] . 

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support

2021-04-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316912#comment-17316912
 ] 

Hadoop QA commented on HDFS-15788:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
39s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} markdownlint was not available. 
{color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
16s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
40m 15s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  4s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} The patch does not generate 
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 42s{color} | 
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/565/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15788 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13023533/HDFS-15788-02.patch |
| Optional Tests | dupname asflicense mvnsite markdownlint |
| uname | Linux 956aac858d5d 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / ae88174c29a |
| Max. process+thread count | 554 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/565/console |
| versions | git=2.25.1 maven=3.6.3 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Correct the statement for pmem cache to reflect cache persistence support
> -
>
> Key: HDFS-15788
> URL: https://issues.apache.org/jira/browse/HDFS-15788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch
>
>
> Correct the statement for pmem cache to reflect cache persistence support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support

2021-04-08 Thread Feilong He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316901#comment-17316901
 ] 

Feilong He commented on HDFS-15788:
---

Hi [~ayushtkn], sorry for this late reply. This issue is relevant to HDFS-14740 
which has already been resolved in 3.3.0. We proposed this current Jira to 
update document to align with the code changes we made. The target of this Jira 
is 3.3.1 & 3.4.0.

> Correct the statement for pmem cache to reflect cache persistence support
> -
>
> Key: HDFS-15788
> URL: https://issues.apache.org/jira/browse/HDFS-15788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch
>
>
> Correct the statement for pmem cache to reflect cache persistence support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support

2021-04-08 Thread Feilong He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feilong He updated HDFS-15788:
--
Target Version/s: 3.3.1, 3.4.0  (was: 3.3.1, 3.4.0, 3.1.5, 3.2.3)

> Correct the statement for pmem cache to reflect cache persistence support
> -
>
> Key: HDFS-15788
> URL: https://issues.apache.org/jira/browse/HDFS-15788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch
>
>
> Correct the statement for pmem cache to reflect cache persistence support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15956) Provide utility class for FSNamesystem

2021-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15956?focusedWorklogId=578931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-578931
 ]

ASF GitHub Bot logged work on HDFS-15956:
-

Author: ASF GitHub Bot
Created on: 08/Apr/21 06:39
Start Date: 08/Apr/21 06:39
Worklog Time Spent: 10m 
  Work Description: virajjasani edited a comment on pull request #2876:
URL: https://github.com/apache/hadoop/pull/2876#issuecomment-815454130


   I see, open for opinions. Since I saw ~9k lines of code, I thought of at 
least refactoring util functions which can be **static** and not require basic 
**Namespace tree specific** logic internally (e.g BlockManager, 
SnapshotManager, Namesystem read-write locking related logic)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 578931)
Time Spent: 1h  (was: 50m)

> Provide utility class for FSNamesystem
> --
>
> Key: HDFS-15956
> URL: https://issues.apache.org/jira/browse/HDFS-15956
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With ever-growing functionalities, FSNamesystem has become very huge (with 
> ~9k lines of code) over a period of time, we should provide a utility class 
> and refactor as many basic utility functions to new class as we can.
> With any further suggestions, we can create sub-tasks of this Jira and work 
> on them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support

2021-04-08 Thread Feilong He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feilong He updated HDFS-15788:
--
Attachment: HDFS-15788-02.patch

> Correct the statement for pmem cache to reflect cache persistence support
> -
>
> Key: HDFS-15788
> URL: https://issues.apache.org/jira/browse/HDFS-15788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch
>
>
> Correct the statement for pmem cache to reflect cache persistence support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org