date:20201209

[jira] [Assigned] (HDFS-14558) RBF: Isolation/Fairness documentation

2020-12-09 Thread Fengnan Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-14558:
-

Assignee: Fengnan Li  (was: CR Hota)

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14558) RBF: Isolation/Fairness documentation

2020-12-09 Thread Fengnan Li (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247059#comment-17247059
 ] 

Fengnan Li commented on HDFS-14558:
---

[~ferhui] Thanks for the ping. I will provide an updated patch soon.

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14558.001.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15720?focusedWorklogId=522584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522584
 ]

ASF GitHub Bot logged work on HDFS-15720:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 07:25
Start Date: 10/Dec/20 07:25
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #2532:
URL: https://github.com/apache/hadoop/pull/2532#issuecomment-742300725


   > The checks from jenkins has failed, but I can't find any error about the 
patch. Is it ok to merge?
   
   @jojochuang Though these checks has failed, Is it ok to merge. thx



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522584)
Time Spent: 1h 40m  (was: 1.5h)

> namenode audit async logger should add some log4j config
> 
>
> Key: HDFS-15720
> URL: https://issues.apache.org/jira/browse/HDFS-15720
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
> Environment: hadoop 3.3.0
>Reporter: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Hadoop project use log4j 1.2.x, we can't config some properties of logger in 
> log4j.properties file , For example, AsyncAppender BufferSize and Blocking 
> see https://logging.apache.org/log4j/1.2/apidocs/index.html.
> Namenode  should add some audit async logger log4j config In order to 
> facilitate the adjustment of log4j usage and audit log output performance 
> adjustment. 
> The new configuration is as follows
> dfs.namenode.audit.log.async.blocking false
> dfs.namenode.audit.log.async.buffer.size 128
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14558) RBF: Isolation/Fairness documentation

2020-12-09 Thread Hui Fei (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247051#comment-17247051
 ] 

Hui Fei commented on HDFS-14558:


[~fengnanli] Hi, this jira documents for HDFS-14090. As HDFS-14090 has been 
merged to trunk, Would you please complete this? Thanks!

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14558.001.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-09 Thread Fengnan Li (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247035#comment-17247035
 ] 

Fengnan Li commented on HDFS-15383:
---

[~John Smith] It is a good question.
First of all, when the token is stale it will be deleted by the clean up 
thread, thus when a client access this Router with a renewed token this Router 
would not recognize it thus will load from ZK. The default scan interval is 1h, 
which is long.
On the other hand, clients normally renew a token before it expires. For 
example, Yarn renews a token when it reaches 92% (configurable, I forgot the 
exact value) of the renew date, meaning when the client renews token, there are 
still over 1 hour left for the token to be effective. Internally we set our 
sync interval as 10min, so all Routers will be able to get the new renew date 
in around 10min. In the meanwhile this is still a valid token, though there may 
be different renew date on different Routers. 
10 minutes is time for loading 1M tokens from zk to router memory in our env.
So theoretically your client will fail if you set the sync interval to be a 
very large value like 2 hours, but we don't use such a big value in this poll 
model. We can also make the deletion period shorter like every 15 mins to 
further prevent the auth failures.
Hope it makes sense.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

2020-12-09 Thread Yang Yun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15722:

Attachment: HDFS-15722.001.patch
Status: Patch Available  (was: Open)

> Gather storage report  for each volume in a separate thread
> ---
>
> Key: HDFS-15722
> URL: https://issues.apache.org/jira/browse/HDFS-15722
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15722.001.patch
>
>
> Getting stuck while gathering information from one volume may make  the 
> entire datanode hang. This can happen in the case that volume is mounted by 
> some process(fuse).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

2020-12-09 Thread Yang Yun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15722:

Attachment: (was: HDFS-15722.001.patch)

> Gather storage report  for each volume in a separate thread
> ---
>
> Key: HDFS-15722
> URL: https://issues.apache.org/jira/browse/HDFS-15722
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
>
> Getting stuck while gathering information from one volume may make  the 
> entire datanode hang. This can happen in the case that volume is mounted by 
> some process(fuse).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

2020-12-09 Thread Yang Yun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15722:

Status: Open  (was: Patch Available)

> Gather storage report  for each volume in a separate thread
> ---
>
> Key: HDFS-15722
> URL: https://issues.apache.org/jira/browse/HDFS-15722
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
>
> Getting stuck while gathering information from one volume may make  the 
> entire datanode hang. This can happen in the case that volume is mounted by 
> some process(fuse).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-09 Thread Yuxuan Wang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246982#comment-17246982
 ] 

Yuxuan Wang commented on HDFS-15383:


Hi ~ [~fengnanli][~elgoiri][~hexiaoqiao]

After disable watcher, tokens in router memory can be stale. And client may 
auth failed if the token is renewed but router don't rebuild cache yet.

Or there is some misunderstand in my mind? Plz figure out, Thx!

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246939#comment-17246939
 ] 

Wei-Chiu Chuang commented on HDFS-15170:


Ok. Thanks for the clarification!

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246938#comment-17246938
 ] 

Ayush Saxena commented on HDFS-15170:
-

{quote}this line is not needed if 
dfs.namenode.corrupt.block.delete.immediately.enabled is true?
{quote}
Yes, I should have updated this, It would be redundant to remove here. This 
block should execute only if 
{{dfs.namenode.corrupt.block.delete.immediately.enabled}} is false.
{quote}b.getStored() is the internal block whereas corrupt is the EC block 
group id
{quote}
Should I update to :
{code:java}
 // If the block is an EC block, the whole block group is marked
+// corrupted, so if this block is getting deleted, remove the block
+// from corrupt replica map explicitly, since removal of the
+// block from corrupt replicas may be delayed if the blocks are on
+// stale storage due to failover or any other reason.
{code}

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246935#comment-17246935
 ] 

Wei-Chiu Chuang commented on HDFS-15170:


 

  
{code:java}
+// If the block is an EC block, the whole block group is marked
+// corrupted, so if this block is getting deleted, remove the block
+// group from corrupt replica map explicitly, since removal of the
+// block from corrupt replicas may be delayed if the blocks are on
+// stale storage due to failover or any other reason.
+corruptReplicas.removeFromCorruptReplicasMap(b.getStored(), node);
^
{code}
this line is not needed if 
dfs.namenode.corrupt.block.delete.immediately.enabled is true? it will be 
removed later by invalidateBlock().
 Or do we want to call {{corruptReplicas.removeFromCorruptReplicasMap(corrupt, 
node)}} instead? b.getStored() is the internal block whereas corrupt is the EC 
block group id. The code doesn't seem to match the comment.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246874#comment-17246874
 ] 

Ayush Saxena edited comment on HDFS-15170 at 12/9/20, 11:08 PM:


IIRC In case of EC blocks, if one block is corrupt, the block group was getting 
marked as corrupt. In BlockManager :
{code:java}
// Add this replica to corruptReplicas Map. For striped blocks, we always
// use the id of whole striped block group when adding to corruptReplicas
Block corrupted = new Block(b.getCorrupted());
if (b.getStored().isStriped()) {
  corrupted.setBlockId(b.getStored().getBlockId());
}
{code}

Well to be precise, I don't remember exactly what happened because of that(I 
don't have access to the internal discussion now) and I guess Surendra was 
involved in this with me, He too won't be having access, This came from some 
testing side, Don't remember how we landed up at HDFS-15200. Will try catch up 
with folks, if anyone remembers what exactly happened. 

bq.  and with HDFS-15200's change, BlockManager calls 
removeStoredBlock(b.getStored(), node) and that seems to do the same thing (and 
more, potentially more complete).

Exactly, With that change in, this won't occur unless we explicitly turn the 
conf off. By default it would be enabled only. So, with HDFS-15200 in, for us 
the issue got solved.


was (Author: ayushtkn):
IIRC In case of EC blocks, if one block is corrupt, the block group was getting 
marked as corrupt. In BlockManager :
{code:java}
// Add this replica to corruptReplicas Map. For striped blocks, we always
// use the id of whole striped block group when adding to corruptReplicas
Block corrupted = new Block(b.getCorrupted());
if (b.getStored().isStriped()) {
  corrupted.setBlockId(b.getStored().getBlockId());
}
{code}

Well to be precise, I don't remember exactly what happened because of that(I 
don't have access to the internal discussion now) and I guess Surendra was 
involved in this with me, He too won't be having access, This came from some 
testing side, Don't remember how we landed up at HDFS-15200. Will try catch up 
with folks, if anyone remembers what exactly happened. 

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246874#comment-17246874
 ] 

Ayush Saxena commented on HDFS-15170:
-

IIRC In case of EC blocks, if one block is corrupt, the block group was getting 
marked as corrupt. In BlockManager :
{code:java}
// Add this replica to corruptReplicas Map. For striped blocks, we always
// use the id of whole striped block group when adding to corruptReplicas
Block corrupted = new Block(b.getCorrupted());
if (b.getStored().isStriped()) {
  corrupted.setBlockId(b.getStored().getBlockId());
}
{code}

Well to be precise, I don't remember exactly what happened because of that(I 
don't have access to the internal discussion now) and I guess Surendra was 
involved in this with me, He too won't be having access, This came from some 
testing side, Don't remember how we landed up at HDFS-15200. Will try catch up 
with folks, if anyone remembers what exactly happened. 

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246841#comment-17246841
 ] 

Wei-Chiu Chuang commented on HDFS-15170:


Dumb question. I am not sure what makes EC blocks different. Seems like this 
would be the same for replicated blocks, and with HDFS-15200's change, 
BlockManager calls removeStoredBlock(b.getStored(), node) and that seems to do 
the same thing (and more, potentially more complete).

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Wei-Chiu Chuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15170:
---
Component/s: erasure-coding

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15711) Add Metrics to HttpFS Server

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15711?focusedWorklogId=522409&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522409
 ]

ASF GitHub Bot logged work on HDFS-15711:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 19:54
Start Date: 09/Dec/20 19:54
Worklog Time Spent: 10m 
  Work Description: jbrennan333 commented on a change in pull request #2521:
URL: https://github.com/apache/hadoop/pull/2521#discussion_r539601616



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/server/TestHttpFSServer.java
##
@@ -120,6 +122,25 @@
  */
 public class TestHttpFSServer extends HFSTestCase {
 
+  /**
+   * define metric getters for unit tests.
+   */
+  private static Callable defaultEntryMetricGetter = () -> 0L;
+  private static Callable defaultExitMetricGetter = () -> 1L;
+  private static HashMap> metricsGetter =
+  new HashMap>() {
+{
+  put("LISTSTATUS",
+  () -> HttpFSServerWebApp.get().getMetrics().getOpsListing());
+  put("MKDIRS",
+  () -> HttpFSServerWebApp.get().getMetrics().getOpsMkdir());
+  put("GETFILESTATUS",
+  () -> HttpFSServerWebApp.get().getMetrics().getOpsStat());
+}
+  };
+

Review comment:
   Oh I see that makes sense for `getStatus()`.  I was looking at 
`testMkdirs() `when I was trying to understand why you were using it, and it 
didn't make sense to me.  I don't think you should use the metricsGetter in 
`testMkdirs()`, or at least not use `getOrDefault()` - the default case here 
does not make sense - it renders that part of the test meaningless.  I'd prefer 
to just get the ops directly in this case.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522409)
Time Spent: 1h 20m  (was: 1h 10m)

> Add Metrics to HttpFS Server
> 
>
> Key: HDFS-15711
> URL: https://issues.apache.org/jira/browse/HDFS-15711
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently HttpFS Server does not have any metrics.
> [~kihwal] has implemented serverMetrics for HttpFs on our internal grid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246530#comment-17246530
 ] 

Xiaoqiao He commented on HDFS-15170:


Sorry, I do not read above comment carefully, it makes sense to me. For the 
patch, I would like to review it if no more guys involved here, but need time 
since I do not have deep experience with EC. Thanks.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246486#comment-17246486
 ] 

Ayush Saxena edited comment on HDFS-15170 at 12/9/20, 12:33 PM:


Thanx [~hexiaoqiao]
Yeps, that is expected, since initially the count will be 0 only, when the IBR 
gets processed it will get marked as 1,
I added a comment above, regarding how to repro.
https://issues.apache.org/jira/browse/HDFS-15170?focusedCommentId=17037104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17037104

just remove the prod change and change 0 to 1 in test, It will show that the 
count increases to 1.

May be we can add a Thread.sleep(1) after cluster.restartDataNode(dn); and 
that should also make the test fail, 
Let me know if you have any suggestions for improving the test, or finding 
difficulty reproducing it, or any issues with the code change.



was (Author: ayushtkn):
Thanx [~hexiaoqiao]
Yeps, that is expected, since initially the count will be 0 only, when the IBR 
gets processed it will get marked as 1,
I added a comment above, regarding how to repro.
https://issues.apache.org/jira/browse/HDFS-15170?focusedCommentId=17037104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17037104

just remove the prod change and change 0 to 1 in test, It will show that the 
count increases to 1.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

2020-12-09 Thread Hui Fei (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei resolved HDFS-14831.

Resolution: Fixed

> Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable 
> ---
>
> Key: HDFS-14831
> URL: https://issues.apache.org/jira/browse/HDFS-14831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.3.0, 3.1.3
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
>
> Mentioned on HDFS-13596
> Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed
> commit message as follow, but issue not found
> {quote}
> commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f
> Author: Vinayakumar B 
> Date:   Mon Oct 15 15:48:26 2018 +0530
> Fix potential FSImage corruption. Contributed by Daryn Sharp.
> {quote} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

2020-12-09 Thread Hui Fei (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246492#comment-17246492
 ] 

Hui Fei commented on HDFS-14831:


Because 2.8 is EOL， the lasted release 2.8.5 doesn't fix it, just set resolve 
version 2.8.6. Users can cherry-pick this commit from branch-2.8

> Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable 
> ---
>
> Key: HDFS-14831
> URL: https://issues.apache.org/jira/browse/HDFS-14831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.3.0, 3.1.3
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
>
> Mentioned on HDFS-13596
> Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed
> commit message as follow, but issue not found
> {quote}
> commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f
> Author: Vinayakumar B 
> Date:   Mon Oct 15 15:48:26 2018 +0530
> Fix potential FSImage corruption. Contributed by Daryn Sharp.
> {quote} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

2020-12-09 Thread Hui Fei (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246488#comment-17246488
 ] 

Hui Fei commented on HDFS-14831:


Recently I test upgrade from 2.8.5 to 3.2.1 and downgrade from 3.2.1 to 2.8.5, 
The same problem occurs. And this can be fixed by the following commit from 
branch-2.8

{quote}
commit f697f3c4fc0067bb82494e445900d86942685b09
Author: Vinayakumar B 
Date:   Mon Oct 15 16:04:34 2018 +0530

Fix potential FSImage corruption. Contributed by Daryn Sharp.
{quote}


> Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable 
> ---
>
> Key: HDFS-14831
> URL: https://issues.apache.org/jira/browse/HDFS-14831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.3.0, 3.1.3
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
>
> Mentioned on HDFS-13596
> Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed
> commit message as follow, but issue not found
> {quote}
> commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f
> Author: Vinayakumar B 
> Date:   Mon Oct 15 15:48:26 2018 +0530
> Fix potential FSImage corruption. Contributed by Daryn Sharp.
> {quote} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246486#comment-17246486
 ] 

Ayush Saxena commented on HDFS-15170:
-

Thanx [~hexiaoqiao]
Yeps, that is expected, since initially the count will be 0 only, when the IBR 
gets processed it will get marked as 1,
I added a comment above, regarding how to repro.
https://issues.apache.org/jira/browse/HDFS-15170?focusedCommentId=17037104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17037104

just remove the prod change and change 0 to 1 in test, It will show that the 
count increases to 1.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246483#comment-17246483
 ] 

Xiaoqiao He commented on HDFS-15170:


OK, Thanks for your information, get it. Let's backport to the following branch 
once ready.
BTW, The new unit test can pass using [^HDFS-15170-03.patch] without any other 
changes in, is it expected?

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15720?focusedWorklogId=522200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522200
 ]

ASF GitHub Bot logged work on HDFS-15720:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 11:50
Start Date: 09/Dec/20 11:50
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #2532:
URL: https://github.com/apache/hadoop/pull/2532#issuecomment-741722091


   The checks from jenkins has failed, but I can't find any error about the 
patch. Is it ok to merge?   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522200)
Time Spent: 1.5h  (was: 1h 20m)

> namenode audit async logger should add some log4j config
> 
>
> Key: HDFS-15720
> URL: https://issues.apache.org/jira/browse/HDFS-15720
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
> Environment: hadoop 3.3.0
>Reporter: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Hadoop project use log4j 1.2.x, we can't config some properties of logger in 
> log4j.properties file , For example, AsyncAppender BufferSize and Blocking 
> see https://logging.apache.org/log4j/1.2/apidocs/index.html.
> Namenode  should add some audit async logger log4j config In order to 
> facilitate the adjustment of log4j usage and audit log output performance 
> adjustment. 
> The new configuration is as follows
> dfs.namenode.audit.log.async.blocking false
> dfs.namenode.audit.log.async.buffer.size 128
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246455#comment-17246455
 ] 

Ayush Saxena commented on HDFS-15170:
-

Hey [~hexiaoqiao],
With HDFS-15200 in, this won't pitch in directly, unless we turn the config 
introduced there to false, I got this issue before that jira, so in default 
configuration this won't happen.
Yes, if we turn off that conf, this can surface, You can take a call on this 
whether to hold or not, If there are any review comments, I will try address 
them at earliest.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15720?focusedWorklogId=522184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522184
 ]

ASF GitHub Bot logged work on HDFS-15720:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 11:23
Start Date: 09/Dec/20 11:23
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2532:
URL: https://github.com/apache/hadoop/pull/2532#issuecomment-741709431


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  17m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   3m  1s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 59s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 10s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 55s | 
[/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2532/3/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 622 unchanged 
- 0 fixed = 626 total (was 622)  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  14m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m  0s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  99m 57s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2532/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 193m 58s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestFileCreation |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap |
   |   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
   |   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2532/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2532 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle xml |
   | uname | Linu

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246453#comment-17246453
 ] 

Hadoop QA commented on HDFS-15170:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m  
8s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
16s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 30s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  4m  
4s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; 
considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
1s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
37s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
26s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  5s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
36s{color} | {color:green}{color} | {col

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-09 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246450#comment-17246450
 ] 

Xiaoqiao He commented on HDFS-15170:


[~ayushtkn] Thanks for your works here. IMO, this issue could exist at each 
branch-3.*. Do you have bandwidth to push it forward, if sure I will wait it 
resolved then prepare RC for release-3.2.2. Thanks.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15722) Gather storage report for each volume in a separate thread

2020-12-09 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246446#comment-17246446
 ] 

Hadoop QA commented on HDFS-15722:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  4m 
18s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/348/artifact/out/branch-mvninstall-root.txt{color}
 | {color:red} root in trunk failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
27s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/348/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt{color}
 | {color:red} hadoop-hdfs in trunk failed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
24s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/348/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt{color}
 | {color:red} hadoop-hdfs in trunk failed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/348/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt{color}
 | {color:orange} The patch fails to run checkstyle in hadoop-hdfs {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
18s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/348/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt{color}
 | {color:red} hadoop-hdfs in trunk failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
1m  8s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
10s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
8s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 10s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/348/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt{color}
 | {color:red} 
hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 with 
JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 generated 602 new + 0 unchanged - 0 
fixed = 602 total (was 0) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m  4s{color} 
| 
{co

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15720?focusedWorklogId=522121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522121
 ]

ASF GitHub Bot logged work on HDFS-15720:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 08:42
Start Date: 09/Dec/20 08:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2532:
URL: https://github.com/apache/hadoop/pull/2532#issuecomment-741623898


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 32s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   3m 36s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 34s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 55s | 
[/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2532/2/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 622 unchanged 
- 0 fixed = 626 total (was 622)  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  shadedclient  |  18m 31s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m 49s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 101m 22s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2532/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 204m 21s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestFileCreation |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
   |   | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor |
   |   | hadoop.hdfs.TestDecommissionWithStriped |
   |   | hadoop.hdfs.TestErasureCodingPolicies |
   |   | hadoop.hdfs.TestDFSOutputStream |
   |   | hadoop.tools.TestHdfsConfigFields |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2532/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2532 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   |

[jira] [Work logged] (HDFS-15719) [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket timeout

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15719?focusedWorklogId=522120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522120
 ]

ASF GitHub Bot logged work on HDFS-15719:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 08:38
Start Date: 09/Dec/20 08:38
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2533:
URL: https://github.com/apache/hadoop/pull/2533#issuecomment-741622026


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  26m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 43s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  20m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |  17m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   2m 41s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   2m 35s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +0 :ok: |  findbugs  |   0m 33s |  |  branch/hadoop-project no findbugs 
output file (findbugsXml.xml)  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 59s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |  22m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |  20m  6s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   3m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | -1 :x: |  shadedclient  |  20m 17s |  |  patch has errors when building 
and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  findbugs  |   0m 31s |  |  hadoop-project has no data from 
findbugs  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 28s |  |  hadoop-project in the patch 
passed.  |
   | -1 :x: |  unit  |  10m 43s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2533/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 56s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 224m  2s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.metrics2.source.TestJvmMetrics |
   |   | hadoop.ha.TestZKFailoverController |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2533/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2533 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle xml |
   | uname | Linux 3e550b4a690c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build to

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

2020-12-09 Thread Yang Yun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15722:

Attachment: HDFS-15722.001.patch
Status: Patch Available  (was: Open)

> Gather storage report  for each volume in a separate thread
> ---
>
> Key: HDFS-15722
> URL: https://issues.apache.org/jira/browse/HDFS-15722
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15722.001.patch
>
>
> Getting stuck while gathering information from one volume may make  the 
> entire datanode hang. This can happen in the case that volume is mounted by 
> some process(fuse).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15722) Gather storage report for each volume in a separate thread

2020-12-09 Thread Yang Yun (Jira)

Yang Yun created HDFS-15722:
---

 Summary: Gather storage report  for each volume in a separate 
thread
 Key: HDFS-15722
 URL: https://issues.apache.org/jira/browse/HDFS-15722
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yang Yun
Assignee: Yang Yun


Getting stuck while gathering information from one volume may make  the entire 
datanode hang. This can happen in the case that volume is mounted by some 
process(fuse).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15720?focusedWorklogId=522107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522107
 ]

ASF GitHub Bot logged work on HDFS-15720:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 08:09
Start Date: 09/Dec/20 08:09
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #2532:
URL: https://github.com/apache/hadoop/pull/2532#issuecomment-741607192


   > patch looks good. Please also add the configuration keys and values to 
hdfs-default.xml. +1 after that.
   
   done @jojochuang 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522107)
Time Spent: 1h  (was: 50m)

> namenode audit async logger should add some log4j config
> 
>
> Key: HDFS-15720
> URL: https://issues.apache.org/jira/browse/HDFS-15720
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
> Environment: hadoop 3.3.0
>Reporter: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hadoop project use log4j 1.2.x, we can't config some properties of logger in 
> log4j.properties file , For example, AsyncAppender BufferSize and Blocking 
> see https://logging.apache.org/log4j/1.2/apidocs/index.html.
> Namenode  should add some audit async logger log4j config In order to 
> facilitate the adjustment of log4j usage and audit log output performance 
> adjustment. 
> The new configuration is as follows
> dfs.namenode.audit.log.async.blocking false
> dfs.namenode.audit.log.async.buffer.size 128
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-14558) RBF: Isolation/Fairness documentation

[jira] [Commented] (HDFS-14558) RBF: Isolation/Fairness documentation

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

[jira] [Commented] (HDFS-14558) RBF: Isolation/Fairness documentation

[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Comment Edited] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Updated] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Work logged] (HDFS-15711) Add Metrics to HttpFS Server

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Comment Edited] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Resolved] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

[jira] [Commented] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

[jira] [Commented] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

[jira] [Commented] (HDFS-15722) Gather storage report for each volume in a separate thread

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

[jira] [Work logged] (HDFS-15719) [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket timeout

[jira] [Updated] (HDFS-15722) Gather storage report for each volume in a separate thread

[jira] [Created] (HDFS-15722) Gather storage report for each volume in a separate thread

[jira] [Work logged] (HDFS-15720) namenode audit async logger should add some log4j config

35 matches

Site Navigation

Mail list logo

Footer information