[jira] [Comment Edited] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-31 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027262#comment-17027262
 ] 

Ayush Saxena edited comment on HDFS-15115 at 1/31/20 8:04 AM:
--

Thanx Everyone for the work here

Well there are two approaches possible here first is the old one, having null 
check and one in the v2 patch that is initializing builder irrespective of log 
level. Well personally I prefer having the previous approach of null check, but 
I am ok doing this way too, if everyone prefers this.

[~weichiu] [~hexiaoqiao] any preferences???

Anyway [~belugabehr] [~wzx513] possible extending a UT? Not sure must be tricky 
to change the log level in middle, just give a check once.


was (Author: ayushtkn):
Thanx Everyone for the work here?

Well there are two approaches possible here first is the old one, having null 
check and one in the v2 patch that is initializing builder irrespective of log 
level. Well personally I prefer having the previous approach of null check, but 
I am ok doing this way too, if everyone prefers this.

[~weichiu] [~hexiaoqiao] any preferences???

Anyway [~belugabehr] [~wzx513] possible extending a UT? Not sure must be tricky 
to change the log level in middle, just give a check once.

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15135) EC : ArrayIndexOutOfBoundsException in BlockRecoveryWorker#RecoveryTaskStriped.

2020-01-31 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-15135:
--
Status: Patch Available  (was: Open)

> EC : ArrayIndexOutOfBoundsException in 
> BlockRecoveryWorker#RecoveryTaskStriped.
> ---
>
> Key: HDFS-15135
> URL: https://issues.apache.org/jira/browse/HDFS-15135
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Surendra Singh Lilhore
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-15135.001.patch
>
>
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 8
>at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskStriped.recover(BlockRecoveryWorker.java:464)
>at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:602)
>at java.lang.Thread.run(Thread.java:745) {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15135) EC : ArrayIndexOutOfBoundsException in BlockRecoveryWorker#RecoveryTaskStriped.

2020-01-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027458#comment-17027458
 ] 

Hadoop QA commented on HDFS-15135:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15135 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992145/HDFS-15135.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 10eaac5a9898 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf8686f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28728/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28728/testReport/ |
| Max. process+thread count | 4166 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Co

[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15124:
-
Attachment: HDFS-15124.004.patch

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re) {
>         throw re;
> 

[jira] [Commented] (HDFS-15150) Introduce read write lock to Datanode

2020-01-31 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027690#comment-17027690
 ] 

Wei-Chiu Chuang commented on HDFS-15150:


I think it makes perfect sense. The implementation should be quite 
straightforward.

That said, the benchmarks from HDFS-9668 isn't that impressive. If we can redo 
a benchmark to show it does help, especially in a contentious scenario, that 
would be great.

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15124:
-
Attachment: HDFS-15124.005.patch

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re)

[jira] [Updated] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-31 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15147:
-
Attachment: HDFS-15147.002.patch

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch, 
> HDFS-15147.002.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently

2020-01-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13179:
---
Fix Version/s: 3.2.2
   3.1.4
   3.0.4

> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails 
> intermittently
> --
>
> Key: HDFS-13179
> URL: https://issues.apache.org/jira/browse/HDFS-13179
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Ahmed Hussein
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-13179-branch-2.10.003.patch, HDFS-13179.001.patch, 
> HDFS-13179.002.patch, HDFS-13179.003.patch, test runs.zip
>
>
> The error caused by TimeoutException because the test is waiting to ensure 
> that the file is replicated to DISK storage but the replication can't be 
> finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), 
> but the file is still on RAM_DISK - so there is no data loss.
> Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially 
> fixes the flakiness. 
> {code:java}
> try {
>   ensureFileReplicasOnStorageType(path1, DEFAULT);
> }catch (TimeoutException t){
>   LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on 
> RAM_DISK");
>   ensureFileReplicasOnStorageType(path1, RAM_DISK);
> }
>   }
> {code}
> Some thoughts:
> * Successful and failed tests run similar to the point when datanode 
> restarts. Restart line is the following in the log: LazyPersistTestCase - 
> Restarting the DataNode
> * There is a line which only occurs in the failed test: *addStoredBlock: 
> Redundant addStoredBlock request received for blk_1073741825_1001 on node 
> 127.0.0.1:49455 size 5242880*
> * This redundant request at BlockManager#addStoredBlock could be the main 
> reason for the test fail. Something wrong with the gen stamp? Corrupt 
> replicas? 
> =
> Current fail ratio based on my test of TestLazyPersistReplicaRecovery: 
> 1000 runs, 34 failures (3.4% fail)
> Failure rate analysis:
> TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4%
> 33 failures caused by: {noformat}
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 
> on 39589" 
> {noformat}
> 1 failure caused by: {noformat}
> java.net.BindException: Problem binding to [localhost:56729] 
> java.net.BindException: Address already in use; For more details see: 
> http://wiki.apache.org/hadoop/BindException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
>  Caused by: java.net.BindException: Address already in use at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
> {noformat}
> =
> Example stacktrace:
> {noformat}
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2017-11-01 10:36:49,499
> "Thread-1" prio=5 tid=13 runnable
> java.lang.Thread.State: RUNNABLE
> at java.lang.Thread.dumpThreads(Native Method)
> at java.lang.Thread.getAllStackTraces(Thread.java:1610)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027773#comment-17027773
 ] 

Íñigo Goiri commented on HDFS-15147:


Thanks for  [^HDFS-15147.002.patch].
Minor comments:
* We probably want to make the waitFor with the long as the one with the 
implementation and the current one with the ints just do a cast and call the 
long method. Not sure about the Supplier replacement... I guess it goes into 
the removing guava philosophy. Your call.
* In joinUninterruptibly(), we never set the interrupted, I think you need to 
do that before the return.
* The javadoc in BlockManager#lastRedundancyCycleTS has an extra asterisk at 
the end. Same for lazyPersistFileScrubberTS .

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch, 
> HDFS-15147.002.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently

2020-01-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13179:
---
Fix Version/s: 2.10.1

> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails 
> intermittently
> --
>
> Key: HDFS-13179
> URL: https://issues.apache.org/jira/browse/HDFS-13179
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Ahmed Hussein
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-13179-branch-2.10.003.patch, HDFS-13179.001.patch, 
> HDFS-13179.002.patch, HDFS-13179.003.patch, test runs.zip
>
>
> The error caused by TimeoutException because the test is waiting to ensure 
> that the file is replicated to DISK storage but the replication can't be 
> finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), 
> but the file is still on RAM_DISK - so there is no data loss.
> Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially 
> fixes the flakiness. 
> {code:java}
> try {
>   ensureFileReplicasOnStorageType(path1, DEFAULT);
> }catch (TimeoutException t){
>   LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on 
> RAM_DISK");
>   ensureFileReplicasOnStorageType(path1, RAM_DISK);
> }
>   }
> {code}
> Some thoughts:
> * Successful and failed tests run similar to the point when datanode 
> restarts. Restart line is the following in the log: LazyPersistTestCase - 
> Restarting the DataNode
> * There is a line which only occurs in the failed test: *addStoredBlock: 
> Redundant addStoredBlock request received for blk_1073741825_1001 on node 
> 127.0.0.1:49455 size 5242880*
> * This redundant request at BlockManager#addStoredBlock could be the main 
> reason for the test fail. Something wrong with the gen stamp? Corrupt 
> replicas? 
> =
> Current fail ratio based on my test of TestLazyPersistReplicaRecovery: 
> 1000 runs, 34 failures (3.4% fail)
> Failure rate analysis:
> TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4%
> 33 failures caused by: {noformat}
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 
> on 39589" 
> {noformat}
> 1 failure caused by: {noformat}
> java.net.BindException: Problem binding to [localhost:56729] 
> java.net.BindException: Address already in use; For more details see: 
> http://wiki.apache.org/hadoop/BindException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
>  Caused by: java.net.BindException: Address already in use at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
> {noformat}
> =
> Example stacktrace:
> {noformat}
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2017-11-01 10:36:49,499
> "Thread-1" prio=5 tid=13 runnable
> java.lang.Thread.State: RUNNABLE
> at java.lang.Thread.dumpThreads(Native Method)
> at java.lang.Thread.getAllStackTraces(Thread.java:1610)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently

2020-01-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027774#comment-17027774
 ] 

Íñigo Goiri commented on HDFS-13179:


Committed all the way to branch-2.10.

> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails 
> intermittently
> --
>
> Key: HDFS-13179
> URL: https://issues.apache.org/jira/browse/HDFS-13179
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Ahmed Hussein
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-13179-branch-2.10.003.patch, HDFS-13179.001.patch, 
> HDFS-13179.002.patch, HDFS-13179.003.patch, test runs.zip
>
>
> The error caused by TimeoutException because the test is waiting to ensure 
> that the file is replicated to DISK storage but the replication can't be 
> finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), 
> but the file is still on RAM_DISK - so there is no data loss.
> Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially 
> fixes the flakiness. 
> {code:java}
> try {
>   ensureFileReplicasOnStorageType(path1, DEFAULT);
> }catch (TimeoutException t){
>   LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on 
> RAM_DISK");
>   ensureFileReplicasOnStorageType(path1, RAM_DISK);
> }
>   }
> {code}
> Some thoughts:
> * Successful and failed tests run similar to the point when datanode 
> restarts. Restart line is the following in the log: LazyPersistTestCase - 
> Restarting the DataNode
> * There is a line which only occurs in the failed test: *addStoredBlock: 
> Redundant addStoredBlock request received for blk_1073741825_1001 on node 
> 127.0.0.1:49455 size 5242880*
> * This redundant request at BlockManager#addStoredBlock could be the main 
> reason for the test fail. Something wrong with the gen stamp? Corrupt 
> replicas? 
> =
> Current fail ratio based on my test of TestLazyPersistReplicaRecovery: 
> 1000 runs, 34 failures (3.4% fail)
> Failure rate analysis:
> TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4%
> 33 failures caused by: {noformat}
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 
> on 39589" 
> {noformat}
> 1 failure caused by: {noformat}
> java.net.BindException: Problem binding to [localhost:56729] 
> java.net.BindException: Address already in use; For more details see: 
> http://wiki.apache.org/hadoop/BindException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
>  Caused by: java.net.BindException: Address already in use at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
> {noformat}
> =
> Example stacktrace:
> {noformat}
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2017-11-01 10:36:49,499
> "Thread-1" prio=5 tid=13 runnable
> java.lang.Thread.State: RUNNABLE
> at java.lang.Thread.dumpThreads(Native Method)
> at java.lang.Thread.getAllStackTraces(Thread.java:1610)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027778#comment-17027778
 ] 

Hadoop QA commented on HDFS-15124:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
49s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}110m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}172m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15124 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992348/HDFS-15124.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1e49660b6b58 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf8686f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28729/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28729/testReport/ |
| Max. process+thread count | 2974 (vs. ulimit of 5500) |
| modu

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027790#comment-17027790
 ] 

Ctest commented on HDFS-15124:
--

[~elgoiri]  Thank you a lot for pointing out this! The 004 patch has already 
fixed the checkstyle issue.

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInsta

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027853#comment-17027853
 ] 

Hadoop QA commented on HDFS-15124:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15124 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992358/HDFS-15124.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9ec3af45c2b7 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf8686f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28730/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28730/testReport/ |
| Max. process+thread count | 3013

[jira] [Assigned] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned HDFS-15124:
--

Assignee: Ctest

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Assignee: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>     

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027868#comment-17027868
 ] 

Íñigo Goiri commented on HDFS-15124:


Before closing this... just making sure there is no quick test we can add for 
this.
What about initialize one with and without the TopAuditLogger and checking that 
in each case it does the right thing?

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Assignee: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className

[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027875#comment-17027875
 ] 

Hadoop QA commented on HDFS-15147:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
57s{color} | {color:green} root generated 0 new + 1868 unchanged - 2 fixed = 
1868 total (was 1870) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
53s{color} | {color:green} root: The patch generated 0 new + 448 unchanged - 12 
fixed = 448 total (was 460) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
27s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}214m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
|   | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15147 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992360/HDFS-15147.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d68abdf14006 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027881#comment-17027881
 ] 

Ctest commented on HDFS-15124:
--

I can try to write a test to do this, it may take some time. Should I 
incorporate the test into this patch?

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Assignee: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) C

[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027881#comment-17027881
 ] 

Ctest edited comment on HDFS-15124 at 1/31/20 11:28 PM:


I can have a try to write a test to do this, but it may take some time.(I am 
not sure whether it is easy or not). Should I incorporate the test into this 
patch?


was (Author: ctest.team):
I can try to write a test to do this, it may take some time. Should I 
incorporate the test into this patch?

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Assignee: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String cla

[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027881#comment-17027881
 ] 

Ctest edited comment on HDFS-15124 at 1/31/20 11:29 PM:


I can have a try to write a test to do this, but it may take some time.(I am 
not sure whether it will be a quick test or not).

Should I incorporate the test into this patch? Or start a new issue to write 
the test?


was (Author: ctest.team):
I can have a try to write a test to do this, but it may take some time.(I am 
not sure whether it is easy or not). Should I incorporate the test into this 
patch?

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Assignee: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List aud

[jira] [Commented] (HDFS-15148) dfs.namenode.send.qop.enabled should not apply to primary NN port

2020-01-31 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027899#comment-17027899
 ] 

Konstantin Shvachko commented on HDFS-15148:


This looks reasonable. Would make sense to 
# fix checkstyle warning with some unused imports
# Remove {{Thread.sleep(*);}} before the loop. I see 3 occasions of those. You 
shouldn't need them anymore.

> dfs.namenode.send.qop.enabled should not apply to primary NN port
> -
>
> Key: HDFS-15148
> URL: https://issues.apache.org/jira/browse/HDFS-15148
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15148.001.patch, HDFS-15148.002.patch, 
> HDFS-15148.003.patch
>
>
> In HDFS-13617, NameNode can be configured to wrap its established QOP into 
> block access token as an encrypted message. Later on DataNode will use this 
> message to create SASL connection. But this new behavior should only apply to 
> new auxiliary NameNode ports, not the primary port (the one configured in 
> fs.defaultFS), as it may cause conflicting behavior with existing other SASL 
> related configuration (e.g. dfs.data.transfer.protection). Since this 
> configure is introduced for to auxiliary ports only, we should restrict this 
> new behavior to not apply to primary port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15148) dfs.namenode.send.qop.enabled should not apply to primary NN port

2020-01-31 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027899#comment-17027899
 ] 

Konstantin Shvachko edited comment on HDFS-15148 at 2/1/20 12:08 AM:
-

This looks reasonable. Would make sense to 
# fix checkstyle warning with some unused imports
# Remove {{Thread.sleep( * );}} before the loop. I see 3 occasions of those. 
You shouldn't need them anymore.


was (Author: shv):
This looks reasonable. Would make sense to 
# fix checkstyle warning with some unused imports
# Remove {{Thread.sleep(*);}} before the loop. I see 3 occasions of those. You 
shouldn't need them anymore.

> dfs.namenode.send.qop.enabled should not apply to primary NN port
> -
>
> Key: HDFS-15148
> URL: https://issues.apache.org/jira/browse/HDFS-15148
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15148.001.patch, HDFS-15148.002.patch, 
> HDFS-15148.003.patch
>
>
> In HDFS-13617, NameNode can be configured to wrap its established QOP into 
> block access token as an encrypted message. Later on DataNode will use this 
> message to create SASL connection. But this new behavior should only apply to 
> new auxiliary NameNode ports, not the primary port (the one configured in 
> fs.defaultFS), as it may cause conflicting behavior with existing other SASL 
> related configuration (e.g. dfs.data.transfer.protection). Since this 
> configure is introduced for to auxiliary ports only, we should restrict this 
> new behavior to not apply to primary port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-7175:
--
Fix Version/s: 3.2.2
   3.1.4
   3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~aajisaka] for the patch and [~sodonnell] for pushing it to the finish 
line!

> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Carl Steinbach
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, 
> HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027901#comment-17027901
 ] 

Íñigo Goiri commented on HDFS-15124:


Let's try to do it in this JIRA if it doesn't take too long.
It is usually better to have complete patches.

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Assignee: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch, 
> HDFS-15124.005.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (Aud

[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2020-01-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15046:
---
Component/s: datanode

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.1
>
> Attachments: HDFS-15046.branch-2.001.patch, 
> HDFS-15046.branch-2.9.001.patch, HDFS-15046.branch-2.9.002(2).patch, 
> HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2020-01-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15046:
---
Fix Version/s: 2.10.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.1
>
> Attachments: HDFS-15046.branch-2.001.patch, 
> HDFS-15046.branch-2.9.001.patch, HDFS-15046.branch-2.9.002(2).patch, 
> HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-31 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027905#comment-17027905
 ] 

Hudson commented on HDFS-7175:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17923 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17923/])
HDFS-7175. Client-side SocketTimeoutException during Fsck. Contributed 
(weichiu: rev 1e3a0b0d931676b191cb4813ed1a283ebb24d4eb)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md


> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Carl Steinbach
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, 
> HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-31 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15147:
-
Attachment: HDFS-15147.003.patch

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch, 
> HDFS-15147.002.patch, HDFS-15147.003.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-31 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027920#comment-17027920
 ] 

Ahmed Hussein commented on HDFS-15147:
--

Thanks [~inigoiri] for the review.

{\{joinUninterruptibly}} is a good catch!

I addressed your comments, and fixed the implementation of 
"{{ensureLazyPersistBlocksAreSaved()}}" to follow the same methodology as 
"{{ensureFileReplicasOnStorageType()}}". 

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch, 
> HDFS-15147.002.patch, HDFS-15147.003.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-01-31 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027939#comment-17027939
 ] 

Ahmed Hussein commented on HDFS-15149:
--

The changes in HDFS-14651 frequently causes a timeout in the 
{{TestDeadNodeDetection}}. There are many configurations and threads involved 
in the JUnit but the wait time is very long 100 sec which eventually times out 
with successive calls to {{WaitFor}}

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
> "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:552)
> at java.util.TimerThread.run(Timer.java:505)
> "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
>  daemon prio=5 tid=385 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
> at java.lang.Thread.run(Thread.java:748)
> "qtp164757726-349" daemon prio=5 tid=349 runnable
> java.lang.Thread.State: RUNNABLE
>   

[jira] [Commented] (HDFS-14651) DeadNodeDetector checks dead node periodically

2020-01-31 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027943#comment-17027943
 ] 

Ahmed Hussein commented on HDFS-14651:
--

Hi [~leosun08] and [~linyiqun]!
I am looking at the timeouts caused by the changes in this patch HDFS-15147

I have couple of questions:
# what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, 
every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing 
the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need 
{{deadNodeDetectInterval}} if the actual time gap between every check is 
{{IDLE_SLEEP_MS}}?
# Correct me if I am wrong: 
{{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop 
the deadNodeDetector thread; but it looks like the implementation never of the 
runnable  never terminates. {{DeadNodeDetector}} surpresses all interrupts and 
never checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join. 

> DeadNodeDetector checks dead node periodically
> --
>
> Key: HDFS-14651
> URL: https://issues.apache.org/jira/browse/HDFS-14651
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, 
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, 
> If the access is successful, the Node will be moved from 
> DeadNodeDetector#deadnode. Continuous detection of the dead node is 
> necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14651) DeadNodeDetector checks dead node periodically

2020-01-31 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027943#comment-17027943
 ] 

Ahmed Hussein edited comment on HDFS-14651 at 2/1/20 2:31 AM:
--

Hi [~leosun08] and [~linyiqun]!
I am looking at the timeouts caused by the changes in this patch HDFS-15149

I have couple of questions:
# what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, 
every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing 
the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need 
{{deadNodeDetectInterval}} if the actual time gap between every check is 
{{IDLE_SLEEP_MS}}?
# Correct me if I am wrong: 
{{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop 
the deadNodeDetector thread; but it looks like the implementation never of the 
runnable  never terminates. {{DeadNodeDetector}} surpresses all interrupts and 
never checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join. 


was (Author: ahussein):
Hi [~leosun08] and [~linyiqun]!
I am looking at the timeouts caused by the changes in this patch HDFS-15147

I have couple of questions:
# what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, 
every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing 
the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need 
{{deadNodeDetectInterval}} if the actual time gap between every check is 
{{IDLE_SLEEP_MS}}?
# Correct me if I am wrong: 
{{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop 
the deadNodeDetector thread; but it looks like the implementation never of the 
runnable  never terminates. {{DeadNodeDetector}} surpresses all interrupts and 
never checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join. 

> DeadNodeDetector checks dead node periodically
> --
>
> Key: HDFS-14651
> URL: https://issues.apache.org/jira/browse/HDFS-14651
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, 
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, 
> If the access is successful, the Node will be moved from 
> DeadNodeDetector#deadnode. Continuous detection of the dead node is 
> necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14651) DeadNodeDetector checks dead node periodically

2020-01-31 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027943#comment-17027943
 ] 

Ahmed Hussein edited comment on HDFS-14651 at 2/1/20 2:32 AM:
--

Hi [~leosun08] and [~linyiqun]!
I am looking at the timeouts caused by the changes in this patch HDFS-15149

I have couple of questions:
# what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, 
every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing 
the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need 
{{deadNodeDetectInterval}} if the actual time gap between every check is 
{{IDLE_SLEEP_MS}}?
# Correct me if I am wrong: 
{{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop 
the deadNodeDetector thread; but it looks like the implementation of the 
runnable  never terminates. {{DeadNodeDetector}} surpresses all interrupts and 
never checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join. 


was (Author: ahussein):
Hi [~leosun08] and [~linyiqun]!
I am looking at the timeouts caused by the changes in this patch HDFS-15149

I have couple of questions:
# what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, 
every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing 
the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need 
{{deadNodeDetectInterval}} if the actual time gap between every check is 
{{IDLE_SLEEP_MS}}?
# Correct me if I am wrong: 
{{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop 
the deadNodeDetector thread; but it looks like the implementation never of the 
runnable  never terminates. {{DeadNodeDetector}} surpresses all interrupts and 
never checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join. 

> DeadNodeDetector checks dead node periodically
> --
>
> Key: HDFS-14651
> URL: https://issues.apache.org/jira/browse/HDFS-14651
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, 
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, 
> If the access is successful, the Node will be moved from 
> DeadNodeDetector#deadnode. Continuous detection of the dead node is 
> necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027964#comment-17027964
 ] 

Hadoop QA commented on HDFS-15147:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
1s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m  
2s{color} | {color:green} root generated 0 new + 1868 unchanged - 2 fixed = 
1868 total (was 1870) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} root: The patch generated 0 new + 445 unchanged - 14 
fixed = 445 total (was 459) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 31s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 45s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}225m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15147 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992382/HDFS-15147.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 49c186c6883d 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk /

[jira] [Commented] (HDFS-15111) start / stopStandbyServices() should log which service it is transitioning to/from.

2020-01-31 Thread Xieming Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028003#comment-17028003
 ] 

Xieming Li commented on HDFS-15111:
---

Hi, [~shv]

I have started working on this issue:

Before modifying any code, I would like to make the issue clear.

Currenly, transtioning from Standby to Observer produces the following logs, 
and it seems correct.
{code:java}
2020-02-01 16:38:55,947 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started 
for standby state
2020-02-01 16:38:55,951 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for observer state
{code}
 

On the other hand, transitioning from Observer to Standby seems a little 
problematic: (Standby --> Standby). Is this the problem this issue tryting to 
solve?
{code:java}
2020-02-01 16:39:16,100 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started 
for standby state
2020-02-01 16:39:16,102 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for standby state{code}

> start / stopStandbyServices() should log which service it is transitioning 
> to/from.
> ---
>
> Key: HDFS-15111
> URL: https://issues.apache.org/jira/browse/HDFS-15111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, logging
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Xieming Li
>Priority: Major
>  Labels: newbie++
>
> Trying to transition Observer to Standby state. Both 
> {{stopStandbyServices()}} and {{startStandbyServices()}} log that they are 
> stopping/starting Standby services.
> # {{startStandbyServices()}} should log which state it is transitioning TO.
> # {{stopStandbyServices()}} should log which state it is transitioning FROM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org