[jira] [Commented] (HDFS-15097) Purge log in KMS and HttpFS

2020-01-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014098#comment-17014098
 ] 

Hadoop QA commented on HDFS-15097:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 31s{color} | {color:orange} root: The patch generated 2 new + 0 unchanged - 
0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
30s{color} | {color:green} hadoop-kms in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
37s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15097 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12990689/HDFS-15097.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5e9d88c9571c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 52b360a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs 

[jira] [Commented] (HDFS-15106) Remove unused code from FSDirConcatOp.

2020-01-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014092#comment-17014092
 ] 

Ayush Saxena commented on HDFS-15106:
-

Thanx [~LiJinglun] for the confirmation, will commit this today

> Remove unused code from FSDirConcatOp.
> --
>
> Key: HDFS-15106
> URL: https://issues.apache.org/jira/browse/HDFS-15106
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Trivial
> Attachments: HDFS-15106.001.patch
>
>
> I was reading code about rpc concat()  and found an unused variable named 
> count. It was originally used to compute the namespace delta. Now we use 
> QuotaCounts deltas so it becomes useless.  To be clear we should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15106) Remove unused code from FSDirConcatOp.

2020-01-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15106:

Summary: Remove unused code from FSDirConcatOp.  (was: Remove unused code.)

> Remove unused code from FSDirConcatOp.
> --
>
> Key: HDFS-15106
> URL: https://issues.apache.org/jira/browse/HDFS-15106
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Trivial
> Attachments: HDFS-15106.001.patch
>
>
> I was reading code about rpc concat()  and found an unused variable named 
> count. It was originally used to compute the namespace delta. Now we use 
> QuotaCounts deltas so it becomes useless.  To be clear we should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15097) Purge log in KMS and HttpFS

2020-01-12 Thread Doris Gu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doris Gu updated HDFS-15097:

Status: Patch Available  (was: Open)

> Purge log in KMS and HttpFS
> ---
>
> Key: HDFS-15097
> URL: https://issues.apache.org/jira/browse/HDFS-15097
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs, kms
>Affects Versions: 3.1.3, 3.2.1, 3.0.3, 3.3.0
>Reporter: Doris Gu
>Assignee: Doris Gu
>Priority: Minor
> Attachments: HDFS-15097.001.patch
>
>
> KMS and HttpFS uses ConfigurationWithLogging instead of Configuration,  which 
> logs a configuration object each access.  It's more like a development use.
> {code:java}
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 
> 'false') 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 'false')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15097) Purge log in KMS and HttpFS

2020-01-12 Thread Doris Gu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doris Gu updated HDFS-15097:

Attachment: HDFS-15097.001.patch

> Purge log in KMS and HttpFS
> ---
>
> Key: HDFS-15097
> URL: https://issues.apache.org/jira/browse/HDFS-15097
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs, kms
>Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3
>Reporter: Doris Gu
>Assignee: Doris Gu
>Priority: Minor
> Attachments: HDFS-15097.001.patch
>
>
> KMS and HttpFS uses ConfigurationWithLogging instead of Configuration,  which 
> logs a configuration object each access.  It's more like a development use.
> {code:java}
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 
> 'false') 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 'false')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15097) Purge log in KMS and HttpFS

2020-01-12 Thread Doris Gu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doris Gu updated HDFS-15097:

Attachment: (was: HDFS-15097.001.patch)

> Purge log in KMS and HttpFS
> ---
>
> Key: HDFS-15097
> URL: https://issues.apache.org/jira/browse/HDFS-15097
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs, kms
>Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3
>Reporter: Doris Gu
>Assignee: Doris Gu
>Priority: Minor
> Attachments: HDFS-15097.001.patch
>
>
> KMS and HttpFS uses ConfigurationWithLogging instead of Configuration,  which 
> logs a configuration object each access.  It's more like a development use.
> {code:java}
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 
> 'false') 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 'false')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15097) Purge log in KMS and HttpFS

2020-01-12 Thread Doris Gu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doris Gu updated HDFS-15097:

Status: Open  (was: Patch Available)

> Purge log in KMS and HttpFS
> ---
>
> Key: HDFS-15097
> URL: https://issues.apache.org/jira/browse/HDFS-15097
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs, kms
>Affects Versions: 3.1.3, 3.2.1, 3.0.3, 3.3.0
>Reporter: Doris Gu
>Assignee: Doris Gu
>Priority: Minor
> Attachments: HDFS-15097.001.patch
>
>
> KMS and HttpFS uses ConfigurationWithLogging instead of Configuration,  which 
> logs a configuration object each access.  It's more like a development use.
> {code:java}
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 
> 'false') 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' 
> 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: 
> Got hadoop.security.instrumentation.requires.admin = 'false' (default 'false')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15106) Remove unused code.

2020-01-12 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014008#comment-17014008
 ] 

Jinglun commented on HDFS-15106:


The failed tests are not related. And I think we don't need to add new test 
cases since the patch only removed an unused local variable.

> Remove unused code.
> ---
>
> Key: HDFS-15106
> URL: https://issues.apache.org/jira/browse/HDFS-15106
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Trivial
> Attachments: HDFS-15106.001.patch
>
>
> I was reading code about rpc concat()  and found an unused variable named 
> count. It was originally used to compute the namespace delta. Now we use 
> QuotaCounts deltas so it becomes useless.  To be clear we should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15114) JournalNodes' committed-txid file includes aborted transaction, breaks NameNode startup

2020-01-12 Thread Steven Rand (Jira)
Steven Rand created HDFS-15114:
--

 Summary: JournalNodes' committed-txid file includes aborted 
transaction, breaks NameNode startup
 Key: HDFS-15114
 URL: https://issues.apache.org/jira/browse/HDFS-15114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node, namenode
Affects Versions: 3.2.1
Reporter: Steven Rand


A couple of days ago, our active NameNode in an HA setup aborted a 
{{QuorumOutputStream}} starting at tx 3389424 because tx 3389425 failed to be 
written. This was likely related to a rolling restart of the three JournalNodes 
that was happening at this time. The NameNode logged:
{code:java}
2020-01-11 02:00:50,229 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
10.6.1.181
2020-01-11 02:00:50,229 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Rolling edit logs
2020-01-11 02:00:50,229 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Ending log segment 3389424, 3389424
2020-01-11 02:00:50,229 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Number of transactions: 2 Total time for transactions(ms): 1 Number of 
transactions batched in Syncs:
 0 Number of syncs: 1 SyncTimes(ms): 1 7
2020-01-11 02:00:50,245 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 
10.6.2.187:8485 failed to write txns 3389425-3389425. Will try to write to this 
JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException):
 Can't write, no segment open ; journal id: 
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:545)
... rest of stacktrace ...

// the same warning for the second JournalNode
// the same warning for the third JournalNode

2020-01-11 02:00:50,246 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Error: flush failed for required journal (JournalAndStream(mgr=QJM to 
[10.6.1.4:8485, 10.6.1.181:8485, 10.6.2.187:8485], stream=QuorumOutputStream 
starting at txid 3389424))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions 
to achieve quorum size 2/3. 3 exceptions thrown:

// the same "Can't write, no segment open ; journal id: " error 
for all 3 JournalNodes

2020-01-11 02:00:50,246 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting 
QuorumOutputStream starting at txid 3389424
2020-01-11 02:00:50,255 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: Error: flush failed for required journal (JournalAndStream(mgr=QJM to 
[10.6.1.4:8485, 10.6.1.181:8485, 10.6.2.187:8485], stream=QuorumOutputStream 
starting at txid 3389424))
{code}
Even though the stream was aborted, the {{committed-txid}} file on each of the 
three JournalNodes was updated to be {{3389424}}.

This caused both NameNodes to fail to start with this error:
  
{code:java}
2020-01-11 02:54:35,483 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for active state
2020-01-11 02:54:35,491 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Starting recovery 
process for unclosed journal segments...
2020-01-11 02:54:35,537 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Successfully 
started new epoch 80
2020-01-11 02:54:35,537 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Beginning recovery 
of unclosed segment starting at txid 3389422
2020-01-11 02:54:35,574 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Recovery prepare 
phase complete. Responses:
10.6.1.4:8485: segmentState { startTxId: 3389422 endTxId: 3389423 isInProgress: 
false } lastWriterEpoch: 57 lastCommittedTxId: 3389424
10.6.2.187:8485: segmentState { startTxId: 3389422 endTxId: 3389423 
isInProgress: false } lastWriterEpoch: 57 lastCommittedTxId: 3389424
2020-01-11 02:54:35,575 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest log: 
10.6.1.4:8485=segmentState {
  startTxId: 3389422
  endTxId: 3389423
  isInProgress: false
}
lastWriterEpoch: 57
lastCommittedTxId: 3389424

2020-01-11 02:54:35,575 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Error: recoverUnfinalizedSegments failed for required journal 
(JournalAndStream(mgr=QJM to [10.6.1.4:8485, 10.6.1.181:8485, 10.6.2.187:8485], 
stream=null))
java.lang.AssertionError: Decided to synchronize log to startTxId: 3389422
endTxId: 3389423
isInProgress: false
 but logger 10.6.1.4:8485 had seen txid 3389424 committed
... rest of stacktrace ...
2020-01-11 02:54:35,577 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: Error: recoverUnfinalizedSegments failed for required journal 
(JournalAndStream(mgr=QJM to [10.6.1.4:8485, 10.6.1.181:8485, 10.6.2.187:8485], 
stream=null))
{code}
A potentially relevant detail is that each of the three JournalNodes logged 
this at around the time of the 

[jira] [Commented] (HDFS-14963) Add HDFS Client machine caching active namenode index mechanism.

2020-01-12 Thread Xudong Cao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013957#comment-17013957
 ] 

Xudong Cao commented on HDFS-14963:
---

The above mentioned issue HDFS-15024 seems stuck, so can we process this patch 
first?

> Add HDFS Client machine caching active namenode index mechanism.
> 
>
> Key: HDFS-14963
> URL: https://issues.apache.org/jira/browse/HDFS-14963
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>  Labels: multi-sbnn
>
> In multi-NameNodes scenery, a new hdfs client always begins a rpc call from 
> the 1st namenode, simply polls, and finally determines the current Active 
> namenode. 
> This brings at least two problems:
>  # Extra failover consumption, especially in the case of frequent creation of 
> clients.
>  # Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}
> We can introduce a solution for this problem: in client machine, for every 
> hdfs cluster, caching its current Active NameNode index in a separate cache 
> file named by its uri. *Note these cache files are shared by all hdfs client 
> processes on this machine*.
> For example, suppose there are hdfs://ns1 and hdfs://ns2, and the client 
> machine cache file directory is /tmp, then:
>  # the ns1 cluster related cache file is /tmp/ns1
>  # the ns2 cluster related cache file is /tmp/ns2
> And then:
>  #  When a client starts, it reads the current Active NameNode index from the 
> corresponding cache file based on the target hdfs uri, and then directly make 
> an rpc call toward the right ANN.
>  #  After each time client failovers, it need to write the latest Active 
> NameNode index to the corresponding cache file based on the target hdfs uri.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15027) Correct target DN's log while balancing.

2020-01-12 Thread Xudong Cao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013954#comment-17013954
 ] 

Xudong Cao commented on HDFS-15027:
---

[~weichiu] can this patch be merged now?

> Correct target DN's log while balancing.
> 
>
> Key: HDFS-15027
> URL: https://issues.apache.org/jira/browse/HDFS-15027
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.2.1
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-15027.000.patch, HDFS-15027.001.patch
>
>
> During HDFS balancing, after the target DN copied a block from the proxy DN, 
> it prints a log following the pattern below:
> *Moved BLOCK from BALANCER*
> This is wrong and misleading, maybe we can improve the pattern like:
> *Moved BLOCK complete, copied from PROXY DN, initiated by* *BALANCER*
>  
> An example log of target DN during balancing:
> 1. Wrong log printing before jira:
> {code:java}
> 2019-12-04 09:33:19,718 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Moved BP-1426342230-192.168.202.11-1575277482603:blk_1073741889_1065 from 
> /192.168.202.13:56322, delHint=54a14a41-0d7c-4487-b4f0-ce2848f86b48{code}
> 2. Correct log printing after jira:
> {code:java}
> 2019-12-12 10:06:34,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Moved BP-1360308441-192.168.202.11-1576116241828:blk_1073741872_1048 
> complete, copied from /192.168.202.11:9866, initiated by 
> /192.168.202.13:53536, delHint=c70406f8-a815-4f6f-bdf0-fd3661bd6920{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15112) RBF: do not return FileNotFoundException when a subcluster is unavailable

2020-01-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013824#comment-17013824
 ] 

Ayush Saxena commented on HDFS-15112:
-

In {{InvokeConcurrent}} there is a logic which requires to get response from 
all nameservices if {{requireResponse}} is true.

{code:java}
for (final RemoteResult result : results) {
  // Response from all servers required, use this error.
  if (requireResponse && result.hasException()) {
throw result.getException();
  }
{code}

It is returning the same exception which it got from the namespace, In case the 
nameservice is down and {{invokeConcurrent}} call is made with 
{{requireResponse}} as true, it will be returning the same exception as 
received by the namenode. 

Maybe we can do the same here too, if it is one of {{isUnavailableException()}} 
we give that exception a priority rather than the first received. That way at 
the client level also, if the same exception was encountered by the client 
while connecting to the namenode, if he retried or did a failover, he can do 
that similarly here and we will be safe from concluding also that the file 
actually doesn't exist or not. By having a retry, we may land up with a 
response too, if the problem was temporary or with one router only.

Another solution could be having a new Exception for the pourpose, or maybe the 
same NoNamenodeException, but these won't be unwrapped at the client side they 
would be all RemoteException only.

Whatever fits your use case shall be fine with me, if none, let me know, I will 
try to come up with some other idea. :)

> RBF: do not return FileNotFoundException when a subcluster is unavailable 
> --
>
> Key: HDFS-15112
> URL: https://issues.apache.org/jira/browse/HDFS-15112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-15112.000.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15112) RBF: do not return FileNotFoundException when a subcluster is unavailable

2020-01-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013787#comment-17013787
 ] 

Íñigo Goiri commented on HDFS-15112:


[~ayushtkn], yes, that's the idea, to not give a false file not found.
That's the most important thing. 
After that, I'm not sure what is the best approach.
For now, I just started returning the easiest exception. 
Retry after some time might be an option.
Any proposal to what to return in this case? 

> RBF: do not return FileNotFoundException when a subcluster is unavailable 
> --
>
> Key: HDFS-15112
> URL: https://issues.apache.org/jira/browse/HDFS-15112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-15112.000.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15112) RBF: do not return FileNotFoundException when a subcluster is unavailable

2020-01-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013762#comment-17013762
 ] 

Ayush Saxena commented on HDFS-15112:
-

Thanx [~elgoiri] for the report.
Just to catch the intent, is our motive to just denote to the client, that the 
file isn't lost but we are facing some cluster issues?
NoNamenodeException is Ok to denote, but I think this won't be retried, if the 
cluster issue is temporary or at only one router, we may consider it being 
retried on same router or maybe on another router.

> RBF: do not return FileNotFoundException when a subcluster is unavailable 
> --
>
> Key: HDFS-15112
> URL: https://issues.apache.org/jira/browse/HDFS-15112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-15112.000.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature

2020-01-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013691#comment-17013691
 ] 

Hadoop QA commented on HDFS-15113:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
55s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 44 unchanged - 0 fixed = 45 total (was 44) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 56s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestAddBlockTailing |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15113 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12990655/HDFS-15113.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 81bc18c033e3 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cebce0a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28649/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28649/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results |