[ 
https://issues.apache.org/jira/browse/HDFS-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779348#comment-17779348
 ] 

ASF GitHub Bot commented on HDFS-17232:
---------------------------------------

hadoop-yetus commented on PR #6208:
URL: https://github.com/apache/hadoop/pull/6208#issuecomment-1778590147

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  2s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  2s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m  3s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  38m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 17s |  |  
hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 0 new + 0 unchanged - 
1 fixed = 0 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  23m 26s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 166m 57s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6208 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 38772091eea6 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f29fcf9a6015be8f08df6bdcda3f12391d3ee890 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/3/testReport/ |
   | Max. process+thread count | 2435 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF: Fix NoNamenodesAvailableException for a long time, when use observer
> -------------------------------------------------------------------------
>
>                 Key: HDFS-17232
>                 URL: https://issues.apache.org/jira/browse/HDFS-17232
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Jian Zhang
>            Assignee: Jian Zhang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-17232.001.patch
>
>
> *Describe*
> I solved the NoNamenodesAvailableException for a long time, when failover 
> without using observer, but when using observer, there are still many 
> problems.
>  #  When the observer fails and there is no active namenode at this time, 
> even if we can rotate the cache, the next request will shuffle the observer 
> namenode to the front of the cache due to the use of the observer, so retry 
> will still send the request to the failed observer node.
>  # If there are multiple observers, and an exception occurs when accessing an 
> observer and there is no active namenode at this time, a 
> NoNamenodesAvailableException will be caused and the server will try again. 
> However, since using the observer will put the observer node at the front of 
> the cache, it may still fail.
>  # When there are multiple observers, one of which is unavailable and there 
> is no active namenode at this time, we should continue to try the next 
> observer, so that the currently unavailable observer can be marked as 
> unavailable, and subsequent requests can avoid the unavailable observer.
>  # If it is due to an illegal operation, that is, even if the operation is 
> sent to the active namenode, an exception will occur, resulting in 
> NoNamenodesAvailableException. If the cache is rotated at this time, the next 
> normal request will be sent to the namenode that is indeed the standby, 
> causing an error in the legal request. , so illegal operations should not 
> rotate the cache.
>  
> Detailed bug description: HDFS-17166
>  
> - *case  1:*
> * router's cache : [ observer-1(problematic), standby-2, standby-3(actually 
> active) ]
> * client read  -> observer-1   throw   NoNamenodesAvailableException  -> 
> rotate the cache -> [ standby-2, standby-3(actually 
> active),observer-1(problematic) ]
> * client retry read ->  shuffleObserverNN ->   [ observer-1(problematic), 
> standby-2, standby-3(actually active) ] -> observer-1   throw   
> NoNamenodesAvailableException  -> rotate the cache -> [ standby-2, 
> standby-3(actually active),observer-1(problematic) ]
> * *.....*
> * client  (reties > max.attempts )   ->    Read failed
>  
> - *case 2:*
> * router's cache :   [ observer-1(problematic), observer-2, standby-3, 
> standby-4(actually active) ]  
> * client read  -> observer-1   throw   NoNamenodesAvailableException  -> 
> rotate the cache -> [ observer-2, standby-3, standby-4(actually 
> active),observer-1(problematic) ]
> * client retry read ->  shuffleObserverNN ->  [ observer-1(problematic), 
> observer-2, standby-3, standby-4(actually active) ] (may happen) -> 
> observer-1   throw   NoNamenodesAvailableException  -> rotate the cache -> [ 
> observer-2, standby-3, standby-4(actually active),observer-1(problematic) ]
> * *.....*
> * client  may (reties > max.attempts )   ->    Read failed
> - *case 3:*
> * router's cache :   [ standby-1, standby-2(actually active) ]  
> * client request  -> standby-1   throw   NoNamenodesAvailableException  -> 
> rotate the cache -> [ standby-2(actually active),standby-1 ]
> * client retry request ->  standby-2(actually active) success
> * client Illegal request -> standby-2(actually active)  throw   
> NoNamenodesAvailableException -> rotate the cache -> [standby1, 
> standby-2(actually active) ]
> * client legal request -> standby1 throw   NoNamenodesAvailableException  
> failed
> *How to reproduce*
> I have provided unit tests:TestNoNamenodesAvailableLongTime
> You can use the original code and run my new unit tests to reproduce the 
> above problems.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to