[ 
https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-13831:
---------------------------------------
    Description: 
Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ 
environment.  The idea of the test is that with when 2 HBCK operations are 
running simultaneously, the 2nd HBCK would fail with no-retry because creating 
lock file would fail due to the 1st HBCK already created.  However, with 
HADOOP-2.6+, the FileSystem#createFile call internally retries with 
AlreadyBeingCreatedException (see HBASE-13574 for more details: "It seems that 
test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy 
now created with custom RetryPolicy for AlreadyBeingCreatedException which is 
implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD 
(60seconds).")

When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a 
Windows environment (HBASE is branch-1.1) multiple times, the result is 
unpredictable (sometime succeeded, sometime failed - more failure than 
succeeded).  

The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait 
time to a smaller number.   

  was:
Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ 
environment.  The idea of the test is that with when 2 HBCK operations are 
running simultaneously, the 2nd HBCK would fail with no-retry because creating 
lock file would fail due to the 1st HBCK already created.  However, with 
HADOOP-2.6+, the FileSystem#createFile call internally retries with 
AlreadyBeingCreatedException (see HBASE-13574 for more details: "It seems that 
test is broken due of the new create retry policy in hadoop 2.6. 
Namenode proxy now created with custom RetryPolicy for 
AlreadyBeingCreatedException which is implies timeout on this operations up to 
HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).")

When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a 
Windows environment (HBASE is branch-1.1) multiple times, the result is 
unpredictable (sometime succeeded, sometime failed - more failure than 
succeeded).  

The fix is trivial, to leverage the change in HBASE-13732 and reduce the max 
wait time to a smaller number.   


> TestHBaseFsck#testParallelHbck is flaky
> ---------------------------------------
>
>                 Key: HBASE-13831
>                 URL: https://issues.apache.org/jira/browse/HBASE-13831
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck, test
>    Affects Versions: 2.0.0, 1.1.0, 1.2.0
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>            Priority: Minor
>             Fix For: 2.0.0, 1.2.0, 1.1.1
>
>         Attachments: HBASE-13831.patch
>
>
> Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ 
> environment.  The idea of the test is that with when 2 HBCK operations are 
> running simultaneously, the 2nd HBCK would fail with no-retry because 
> creating lock file would fail due to the 1st HBCK already created.  However, 
> with HADOOP-2.6+, the FileSystem#createFile call internally retries with 
> AlreadyBeingCreatedException (see HBASE-13574 for more details: "It seems 
> that test is broken due of the new create retry policy in hadoop 2.6. 
> Namenode proxy now created with custom RetryPolicy for 
> AlreadyBeingCreatedException which is implies timeout on this operations up 
> to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).")
> When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a 
> Windows environment (HBASE is branch-1.1) multiple times, the result is 
> unpredictable (sometime succeeded, sometime failed - more failure than 
> succeeded).  
> The fix is trivial: Leverage the change in HBASE-13732 and reduce the max 
> wait time to a smaller number.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to