[jira] [Commented] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94

Himanshu Vashishtha (JIRA) Sun, 03 Feb 2013 07:20:14 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569798#comment-13569798
 ]


Himanshu Vashishtha commented on HBASE-7607:
--------------------------------------------

Interestingly, with this patch, the regionserver which is aborted is processed 
normally. And, the test passes its normal phase. Its in the cluster shutdown 
process, sometimes master is not able to process the other regionserver dying 
process, but the cluster is considered as shutdown by JVMClusterUtil.
{code}
2013-01-30 19:40:14,048 INFO  [RegionServer:0;localhost,49074,1359600001555] 
regionserver.HRegionServer(851): stopping server localhost,49074,1359600001555; 
zookeeper connection closed.

2013-01-30 19:40:14,048 INFO  [RegionServer:0;localhost,49074,1359600001555] 
regionserver.HRegionServer(854): RegionServer:0;localhost,49074,1359600001555 
exiting

2013-01-30 19:40:14,048 INFO  [localhost,35387,1359600001393.timerUpdater] 
hbase.Chore(80): localhost,35387,1359600001393.timerUpdater exiting

2013-01-30 19:40:14,048 INFO  [Shutdown of 
org.apache.hadoop.hbase.fs.HFileSystem@32d35f5f] 
hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(182): Hook closing 
fs=org.apache.hadoop.hbase.fs.HFileSystem@32d35f5f

2013-01-30 19:40:14,049 INFO  [main] util.JVMClusterUtil(262): Shutdown of 1 
master(s) and 2 regionserver(s) complete

{code}

{code}
2013-01-30 19:40:14,168 INFO  
[RegionServer:0;localhost,49074,1359600001555.leaseChecker] 
regionserver.Leases(132): 
RegionServer:0;localhost,49074,1359600001555.leaseChecker closed leases

2013-01-30 19:40:14,227 INFO  [Master:0;localhost,35387,1359600001393] 
master.ServerManager(357): Waiting on regionserver(s) to go down 
localhost,49074,1359600001555
{code}

But, master thread still looping in its ServerManager#letRegionServersShutdown 
method to process the dead regionserver, which it doesn't get. I am looking 
into the reason why this happens only with this patch (frequently is around 
1/5).
                
> Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94
> -------------------------------------------------------------------
>
>                 Key: HBASE-7607
>                 URL: https://issues.apache.org/jira/browse/HBASE-7607
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, test
>    Affects Versions: 0.94.4
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.6
>
>         Attachments: HBASE-7607-v2.patch
>
>
> TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk 
> and 0.94.X. The codebase is different in both. 
> In 0.94.x, client retries to look at the root region, while the cluster is 
> down and /hbase znode is no longer present.
> "Check the value configured in 'zookeeper.znode.parent'. There could be a 
> mismatch with the one configured in the master."
> I will file a separate jira for the trunk as the code is different there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94

Reply via email to