[ 
https://issues.apache.org/jira/browse/HBASE-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu resolved HBASE-9636.
-------------------------------

    Resolution: Not A Problem

[~shankarlingayya]
 This is as expected behavior only.
 As for the logs you have shared, the region server holding row17-row18 range 
is went down at 18:20:58
 {code}
 Fri Sep 20 18:20:58 IST 2013, 
org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
java.net.ConnectException: Connection refused
 Fri Sep 20 18:20:59 IST 2013, 
org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
in the failed servers list: HOST-10-18-40-172/10.18.40.172:61020
 {code}
 
 After that the regions within the range took more time to assign, because the 
HOST-10-18-40-172 holds many regions and the need to assigned one by one after 
shutdown.
 From the logs we can observe this. Means the META table holds the old region 
server address and on each exception we will clear the cache and read from 
META. But meta also holds HOST-10-18-40-172 so scan failed after 7 retries.
 
 {code}
2013-09-20 18:21:33,539 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
region: t1,row170593,1379679042365.1ad0997453c665bb9707907be08980fa.
 
2013-09-20 18:21:33,551 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:61020-0x1413b3594140079-0x1413b3594140079-0x1413b3594140079-0x1413b3594140079
 Attempting to transition node 1ad0997453c665bb9707907be08980fa from 
M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
 
2013-09-20 18:21:33,557 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
loaded 
hdfs://10.18.40.153:8020/hbase/t1/c18a2bbd6ef4b53f480b53207a68c44e/cf1/04b0a1c45c9f498ebbfd4f8909e693a4,
 isReference=false, isBulkL
 {code}
 There are some configurations which should be tuned to avoid such kind of 
issues.
 1) increase retry count(hbase.client.retries.number - default 7 from shell and 
10 from client)
 2) increase pause time for each retry(hbase.client.pause - default 1 sec)

>  HBase shell/client 'scan table' operation is getting failed inbetween the 
> when the regions are shifted from one Region Server to another Region Server 
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9636
>                 URL: https://issues.apache.org/jira/browse/HBASE-9636
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.94.11
>         Environment: SuSE11
>            Reporter: shankarlingayya
>            Assignee: rajeshbabu
>
> {noformat}
> Problem:
> HBase shell/client 'scan table' operation is getting failed inbetween the 
> when the regions are shifted from one Region Server to another Region Server
> When the table regions data moved from one Region Server to another Region 
> Server then the client/shell should be able to handle the data from the 
> new Region server automatically (because when we have huge data in terms of 
> GB/TB at that time one of the Region Server going down in the cluster is 
> frequent)
> Procedure:
> 1. Setup Non HA Hadoop Cluster with two nodes (Node1-XX.XX.XX.XX,  
> Node2-YY.YY.YY.YY)
> 2. Install Zookeeper, HMaster & HRegionServer in Node-1
> 3. Install HRegionServer in Node-2
> 4. From Node2 create HBase Table ( table name 't1' with one column family 
> 'cf1' )
> 5. add around 367120 rows to the table
> 6. scan the table 't1' using hbase shell & at the same time switch the region 
> server 1 & 2 (so that the table 't1' regions data are moved from Region 
> Server 1 to 1 & vice versa)
> 7. During this time hbase shell is getting failed in between of the scan 
> operation as below
> ...................................................................           
>                      
>  row172266                        column=cf1:a, timestamp=1379680737307, 
> value=100                                              
>  row172267                        column=cf1:a, timestamp=1379680737311, 
> value=100                                              
>  row172268                        column=cf1:a, timestamp=1379680737314, 
> value=100                                              
>  row172269                        column=cf1:a, timestamp=1379680737317, 
> value=100                                              
>  row17227                         column=cf1:a, timestamp=1379679668631, 
> value=100                                              
>  row17227                         column=cf1:b, timestamp=1379681090560, 
> value=200                                             
> ERROR: java.lang.RuntimeException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=7, exceptions:
> Fri Sep 20 18:20:58 IST 2013, 
> org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
> java.net.ConnectException: Connection refused
> Fri Sep 20 18:20:59 IST 2013, 
> org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
> org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
> in the failed servers list: HOST-YY.YY.YY.YY/YY.YY.YY.YY:61020
> Fri Sep 20 18:21:00 IST 2013, 
> org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
> java.net.ConnectException: Connection refused
> Fri Sep 20 18:21:01 IST 2013, 
> org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
> org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
> in the failed servers list: HOST-YY.YY.YY.YY/YY.YY.YY.YY:61020
> Fri Sep 20 18:21:07 IST 2013, 
> org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
> java.net.ConnectException: Connection refused
> Fri Sep 20 18:21:09 IST 2013, 
> org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
> java.net.ConnectException: Connection refused
> Fri Sep 20 18:21:17 IST 2013, 
> org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, 
> java.net.ConnectException: Connection refused
> hbase(main):014:0> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to