[ 
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510798#comment-13510798
 ] 

Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------

ok, I got repro... will attach patch after cleanup of debug logging/etc. 
I'd prefer to have TS in meta but this is a simpler fix for now.
The logging with patch looks like this:
{code}
2012-12-05 12:06:08,285 DEBUG [Thread-521] util.ChaosMonkey$Action(203): 
Removing 13 regions from 10.10.11.17,53406,1354737903944
...
2012-12-05 12:06:08,765 INFO  [am-zkevent-worker-pool-2-thread-2] 
master.RegionStates(249): Region {NAME => 
'IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.',
 STARTKEY => '7ffffff8', ENDKEY => '8cccccc4', 
ENCODED => 89483778064d05b1f2e1c0d20bcabc16,} transitioned from 
{IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
 state=PENDING_OPEN, ts=1354737968742, server=10.10.11.17,53407,1354737903960} 
to 
{IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
 state=OPENING, ts=1354737968765, server=10.10.11.17,53407,1354737903960}
...
2012-12-05 12:06:10,549 INFO  [Thread-521] util.ChaosMonkey$Action(179): 
Killing region server:10.10.11.17,53407,1354737903960
...
2012-12-05 12:06:39,233 INFO  [am-zkevent-worker-pool-2-thread-2] 
master.RegionStates(249): Region {NAME => 
'IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.',
 STARTKEY => '7ffffff8', ENDKEY => '8cccccc4', 
ENCODED => 89483778064d05b1f2e1c0d20bcabc16,} transitioned from 
{IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
 state=OPENING, ts=1354737999228, server=10.10.11.17,53404,1354737903902} to 
{IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
 state=OPEN, ts=1354737999232, server=10.10.11.17,53404,1354737903902}
...
2012-12-05 12:06:40,276 INFO  [HBaseWriterThread_4] 
client.HConnectionManager$HConnectionImplementation(1776): Received an error 
from 10.10.11.17:53407 for region 
IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.;
 not removing 10.10.11.17:53404 from cache.
...
2012-12-05 12:06:40,381 INFO  [HBaseWriterThread_15] 
client.HConnectionManager$HConnectionImplementation(1809): Region 
IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
 moved to 10.10.11.17:53407 according to 10.10.11.17:53406
2012-12-05 12:06:40,381 DEBUG [HBaseWriterThread_15] 
client.HConnectionManager$HConnectionImplementation(1342): Ignoring stale 
location update for 
IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.:
 10.10.11.17:53407 at 1354737968725; local 10.10.11.17:53404 at 1354738000265
{code}
                
> correct local region location cache information can be overwritten w/stale 
> information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R 
> moved from C to B", even though such transition never happened (neither in 
> nor before the sequence described below). Not quite sure how the client 
> learned of the transition to C, I assume it's from meta from some other 
> thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, 
> which I am investigating... but the bogus cache update is there 
> nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet 
> unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to