[ 
https://issues.apache.org/jira/browse/HBASE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-1045:
--------------------------------------

    Attachment: hbase-1045-1.patch

Patch adds the retry of only 1 row when IOE or NSRE and makes the 
getRegionServerForWithoutRetries eat the IOE instead of throwing it like it did 
for Andrew.

Here is what is now happening when a region splits during PE with some custom 
debug:

{code}
...
08/12/20 13:42:39 DEBUG client.HConnectionManager$TableServers: Commit 10478 
rows starting at 0000513422
08/12/20 13:42:47 DEBUG client.HConnectionManager$TableServers: Index returned 
is 1946
08/12/20 13:42:47 DEBUG client.HConnectionManager$TableServers: Reloading table 
servers because region server didn't accept updates; tries=0 of max=10, 
waiting=2000ms
08/12/20 13:42:49 DEBUG client.HConnectionManager$TableServers: Removed 
TestTable,,1229798498529 from cache because of 0000515368
08/12/20 13:42:49 DEBUG client.HConnectionManager$TableServers: Commit 1 rows 
starting at 0000515368
08/12/20 13:42:49 DEBUG client.HConnectionManager$TableServers: Index returned 
is 0
08/12/20 13:42:49 DEBUG client.HConnectionManager$TableServers: Reloading table 
servers because region server didn't accept updates; tries=1 of max=10, 
waiting=2000ms
08/12/20 13:42:51 DEBUG client.HConnectionManager$TableServers: Removed 
TestTable,,1229798498529 from cache because of 0000515368
08/12/20 13:42:51 DEBUG client.HConnectionManager$TableServers: Attempt 0 of 10 
failed with <org.apache.hadoop.hbase.client.NoServerForRegionException: No 
server address listed in .META. for region TestTable,0000085888,1229798566468>. 
Retrying after sleep of 2000
08/12/20 13:42:51 DEBUG client.HConnectionManager$TableServers: Removed 
.META.,,1 from cache because of TestTable,0000515368,99999999999999
08/12/20 13:42:51 DEBUG client.HConnectionManager$TableServers: Found ROOT 
REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 
70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', 
FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', 
VERSIONS => '10', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', 
BLOCKCACHE => 'true'}], INDEXES => []}}
08/12/20 13:42:53 DEBUG client.HConnectionManager$TableServers: Commit 1 rows 
starting at 0000515368
08/12/20 13:42:53 DEBUG client.HConnectionManager$TableServers: Index returned 
is -1
08/12/20 13:42:53 DEBUG client.HConnectionManager$TableServers: Commit 8531 
rows starting at 0000515369
08/12/20 13:42:53 DEBUG client.HConnectionManager$TableServers: Index returned 
is -1
...
{code}

> Hangup by regionserver causes write to fail
> -------------------------------------------
>
>                 Key: HBASE-1045
>                 URL: https://issues.apache.org/jira/browse/HBASE-1045
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Andrew Purtell
>             Fix For: 0.19.0
>
>         Attachments: hbase-1045-1.patch
>
>
> Root cause is OOME on the region server. Nonetheless a hangup during IPC 
> causes the client to fail the write, currently causing data loss. Should the 
> application catch and retry? Or should the client libraries try harder?
> Dec 4, 2008 5:25:30 PM com.powerset.heritrix.writer.HBaseWriterProcessor 
> innerProcessResult
> SEVERE: Failed write of Record: http://www.publicrecordslocal.com/georgia.htm 
> (in thread 'ToeThread #9: http://www.publicrecordslocal.com/georgia.htm'; in 
> processor 'Archiver')
> java.io.IOException: java.io.IOException: Call to /10.30.94.38:60020 failed 
> on local exception: Connection refused
>       at com.powerset.heritrix.writer.HBaseWriter.write(Unknown Source)
>       at com.powerset.heritrix.writer.HBaseWriterProcessor.write(Unknown 
> Source)
>       at 
> com.powerset.heritrix.writer.HBaseWriterProcessor.innerProcessResult(Unknown 
> Source)
>       at org.archive.modules.Processor.process(Processor.java:123)
>       at 
> org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:310)
>       at org.archive.crawler.framework.ToeThread.run(ToeThread.java:157)
> Caused by: java.io.IOException: Call to /10.30.94.38:60020 failed on local 
> exception: Connection refused
>       at org.apache.hadoop.ipc.Client.call(Client.java:699)
>       at 
> org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:323)
>       at $Proxy12.batchUpdates(Unknown Source)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:919)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:917)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerForWithoutRetries(HConnectionManager.java:875)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:916)
>       at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1267)
>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1238)
>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1218)
>       at net.iridiant.content.Content.storeURLInfo(Unknown Source)
>       ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>       at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
>       at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
>       at org.apache.hadoop.ipc.Client.call(Client.java:685)
>       ... 16 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to