[ 
https://issues.apache.org/jira/browse/HADOOP-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553409
 ] 

stack commented on HADOOP-2203:
-------------------------------

It was good to meetup last night Vuk. 

Your understanding above of whats going on is about right except the bit where 
you say the IOE causes the reduce to fail.  The reduce doesn't fail.  It just 
keeps on going but job never seems to complete.  It looks like the reduce keeps 
trickling in updates, just enough to get in the way of the TaskTracker killing 
the job.  

hbase has moved on since I pasted the above logs but last time I tried this 
job, a few weeks ago, so log output is different now but it still fails to 
complete; 2 of the 8 reduces finish up fairly promptly but the other 6 hang out 
for ever seemingly.

The problematic class is using TIF and TOF (Last night I was confused thinking 
it was doing its own updating and not using TIF/TOF at all).   The class came 
from a gentleman who was having other hbase issues.  Let me first get 
permission to post it and data.

> [hbase] Hung MR job writing hbase
> ---------------------------------
>
>                 Key: HADOOP-2203
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2203
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>
> Running a simple, fairly large job on the cluster, a few of the reduce steps 
> failed; all that did are effectively hung.  Looking at failures, I see this 
> strange output:
> {code}
> 2007-11-13 02:09:09,606 DEBUG org.apache.hadoop.hbase.HTable: reloading table 
> servers because: java.io.IOException: Region 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610 closed
>       at org.apache.hadoop.hbase.HRegion.startUpdate(HRegion.java:1142)
>       at 
> org.apache.hadoop.hbase.HRegionServer.startUpdate(HRegionServer.java:1239)
>       at 
> org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1113)
>       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-11-13 02:09:19,778 DEBUG org.apache.hadoop.hbase.HTable: reloading table 
> servers because: org.apache.hadoop.hbase.NotServingRegionException: 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1325)
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1297)
>       at 
> org.apache.hadoop.hbase.HRegionServer.startUpdate(HRegionServer.java:1238)
>       at 
> org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1113)
>       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-11-13 02:09:30,039 DEBUG org.apache.hadoop.hbase.HTable: reloading table 
> servers because: org.apache.hadoop.hbase.NotServingRegionException: 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1325)
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1297)
>       at 
> org.apache.hadoop.hbase.HRegionServer.startUpdate(HRegionServer.java:1238)
>       at 
> org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1113)
>       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-11-13 02:09:40,275 DEBUG org.apache.hadoop.hbase.HTable: reloading table 
> servers because: org.apache.hadoop.hbase.NotServingRegionException: 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1325)
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1297)
>       at 
> org.apache.hadoop.hbase.HRegionServer.startUpdate(HRegionServer.java:1238)
>       at 
> org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1113)
>       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-11-13 02:09:55,379 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> running child
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1325)
>       at 
> org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1297)
>       at 
> org.apache.hadoop.hbase.HRegionServer.startUpdate(HRegionServer.java:1238)
>       at 
> org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1113)
>       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>       at org.apache.hadoop.hbase.HTable.commit(HTable.java:734)
>       at org.apache.hadoop.hbase.HTable.commit(HTable.java:706)
>       at 
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:89)
>       at 
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:63)
>       at org.apache.hadoop.mapred.ReduceTask$2.collect(ReduceTask.java:308)
>       at TriplesTest$TestReducer.reduce(TriplesTest.java:66)
>       at TriplesTest$TestReducer.reduce(TriplesTest.java:51)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:326)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1935)
> {code}
> Why is the reducer client being pig-headed trying to go against a region that 
> has been closed?  Why not recalibrate?
> Here is master log from around same time on the problem region:
> {code}
> ...
> 2007-11-13 02:10:17,867 INFO org.apache.hadoop.hbase.HMaster: region 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610 split. 
> New regions are: 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919707661, 
> triples,http://purl.org/dc/terms/references,1194919707661
> ...
> 2007-11-13 02:11:47,313 DEBUG org.apache.hadoop.hbase.HMaster: 
> HMaster.metaScanner scanner: -3075797136896920808 regioninfo: {regionname: 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610, 
> startKey: <http://purl.org/dc/terms/bibliographicCitation>, offline: true, 
> split: true, tableDesc: {name: triples, families: {triples:={name: triples, 
> max versions: 3, compression: NONE, in memory: false, max length: 2147483647, 
> bloom filter: none}}}}, server: 208.76.44.140:60010, startCode: 
> -647590197693256616
> ...
> 2007-11-13 02:11:47,317 INFO org.apache.hadoop.hbase.HMaster: Deleting region 
> triples,http://purl.org/dc/terms/bibliographicCitation,1194919500610 because 
> daughter splits no longer hold references
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to