oops... sorry.  I missed fact that Andrew had already reponded.  Pardon me
(experimenting with new email client).
St.Ack

On Thu, Jan 29, 2009 at 11:08 AM, stack <[email protected]> wrote:

> Check your datanode logs.   You might get a clue.  Any xceiver issues
> therein?  You've upped your file descriptors limit?  Tell us more about how
> many instances and how many regions you have loaded.  Make mention of your
> schema too.  Thanks Larry.
> St.Ack
>
>
> On Thu, Jan 29, 2009 at 10:18 AM, Larry Compton <
> [email protected]> wrote:
>
>> After a lengthy, but successful data ingestion run, I was running some
>> queries against my HBase table when one of my region servers ran out of
>> memory and became unresponsive. I shut down the HBase servers via
>> "stop-hbase.sh" and the one region server didn't terminate, so I killed it
>> via "kill" and then restarted the servers. Ever since I did that, when I
>> try
>> to access my table, the request stalls, eventually fails, and a number of
>> exceptions like the following appear in the log of one of region servers
>> (oddly enough, not the same one every time)...
>>
>> 2009-01-29 13:07:50,439 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
>> java.io.IOException: Could not obtain block: blk_2439003473799601954_58348
>> file=/hbase/-ROOT-/70236052/info/mapfiles/2587717070724571438/data
>>    at
>>
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
>>    at
>>
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
>>    at
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
>>    at
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
>>    at java.io.DataInputStream.readInt(DataInputStream.java:370)
>>    at
>>
>> org.apache.hadoop.hbase.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1909)
>>    at
>>
>> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1939)
>>    at
>>
>> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
>>    at
>>
>> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
>>    at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
>>    at
>>
>> org.apache.hadoop.hbase.regionserver.HStore.rowAtOrBeforeFromMapFile(HStore.java:1714)
>>    at
>>
>> org.apache.hadoop.hbase.regionserver.HStore.getRowKeyAtOrBefore(HStore.java:1686)
>>    at
>>
>> org.apache.hadoop.hbase.regionserver.HRegion.getClosestRowBefore(HRegion.java:1088)
>>    at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1548)
>>    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>>    at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>>    at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>>
>> I ran fsck on HDFS and it's healthy. I'm guessing that something needed to
>> be flushed from the region server that I killed and now my table is in a
>> corrupt state. I have a couple of questions:
>>
>> - Is there a way to recover from this problem or do I need to rerun my
>> ingestion job?
>>
>> - When a region server runs out of memory, is there a better way to kill
>> it
>> other than the "kill" command? I've been reading the postings related to
>> out
>> of memory errors and plan to try some of the suggestions. However, if it
>> does happen should I use one of the other scripts in the "bin" directory
>> to
>> do a graceful shutdown?
>>
>> Hadoop 0.19.0
>> HBase 0.19.0
>>
>> Thanks
>> Larry Compton
>>
>
>

Reply via email to