oops... sorry. I missed fact that Andrew had already reponded. Pardon me (experimenting with new email client). St.Ack
On Thu, Jan 29, 2009 at 11:08 AM, stack <[email protected]> wrote: > Check your datanode logs. You might get a clue. Any xceiver issues > therein? You've upped your file descriptors limit? Tell us more about how > many instances and how many regions you have loaded. Make mention of your > schema too. Thanks Larry. > St.Ack > > > On Thu, Jan 29, 2009 at 10:18 AM, Larry Compton < > [email protected]> wrote: > >> After a lengthy, but successful data ingestion run, I was running some >> queries against my HBase table when one of my region servers ran out of >> memory and became unresponsive. I shut down the HBase servers via >> "stop-hbase.sh" and the one region server didn't terminate, so I killed it >> via "kill" and then restarted the servers. Ever since I did that, when I >> try >> to access my table, the request stalls, eventually fails, and a number of >> exceptions like the following appear in the log of one of region servers >> (oddly enough, not the same one every time)... >> >> 2009-01-29 13:07:50,439 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: >> java.io.IOException: Could not obtain block: blk_2439003473799601954_58348 >> file=/hbase/-ROOT-/70236052/info/mapfiles/2587717070724571438/data >> at >> >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708) >> at >> >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593) >> at java.io.DataInputStream.readInt(DataInputStream.java:370) >> at >> >> org.apache.hadoop.hbase.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1909) >> at >> >> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1939) >> at >> >> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844) >> at >> >> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890) >> at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525) >> at >> >> org.apache.hadoop.hbase.regionserver.HStore.rowAtOrBeforeFromMapFile(HStore.java:1714) >> at >> >> org.apache.hadoop.hbase.regionserver.HStore.getRowKeyAtOrBefore(HStore.java:1686) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion.getClosestRowBefore(HRegion.java:1088) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1548) >> at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895) >> >> I ran fsck on HDFS and it's healthy. I'm guessing that something needed to >> be flushed from the region server that I killed and now my table is in a >> corrupt state. I have a couple of questions: >> >> - Is there a way to recover from this problem or do I need to rerun my >> ingestion job? >> >> - When a region server runs out of memory, is there a better way to kill >> it >> other than the "kill" command? I've been reading the postings related to >> out >> of memory errors and plan to try some of the suggestions. However, if it >> does happen should I use one of the other scripts in the "bin" directory >> to >> do a graceful shutdown? >> >> Hadoop 0.19.0 >> HBase 0.19.0 >> >> Thanks >> Larry Compton >> > >
