[
https://issues.apache.org/jira/browse/HADOOP-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557687#action_12557687
]
stack commented on HADOOP-2558:
-------------------------------
TestHLog did a vintage hang-at-end-of-successful-test for ten hours last night:
{code}
2008-01-10 07:59:51,981 INFO [main] hbase.HRegionServer$ShutdownThread(151):
Starting shutdown thread.
[junit] 2008-01-10 07:59:51,981 INFO [main]
hbase.HRegionServer$ShutdownThread(156): Shutdown thread complete
[junit] Running org.apache.hadoop.hbase.TestHLog
[junit] Starting DataNode 0 with dfs.data.dir:
/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2
[junit] Starting DataNode 1 with dfs.data.dir:
/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4
[junit] 2008-01-10 07:59:55,430 INFO [main] hbase.HLog(313): new log
writer created at /hbase/hlog.dat.000
[junit] 2008-01-10 07:59:55,608 WARN [IPC Server handler 1 on 37583]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] 2008-01-10 07:59:55,658 DEBUG [main] hbase.HLog(301): Closing
current log writer /hbase/hlog.dat.000 to get a new one
[junit] 2008-01-10 07:59:55,662 INFO [main] hbase.HLog(313): new log
writer created at /hbase/hlog.dat.001
[junit] 2008-01-10 07:59:55,665 DEBUG [main] hbase.HLog(346): Found 0 logs
to remove using oldest outstanding seqnum of 0 from region 0
[junit] 2008-01-10 07:59:55,670 WARN [IPC Server handler 7 on 37583]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] 2008-01-10 07:59:55,678 DEBUG [main] hbase.HLog(301): Closing
current log writer /hbase/hlog.dat.001 to get a new one
[junit] 2008-01-10 07:59:55,683 INFO [main] hbase.HLog(313): new log
writer created at /hbase/hlog.dat.002
[junit] 2008-01-10 07:59:55,684 DEBUG [main] hbase.HLog(346): Found 0 logs
to remove using oldest outstanding seqnum of 0 from region 0
[junit] 2008-01-10 07:59:55,689 WARN [IPC Server handler 2 on 37583]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] 2008-01-10 07:59:55,694 DEBUG [main] hbase.HLog(301): Closing
current log writer /hbase/hlog.dat.002 to get a new one
[junit] 2008-01-10 07:59:55,699 INFO [main] hbase.HLog(313): new log
writer created at /hbase/hlog.dat.003
[junit] 2008-01-10 07:59:55,700 DEBUG [main] hbase.HLog(346): Found 0 logs
to remove using oldest outstanding seqnum of 0 from region 0
[junit] 2008-01-10 07:59:55,707 INFO [main] hbase.HLog(148): splitting 4
log(s) in /hbase
[junit] 2008-01-10 07:59:55,708 DEBUG [main] hbase.HLog(155): Splitting 0
of 4: hdfs://localhost:37583/hbase/hlog.dat.000
[junit] 2008-01-10 07:59:55,750 DEBUG [main] hbase.HLog(177): Creating new
log file writer for path
test.build.data/testSplit/hregion_14095470/oldlogfile.log; map content {}
[junit] 2008-01-10 07:59:55,757 DEBUG [main] hbase.HLog(177): Creating new
log file writer for path
test.build.data/testSplit/hregion_1701666436/oldlogfile.log; map content [EMAIL
PROTECTED]
[junit] 2008-01-10 07:59:55,762 DEBUG [main] hbase.HLog(177): Creating new
log file writer for path
test.build.data/testSplit/hregion_1249881816/oldlogfile.log; map content [EMAIL
PROTECTED], [EMAIL PROTECTED]
[junit] 2008-01-10 07:59:55,767 DEBUG [main] hbase.HLog(192): Applied 9
total edits
[junit] 2008-01-10 07:59:55,768 DEBUG [main] hbase.HLog(155): Splitting 1
of 4: hdfs://localhost:37583/hbase/hlog.dat.001
[junit] 2008-01-10 07:59:55,771 DEBUG [main] hbase.HLog(192): Applied 9
total edits
[junit] 2008-01-10 07:59:55,772 DEBUG [main] hbase.HLog(155): Splitting 2
of 4: hdfs://localhost:37583/hbase/hlog.dat.002
[junit] 2008-01-10 07:59:55,791 DEBUG [main] hbase.HLog(192): Applied 9
total edits
[junit] 2008-01-10 07:59:55,792 DEBUG [main] hbase.HLog(155): Splitting 3
of 4: hdfs://localhost:37583/hbase/hlog.dat.003
[junit] 2008-01-10 07:59:55,793 INFO [main] hbase.HLog(160): Skipping
hdfs://localhost:37583/hbase/hlog.dat.003 because zero length
[junit] 2008-01-10 07:59:55,794 WARN [IPC Server handler 3 on 37583]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] 2008-01-10 07:59:55,802 WARN [IPC Server handler 5 on 37583]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] 2008-01-10 07:59:55,811 WARN [IPC Server handler 8 on 37583]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] 2008-01-10 07:59:55,819 INFO [main] hbase.HLog(212): log file
splitting completed for /hbase
[junit] 2008-01-10 07:59:55,821 INFO [main]
hbase.StaticTestEnvironment(135): Shutting down FileSystem
[junit] 2008-01-10 07:59:55,822 WARN [IPC Server handler 4 on 37583]
dfs.FSNamesystem(1689): DIR* NameSystem.internalReleaseCreate: attempt to
release a create lock on /hbase/hlog.dat.003 file does not exist.
[junit] 2008-01-10 07:59:56,106 INFO [main]
hbase.StaticTestEnvironment(142): Shutting down Mini DFS
[junit] Shutting down the Mini HDFS Cluster
[junit] Shutting down DataNode 1
[junit] 2008-01-10 07:59:56,419 WARN [DataNode:
[/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4]]
dfs.DataNode(658): java.io.InterruptedIOException
[junit] at java.io.FileInputStream.readBytes(Native Method)
[junit] at java.io.FileInputStream.read(FileInputStream.java:194)
[junit] at
java.lang.UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227)
[junit] at
java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
[junit] at
java.io.BufferedInputStream.read(BufferedInputStream.java:313)
[junit] at
sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
[junit] at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
[junit] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
[junit] at java.io.InputStreamReader.read(InputStreamReader.java:167)
[junit] at java.io.BufferedReader.fill(BufferedReader.java:136)
[junit] at java.io.BufferedReader.readLine(BufferedReader.java:299)
[junit] at java.io.BufferedReader.readLine(BufferedReader.java:362)
[junit] at org.apache.hadoop.fs.DU.parseExecResult(DU.java:73)
[junit] at org.apache.hadoop.util.Shell.runCommand(Shell.java:145)
[junit] Shutting down DataNode 0
[junit] at org.apache.hadoop.util.Shell.run(Shell.java:100)
[junit] at org.apache.hadoop.fs.DU.getUsed(DU.java:53)
[junit] at
org.apache.hadoop.dfs.FSDataset$FSVolume.getDfsUsed(FSDataset.java:299)
[junit] at
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getDfsUsed(FSDataset.java:396)
[junit] at
org.apache.hadoop.dfs.FSDataset.getDfsUsed(FSDataset.java:516)
[junit] at
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:562)
[junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1736)
[junit] at java.lang.Thread.run(Thread.java:595)
[junit] 2008-01-10 07:59:56,419 WARN [EMAIL PROTECTED]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] 2008-01-10 07:59:56,422 WARN [Thread-44] util.Shell$1(137): Error
reading the error stream
[junit] java.io.InterruptedIOException
[junit] at java.io.FileInputStream.readBytes(Native Method)
[junit] at java.io.FileInputStream.read(FileInputStream.java:194)
[junit] at
java.lang.UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227)
[junit] at
sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
[junit] at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
[junit] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
[junit] at java.io.InputStreamReader.read(InputStreamReader.java:167)
[junit] at java.io.BufferedReader.fill(BufferedReader.java:136)
[junit] at java.io.BufferedReader.readLine(BufferedReader.java:299)
[junit] at java.io.BufferedReader.readLine(BufferedReader.java:362)
[junit] at org.apache.hadoop.util.Shell$1.run(Shell.java:130)
[junit] Starting DataNode 0 with dfs.data.dir:
/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2
[junit] Starting DataNode 1 with dfs.data.dir:
/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4
[junit] 2008-01-10 07:59:58,650 INFO [main] hbase.HLog(313): new log
writer created at /hbase/hlog.dat.000
[junit] 2008-01-10 07:59:58,652 DEBUG [main] hbase.HLog(399): closing log
writer in /hbase
[junit] 2008-01-10 07:59:58,654 WARN [IPC Server handler 3 on 37603]
dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in
need of 1
[junit] tablename/regionname/row/0 (0/1199951998651/0)
[junit] tablename/regionname/row/1 (1/1199951998651/1)
[junit] tablename/regionname/row/2 (2/1199951998651/2)
[junit] tablename/regionname/row/3 (3/1199951998651/3)
[junit] tablename/regionname/row/4 (4/1199951998651/4)
[junit] tablename/regionname/row/5 (5/1199951998651/5)
[junit] tablename/regionname/row/6 (6/1199951998651/6)
[junit] tablename/regionname/row/7 (7/1199951998651/7)
[junit] tablename/regionname/row/8 (8/1199951998651/8)
[junit] tablename/regionname/row/9 (9/1199951998651/9)
[junit] tablename/regionname/METAROW/10
(METACOLUMN:/1199951998652/HBASE::CACHEFLUSH)
[junit] 2008-01-10 07:59:58,667 INFO [main]
hbase.StaticTestEnvironment(135): Shutting down FileSystem
[junit] 2008-01-10 07:59:59,646 INFO [main]
hbase.StaticTestEnvironment(142): Shutting down Mini DFS
[junit] Shutting down the Mini HDFS Cluster
[junit] Shutting down DataNode 1
[junit] Shutting down DataNode 0
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 7.84 sec
[junit] Running org.apache.hadoop.hbase.TestHMemcache
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.018 sec
[junit] Running org.apache.hadoop.hbase.TestHRegion
[junit] Starting DataNode 0 with dfs.data.dir:
/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2
[junit] Starting DataNode 1 with dfs.data.dir:
/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4
[junit] 2008-01-10 16:21:52,123 INFO [main] hbase.HLog(313): new log
writer created at /hbase/log/hlog.dat.000
{code}
I just did a kill -9 (kill and -QUIT had no effect).
> [hbase] fixes for build up on hudson
> ------------------------------------
>
> Key: HADOOP-2558
> URL: https://issues.apache.org/jira/browse/HADOOP-2558
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Attachments: 2558-v2.patch, 2558-v3.patch, 2558.patch
>
>
> Fixes for hbase breakage up on hudson. There seem to be many reasons for the
> failings.
> One is that the .META. region of a sudden decides its 'no good' and it gets
> deployed elsewhere. Tests don't have the tolerance for this kinda churn. A
> previous commit adding in logging of why .META. is 'no good'. Hopefully that
> will help.
> Found also a case where TestTableMapReduce would fail because no sleep
> between retries when getting new scanners.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.