Re: can't start region server after crash
also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 it seems there are many bad blocks. is there any method to rescue good data? On Thu, Nov 20, 2014 at 10:52 AM, Li Li wrote: > I am running a single node pseudo hbase cluster on top of a pseudo hadoop. > hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase > version is 0.98.5 > Last night, I found the region server crashed (the process is gone) > I found many logs say > [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host > machine (eg GC): pause of approximately 2176ms > > GC pool 'ParNew' had collection(s): count=1 time=0ms > > Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to restart > it. > Then I can see many logs in region server like: > > wal.HLogSplitter: Creating writer > path=hdfs://192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp > region=5e7f8f9c63c12a70892f3a774e3186f4 > > The cpu usage is high and disk read/write speed is 20MB/s. So I let it > run and go home. > Today morning, I found the region server crash and found logs: > > hdfs.DFSClient: Failed to close file > /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp > > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp > could only be replicated to 0 nodes, instead of 1 > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) > > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) > > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > > at org.apache.hadoop.ipc.Client.call(Client.java:1113) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) > > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) > > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) > > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023
Re: can't start region server after crash
Have you tried using fsck ? Cheers On Wed, Nov 19, 2014 at 6:56 PM, Li Li wrote: > also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 > it seems there are many bad blocks. is there any method to rescue good > data? > > On Thu, Nov 20, 2014 at 10:52 AM, Li Li wrote: > > I am running a single node pseudo hbase cluster on top of a pseudo > hadoop. > > hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase > > version is 0.98.5 > > Last night, I found the region server crashed (the process is gone) > > I found many logs say > > [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host > > machine (eg GC): pause of approximately 2176ms > > > > GC pool 'ParNew' had collection(s): count=1 time=0ms > > > > Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to > restart it. > > Then I can see many logs in region server like: > > > > wal.HLogSplitter: Creating writer > > path=hdfs:// > 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp > > region=5e7f8f9c63c12a70892f3a774e3186f4 > > > > The cpu usage is high and disk read/write speed is 20MB/s. So I let it > > run and go home. > > Today morning, I found the region server crash and found logs: > > > > hdfs.DFSClient: Failed to close file > > > /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp > > > > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > > > /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp > > could only be replicated to 0 nodes, instead of 1 > > > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) > > > > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) > > > > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > at javax.security.auth.Subject.doAs(Subject.java:415) > > > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1113) > > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > > > > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) > > > > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > > > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > > > > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) > > > > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) > > > > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > > > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) > > > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) > > > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) > > > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023 >
Re: can't start region server after crash
I have tried and found many file's replication factor is 3(dfs.replication is 1 in hdfs.xml). So I try to set it to 1 now. there are so many files that it takes more than 30 minutes now and still not finished. I will try fsck later On Thu, Nov 20, 2014 at 11:25 AM, Ted Yu wrote: > Have you tried using fsck ? > > Cheers > > On Wed, Nov 19, 2014 at 6:56 PM, Li Li wrote: > >> also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 >> it seems there are many bad blocks. is there any method to rescue good >> data? >> >> On Thu, Nov 20, 2014 at 10:52 AM, Li Li wrote: >> > I am running a single node pseudo hbase cluster on top of a pseudo >> hadoop. >> > hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase >> > version is 0.98.5 >> > Last night, I found the region server crashed (the process is gone) >> > I found many logs say >> > [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host >> > machine (eg GC): pause of approximately 2176ms >> > >> > GC pool 'ParNew' had collection(s): count=1 time=0ms >> > >> > Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to >> restart it. >> > Then I can see many logs in region server like: >> > >> > wal.HLogSplitter: Creating writer >> > path=hdfs:// >> 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp >> > region=5e7f8f9c63c12a70892f3a774e3186f4 >> > >> > The cpu usage is high and disk read/write speed is 20MB/s. So I let it >> > run and go home. >> > Today morning, I found the region server crash and found logs: >> > >> > hdfs.DFSClient: Failed to close file >> > >> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp >> > >> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File >> > >> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp >> > could only be replicated to 0 nodes, instead of 1 >> > >> > at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) >> > >> > at >> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) >> > >> > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >> > >> > at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > >> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) >> > >> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) >> > >> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) >> > >> > at java.security.AccessController.doPrivileged(Native Method) >> > >> > at javax.security.auth.Subject.doAs(Subject.java:415) >> > >> > at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) >> > >> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) >> > >> > >> > at org.apache.hadoop.ipc.Client.call(Client.java:1113) >> > >> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) >> > >> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) >> > >> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> > >> > at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > >> > at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) >> > >> > at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) >> > >> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) >> > >> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> > >> > at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > >> > at >> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) >> > >> > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) >> > >> > at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) >> > >> > at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) >> > >> > at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) >> > >> > at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023 >>
Re: can't start region server after crash
hadoop fsck / Status: HEALTHY Total size:1382743735840 B Total dirs:1127 Total files: 476753 Total blocks (validated): 490085 (avg. block size 2821436 B) Minimally replicated blocks: 490085 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:1 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Thu Nov 20 13:57:44 CST 2014 in 9065 milliseconds On Thu, Nov 20, 2014 at 11:25 AM, Ted Yu wrote: > Have you tried using fsck ? > > Cheers > > On Wed, Nov 19, 2014 at 6:56 PM, Li Li wrote: > >> also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 >> it seems there are many bad blocks. is there any method to rescue good >> data? >> >> On Thu, Nov 20, 2014 at 10:52 AM, Li Li wrote: >> > I am running a single node pseudo hbase cluster on top of a pseudo >> hadoop. >> > hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase >> > version is 0.98.5 >> > Last night, I found the region server crashed (the process is gone) >> > I found many logs say >> > [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host >> > machine (eg GC): pause of approximately 2176ms >> > >> > GC pool 'ParNew' had collection(s): count=1 time=0ms >> > >> > Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to >> restart it. >> > Then I can see many logs in region server like: >> > >> > wal.HLogSplitter: Creating writer >> > path=hdfs:// >> 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp >> > region=5e7f8f9c63c12a70892f3a774e3186f4 >> > >> > The cpu usage is high and disk read/write speed is 20MB/s. So I let it >> > run and go home. >> > Today morning, I found the region server crash and found logs: >> > >> > hdfs.DFSClient: Failed to close file >> > >> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp >> > >> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File >> > >> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp >> > could only be replicated to 0 nodes, instead of 1 >> > >> > at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) >> > >> > at >> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) >> > >> > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >> > >> > at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > >> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) >> > >> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) >> > >> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) >> > >> > at java.security.AccessController.doPrivileged(Native Method) >> > >> > at javax.security.auth.Subject.doAs(Subject.java:415) >> > >> > at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) >> > >> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) >> > >> > >> > at org.apache.hadoop.ipc.Client.call(Client.java:1113) >> > >> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) >> > >> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) >> > >> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> > >> > at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > >> > at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) >> > >> > at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) >> > >> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source) >> > >> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> > >> > at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > >> > at >> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) >> > >> > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) >> > >> > at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) >> > >> > at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) >> > >> > at >> org.apache.hadoop.hdfs.DFSClien