Re: can't start region server after crash

2014-11-19 Thread Li Li
also in hdfs ui, I found Number of Under-Replicated Blocks : 497741
it seems there are many bad blocks. is there any method to rescue good data?

On Thu, Nov 20, 2014 at 10:52 AM, Li Li  wrote:
> I am running a single node pseudo hbase cluster on top of a pseudo hadoop.
> hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase
> version is 0.98.5
> Last night, I found the region server crashed (the process is gone)
> I found many logs say
> [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
> machine (eg GC): pause of approximately 2176ms
>
> GC pool 'ParNew' had collection(s): count=1 time=0ms
>
> Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to restart 
> it.
> Then I can see many logs in region server like:
>
> wal.HLogSplitter: Creating writer
> path=hdfs://192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp
> region=5e7f8f9c63c12a70892f3a774e3186f4
>
> The cpu usage is high and disk read/write speed is 20MB/s. So I let it
> run and go home.
> Today morning, I found the region server crash and found logs:
>
> hdfs.DFSClient: Failed to close file
> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
>
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
> could only be replicated to 0 nodes, instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
>
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
>
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
>
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1113)
>
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>
> at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
>
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
>
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
>
> at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
>
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
>
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
>
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
>
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
>
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023


Re: can't start region server after crash

2014-11-19 Thread Ted Yu
Have you tried using fsck ?

Cheers

On Wed, Nov 19, 2014 at 6:56 PM, Li Li  wrote:

> also in hdfs ui, I found Number of Under-Replicated Blocks : 497741
> it seems there are many bad blocks. is there any method to rescue good
> data?
>
> On Thu, Nov 20, 2014 at 10:52 AM, Li Li  wrote:
> > I am running a single node pseudo hbase cluster on top of a pseudo
> hadoop.
> > hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase
> > version is 0.98.5
> > Last night, I found the region server crashed (the process is gone)
> > I found many logs say
> > [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
> > machine (eg GC): pause of approximately 2176ms
> >
> > GC pool 'ParNew' had collection(s): count=1 time=0ms
> >
> > Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to
> restart it.
> > Then I can see many logs in region server like:
> >
> > wal.HLogSplitter: Creating writer
> > path=hdfs://
> 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp
> > region=5e7f8f9c63c12a70892f3a774e3186f4
> >
> > The cpu usage is high and disk read/write speed is 20MB/s. So I let it
> > run and go home.
> > Today morning, I found the region server crash and found logs:
> >
> > hdfs.DFSClient: Failed to close file
> >
> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
> >
> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> >
> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
> > could only be replicated to 0 nodes, instead of 1
> >
> > at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
> >
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
> >
> > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> >
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> > at java.lang.reflect.Method.invoke(Method.java:606)
> >
> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
> >
> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
> >
> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
> >
> > at java.security.AccessController.doPrivileged(Native Method)
> >
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> >
> > at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> >
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
> >
> >
> > at org.apache.hadoop.ipc.Client.call(Client.java:1113)
> >
> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> >
> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
> >
> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> >
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> > at java.lang.reflect.Method.invoke(Method.java:606)
> >
> > at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
> >
> > at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
> >
> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
> >
> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> >
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> > at java.lang.reflect.Method.invoke(Method.java:606)
> >
> > at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
> >
> > at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> >
> > at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
> >
> > at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
> >
> > at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
> >
> > at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023
>


Re: can't start region server after crash

2014-11-19 Thread Li Li
I have tried and found many file's replication factor is
3(dfs.replication is 1 in hdfs.xml). So I try to set it to 1 now.
there are so many files that it takes more than 30 minutes now and
still not finished.
I will try fsck later

On Thu, Nov 20, 2014 at 11:25 AM, Ted Yu  wrote:
> Have you tried using fsck ?
>
> Cheers
>
> On Wed, Nov 19, 2014 at 6:56 PM, Li Li  wrote:
>
>> also in hdfs ui, I found Number of Under-Replicated Blocks : 497741
>> it seems there are many bad blocks. is there any method to rescue good
>> data?
>>
>> On Thu, Nov 20, 2014 at 10:52 AM, Li Li  wrote:
>> > I am running a single node pseudo hbase cluster on top of a pseudo
>> hadoop.
>> > hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase
>> > version is 0.98.5
>> > Last night, I found the region server crashed (the process is gone)
>> > I found many logs say
>> > [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
>> > machine (eg GC): pause of approximately 2176ms
>> >
>> > GC pool 'ParNew' had collection(s): count=1 time=0ms
>> >
>> > Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to
>> restart it.
>> > Then I can see many logs in region server like:
>> >
>> > wal.HLogSplitter: Creating writer
>> > path=hdfs://
>> 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp
>> > region=5e7f8f9c63c12a70892f3a774e3186f4
>> >
>> > The cpu usage is high and disk read/write speed is 20MB/s. So I let it
>> > run and go home.
>> > Today morning, I found the region server crash and found logs:
>> >
>> > hdfs.DFSClient: Failed to close file
>> >
>> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
>> >
>> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>> >
>> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
>> > could only be replicated to 0 nodes, instead of 1
>> >
>> > at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
>> >
>> > at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
>> >
>> > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>> >
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >
>> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
>> >
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
>> >
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
>> >
>> > at java.security.AccessController.doPrivileged(Native Method)
>> >
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> >
>> > at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>> >
>> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
>> >
>> >
>> > at org.apache.hadoop.ipc.Client.call(Client.java:1113)
>> >
>> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>> >
>> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
>> >
>> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>> >
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >
>> > at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
>> >
>> > at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
>> >
>> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
>> >
>> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>> >
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >
>> > at
>> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
>> >
>> > at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>> >
>> > at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
>> >
>> > at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
>> >
>> > at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
>> >
>> > at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023
>>


Re: can't start region server after crash

2014-11-19 Thread Li Li
hadoop fsck /
 Status: HEALTHY

 Total size:1382743735840 B

 Total dirs:1127

 Total files:   476753

 Total blocks (validated):  490085 (avg. block size 2821436 B)

 Minimally replicated blocks:   490085 (100.0 %)

 Over-replicated blocks:0 (0.0 %)

 Under-replicated blocks:   0 (0.0 %)

 Mis-replicated blocks: 0 (0.0 %)

 Default replication factor:1

 Average block replication: 1.0

 Corrupt blocks:0

 Missing replicas:  0 (0.0 %)

 Number of data-nodes:  1

 Number of racks:   1

FSCK ended at Thu Nov 20 13:57:44 CST 2014 in 9065 milliseconds

On Thu, Nov 20, 2014 at 11:25 AM, Ted Yu  wrote:
> Have you tried using fsck ?
>
> Cheers
>
> On Wed, Nov 19, 2014 at 6:56 PM, Li Li  wrote:
>
>> also in hdfs ui, I found Number of Under-Replicated Blocks : 497741
>> it seems there are many bad blocks. is there any method to rescue good
>> data?
>>
>> On Thu, Nov 20, 2014 at 10:52 AM, Li Li  wrote:
>> > I am running a single node pseudo hbase cluster on top of a pseudo
>> hadoop.
>> > hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase
>> > version is 0.98.5
>> > Last night, I found the region server crashed (the process is gone)
>> > I found many logs say
>> > [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
>> > machine (eg GC): pause of approximately 2176ms
>> >
>> > GC pool 'ParNew' had collection(s): count=1 time=0ms
>> >
>> > Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to
>> restart it.
>> > Then I can see many logs in region server like:
>> >
>> > wal.HLogSplitter: Creating writer
>> > path=hdfs://
>> 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp
>> > region=5e7f8f9c63c12a70892f3a774e3186f4
>> >
>> > The cpu usage is high and disk read/write speed is 20MB/s. So I let it
>> > run and go home.
>> > Today morning, I found the region server crash and found logs:
>> >
>> > hdfs.DFSClient: Failed to close file
>> >
>> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
>> >
>> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>> >
>> /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp
>> > could only be replicated to 0 nodes, instead of 1
>> >
>> > at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
>> >
>> > at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
>> >
>> > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>> >
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >
>> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
>> >
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
>> >
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
>> >
>> > at java.security.AccessController.doPrivileged(Native Method)
>> >
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> >
>> > at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>> >
>> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
>> >
>> >
>> > at org.apache.hadoop.ipc.Client.call(Client.java:1113)
>> >
>> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>> >
>> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
>> >
>> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>> >
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >
>> > at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
>> >
>> > at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
>> >
>> > at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
>> >
>> > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>> >
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >
>> > at
>> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
>> >
>> > at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>> >
>> > at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
>> >
>> > at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
>> >
>> > at
>> org.apache.hadoop.hdfs.DFSClien