Re: What can corrupt HBase table and what is Cannot find row in .META. for table?
Hi, I'm using Java API. I see mentioned exception in java log. I'll provide full stacktrace next time. 2014-11-19 1:01 GMT+03:00 Ted Yu yuzhih...@gmail.com: The thread you mentioned was more about thrift API rather than TableNotFoundException. Can you show us the stack trace of TableNotFoundException (vicinity of app log around the exception) ? Please also check master log / meta region server log. I assume you could access the table using hbase shell. Cheers On Tue, Nov 18, 2014 at 12:57 PM, Serega Sheypak serega.shey...@gmail.com wrote: hi, sometimes I do get im my web application log: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: my_favorite_table I've found this http://grokbase.com/t/hbase/user/143bn79wf2/cannot-find-row-in-meta-for-table I did run hbase hbck result: 0 inconsistencies detected. Status: OK What can I try next? I'm using Cloudera CDH 5.2 HBase 0.98 Thanks!
hbase: secure login and connection management
Hi, I am trying to login to secure cluster with keytabs using below methods. It works fine if the token is not expired. My process runs for long time ( web app from tomcat). Keep getting below exceptions after the token expire time and connection fails if the user tries to view data from web page. What is the better way of handling connections? How to refresh keys automatically?. Is there a spring implementation for managing connections? If yes, can you share sample code. UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab(hbase.myclient.principal, hbase.myclient.keytab); 2014-11-13 08:25:49,899 ERROR [org.apache.hadoop.security.UserGroupInformation] PriviledgedActionException as u...@mycompany.com (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2014-11-13 08:25:49,900 WARN [org.apache.hadoop.ipc.RpcClient] Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] javax.security.sasl.SaslException: GSS initiate failed Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) Thanks, Chandra
Re: hbase: secure login and connection management
Take a look at the patch added to https://issues.apache.org/jira/browse/HBASE-12366 There will be a new AuthUtil. launchAuthChore() which should help in your case. (The doc patch is here https://issues.apache.org/jira/browse/HBASE-12528) Matteo On Wed, Nov 19, 2014 at 11:19 AM, Bogala, Chandra Reddy chandra.bog...@gs.com wrote: Hi, I am trying to login to secure cluster with keytabs using below methods. It works fine if the token is not expired. My process runs for long time ( web app from tomcat). Keep getting below exceptions after the token expire time and connection fails if the user tries to view data from web page. What is the better way of handling connections? How to refresh keys automatically?. Is there a spring implementation for managing connections? If yes, can you share sample code. UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab(hbase.myclient.principal, hbase.myclient.keytab); 2014-11-13 08:25:49,899 ERROR [org.apache.hadoop.security.UserGroupInformation] PriviledgedActionException as u...@mycompany.com (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2014-11-13 08:25:49,900 WARN [org.apache.hadoop.ipc.RpcClient] Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] javax.security.sasl.SaslException: GSS initiate failed Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) Thanks, Chandra
RPC Timeout - DoNotRetryIOException
Hello: I also have encountered the exception? do you have some solution? please tell me. tks. xuge...@longshine.com
[ANNOUNCE] HBase 0.98.8 is now available for download
Apache HBase 0.98.8 is now available for download. Get it from an Apache mirror [1] or Maven repository. The list of changes in this release can be found in the release notes [2] or following this announcement. This release contains a fix for a security issue, please see HBASE-12536 [3] for more detail. Thanks to all who contributed to this release. Best, The HBase Dev Team 1. http://www.apache.org/dyn/closer.cgi/hbase/ 2. http://s.apache.org/44a 3. https://issues.apache.org/jira/browse/HBASE-12536 -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: HBase concurrent.RejectedExecutionException
Hi Arul, It's a pure client exception: it means that the client has not even tried to send the query to the server, it failed before. Why the client failed is another question. I see that the pool size is 7, have you changed the default configuration? Cheers, Nicolas On Tue, Nov 18, 2014 at 7:29 AM, Arul Ramachandran arkup...@gmail.com wrote: Hi Ted, I don't have the load metrics at the moment... are you suggesting this could be load related? Thanks On Mon, Nov 17, 2014 at 5:34 PM, Ted Yu yuzhih...@gmail.com wrote: What was the load on rs3.world.com,60020,1414690096750 around 09:49:30 ? Cheers On Mon, Nov 17, 2014 at 4:59 PM, Arul Ramachandran arkup...@gmail.com wrote: Hi, Our Hbase application gets the following exception - HBase 0.96.1.2.0.6.1-101-hadoop2. I looked at the region server log and nothing unusual is happening. Any pointers on what else I can check? Thanks! 2014-11-16 09:49:37,686 WARN [hbase-connection-shared-executor-pool644-t158] AsyncProcess.sendMultiAction(AsyncProcess.java:511) - The task was rejected by the pool. This is unexpected. Server is rs3.world.com ,60020,1414690096750 java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@f48544d rejected from java.util.concurrent.ThreadPoolExecutor@10916cfd[Shutting down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 932] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110) at org.apache.hadoop.hbase.client.AsyncProcess.sendMultiAction(AsyncProcess.java:506) at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:461) at org.apache.hadoop.hbase.client.AsyncProcess.receiveMultiAction(AsyncProcess.java:700) at org.apache.hadoop.hbase.client.AsyncProcess.access$300(AsyncProcess.java:89) at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:498) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 2014-11-16 09:49:37,686 INFO [qtp1232775351-168005] AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:743) - : Waiting for the global number of running tasks to be equals or less than 0, tasksSent=3, tasksDone=2, currentTasksDone=2, tableName=group_sku_mapping 2014-11-16 09:49:37,686 INFO [qtp1232775351-168006] AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:743) - : Waiting for the global number of running tasks to be equals or less than 0, tasksSent=3, tasksDone=2, currentTasksDone=2, tableName=group_sku_mapping 2014-11-16 09:49:37,686 WARN [hbase-connection-shared-executor-pool644-t159] AsyncProcess.sendMultiAction(AsyncProcess.java:511) - The task was rejected by the pool. This is unexpected. Server is rs3.world.com ,60020,1414690096750
scan column qualifiers in column family
Hi i need to find whether particular column qualifier present in column family so i did code like this As per document public boolean containsColumn(byte[] family, byte[] qualifier) Checks for existence of a value for the specified column (empty or not). Parameters:family - family namequalifier - column qualifierReturns:true if at least one value exists in the result, false if not // //my code public static boolean search_column(String mail) throws IOException { HTable testTable = new HTable(frinds_util.get_config(), people);//configuration byte[] email_b=Bytes.toBytes(mail);//column qulifier byte[] colmnfamily=Bytes.toBytes(colmn_fam);//column family Scan scan_col=new Scan (Bytes.toBytes(colmn_fam),email_b); ResultScanner results = testTable.getScanner(scan_col); Result result = results.next(); if(result.containsColumn(colmnfamily, email_b))//check whether column presernt { System.out.println(column is present); ret=true; } return ret; } my build is failed with below o/p java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293) at java.lang.Thread.run(Thread.java:724) Caused by: java.lang.NullPointerException at org.freinds_rep.java.Insert_friend.search_column(Insert_friend.java:106) at org.freinds_rep.java.Insert_friend.main(Insert_friend.java:156) ... 6 more [WARNING] thread Thread[org.freinds_rep.java.Insert_friend.main( 127.0.0.1:2181),5,org.freinds_rep.java.Insert_friend] was interrupted but is still alive after waiting at least 15000msecs [WARNING] thread Thread[org.freinds_rep.java.Insert_friend.main( 127.0.0.1:2181),5,org.freinds_rep.java.Insert_friend] will linger despite being asked to die via interruption [WARNING] NOTE: 1 thread(s) did not finish despite being asked to via interruption. This is not a problem with exec:java, it is a problem with the running code. Although not serious, it should be remedied. [WARNING] Couldn't destroy threadgroup org.codehaus.mojo.exec.ExecJavaMojo$IsolatedThreadGroup[name=org.freinds_rep.java.Insert_friend,maxpri=10] java.lang.IllegalThreadStateException at java.lang.ThreadGroup.destroy(ThreadGroup.java:775) at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:328) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1:23.294s [INFO] Finished at: Wed Nov 19 09:08:48 PST 2014 [INFO] Final Memory: 10M/137M Any idea how to solve this?
Re: scan column qualifiers in column family
bq. org.freinds_rep.java.Insert_friend.search_column(Insert_friend.java:106) Does line 106 correspond to result.containsColumn() call ? If so, result was null. On Wed, Nov 19, 2014 at 9:47 AM, beeshma r beeshm...@gmail.com wrote: Hi i need to find whether particular column qualifier present in column family so i did code like this As per document public boolean containsColumn(byte[] family, byte[] qualifier) Checks for existence of a value for the specified column (empty or not). Parameters:family - family namequalifier - column qualifierReturns:true if at least one value exists in the result, false if not // //my code public static boolean search_column(String mail) throws IOException { HTable testTable = new HTable(frinds_util.get_config(), people);//configuration byte[] email_b=Bytes.toBytes(mail);//column qulifier byte[] colmnfamily=Bytes.toBytes(colmn_fam);//column family Scan scan_col=new Scan (Bytes.toBytes(colmn_fam),email_b); ResultScanner results = testTable.getScanner(scan_col); Result result = results.next(); if(result.containsColumn(colmnfamily, email_b))//check whether column presernt { System.out.println(column is present); ret=true; } return ret; } my build is failed with below o/p java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293) at java.lang.Thread.run(Thread.java:724) Caused by: java.lang.NullPointerException at org.freinds_rep.java.Insert_friend.search_column(Insert_friend.java:106) at org.freinds_rep.java.Insert_friend.main(Insert_friend.java:156) ... 6 more [WARNING] thread Thread[org.freinds_rep.java.Insert_friend.main( 127.0.0.1:2181),5,org.freinds_rep.java.Insert_friend] was interrupted but is still alive after waiting at least 15000msecs [WARNING] thread Thread[org.freinds_rep.java.Insert_friend.main( 127.0.0.1:2181),5,org.freinds_rep.java.Insert_friend] will linger despite being asked to die via interruption [WARNING] NOTE: 1 thread(s) did not finish despite being asked to via interruption. This is not a problem with exec:java, it is a problem with the running code. Although not serious, it should be remedied. [WARNING] Couldn't destroy threadgroup org.codehaus.mojo.exec.ExecJavaMojo$IsolatedThreadGroup[name=org.freinds_rep.java.Insert_friend,maxpri=10] java.lang.IllegalThreadStateException at java.lang.ThreadGroup.destroy(ThreadGroup.java:775) at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:328) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) [INFO] [INFO] BUILD FAILURE [INFO]
can't start region server after crash
I am running a single node pseudo hbase cluster on top of a pseudo hadoop. hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase version is 0.98.5 Last night, I found the region server crashed (the process is gone) I found many logs say [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2176ms GC pool 'ParNew' had collection(s): count=1 time=0ms Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to restart it. Then I can see many logs in region server like: wal.HLogSplitter: Creating writer path=hdfs://192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp region=5e7f8f9c63c12a70892f3a774e3186f4 The cpu usage is high and disk read/write speed is 20MB/s. So I let it run and go home. Today morning, I found the region server crash and found logs: hdfs.DFSClient: Failed to close file /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023
Re: can't start region server after crash
also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 it seems there are many bad blocks. is there any method to rescue good data? On Thu, Nov 20, 2014 at 10:52 AM, Li Li fancye...@gmail.com wrote: I am running a single node pseudo hbase cluster on top of a pseudo hadoop. hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase version is 0.98.5 Last night, I found the region server crashed (the process is gone) I found many logs say [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2176ms GC pool 'ParNew' had collection(s): count=1 time=0ms Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to restart it. Then I can see many logs in region server like: wal.HLogSplitter: Creating writer path=hdfs://192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp region=5e7f8f9c63c12a70892f3a774e3186f4 The cpu usage is high and disk read/write speed is 20MB/s. So I let it run and go home. Today morning, I found the region server crash and found logs: hdfs.DFSClient: Failed to close file /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023
Re: can't start region server after crash
Have you tried using fsck ? Cheers On Wed, Nov 19, 2014 at 6:56 PM, Li Li fancye...@gmail.com wrote: also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 it seems there are many bad blocks. is there any method to rescue good data? On Thu, Nov 20, 2014 at 10:52 AM, Li Li fancye...@gmail.com wrote: I am running a single node pseudo hbase cluster on top of a pseudo hadoop. hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase version is 0.98.5 Last night, I found the region server crashed (the process is gone) I found many logs say [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2176ms GC pool 'ParNew' had collection(s): count=1 time=0ms Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to restart it. Then I can see many logs in region server like: wal.HLogSplitter: Creating writer path=hdfs:// 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp region=5e7f8f9c63c12a70892f3a774e3186f4 The cpu usage is high and disk read/write speed is 20MB/s. So I let it run and go home. Today morning, I found the region server crash and found logs: hdfs.DFSClient: Failed to close file /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023
Re: can't start region server after crash
I have tried and found many file's replication factor is 3(dfs.replication is 1 in hdfs.xml). So I try to set it to 1 now. there are so many files that it takes more than 30 minutes now and still not finished. I will try fsck later On Thu, Nov 20, 2014 at 11:25 AM, Ted Yu yuzhih...@gmail.com wrote: Have you tried using fsck ? Cheers On Wed, Nov 19, 2014 at 6:56 PM, Li Li fancye...@gmail.com wrote: also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 it seems there are many bad blocks. is there any method to rescue good data? On Thu, Nov 20, 2014 at 10:52 AM, Li Li fancye...@gmail.com wrote: I am running a single node pseudo hbase cluster on top of a pseudo hadoop. hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase version is 0.98.5 Last night, I found the region server crashed (the process is gone) I found many logs say [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2176ms GC pool 'ParNew' had collection(s): count=1 time=0ms Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to restart it. Then I can see many logs in region server like: wal.HLogSplitter: Creating writer path=hdfs:// 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp region=5e7f8f9c63c12a70892f3a774e3186f4 The cpu usage is high and disk read/write speed is 20MB/s. So I let it run and go home. Today morning, I found the region server crash and found logs: hdfs.DFSClient: Failed to close file /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023
Re: can't start region server after crash
hadoop fsck / Status: HEALTHY Total size:1382743735840 B Total dirs:1127 Total files: 476753 Total blocks (validated): 490085 (avg. block size 2821436 B) Minimally replicated blocks: 490085 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:1 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Thu Nov 20 13:57:44 CST 2014 in 9065 milliseconds On Thu, Nov 20, 2014 at 11:25 AM, Ted Yu yuzhih...@gmail.com wrote: Have you tried using fsck ? Cheers On Wed, Nov 19, 2014 at 6:56 PM, Li Li fancye...@gmail.com wrote: also in hdfs ui, I found Number of Under-Replicated Blocks : 497741 it seems there are many bad blocks. is there any method to rescue good data? On Thu, Nov 20, 2014 at 10:52 AM, Li Li fancye...@gmail.com wrote: I am running a single node pseudo hbase cluster on top of a pseudo hadoop. hadoop is 1.2.1 and replication factor of hdfs is 1. And the hbase version is 0.98.5 Last night, I found the region server crashed (the process is gone) I found many logs say [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2176ms GC pool 'ParNew' had collection(s): count=1 time=0ms Then I use ./bin/stop-hbase.sh to stop it and then start-hbase.sh to restart it. Then I can see many logs in region server like: wal.HLogSplitter: Creating writer path=hdfs:// 192.168.10.121:9000/hbase/data/default/baiducrawler.webpage/5e7f8f9c63c12a70892f3a774e3186f4/recovered.edits/0121515.temp region=5e7f8f9c63c12a70892f3a774e3186f4 The cpu usage is high and disk read/write speed is 20MB/s. So I let it run and go home. Today morning, I found the region server crash and found logs: hdfs.DFSClient: Failed to close file /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/data/default/baiducrawler.webpage/1a4628670035e53d38f87b534b3302bf/recovered.edits/0116237.temp could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy8.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023