Re: HMaster restart with error
The exception originated from Web UI corresponding to HBaseAdmin.listTables(). At that moment, master was unable to process the request - it needed some more time. Cheers On Sun, May 17, 2015 at 11:03 PM, Louis Hust louis.h...@gmail.com wrote: Yes, ted, can you tell me what the following excpetion means in l-namenode1.log? 2015-05-15 12:16:40,522 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Finished processing of shutdown of l-hbase31.data.cn8.qunar.com,60020,1427789773001 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] client.HConnectionManager$HConnectionImplementation: Checking master connection Does this mean the cluster was not operational? 2015-05-18 11:45 GMT+08:00 Ted Yu yuzhih...@gmail.com: After l-namenode1 became active master , it assigned regions: 2015-05-15 12:16:40,432 INFO [master:l-namenode1:6] master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff state=OFFLINE, ts=1431663400432, server=null} to {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server= l-hbase31.data.cn8.qunar.com,60020,1431462584879} However, l-hbase31 went down: 2015-05-15 12:16:40,508 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Splitting logs for l-hbase31.data.cn8.qunar.com,60020,1427789773001 before assignment. l-namenode1 was restarted : 2015-05-15 12:20:25,322 INFO [main] util.VersionInfo: HBase 0.96.0-hadoop2 2015-05-15 12:20:25,323 INFO [main] util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434 However, it went down due to zookeeper session expiration: 2015-05-15 12:20:25,580 WARN [main] zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master It started again after that and AssignmentManager did a lot of assignments. Looks like the cluster was operational this time. Cheers On Sun, May 17, 2015 at 8:24 AM, Ted Yu yuzhih...@gmail.com wrote: bq. the backup master take over at 2015-05-15 12:16:40,024 ? The switch of active master should be earlier than 12:16:40,024 - shortly after 12:15:58 l-namenode1 would do some initialization (such as waiting for region servers count to settle) after it became active master. I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but the download progress was very slow. Will try downloading later in the day. Do you have access to pastebin ? Cheers On Sun, May 17, 2015 at 2:07 AM, Louis Hust louis.h...@gmail.com wrote: Hi, ted, Thanks for your reply!! I found the log in l-namenode2.dba.cn8 during the restarting progress: 2015-05-15 12:11:36,540 INFO [master:l-namenode2:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of 2147483647, master is running. So this means the HMaster ready for handle request at 12:11:36? The backup master is l-namenode1.dba.cn8 and you can get the log at : http://pan.baidu.com/s/1eQlKXj0 After the l-namenode2.dba.cn8 is stopped by me at 12:15:58, the backup master l-namenode1 take over, and i found log: 2015-05-15 12:16:40,024 INFO [master:l-namenode1:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of 2147483647, master is running. So the backup master take over at 2015-05-15 12:16:40,024 ? But it seems the l-namenode2 not working normally with the exception in log: 2015-05-15 12:16:40,522 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Finished processing of shutdown of l-hbase31.data.cn8.qunar.com,60020,1427789773001 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] client.HConnectionManager$HConnectionImplementation: Checking master connection com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110) at
Re: Where can I find the document for specific version of hbase api ?
Xiaobo: Can you download the source tarball for the release you're using ? You can find all the API information from the source code. Cheers On Mon, May 18, 2015 at 1:33 AM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, http://hbase.apache.org/apidocs/ shows the latest version, but where I find the document for a specific version such as 0.98.5? Thanks,
Re: HBase failing to restart in single-user mode
Same for me, I had faced similar issues especially on my virtual machines since I would restart them more often than my host machine. Moving ZK from /tmp which could get cleared on reboots fixed the issue for me. Thanks, Viral On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com wrote: I noticed similar ZK related issues but those went away after changing the ZK directory to a permanent directory along with the HBase root directory. Both point now to a location in my home folder and restarts work fine now. Not much help but wanted to at least state that. Lars Sent from my iPhone On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com wrote: Hi all, For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the latest version from branch-1.0 and use the following config: configuration property namehbase.rootdir/name valuefile:///tmp/hbase-${user.name}/value /property property namehbase.online.schema.update.enable/name valuetrue/value /property property namezookeeper.session.timeout/name value30/value /property property namehbase.zookeeper.property.tickTime/name value200/value /property property namehbase.zookeeper.dns.interface/name valuelo0/value /property property namehbase.regionserver.dns.interface/name valuelo0/value /property property namehbase.master.dns.interface/name valuelo0/value /property /configuration Since at least a month ago (perhaps longer, I don’t remember exactly) I can’t restart HBase. The very first time it starts up fine, but subsequent startup attempts all fail with: 2015-05-17 20:39:19,024 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: starting 2015-05-17 20:39:19,024 INFO [RpcServer.listener,port=49809] ipc.RpcServer: RpcServer.listener,port=49809: starting 2015-05-17 20:39:19,029 INFO [main] http.HttpRequestLog: Http request log for http.requests.regionserver is not defined 2015-05-17 20:39:19,030 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context regionserver 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2015-05-17 20:39:19,033 INFO [main] http.HttpServer: Jetty bound to port 49811 2015-05-17 20:39:19,033 INFO [main] mortbay.log: jetty-6.1.26 2015-05-17 20:39:19,157 INFO [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:49811 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4f708099 connecting to ZooKeeper ensemble=localhost:2181 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=1 watcher=hconnection-0x4f7080990x0, quorum=localhost:2181, baseZNode=/hbase 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:49812 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:49812 2015-05-17 20:39:19,224 INFO [SyncThread:0] server.ZooKeeperServer: Established session 0x14d651aaec2 with negotiated timeout 400 for client /127.0.0.1:49812 2015-05-17 20:39:19,224 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14d651aaec2, negotiated timeout = 400 2015-05-17 20:39:19,249 INFO [M:0;localhost:49807] regionserver.HRegionServer: ClusterId : 6ad7eddd-2886-4ff0-b377-a2ff42c8632f 2015-05-17 20:39:49,208 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Master not active after 30 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:194) at
Where can I find the document for specific version of hbase api ?
Hi, http://hbase.apache.org/apidocs/ shows the latest version, but where I find the document for a specific version such as 0.98.5? Thanks,
Re: How to set Timeout for get/scan operations without impacting others
hbase.client.operation.timeout is used by HBaseAdmin operations, by RegionReplicaFlushHandler and by various HTable operations (including Get). hbase.rpc.timeout is for the RPC layer to define how long HBase client applications take for a remote call to time out. It uses pings to check connections but will eventually throw a TimeoutException. FYI On Sun, May 17, 2015 at 11:11 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, I need to set tight timeout for get/scan operations and I think HBase Client already support it. I found three related keys: - hbase.client.operation.timeout - hbase.rpc.timeout - hbase.client.retries.number What's the difference between hbase.client.operation.timeout and hbase.rpc.timeout? My understanding is that hbase.rpc.timeout has larger scope than hbase. client.operation.timeout, so setting hbase.client.operation.timeout is safer. Am I correct? And any other property keys I can uses? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: Re: How to know the root reason to cause RegionServer OOM?
On Mon, May 18, 2015 at 11:47 AM, Andrew Purtell apurt...@apache.org wrote: You need to not overcommit memory on servers running JVMs for HDFS and HBase (and YARN, and containers, if colocating Hadoop MR). Sum the -Xmx parameter, the maximum heap size, for all JVMs that will be concurrently executing on the server. The total should be less than the total amount of RAM available on the server. Additionally you will want to reserve ~1GB for the OS. Finally, set vm.swappiness=0 in /etc/sysctl.conf to prevent unnecessary paging. On 3.5+ kernels you have to set vm.swappiness=1 if you still want to page to avoid OOM. -- Sean
Re: Where can I find the document for specific version of hbase api ?
Thanks for pinging us on this. There's currently an open jira for properly providing access to 0.98, 1.0, and 1.1 specific javadocs[1]. Unfortunately, no one has had the time to take care of things yet. You can follow that ticket if you'd like to know when there's movement. For now, your only option is to use the source tarball and build the site for that version, as Ted mentioned. The command for that is $ mvn -DskipTests clean package site [1]: https://issues.apache.org/jira/browse/HBASE-13140 On Mon, May 18, 2015 at 3:33 AM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, http://hbase.apache.org/apidocs/ shows the latest version, but where I find the document for a specific version such as 0.98.5? Thanks, -- Sean
Re: Re: How to know the root reason to cause RegionServer OOM?
You need to not overcommit memory on servers running JVMs for HDFS and HBase (and YARN, and containers, if colocating Hadoop MR). Sum the -Xmx parameter, the maximum heap size, for all JVMs that will be concurrently executing on the server. The total should be less than the total amount of RAM available on the server. Additionally you will want to reserve ~1GB for the OS. Finally, set vm.swappiness=0 in /etc/sysctl.conf to prevent unnecessary paging. On Sun, May 17, 2015 at 8:08 PM, David chen c77...@163.com wrote: The snippet in /var/log/messages is as follows, i am sure that process killed(22827) is RegsionServer. .. May 14 12:00:38 localhost kernel: Mem-Info: May 14 12:00:38 localhost kernel: Node 0 DMA per-cpu: May 14 12:00:38 localhost kernel: CPU0: hi:0, btch: 1 usd: 0 .. May 14 12:00:38 localhost kernel: CPU 39: hi:0, btch: 1 usd: 0 May 14 12:00:38 localhost kernel: Node 0 DMA32 per-cpu: May 14 12:00:38 localhost kernel: CPU0: hi: 186, btch: 31 usd: 30 .. May 14 12:00:38 localhost kernel: CPU 39: hi: 186, btch: 31 usd: 8 May 14 12:00:38 localhost kernel: Node 0 Normal per-cpu: May 14 12:00:38 localhost kernel: CPU0: hi: 186, btch: 31 usd: 5 .. May 14 12:00:38 localhost kernel: CPU 39: hi: 186, btch: 31 usd: 20 May 14 12:00:38 localhost kernel: Node 1 Normal per-cpu: May 14 12:00:38 localhost kernel: CPU0: hi: 186, btch: 31 usd: 7 .. May 14 12:00:38 localhost kernel: CPU 39: hi: 186, btch: 31 usd: 10 May 14 12:00:38 localhost kernel: active_anon:7993118 inactive_anon:48001 isolated_anon:0 May 14 12:00:38 localhost kernel: active_file:855 inactive_file:960 isolated_file:0 May 14 12:00:38 localhost kernel: unevictable:0 dirty:0 writeback:0 unstable:0 May 14 12:00:38 localhost kernel: free:39239 slab_reclaimable:14043 slab_unreclaimable:27993 May 14 12:00:38 localhost kernel: mapped:48750 shmem:75053 pagetables:20540 bounce:0 May 14 12:00:38 localhost kernel: Node 0 DMA free:15732kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 3211 16088 16088 May 14 12:00:38 localhost kernel: Node 0 DMA32 free:60388kB min:8968kB low:11208kB high:13452kB active_anon:2811676kB inactive_anon:72kB active_file:0kB inactive_file:788kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3288224kB mlocked:0kB dirty:0kB writeback:44kB mapped:156kB shmem:8232kB slab_reclaimable:10652kB slab_unreclaimable:5144kB kernel_stack:56kB pagetables:4252kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1312 all_unreclaimable? yes May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 12877 12877 May 14 12:00:38 localhost kernel: Node 0 Normal free:35772kB min:35964kB low:44952kB high:53944kB active_anon:13062472kB inactive_anon:4864kB active_file:1268kB inactive_file:1504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13186560kB mlocked:0kB dirty:0kB writeback:92kB mapped:6172kB shmem:51928kB slab_reclaimable:22732kB slab_unreclaimable:73204kB kernel_stack:16240kB pagetables:38040kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:10268 all_unreclaimable? yes May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0 May 14 12:00:38 localhost kernel: Node 1 Normal free:45064kB min:45132kB low:56412kB high:67696kB active_anon:16098324kB inactive_anon:187068kB active_file:2192kB inactive_file:1548kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16547840kB mlocked:0kB dirty:116kB writeback:0kB mapped:188672kB shmem:240052kB slab_reclaimable:22788kB slab_unreclaimable:33624kB kernel_stack:7352kB pagetables:39868kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:12064 all_unreclaimable? yes May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0 May 14 12:00:38 localhost kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15732kB May 14 12:00:38 localhost kernel: Node 0 DMA32: 659*4kB 576*8kB 485*16kB 338*32kB 208*64kB 106*128kB 27*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 60636kB May 14 12:00:38 localhost kernel: Node 0 Normal: 1166*4kB 579*8kB 337*16kB 203*32kB 106*64kB 61*128kB 3*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 37568kB May 14 12:00:38 localhost kernel: Node 1 Normal: 668*4kB 405*8kB 422*16kB 259*32kB 176*64kB 67*128kB 7*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 43608kB May 14 12:00:38 localhost kernel: 78257 total pagecache pages May 14 12:00:38 localhost kernel: 0 pages in swap cache May 14 12:00:38 localhost kernel: Swap cache stats: add 0, delete 0, find 0/0
Re: Where can I find the document for specific version of hbase api ?
You don't need to build from the src tgz, the bin tgz contains a docs directory, wherein you'll find both public-facing (@Public annotated classes) and full javadocs in apidocs and devapidocs respectively. The whole site and book are there too, but our release policy is to copy site and book from master. The javadocs are generated with this build of this release though. -n On Mon, May 18, 2015 at 10:47 AM, Sean Busbey bus...@cloudera.com wrote: Thanks for pinging us on this. There's currently an open jira for properly providing access to 0.98, 1.0, and 1.1 specific javadocs[1]. Unfortunately, no one has had the time to take care of things yet. You can follow that ticket if you'd like to know when there's movement. For now, your only option is to use the source tarball and build the site for that version, as Ted mentioned. The command for that is $ mvn -DskipTests clean package site [1]: https://issues.apache.org/jira/browse/HBASE-13140 On Mon, May 18, 2015 at 3:33 AM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, http://hbase.apache.org/apidocs/ shows the latest version, but where I find the document for a specific version such as 0.98.5? Thanks, -- Sean
Load data into hbase
How should I go about creating and loading a bunch of lookup tables on HBASE? These are the typical RDBMS kind of data - where the data is row-oriented. All the data is coming from a flat file that's again row-oriented. How best can I load this data into HBASE? I first created the table in Hive, mapped to the HBase table: CREATE TABLE CITY_CTR_SLS ( id string, CUST_CITY_ID INT, CALL_CTR_ID INT, TOT_DOLLAR_SALES FLOAT, TOT_UNIT_SALES FLOAT, TOT_COST FLOAT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key,ints:CUST_CITY_ID,ints:CALL_CTR_ID,floats:TOT_DOLLAR_SALES,floats:TOT_UNIT_SALES,floats:TOT_COST ) TBLPROPERTIES(hbase.table.name = hbase_CITY_CTR_SLS1); When I run the following command to load data into the hive table, I get an error about mismatched columns(because of the additional ID column for hbase that's needed: [ash-r101-14l.mstrprime.com:21000] INSERT INTO CITY_CTR_SLS select * from wh2.CITY_CTR_SLS; ...(wh2.city_ctr_sls already exists) Query: insert INTO CITY_CTR_SLS select * from wh2.CITY_CTR_SLS ERROR: AnalysisException: Target table 'hbase_temp.city_ctr_sls' has more columns (6) than the SELECT / VALUES clause returns (5) [ash-r101-14l.mstrprime.com:21000] Any pointers? Thanks. Farah
Re: Load data into hbase
Lot of options depending upon your specifics of the usecase: In addition to Hive... You can use Sqoop http://www.dummies.com/how-to/content/importing-data-into-hbase-with-sqoop.html You can use Pig http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.7/bk_user-guide/content/user-guide-hbase-import-2.html If the data is delimiter separated then importTsv http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/ Regards, Shahab On Mon, May 18, 2015 at 3:33 PM, Omer, Farah fo...@microstrategy.com wrote: How should I go about creating and loading a bunch of lookup tables on HBASE? These are the typical RDBMS kind of data - where the data is row-oriented. All the data is coming from a flat file that's again row-oriented. How best can I load this data into HBASE? I first created the table in Hive, mapped to the HBase table: CREATE TABLE CITY_CTR_SLS ( id string, CUST_CITY_ID INT, CALL_CTR_ID INT, TOT_DOLLAR_SALES FLOAT, TOT_UNIT_SALES FLOAT, TOT_COST FLOAT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key,ints:CUST_CITY_ID,ints:CALL_CTR_ID,floats:TOT_DOLLAR_SALES,floats:TOT_UNIT_SALES,floats:TOT_COST ) TBLPROPERTIES(hbase.table.name = hbase_CITY_CTR_SLS1); When I run the following command to load data into the hive table, I get an error about mismatched columns(because of the additional ID column for hbase that's needed: [ash-r101-14l.mstrprime.com:21000] INSERT INTO CITY_CTR_SLS select * from wh2.CITY_CTR_SLS; ...(wh2.city_ctr_sls already exists) Query: insert INTO CITY_CTR_SLS select * from wh2.CITY_CTR_SLS ERROR: AnalysisException: Target table 'hbase_temp.city_ctr_sls' has more columns (6) than the SELECT / VALUES clause returns (5) [ash-r101-14l.mstrprime.com:21000] Any pointers? Thanks. Farah
Re: How to set Timeout for get/scan operations without impacting others
bq. Caused by: java.io.IOException: Invalid HFile block magic: \x00\x00\x00\x00\x00\x00\x00\x00 Looks like you have some corrupted HFile(s) in your cluster - which should be fixed first. Which hbase release are you using ? Do you use data block encoding ? You can use http://hbase.apache.org/book.html#_hfile_tool to do some investigation. Cheers On Mon, May 18, 2015 at 6:19 PM, Fang, Mike chuf...@paypal.com wrote: Hi Ted, Thanks for your information. My application queries the HBase, and for some of the queries it just hang there and throw exception after several minutes (5-8minutes). As a workaround, I try to set the timeout to a shorter time, so my app won’t hang for minutes but for several seconds. I tried to set both the time out to 1000 (1s). but it still hang for several minutes.Is this expected? Appreciate it if you know how I could fix the exception. Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/125acceb75d84724a089701c590a4d3d, compression=snappy, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=addrv#34005240#US,_28409,_822|addre/F|rval#null|cust#1158923121468951849|addre#1095283883|1/EOUT_T_SRD:~/143098200/Put, lastKey=addrv#38035AC7#US,_60449,_4684|addre/F|rval#null|cust#1335211720509289817|addre#697997140|1/EOUT_T_SRD:~/143098200/Put, avgKeyLen=122, avgValueLen=187, entries=105492830, length=6880313695, cur=null] to key addrv#34B97AEC#FR,_06110,_41 route des breguieres|addre/F|rval#/EOUT_T_SRD:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:165) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:176) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3716) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1890) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1876) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1853) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3090) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28861) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Failed to read compressed block at 1253175503, onDiskSizeWithoutHeader=66428, preReadHeaderSize=33, header.length=33, header bytes: DATABLKE\x00\x003\x00\x00\xC3\xC9\x00\x00\x00\x01r\xC4-\xDF\x01\x00\x00@ \x00\x00\x00P at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1451) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:494) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:515) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:238) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:153) ... 15 more Caused by: java.io.IOException: Invalid HFile block magic: \x00\x00\x00\x00\x00\x00\x00\x00 at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:165) at org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:239) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1448 Thanks, Mike
(info) How can i load data from CDH4.3.0 to CDH5.4.0 in Hbase
hello list, is there a way to load the existing data(HFiles) from CDH4.3.0 to CDH5.4.0? we use the complete bulkload utility which reference the link: http://hbase.apache.org/0.94/book/ops_mgt.html#completebulkload the command: hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/ IM_ItemPrice the region of *096518a1aa5823c4aec9477d7b1b63cf *which contains several family names: BaseInfo / of / ol / q4s etc. but it seems not work, see the output after command typed below: 15/05/19 09:57:59 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://nameservice1/tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/ *BaseInfo*/88fc1240fa8f4e31aa27469b7bd66750 first=e04116151155a5fae8dfa37281df89304c5e62219c31a761024cfac80f8e204c last=baa9f23ae8fd718aa888f814665e19d04fcdee09de9926c8690db00af905 15/05/19 09:57:59 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://nameservice1/tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/ *ol*/2a886066311343f98737ad2e4e804260 first=e045a94bb684ce8bbb5cf34b9e0dd939c03946bd445711204b30c17d72b55874 last=7f5766b39d218e772dc357d9588b7a7363c03b3b7a07be7bcbb41dc267b3 15/05/19 09:57:59 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://nameservice1/tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/ *of*/c01e895d483b4c86beb4eeae503e8fa9 first=e046037883152a81168507bf9faefc8ba716f15a6a028b81cbf83e2894896ec3 last=fff93051701920c2848f091369e484b685f42ed7a1378ffcb4e9dddf8bcd7ef7 15/05/19 10:09:17 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=677650 ms ago, cancelled=false, msg=row '' on table 'IM_ItemPrice' at region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade., hostname=ssspark03,60020,1431768671791, seqNum=2 15/05/19 10:10:32 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=752931 ms ago, cancelled=false, msg=row '' on table 'IM_ItemPrice' at region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade., hostname=ssspark03,60020,1431768671791, seqNum=2 15/05/19 10:11:48 INFO client.RpcRetryingCaller: Call exception, tries=12, retries=35, started=828151 ms ago, cancelled=false, msg=row '' on table 'IM_ItemPrice' at region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade., hostname=ssspark03,60020,1431768671791, seqNum=2 15/05/19 10:13:03 INFO client.RpcRetryingCaller: Call exception, tries=13, retries=35, started=903409 ms ago, cancelled=false, msg=row '' on table 'IM_ItemPrice' at region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade., hostname=ssspark03,60020,1431768671791, seqNum=2 15/05/19 10:14:18 INFO client.RpcRetryingCaller: Call exception, tries=14, retries=35, started=978634 ms ago, cancelled=false, msg=row '' on table 'IM_ItemPrice' at region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade., hostname=ssspark03,60020,1431768671791, seqNum=2 15/05/19 10:15:33 INFO client.RpcRetryingCaller: Call exception, tries=15, retries=35, started=1054003 ms ago, cancelled=false, msg=row '' on table 'IM_ItemPrice' at region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade., hostname=ssspark03,60020,1431768671791, seqNum=2 .. any suggestion? -- *Ric Dong*
Re: HBase failing to restart in single-user mode
Hi Benoit, I think you need to move the directory out of /tmp and give it a shot. /tmp/hbase-${user.name} /zk will get cleaned up during restart. ~Anil On Mon, May 18, 2015 at 9:45 PM, tsuna tsuna...@gmail.com wrote: I added this to hbase-site.xml: property namehbase.zookeeper.property.dataDir/name value/tmp/hbase-${user.name}/zk/value /property Didn’t change anything. Once I kill/shutdown HBase, it won’t come back up. On Mon, May 18, 2015 at 1:14 AM, Viral Bajaria viral.baja...@gmail.com wrote: Same for me, I had faced similar issues especially on my virtual machines since I would restart them more often than my host machine. Moving ZK from /tmp which could get cleared on reboots fixed the issue for me. Thanks, Viral On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com wrote: I noticed similar ZK related issues but those went away after changing the ZK directory to a permanent directory along with the HBase root directory. Both point now to a location in my home folder and restarts work fine now. Not much help but wanted to at least state that. Lars Sent from my iPhone On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com wrote: Hi all, For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the latest version from branch-1.0 and use the following config: configuration property namehbase.rootdir/name valuefile:///tmp/hbase-${user.name}/value /property property namehbase.online.schema.update.enable/name valuetrue/value /property property namezookeeper.session.timeout/name value30/value /property property namehbase.zookeeper.property.tickTime/name value200/value /property property namehbase.zookeeper.dns.interface/name valuelo0/value /property property namehbase.regionserver.dns.interface/name valuelo0/value /property property namehbase.master.dns.interface/name valuelo0/value /property /configuration Since at least a month ago (perhaps longer, I don’t remember exactly) I can’t restart HBase. The very first time it starts up fine, but subsequent startup attempts all fail with: 2015-05-17 20:39:19,024 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: starting 2015-05-17 20:39:19,024 INFO [RpcServer.listener,port=49809] ipc.RpcServer: RpcServer.listener,port=49809: starting 2015-05-17 20:39:19,029 INFO [main] http.HttpRequestLog: Http request log for http.requests.regionserver is not defined 2015-05-17 20:39:19,030 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context regionserver 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2015-05-17 20:39:19,033 INFO [main] http.HttpServer: Jetty bound to port 49811 2015-05-17 20:39:19,033 INFO [main] mortbay.log: jetty-6.1.26 2015-05-17 20:39:19,157 INFO [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:49811 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4f708099 connecting to ZooKeeper ensemble=localhost:2181 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=1 watcher=hconnection-0x4f7080990x0, quorum=localhost:2181, baseZNode=/hbase 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:49812 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:49812 2015-05-17 20:39:19,224 INFO [SyncThread:0] server.ZooKeeperServer: Established session 0x14d651aaec2 with negotiated timeout 400 for client /127.0.0.1:49812 2015-05-17
Re: HBase failing to restart in single-user mode
I added this to hbase-site.xml: property namehbase.zookeeper.property.dataDir/name value/tmp/hbase-${user.name}/zk/value /property Didn’t change anything. Once I kill/shutdown HBase, it won’t come back up. On Mon, May 18, 2015 at 1:14 AM, Viral Bajaria viral.baja...@gmail.com wrote: Same for me, I had faced similar issues especially on my virtual machines since I would restart them more often than my host machine. Moving ZK from /tmp which could get cleared on reboots fixed the issue for me. Thanks, Viral On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com wrote: I noticed similar ZK related issues but those went away after changing the ZK directory to a permanent directory along with the HBase root directory. Both point now to a location in my home folder and restarts work fine now. Not much help but wanted to at least state that. Lars Sent from my iPhone On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com wrote: Hi all, For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the latest version from branch-1.0 and use the following config: configuration property namehbase.rootdir/name valuefile:///tmp/hbase-${user.name}/value /property property namehbase.online.schema.update.enable/name valuetrue/value /property property namezookeeper.session.timeout/name value30/value /property property namehbase.zookeeper.property.tickTime/name value200/value /property property namehbase.zookeeper.dns.interface/name valuelo0/value /property property namehbase.regionserver.dns.interface/name valuelo0/value /property property namehbase.master.dns.interface/name valuelo0/value /property /configuration Since at least a month ago (perhaps longer, I don’t remember exactly) I can’t restart HBase. The very first time it starts up fine, but subsequent startup attempts all fail with: 2015-05-17 20:39:19,024 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: starting 2015-05-17 20:39:19,024 INFO [RpcServer.listener,port=49809] ipc.RpcServer: RpcServer.listener,port=49809: starting 2015-05-17 20:39:19,029 INFO [main] http.HttpRequestLog: Http request log for http.requests.regionserver is not defined 2015-05-17 20:39:19,030 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context regionserver 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2015-05-17 20:39:19,033 INFO [main] http.HttpServer: Jetty bound to port 49811 2015-05-17 20:39:19,033 INFO [main] mortbay.log: jetty-6.1.26 2015-05-17 20:39:19,157 INFO [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:49811 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4f708099 connecting to ZooKeeper ensemble=localhost:2181 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=1 watcher=hconnection-0x4f7080990x0, quorum=localhost:2181, baseZNode=/hbase 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:49812 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:49812 2015-05-17 20:39:19,224 INFO [SyncThread:0] server.ZooKeeperServer: Established session 0x14d651aaec2 with negotiated timeout 400 for client /127.0.0.1:49812 2015-05-17 20:39:19,224 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14d651aaec2, negotiated timeout = 400 2015-05-17 20:39:19,249 INFO [M:0;localhost:49807] regionserver.HRegionServer: ClusterId : 6ad7eddd-2886-4ff0-b377-a2ff42c8632f
Re: HBase failing to restart in single-user mode
Wait. Benoit, you mean restart the laptop or stop/start HBase? I agree that contents of /tmp are not stable across system reboot, across stop/start of HBase process there should be no problems. Should. For what it's worth, on the Mac and local mode testing, I usually use $HBASE_HOME/data. This is usually not on /tmp. On Monday, May 18, 2015, anil gupta anilgupt...@gmail.com wrote: Hi Benoit, I think you need to move the directory out of /tmp and give it a shot. /tmp/hbase-${user.name} /zk will get cleaned up during restart. ~Anil On Mon, May 18, 2015 at 9:45 PM, tsuna tsuna...@gmail.com javascript:; wrote: I added this to hbase-site.xml: property namehbase.zookeeper.property.dataDir/name value/tmp/hbase-${user.name}/zk/value /property Didn’t change anything. Once I kill/shutdown HBase, it won’t come back up. On Mon, May 18, 2015 at 1:14 AM, Viral Bajaria viral.baja...@gmail.com javascript:; wrote: Same for me, I had faced similar issues especially on my virtual machines since I would restart them more often than my host machine. Moving ZK from /tmp which could get cleared on reboots fixed the issue for me. Thanks, Viral On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com javascript:; wrote: I noticed similar ZK related issues but those went away after changing the ZK directory to a permanent directory along with the HBase root directory. Both point now to a location in my home folder and restarts work fine now. Not much help but wanted to at least state that. Lars Sent from my iPhone On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com javascript:; wrote: Hi all, For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the latest version from branch-1.0 and use the following config: configuration property namehbase.rootdir/name valuefile:///tmp/hbase-${user.name}/value /property property namehbase.online.schema.update.enable/name valuetrue/value /property property namezookeeper.session.timeout/name value30/value /property property namehbase.zookeeper.property.tickTime/name value200/value /property property namehbase.zookeeper.dns.interface/name valuelo0/value /property property namehbase.regionserver.dns.interface/name valuelo0/value /property property namehbase.master.dns.interface/name valuelo0/value /property /configuration Since at least a month ago (perhaps longer, I don’t remember exactly) I can’t restart HBase. The very first time it starts up fine, but subsequent startup attempts all fail with: 2015-05-17 20:39:19,024 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: starting 2015-05-17 20:39:19,024 INFO [RpcServer.listener,port=49809] ipc.RpcServer: RpcServer.listener,port=49809: starting 2015-05-17 20:39:19,029 INFO [main] http.HttpRequestLog: Http request log for http.requests.regionserver is not defined 2015-05-17 20:39:19,030 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context regionserver 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2015-05-17 20:39:19,033 INFO [main] http.HttpServer: Jetty bound to port 49811 2015-05-17 20:39:19,033 INFO [main] mortbay.log: jetty-6.1.26 2015-05-17 20:39:19,157 INFO [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:49811 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4f708099 connecting to ZooKeeper ensemble=localhost:2181 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=1 watcher=hconnection-0x4f7080990x0, quorum=localhost:2181, baseZNode=/hbase 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection
Re: HBase Block locality always 0
Sorry if I'm asking a silly question... Are you sure your RSs and Datanodes are all up and running? Are you sure they are collocated? Datanode on l-hbase[26-31].data.cn8 and regionserver on l-hbase[25-31].data.cn8, Could be that your only live RS is on l-hbase25.data.cn8, which would cause that behavior... Btw, why 25th is not collocated with datanode? Alex Baranau -- http://cdap.io - open source framework to build and run data applications on Hadoop HBase On Fri, May 15, 2015 at 8:12 PM, Louis Hust louis.h...@gmail.com wrote: Hi, Esteban, Hadoop Version 2.2.0, r1537062. So i do not know why it always write other datanode instead of local datanode, If there is some log for the hdfs write policy? And now the cluster is working not healthy, with heavy networking. 2015-05-15 1:28 GMT+08:00 Esteban Gutierrez este...@cloudera.com: Hi Louis, Locality 0 is not right for a cluster of that size and having 3 replicas per block unless all RS cannot connect to the local DN and somehow the local DN to the RS is always excluded from the pipeline. In Hadoop 2.0-alpha there was a bug (HDFS-3224) that caused the NN to report a DN as live and dead if the storage ID was changed in a single volume (e.g. after replacing one drive) and that caused fs.getFileBlockLocations() to report less blocks for calculating the HDFS locality index. Unless your cluster is using Hadoop 2.0-alpha I won't worry too much about that. Regarding the logs its odd that the JN is taking about 1.5 seconds just to send less than 200 bytes. Perhaps some IO contention issue is going on in your cluster? thanks, esteban. -- Cloudera, Inc. On Thu, May 14, 2015 at 5:48 AM, Louis Hust louis.h...@gmail.com wrote: Hi, Esteban Each region server has about 122 regions, data is large. HDFS replica is defined as default 3, and namenode have some WARN like below. {log} 2015-05-14 20:45:37,463 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1503ms to send a batch of 3 edits (179 bytes) to remote journal 192.168.44.29:8485 {/log} Regionserver's log seems normal: {log} 2015-05-14 20:46:59,890 INFO [Thread-15] regionserver.HRegion: Finished memstore flush of ~44.4 M/46586984, currentsize=0/0 for region qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee. in 3141ms, sequenceid=7493455453, compaction requested=true 2015-05-14 20:46:59,890 INFO [regionserver60020-smallCompactions-1431462564717] regionserver.HRegion: Starting compaction on m in region qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee. {/log} Any idea? 2015-05-13 1:26 GMT+08:00 Esteban Gutierrez este...@cloudera.com: Hi, How many regions you per RS? one possibility is that you have very little data in your cluster and regions have moved around and there are no blocks in the local DN to the RS. Another possibility is that you have one replica configured and regions moved too so that makes even harder to have some local blocks in the DN to the RS. Lastly it could be some other problem where the HDFS pipeline has excluded the local DN. Have you seen any exception in the RSs or the NameNode that might be interesting? thanks, esteban. -- Cloudera, Inc. On Tue, May 12, 2015 at 2:59 AM, 娄帅 louis.hust...@gmail.com wrote: Hi, all, I am maintaining an hbase 0.96.0 cluster, but from the web ui of HBase regionserver, i saw Block locality is 0 for all regionserver. Datanode on l-hbase[26-31].data.cn8 and regionserver on l-hbase[25-31].data.cn8, Any idea?
Optimizing compactions on super-low-cost HW
Hi, we are using extremely cheap HW: 2 HHD 7200 4*2 core (Hyperthreading) 32GB RAM We met serious IO performance issues. We have more or less even distribution of read/write requests. The same for datasize. ServerName Request Per Second Read Request Count Write Request Count node01.domain.com,60020,1430172017193 195 171871826 16761699 node02.domain.com,60020,1426925053570 24 34314930 16006603 node03.domain.com,60020,1430860939797 22 32054801 16913299 node04.domain.com,60020,1431975656065 33 1765121 253405 node05.domain.com,60020,1430484646409 27 42248883 16406280 node07.domain.com,60020,1426776403757 27 36324492 16299432 node08.domain.com,60020,1426775898757 26 38507165 13582109 node09.domain.com,60020,1430440612531 27 34360873 15080194 node11.domain.com,60020,1431989669340 28 44307 13466 node12.domain.com,60020,1431927604238 30 5318096 2020855 node13.domain.com,60020,1431372874221 29 31764957 15843688 node14.domain.com,60020,1429640630771 41 36300097 13049801 ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed Storefile Size Index Size Bloom Size node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb 641849k 310111k node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb 649610k 318854k node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb 627346k 307136k node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb 655954k 289316k node05.domain.com,60020,1430484646409 82 185 807m 81474mb 688136k 334127k node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb 631774k 296169k node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb 681486k 312325k node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb 658924k 309734k node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb 664753k 264081k node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb 652970k 304137k node13.domain.com,60020,1431372874221 82 178 937557m 70042mb 601684k 257607k node14.domain.com,60020,1429640630771 82 145 949090m 69749mb 592812k 266677k When compaction starts random node gets I/O 100%, io wait for seconds, even tenth of seconds. What are the approaches to optimize minor and major compactions when you are I/O bound..?
RE: How to set Timeout for get/scan operations without impacting others
Hi Ted, Thanks. Hbase version is: HBase 0.98.0.2.1.2.0-402-hadoop2 Data block encoding: DATA_BLOCK_ENCODING = 'DIFF' I tried to run the hfile tool to scan, and it looks good though: hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/10afed9b44024d02992cfd0409686658 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 2015-05-18 18:34:33,406 INFO [main] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning - hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/10afed9b44024d02992cfd0409686658 2015-05-18 18:34:33,800 INFO [main] hfile.CacheConfig: Allocating LruBlockCache with maximum size 386.7 M 2015-05-18 18:34:34,032 INFO [main] compress.CodecPool: Got brand-new decompressor [.snappy] Scanned kv count - 13387493 Any thought or suggestion? Also if it is corrupted file, do you have guidance/link showing how to fix that? Thanks, Mike From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, May 19, 2015 9:29 AM To: Fang, Mike Cc: user@hbase.apache.org; Dai, Kevin; Huang, Jianshi Subject: Re: How to set Timeout for get/scan operations without impacting others bq. Caused by: java.io.IOException: Invalid HFile block magic: \x00\x00\x00\x00\x00\x00\x00\x00 Looks like you have some corrupted HFile(s) in your cluster - which should be fixed first. Which hbase release are you using ? Do you use data block encoding ? You can use http://hbase.apache.org/book.html#_hfile_tool to do some investigation. Cheers On Mon, May 18, 2015 at 6:19 PM, Fang, Mike chuf...@paypal.commailto:chuf...@paypal.com wrote: Hi Ted, Thanks for your information. My application queries the HBase, and for some of the queries it just hang there and throw exception after several minutes (5-8minutes). As a workaround, I try to set the timeout to a shorter time, so my app won’t hang for minutes but for several seconds. I tried to set both the time out to 1000 (1s). but it still hang for several minutes.Is this expected? Appreciate it if you know how I could fix the exception. Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/125acceb75d84724a089701c590a4d3d, compression=snappy, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=addrv#34005240#US,_28409,_822|addre/F|rval#null|cust#1158923121468951849|addre#1095283883|1/EOUT_T_SRD:~/143098200/Put, lastKey=addrv#38035AC7#US,_60449,_4684|addre/F|rval#null|cust#1335211720509289817|addre#697997140|1/EOUT_T_SRD:~/143098200/Put, avgKeyLen=122, avgValueLen=187, entries=105492830, length=6880313695, cur=null] to key addrv#34B97AEC#FR,_06110,_41 route des breguieres|addre/F|rval#/EOUT_T_SRD:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:165) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:176) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3716) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1890) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1876) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1853) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3090) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28861) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Failed to read compressed block at 1253175503, onDiskSizeWithoutHeader=66428, preReadHeaderSize=33, header.length=33, header bytes: DATABLKE\x00\x003\x00\x00\xC3\xC9\x00\x00\x00\x01r\xC4-\xDF\x01\x00\x00@\x00\x00\x00P at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1451) at
RE: How to set Timeout for get/scan operations without impacting others
Hi Ted, Thanks for your information. My application queries the HBase, and for some of the queries it just hang there and throw exception after several minutes (5-8minutes). As a workaround, I try to set the timeout to a shorter time, so my app won’t hang for minutes but for several seconds. I tried to set both the time out to 1000 (1s). but it still hang for several minutes.Is this expected? Appreciate it if you know how I could fix the exception. Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/125acceb75d84724a089701c590a4d3d, compression=snappy, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=addrv#34005240#US,_28409,_822|addre/F|rval#null|cust#1158923121468951849|addre#1095283883|1/EOUT_T_SRD:~/143098200/Put, lastKey=addrv#38035AC7#US,_60449,_4684|addre/F|rval#null|cust#1335211720509289817|addre#697997140|1/EOUT_T_SRD:~/143098200/Put, avgKeyLen=122, avgValueLen=187, entries=105492830, length=6880313695, cur=null] to key addrv#34B97AEC#FR,_06110,_41 route des breguieres|addre/F|rval#/EOUT_T_SRD:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:165) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:176) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3716) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1890) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1876) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1853) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3090) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28861) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Failed to read compressed block at 1253175503, onDiskSizeWithoutHeader=66428, preReadHeaderSize=33, header.length=33, header bytes: DATABLKE\x00\x003\x00\x00\xC3\xC9\x00\x00\x00\x01r\xC4-\xDF\x01\x00\x00@\x00\x00\x00P at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1451) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:494) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:515) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:238) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:153) ... 15 more Caused by: java.io.IOException: Invalid HFile block magic: \x00\x00\x00\x00\x00\x00\x00\x00 at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:165) at org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:239) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1448 Thanks, Mike From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, May 18, 2015 11:55 PM To: user@hbase.apache.org Cc: Fang, Mike; Dai, Kevin Subject: Re: How to set Timeout for get/scan operations without impacting others hbase.client.operation.timeout is used by HBaseAdmin operations, by RegionReplicaFlushHandler and by various HTable operations (including Get). hbase.rpc.timeout is for the RPC layer to define how long HBase client applications take for a remote call to time out. It uses pings to check connections
Re: HMaster restart with error
But from the log: 2015-05-15 12:16:40,522 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Finished processing of shutdown of l-hbase31.data.cn8.qunar.com,60020,1427789773001 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] client.HConnectionManager$HConnectionImplementation: Checking master connection What the Hmaster doing between 12:16:40 and 12:17:11? It's about 30s. 2015-05-18 22:23 GMT+08:00 Ted Yu yuzhih...@gmail.com: The exception originated from Web UI corresponding to HBaseAdmin.listTables(). At that moment, master was unable to process the request - it needed some more time. Cheers On Sun, May 17, 2015 at 11:03 PM, Louis Hust louis.h...@gmail.com wrote: Yes, ted, can you tell me what the following excpetion means in l-namenode1.log? 2015-05-15 12:16:40,522 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Finished processing of shutdown of l-hbase31.data.cn8.qunar.com,60020,1427789773001 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] client.HConnectionManager$HConnectionImplementation: Checking master connection Does this mean the cluster was not operational? 2015-05-18 11:45 GMT+08:00 Ted Yu yuzhih...@gmail.com: After l-namenode1 became active master , it assigned regions: 2015-05-15 12:16:40,432 INFO [master:l-namenode1:6] master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff state=OFFLINE, ts=1431663400432, server=null} to {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server= l-hbase31.data.cn8.qunar.com,60020,1431462584879} However, l-hbase31 went down: 2015-05-15 12:16:40,508 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Splitting logs for l-hbase31.data.cn8.qunar.com,60020,1427789773001 before assignment. l-namenode1 was restarted : 2015-05-15 12:20:25,322 INFO [main] util.VersionInfo: HBase 0.96.0-hadoop2 2015-05-15 12:20:25,323 INFO [main] util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434 However, it went down due to zookeeper session expiration: 2015-05-15 12:20:25,580 WARN [main] zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master It started again after that and AssignmentManager did a lot of assignments. Looks like the cluster was operational this time. Cheers On Sun, May 17, 2015 at 8:24 AM, Ted Yu yuzhih...@gmail.com wrote: bq. the backup master take over at 2015-05-15 12:16:40,024 ? The switch of active master should be earlier than 12:16:40,024 - shortly after 12:15:58 l-namenode1 would do some initialization (such as waiting for region servers count to settle) after it became active master. I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but the download progress was very slow. Will try downloading later in the day. Do you have access to pastebin ? Cheers On Sun, May 17, 2015 at 2:07 AM, Louis Hust louis.h...@gmail.com wrote: Hi, ted, Thanks for your reply!! I found the log in l-namenode2.dba.cn8 during the restarting progress: 2015-05-15 12:11:36,540 INFO [master:l-namenode2:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of 2147483647, master is running. So this means the HMaster ready for handle request at 12:11:36? The backup master is l-namenode1.dba.cn8 and you can get the log at : http://pan.baidu.com/s/1eQlKXj0 After the l-namenode2.dba.cn8 is stopped by me at 12:15:58, the backup master l-namenode1 take over, and i found log: 2015-05-15 12:16:40,024 INFO [master:l-namenode1:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of 2147483647, master is running. So the backup master take over at 2015-05-15 12:16:40,024 ? But it seems the l-namenode2 not working normally with the exception in log: 2015-05-15 12:16:40,522 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Finished processing of shutdown of l-hbase31.data.cn8.qunar.com,60020,1427789773001 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] client.HConnectionManager$HConnectionImplementation: Checking master connection com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667) at
Re: HBase Block locality always 0
Hi, Alex, May be the Block locality display wrong? cause I checked some region file and found some replica on the same machine! 2015-05-19 7:18 GMT+08:00 Alex Baranau alex.barano...@gmail.com: Sorry if I'm asking a silly question... Are you sure your RSs and Datanodes are all up and running? Are you sure they are collocated? Datanode on l-hbase[26-31].data.cn8 and regionserver on l-hbase[25-31].data.cn8, Could be that your only live RS is on l-hbase25.data.cn8, which would cause that behavior... Btw, why 25th is not collocated with datanode? Alex Baranau -- http://cdap.io - open source framework to build and run data applications on Hadoop HBase On Fri, May 15, 2015 at 8:12 PM, Louis Hust louis.h...@gmail.com wrote: Hi, Esteban, Hadoop Version 2.2.0, r1537062. So i do not know why it always write other datanode instead of local datanode, If there is some log for the hdfs write policy? And now the cluster is working not healthy, with heavy networking. 2015-05-15 1:28 GMT+08:00 Esteban Gutierrez este...@cloudera.com: Hi Louis, Locality 0 is not right for a cluster of that size and having 3 replicas per block unless all RS cannot connect to the local DN and somehow the local DN to the RS is always excluded from the pipeline. In Hadoop 2.0-alpha there was a bug (HDFS-3224) that caused the NN to report a DN as live and dead if the storage ID was changed in a single volume (e.g. after replacing one drive) and that caused fs.getFileBlockLocations() to report less blocks for calculating the HDFS locality index. Unless your cluster is using Hadoop 2.0-alpha I won't worry too much about that. Regarding the logs its odd that the JN is taking about 1.5 seconds just to send less than 200 bytes. Perhaps some IO contention issue is going on in your cluster? thanks, esteban. -- Cloudera, Inc. On Thu, May 14, 2015 at 5:48 AM, Louis Hust louis.h...@gmail.com wrote: Hi, Esteban Each region server has about 122 regions, data is large. HDFS replica is defined as default 3, and namenode have some WARN like below. {log} 2015-05-14 20:45:37,463 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1503ms to send a batch of 3 edits (179 bytes) to remote journal 192.168.44.29:8485 {/log} Regionserver's log seems normal: {log} 2015-05-14 20:46:59,890 INFO [Thread-15] regionserver.HRegion: Finished memstore flush of ~44.4 M/46586984, currentsize=0/0 for region qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee. in 3141ms, sequenceid=7493455453, compaction requested=true 2015-05-14 20:46:59,890 INFO [regionserver60020-smallCompactions-1431462564717] regionserver.HRegion: Starting compaction on m in region qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee. {/log} Any idea? 2015-05-13 1:26 GMT+08:00 Esteban Gutierrez este...@cloudera.com: Hi, How many regions you per RS? one possibility is that you have very little data in your cluster and regions have moved around and there are no blocks in the local DN to the RS. Another possibility is that you have one replica configured and regions moved too so that makes even harder to have some local blocks in the DN to the RS. Lastly it could be some other problem where the HDFS pipeline has excluded the local DN. Have you seen any exception in the RSs or the NameNode that might be interesting? thanks, esteban. -- Cloudera, Inc. On Tue, May 12, 2015 at 2:59 AM, 娄帅 louis.hust...@gmail.com wrote: Hi, all, I am maintaining an hbase 0.96.0 cluster, but from the web ui of HBase regionserver, i saw Block locality is 0 for all regionserver. Datanode on l-hbase[26-31].data.cn8 and regionserver on l-hbase[25-31].data.cn8, Any idea?
Re: HMaster restart with error
Yes, ted, can you tell me what the following excpetion means in l-namenode1.log? 2015-05-15 12:16:40,522 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Finished processing of shutdown of l-hbase31.data.cn8.qunar.com,60020,1427789773001 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] client.HConnectionManager$HConnectionImplementation: Checking master connection Does this mean the cluster was not operational? 2015-05-18 11:45 GMT+08:00 Ted Yu yuzhih...@gmail.com: After l-namenode1 became active master , it assigned regions: 2015-05-15 12:16:40,432 INFO [master:l-namenode1:6] master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff state=OFFLINE, ts=1431663400432, server=null} to {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server= l-hbase31.data.cn8.qunar.com,60020,1431462584879} However, l-hbase31 went down: 2015-05-15 12:16:40,508 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Splitting logs for l-hbase31.data.cn8.qunar.com,60020,1427789773001 before assignment. l-namenode1 was restarted : 2015-05-15 12:20:25,322 INFO [main] util.VersionInfo: HBase 0.96.0-hadoop2 2015-05-15 12:20:25,323 INFO [main] util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434 However, it went down due to zookeeper session expiration: 2015-05-15 12:20:25,580 WARN [main] zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master It started again after that and AssignmentManager did a lot of assignments. Looks like the cluster was operational this time. Cheers On Sun, May 17, 2015 at 8:24 AM, Ted Yu yuzhih...@gmail.com wrote: bq. the backup master take over at 2015-05-15 12:16:40,024 ? The switch of active master should be earlier than 12:16:40,024 - shortly after 12:15:58 l-namenode1 would do some initialization (such as waiting for region servers count to settle) after it became active master. I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but the download progress was very slow. Will try downloading later in the day. Do you have access to pastebin ? Cheers On Sun, May 17, 2015 at 2:07 AM, Louis Hust louis.h...@gmail.com wrote: Hi, ted, Thanks for your reply!! I found the log in l-namenode2.dba.cn8 during the restarting progress: 2015-05-15 12:11:36,540 INFO [master:l-namenode2:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of 2147483647, master is running. So this means the HMaster ready for handle request at 12:11:36? The backup master is l-namenode1.dba.cn8 and you can get the log at : http://pan.baidu.com/s/1eQlKXj0 After the l-namenode2.dba.cn8 is stopped by me at 12:15:58, the backup master l-namenode1 take over, and i found log: 2015-05-15 12:16:40,024 INFO [master:l-namenode1:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of 2147483647, master is running. So the backup master take over at 2015-05-15 12:16:40,024 ? But it seems the l-namenode2 not working normally with the exception in log: 2015-05-15 12:16:40,522 INFO [MASTER_SERVER_OPERATIONS-l-namenode1:6-0] handler.ServerShutdownHandler: Finished processing of shutdown of l-hbase31.data.cn8.qunar.com,60020,1427789773001 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] client.HConnectionManager$HConnectionImplementation: Checking master connection com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836) Is the exception means the HMaster not working normally or somewhat? 2015-05-17 11:06 GMT+08:00 Ted Yu yuzhih...@gmail.com: bq. the HMaster is handling two region server down, and not ready to handle client request? I didn't mean that - for a functioning master,
How to set Timeout for get/scan operations without impacting others
Hi, I need to set tight timeout for get/scan operations and I think HBase Client already support it. I found three related keys: - hbase.client.operation.timeout - hbase.rpc.timeout - hbase.client.retries.number What's the difference between hbase.client.operation.timeout and hbase.rpc.timeout? My understanding is that hbase.rpc.timeout has larger scope than hbase. client.operation.timeout, so setting hbase.client.operation.timeout is safer. Am I correct? And any other property keys I can uses? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Default hbase.ipc.server.callqueue.scan.ratio is 0, is this right?
When I start a new cluster, package is hbase-1.0.1-bin.tar.gz, error occurs: 2015-05-18 17:21:09,514 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2496) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2511) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2494) ... 5 more Caused by: java.lang.IllegalArgumentException: Queue size is = 0, must be at least 1 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) at org.apache.hadoop.hbase.ipc.RpcExecutor.getBalancer(RpcExecutor.java:177) at org.apache.hadoop.hbase.ipc.RWQueueRpcExecutor.init(RWQueueRpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RWQueueRpcExecutor.init(RWQueueRpcExecutor.java:95) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.init(SimpleRpcScheduler.java:134) at org.apache.hadoop.hbase.regionserver.SimpleRpcSchedulerFactory.create(SimpleRpcSchedulerFactory.java:46) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:792) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:492) I found the probably reason: in hbase-default.xml, hbase.ipc.server.callqueue.read.ratio default value is 0 hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler#init line 44.public static final String CALL_QUEUE_SCAN_SHARE_CONF_KEY = hbase.ipc.server.callqueue.scan.ratio; line 123. float callqScanShare = conf.getFloat(CALL_QUEUE_SCAN_SHARE_CONF_KEY, 0);// default is 0 line 134. callExecutor = new RWQueueRpcExecutor(RW.default, handlerCount, numCallQueues, callqReadShare, callqScanShare, maxQueueLength, conf, abortable, BoundedPriorityBlockingQueue.class, callPriority); hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RWQueueRpcExecutor#init line 116. int numScanQueues = Math.max(0, (int)Math.floor(numReadQueues * scanShare)); // numScanQueues is 0 line 133.this.scanBalancer = getBalancer(numScanQueues); hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor#getBalancer line 177.Preconditions.checkArgument(queueSize 0, Queue size is = 0, must be at least 1); the queueSize is 0, so, throw IllegalArgumentEception: Caused by: java.lang.IllegalArgumentException: Queue size is = 0, must be at least 1 my answer is: this config, hbase.ipc.server.callqueue.scan.ratio, can be '0'? or other reason cause this fault? Thanks