Re: HMaster restart with error

2015-05-18 Thread Ted Yu
The exception originated from Web UI corresponding
to HBaseAdmin.listTables().
At that moment, master was unable to process the request - it needed some
more time.

Cheers

On Sun, May 17, 2015 at 11:03 PM, Louis Hust louis.h...@gmail.com wrote:

 Yes, ted, can you tell me what the following excpetion means in
 l-namenode1.log?

 2015-05-15 12:16:40,522 INFO
  [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
 handler.ServerShutdownHandler: Finished processing of shutdown of
 l-hbase31.data.cn8.qunar.com,60020,1427789773001
 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
 client.HConnectionManager$HConnectionImplementation: Checking master
 connection

 Does this mean the cluster was not operational?


 2015-05-18 11:45 GMT+08:00 Ted Yu yuzhih...@gmail.com:

  After l-namenode1 became active master , it assigned regions:
 
  2015-05-15 12:16:40,432 INFO  [master:l-namenode1:6]
  master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff
  state=OFFLINE, ts=1431663400432, server=null} to
  {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server=
  l-hbase31.data.cn8.qunar.com,60020,1431462584879}
 
  However, l-hbase31 went down:
 
  2015-05-15 12:16:40,508 INFO
   [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
  handler.ServerShutdownHandler: Splitting logs for
  l-hbase31.data.cn8.qunar.com,60020,1427789773001   before assignment.
 
  l-namenode1 was restarted :
 
  2015-05-15 12:20:25,322 INFO  [main] util.VersionInfo: HBase
 0.96.0-hadoop2
  2015-05-15 12:20:25,323 INFO  [main] util.VersionInfo: Subversion
  https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434
 
  However, it went down due to zookeeper session expiration:
 
  2015-05-15 12:20:25,580 WARN  [main] zookeeper.ZooKeeperNodeTracker:
 Can't
  get or delete the master znode
  org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for /hbase/master
 
  It started again after that and AssignmentManager did a lot of
 assignments.
 
  Looks like the cluster was operational this time.
 
  Cheers
 
  On Sun, May 17, 2015 at 8:24 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   bq. the backup master take over at 2015-05-15 12:16:40,024 ?
  
   The switch of active master should be earlier than 12:16:40,024 -
 shortly
   after 12:15:58
  
   l-namenode1 would do some initialization (such as waiting for region
   servers count to settle) after it became active master.
  
   I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but
   the download progress was very slow.
  
   Will try downloading later in the day.
  
   Do you have access to pastebin ?
  
   Cheers
  
   On Sun, May 17, 2015 at 2:07 AM, Louis Hust louis.h...@gmail.com
  wrote:
  
   Hi, ted,
  
   Thanks for your reply!!
  
   I found the log in l-namenode2.dba.cn8 during the restarting progress:
   2015-05-15 12:11:36,540 INFO  [master:l-namenode2:6]
   master.ServerManager: Finished waiting for region servers count to
  settle;
   checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
   2147483647, master is running.
  
   So this means the HMaster ready for handle request at 12:11:36?
  
   The backup master is l-namenode1.dba.cn8 and you can get the log at :
  
   http://pan.baidu.com/s/1eQlKXj0
  
   After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
   the backup master l-namenode1 take over, and i found log:
  
   2015-05-15 12:16:40,024 INFO  [master:l-namenode1:6]
   master.ServerManager: Finished waiting for region servers count to
  settle;
   checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
   2147483647, master is running.
  
   So the backup master take over at 2015-05-15 12:16:40,024 ?
  
   But it seems the l-namenode2 not working normally with the exception
 in
   log:
  
   2015-05-15 12:16:40,522 INFO
[MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
   handler.ServerShutdownHandler: Finished processing of shutdown of
   l-hbase31.data.cn8.qunar.com,60020,1427789773001
   2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
   client.HConnectionManager$HConnectionImplementation: Checking master
   connection
   com.google.protobuf.ServiceException: java.net.ConnectException:
   Connection
   refused
   at
  
  
 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
   at
  
  
 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
   at
  
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
   at
  
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
   at
  
  
 
 

Re: Where can I find the document for specific version of hbase api ?

2015-05-18 Thread Ted Yu
Xiaobo:
Can you download the source tarball for the release you're using ?

You can find all the API information from the source code.

Cheers

On Mon, May 18, 2015 at 1:33 AM, guxiaobo1982 guxiaobo1...@qq.com wrote:

 Hi,


 http://hbase.apache.org/apidocs/  shows the latest version, but where I
 find the document for a specific version such as  0.98.5?


 Thanks,


Re: HBase failing to restart in single-user mode

2015-05-18 Thread Viral Bajaria
Same for me, I had faced similar issues especially on my virtual machines
since I would restart them more often than my host machine.

Moving ZK from /tmp which could get cleared on reboots fixed the issue for
me.

Thanks,
Viral


On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com wrote:

 I noticed similar ZK related issues but those went away after changing the
 ZK directory to a permanent directory along with the HBase root directory.
 Both point now to a location in my home folder and restarts work fine now.
 Not much help but wanted to at least state that.

 Lars

 Sent from my iPhone

  On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com wrote:
 
  Hi all,
  For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the
  latest version from branch-1.0 and use the following config:
 
  configuration
  property
   namehbase.rootdir/name
   valuefile:///tmp/hbase-${user.name}/value
  /property
  property
   namehbase.online.schema.update.enable/name
   valuetrue/value
  /property
  property
   namezookeeper.session.timeout/name
   value30/value
  /property
  property
   namehbase.zookeeper.property.tickTime/name
   value200/value
  /property
   property
 namehbase.zookeeper.dns.interface/name
 valuelo0/value
   /property
   property
 namehbase.regionserver.dns.interface/name
 valuelo0/value
   /property
   property
 namehbase.master.dns.interface/name
 valuelo0/value
   /property
  /configuration
 
  Since at least a month ago (perhaps longer, I don’t remember exactly)
  I can’t restart HBase.  The very first time it starts up fine, but
  subsequent startup attempts all fail with:
 
  2015-05-17 20:39:19,024 INFO  [RpcServer.responder] ipc.RpcServer:
  RpcServer.responder: starting
  2015-05-17 20:39:19,024 INFO  [RpcServer.listener,port=49809]
  ipc.RpcServer: RpcServer.listener,port=49809: starting
  2015-05-17 20:39:19,029 INFO  [main] http.HttpRequestLog: Http request
  log for http.requests.regionserver is not defined
  2015-05-17 20:39:19,030 INFO  [main] http.HttpServer: Added global
  filter 'safety'
  (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)
  2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
  static_user_filter
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
  to context regionserver
  2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
  static_user_filter
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
  to context static
  2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
  static_user_filter
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
  to context logs
  2015-05-17 20:39:19,033 INFO  [main] http.HttpServer: Jetty bound to
 port 49811
  2015-05-17 20:39:19,033 INFO  [main] mortbay.log: jetty-6.1.26
  2015-05-17 20:39:19,157 INFO  [main] mortbay.log: Started
  SelectChannelConnector@0.0.0.0:49811
  2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
  zookeeper.RecoverableZooKeeper: Process
  identifier=hconnection-0x4f708099 connecting to ZooKeeper
  ensemble=localhost:2181
  2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
  zookeeper.ZooKeeper: Initiating client connection,
  connectString=localhost:2181 sessionTimeout=1
  watcher=hconnection-0x4f7080990x0, quorum=localhost:2181,
  baseZNode=/hbase
  2015-05-17 20:39:19,223 INFO
  [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
  Opening socket connection to server localhost/127.0.0.1:2181. Will not
  attempt to authenticate using SASL (unknown error)
  2015-05-17 20:39:19,223 INFO
  [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
  Socket connection established to localhost/127.0.0.1:2181, initiating
  session
  2015-05-17 20:39:19,223 INFO
  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
  server.NIOServerCnxnFactory: Accepted socket connection from
  /127.0.0.1:49812
  2015-05-17 20:39:19,223 INFO
  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer:
  Client attempting to establish new session at /127.0.0.1:49812
  2015-05-17 20:39:19,224 INFO  [SyncThread:0] server.ZooKeeperServer:
  Established session 0x14d651aaec2 with negotiated timeout 400
  for client /127.0.0.1:49812
  2015-05-17 20:39:19,224 INFO
  [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
  Session establishment complete on server localhost/127.0.0.1:2181,
  sessionid = 0x14d651aaec2, negotiated timeout = 400
  2015-05-17 20:39:19,249 INFO  [M:0;localhost:49807]
  regionserver.HRegionServer: ClusterId :
  6ad7eddd-2886-4ff0-b377-a2ff42c8632f
  2015-05-17 20:39:49,208 ERROR [main] master.HMasterCommandLine: Master
 exiting
  java.lang.RuntimeException: Master not active after 30 seconds
 at
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:194)
 at
 

Where can I find the document for specific version of hbase api ?

2015-05-18 Thread guxiaobo1982
Hi,


http://hbase.apache.org/apidocs/  shows the latest version, but where I find 
the document for a specific version such as  0.98.5?


Thanks,

Re: How to set Timeout for get/scan operations without impacting others

2015-05-18 Thread Ted Yu
hbase.client.operation.timeout is used by HBaseAdmin operations, by
RegionReplicaFlushHandler
and by various HTable operations (including Get).

hbase.rpc.timeout is for the RPC layer to define how long HBase client
applications take for a remote call to time out. It uses pings to check
connections but will eventually throw a TimeoutException.

FYI

On Sun, May 17, 2015 at 11:11 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:

 Hi,

 I need to set tight timeout for get/scan operations and I think HBase
 Client already support it.

 I found three related keys:

 - hbase.client.operation.timeout
 - hbase.rpc.timeout
 - hbase.client.retries.number

 What's the difference between hbase.client.operation.timeout and
 hbase.rpc.timeout?
 My understanding is that hbase.rpc.timeout has larger scope than hbase.
 client.operation.timeout, so setting hbase.client.operation.timeout  is
 safer. Am I correct?

 And any other property keys I can uses?

 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/



Re: Re: How to know the root reason to cause RegionServer OOM?

2015-05-18 Thread Sean Busbey
On Mon, May 18, 2015 at 11:47 AM, Andrew Purtell apurt...@apache.org
wrote:

 You need to not overcommit memory on servers running JVMs for HDFS and
 HBase (and YARN, and containers, if colocating Hadoop MR). Sum the -Xmx
 parameter, the maximum heap size, for all JVMs that will be concurrently
 executing on the server. The total should be less than the total amount of
 RAM available on the server. Additionally you will want to reserve ~1GB for
 the OS. Finally, set vm.swappiness=0 in /etc/sysctl.conf to prevent
 unnecessary paging.


On 3.5+ kernels you have to set vm.swappiness=1 if you still want to page
to avoid OOM.

-- 
Sean


Re: Where can I find the document for specific version of hbase api ?

2015-05-18 Thread Sean Busbey
Thanks for pinging us on this. There's currently an open jira for properly
providing access to 0.98, 1.0, and 1.1 specific javadocs[1].

Unfortunately, no one has had the time to take care of things yet. You can
follow that ticket if you'd like to know when there's movement.

For now, your only option is to use the source tarball and build the site
for that version, as Ted mentioned.

The command for that is

  $ mvn -DskipTests clean package site

[1]: https://issues.apache.org/jira/browse/HBASE-13140

On Mon, May 18, 2015 at 3:33 AM, guxiaobo1982 guxiaobo1...@qq.com wrote:

 Hi,


 http://hbase.apache.org/apidocs/  shows the latest version, but where I
 find the document for a specific version such as  0.98.5?


 Thanks,




-- 
Sean


Re: Re: How to know the root reason to cause RegionServer OOM?

2015-05-18 Thread Andrew Purtell
You need to not overcommit memory on servers running JVMs for HDFS and
HBase (and YARN, and containers, if colocating Hadoop MR). Sum the -Xmx
parameter, the maximum heap size, for all JVMs that will be concurrently
executing on the server. The total should be less than the total amount of
RAM available on the server. Additionally you will want to reserve ~1GB for
the OS. Finally, set vm.swappiness=0 in /etc/sysctl.conf to prevent
unnecessary paging.


On Sun, May 17, 2015 at 8:08 PM, David chen c77...@163.com wrote:

 The snippet in /var/log/messages is as follows, i am sure that process
 killed(22827) is RegsionServer.
 ..
 May 14 12:00:38 localhost kernel: Mem-Info:
 May 14 12:00:38 localhost kernel: Node 0 DMA per-cpu:
 May 14 12:00:38 localhost kernel: CPU0: hi:0, btch:   1 usd:   0
 ..
 May 14 12:00:38 localhost kernel: CPU   39: hi:0, btch:   1 usd:   0
 May 14 12:00:38 localhost kernel: Node 0 DMA32 per-cpu:
 May 14 12:00:38 localhost kernel: CPU0: hi:  186, btch:  31 usd:  30
 ..
 May 14 12:00:38 localhost kernel: CPU   39: hi:  186, btch:  31 usd:   8
 May 14 12:00:38 localhost kernel: Node 0 Normal per-cpu:
 May 14 12:00:38 localhost kernel: CPU0: hi:  186, btch:  31 usd:   5
 ..
 May 14 12:00:38 localhost kernel: CPU   39: hi:  186, btch:  31 usd:  20
 May 14 12:00:38 localhost kernel: Node 1 Normal per-cpu:
 May 14 12:00:38 localhost kernel: CPU0: hi:  186, btch:  31 usd:   7
 ..
 May 14 12:00:38 localhost kernel: CPU   39: hi:  186, btch:  31 usd:  10
 May 14 12:00:38 localhost kernel: active_anon:7993118 inactive_anon:48001
 isolated_anon:0
 May 14 12:00:38 localhost kernel: active_file:855 inactive_file:960
 isolated_file:0
 May 14 12:00:38 localhost kernel: unevictable:0 dirty:0 writeback:0
 unstable:0
 May 14 12:00:38 localhost kernel: free:39239 slab_reclaimable:14043
 slab_unreclaimable:27993
 May 14 12:00:38 localhost kernel: mapped:48750 shmem:75053
 pagetables:20540 bounce:0
 May 14 12:00:38 localhost kernel: Node 0 DMA free:15732kB min:40kB
 low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB
 inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
 present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
 slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB
 unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
 all_unreclaimable? yes
 May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 3211 16088 16088
 May 14 12:00:38 localhost kernel: Node 0 DMA32 free:60388kB min:8968kB
 low:11208kB high:13452kB active_anon:2811676kB inactive_anon:72kB
 active_file:0kB inactive_file:788kB unevictable:0kB isolated(anon):0kB
 isolated(file):0kB present:3288224kB mlocked:0kB dirty:0kB writeback:44kB
 mapped:156kB shmem:8232kB slab_reclaimable:10652kB
 slab_unreclaimable:5144kB kernel_stack:56kB pagetables:4252kB unstable:0kB
 bounce:0kB writeback_tmp:0kB pages_scanned:1312 all_unreclaimable? yes
 May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 12877 12877
 May 14 12:00:38 localhost kernel: Node 0 Normal free:35772kB min:35964kB
 low:44952kB high:53944kB active_anon:13062472kB inactive_anon:4864kB
 active_file:1268kB inactive_file:1504kB unevictable:0kB isolated(anon):0kB
 isolated(file):0kB present:13186560kB mlocked:0kB dirty:0kB writeback:92kB
 mapped:6172kB shmem:51928kB slab_reclaimable:22732kB
 slab_unreclaimable:73204kB kernel_stack:16240kB pagetables:38040kB
 unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:10268
 all_unreclaimable? yes
 May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0
 May 14 12:00:38 localhost kernel: Node 1 Normal free:45064kB min:45132kB
 low:56412kB high:67696kB active_anon:16098324kB inactive_anon:187068kB
 active_file:2192kB inactive_file:1548kB unevictable:0kB isolated(anon):0kB
 isolated(file):0kB present:16547840kB mlocked:0kB dirty:116kB writeback:0kB
 mapped:188672kB shmem:240052kB slab_reclaimable:22788kB
 slab_unreclaimable:33624kB kernel_stack:7352kB pagetables:39868kB
 unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:12064
 all_unreclaimable? yes
 May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0
 May 14 12:00:38 localhost kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB
 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15732kB
 May 14 12:00:38 localhost kernel: Node 0 DMA32: 659*4kB 576*8kB 485*16kB
 338*32kB 208*64kB 106*128kB 27*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB =
 60636kB
 May 14 12:00:38 localhost kernel: Node 0 Normal: 1166*4kB 579*8kB 337*16kB
 203*32kB 106*64kB 61*128kB 3*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB =
 37568kB
 May 14 12:00:38 localhost kernel: Node 1 Normal: 668*4kB 405*8kB 422*16kB
 259*32kB 176*64kB 67*128kB 7*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB =
 43608kB
 May 14 12:00:38 localhost kernel: 78257 total pagecache pages
 May 14 12:00:38 localhost kernel: 0 pages in swap cache
 May 14 12:00:38 localhost kernel: Swap cache stats: add 0, delete 0, find
 0/0

Re: Where can I find the document for specific version of hbase api ?

2015-05-18 Thread Nick Dimiduk
You don't need to build from the src tgz, the bin tgz contains a docs
directory, wherein you'll find both public-facing (@Public annotated
classes) and full javadocs in apidocs and devapidocs respectively. The
whole site and book are there too, but our release policy is to copy site
and book from master. The javadocs are generated with this build of this
release though.

-n

On Mon, May 18, 2015 at 10:47 AM, Sean Busbey bus...@cloudera.com wrote:

 Thanks for pinging us on this. There's currently an open jira for properly
 providing access to 0.98, 1.0, and 1.1 specific javadocs[1].

 Unfortunately, no one has had the time to take care of things yet. You can
 follow that ticket if you'd like to know when there's movement.

 For now, your only option is to use the source tarball and build the site
 for that version, as Ted mentioned.

 The command for that is

   $ mvn -DskipTests clean package site

 [1]: https://issues.apache.org/jira/browse/HBASE-13140

 On Mon, May 18, 2015 at 3:33 AM, guxiaobo1982 guxiaobo1...@qq.com wrote:

  Hi,
 
 
  http://hbase.apache.org/apidocs/  shows the latest version, but where I
  find the document for a specific version such as  0.98.5?
 
 
  Thanks,




 --
 Sean



Load data into hbase

2015-05-18 Thread Omer, Farah
How should I go about creating and loading a bunch of lookup tables on HBASE? 
These are the typical RDBMS kind of data - where the data is row-oriented. All 
the data is coming from a flat file that's again row-oriented.
How best can I load this data into HBASE? I first created the table in Hive, 
mapped to the HBase table:



CREATE TABLE CITY_CTR_SLS (
id string,
CUST_CITY_ID INT,
CALL_CTR_ID INT,
TOT_DOLLAR_SALES FLOAT,
TOT_UNIT_SALES FLOAT,
TOT_COST FLOAT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  hbase.columns.mapping =
  
:key,ints:CUST_CITY_ID,ints:CALL_CTR_ID,floats:TOT_DOLLAR_SALES,floats:TOT_UNIT_SALES,floats:TOT_COST
  )
TBLPROPERTIES(hbase.table.name = hbase_CITY_CTR_SLS1);



When I run the following command to load data into the hive table, I get an 
error about mismatched columns(because of the additional ID column for hbase 
that's needed:



[ash-r101-14l.mstrprime.com:21000]  INSERT INTO CITY_CTR_SLS select * from 
wh2.CITY_CTR_SLS; ...(wh2.city_ctr_sls already 
exists)

Query: insert INTO CITY_CTR_SLS select * from wh2.CITY_CTR_SLS

ERROR: AnalysisException: Target table 'hbase_temp.city_ctr_sls' has more 
columns (6) than the SELECT / VALUES clause returns (5)

[ash-r101-14l.mstrprime.com:21000] 

Any pointers? Thanks.
Farah




Re: Load data into hbase

2015-05-18 Thread Shahab Yunus
Lot of options depending upon your specifics of the usecase:

In addition to Hive...

You can use Sqoop
http://www.dummies.com/how-to/content/importing-data-into-hbase-with-sqoop.html

You can use Pig
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.7/bk_user-guide/content/user-guide-hbase-import-2.html

If the data is delimiter separated then importTsv
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/

Regards,
Shahab

On Mon, May 18, 2015 at 3:33 PM, Omer, Farah fo...@microstrategy.com
wrote:

 How should I go about creating and loading a bunch of lookup tables on
 HBASE? These are the typical RDBMS kind of data - where the data is
 row-oriented. All the data is coming from a flat file that's again
 row-oriented.
 How best can I load this data into HBASE? I first created the table in
 Hive, mapped to the HBase table:



 CREATE TABLE CITY_CTR_SLS (
 id string,
 CUST_CITY_ID INT,
 CALL_CTR_ID INT,
 TOT_DOLLAR_SALES FLOAT,
 TOT_UNIT_SALES FLOAT,
 TOT_COST FLOAT)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (
   hbase.columns.mapping =

 :key,ints:CUST_CITY_ID,ints:CALL_CTR_ID,floats:TOT_DOLLAR_SALES,floats:TOT_UNIT_SALES,floats:TOT_COST
   )
 TBLPROPERTIES(hbase.table.name = hbase_CITY_CTR_SLS1);



 When I run the following command to load data into the hive table, I get
 an error about mismatched columns(because of the additional ID column for
 hbase that's needed:



 [ash-r101-14l.mstrprime.com:21000]  INSERT INTO CITY_CTR_SLS select *
 from wh2.CITY_CTR_SLS; ...(wh2.city_ctr_sls
 already exists)

 Query: insert INTO CITY_CTR_SLS select * from wh2.CITY_CTR_SLS

 ERROR: AnalysisException: Target table 'hbase_temp.city_ctr_sls' has more
 columns (6) than the SELECT / VALUES clause returns (5)

 [ash-r101-14l.mstrprime.com:21000] 

 Any pointers? Thanks.
 Farah





Re: How to set Timeout for get/scan operations without impacting others

2015-05-18 Thread Ted Yu
bq. Caused by: java.io.IOException: Invalid HFile block magic:
\x00\x00\x00\x00\x00\x00\x00\x00

Looks like you have some corrupted HFile(s) in your cluster - which should
be fixed first.

Which hbase release are you using ?
Do you use data block encoding ?

You can use http://hbase.apache.org/book.html#_hfile_tool to do some
investigation.

Cheers

On Mon, May 18, 2015 at 6:19 PM, Fang, Mike chuf...@paypal.com wrote:

  Hi Ted,



 Thanks for your information.

 My application queries the HBase, and for some of the queries it just hang
 there and throw exception after several minutes (5-8minutes). As a
 workaround, I try to set the timeout to a shorter time, so my app won’t
 hang for minutes but for several seconds.  I tried to set both the time out
 to 1000 (1s). but it still hang for several minutes.Is this expected?



 Appreciate it if you know how I could fix the exception.



 Caused by:
 org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
 java.io.IOException: Could not seek StoreFileScanner[HFileScanner for
 reader
 reader=hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/125acceb75d84724a089701c590a4d3d,
 compression=snappy, cacheConf=CacheConfig:enabled [cacheDataOnRead=true]
 [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
 [cacheBloomsOnWrite=false] [cacheEvictOnClose=false]
 [cacheCompressed=false],
 firstKey=addrv#34005240#US,_28409,_822|addre/F|rval#null|cust#1158923121468951849|addre#1095283883|1/EOUT_T_SRD:~/143098200/Put,
 lastKey=addrv#38035AC7#US,_60449,_4684|addre/F|rval#null|cust#1335211720509289817|addre#697997140|1/EOUT_T_SRD:~/143098200/Put,
 avgKeyLen=122, avgValueLen=187, entries=105492830, length=6880313695,
 cur=null] to key addrv#34B97AEC#FR,_06110,_41 route des
 breguieres|addre/F|rval#/EOUT_T_SRD:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0

 at
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:165)

 at
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)

 at
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:176)

 at
 org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1847)

 at
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3716)

 at
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1890)

 at
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1876)

 at
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1853)

 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3090)

 at
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28861)

 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)

 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)

 at
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)

 at
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)

 at
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)

 at java.lang.Thread.run(Thread.java:724)

 Caused by: java.io.IOException: Failed to read compressed block at
 1253175503, onDiskSizeWithoutHeader=66428, preReadHeaderSize=33,
 header.length=33, header bytes:
 DATABLKE\x00\x003\x00\x00\xC3\xC9\x00\x00\x00\x01r\xC4-\xDF\x01\x00\x00@
 \x00\x00\x00P

 at
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1451)

 at
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)

 at
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)

 at
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)

 at
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:494)

 at
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:515)

 at
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:238)

 at
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:153)

 ... 15 more

 Caused by: java.io.IOException: Invalid HFile block magic:
 \x00\x00\x00\x00\x00\x00\x00\x00

 at
 org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)

 at
 org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:165)

 at
 org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:239)

 at
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1448



 Thanks,

 Mike

 

(info) How can i load data from CDH4.3.0 to CDH5.4.0 in Hbase

2015-05-18 Thread dong.yajun
hello list,

is there a way to load the existing data(HFiles) from CDH4.3.0 to CDH5.4.0?

we use the complete bulkload utility which reference the link:
http://hbase.apache.org/0.94/book/ops_mgt.html#completebulkload

the command: hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
/tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/ IM_ItemPrice

the region of *096518a1aa5823c4aec9477d7b1b63cf *which contains several
family names: BaseInfo / of / ol / q4s etc.

but it seems not work, see the output after command typed below:

15/05/19 09:57:59 INFO mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://nameservice1/tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/
*BaseInfo*/88fc1240fa8f4e31aa27469b7bd66750
first=e04116151155a5fae8dfa37281df89304c5e62219c31a761024cfac80f8e204c
last=baa9f23ae8fd718aa888f814665e19d04fcdee09de9926c8690db00af905
15/05/19 09:57:59 INFO mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://nameservice1/tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/
*ol*/2a886066311343f98737ad2e4e804260
first=e045a94bb684ce8bbb5cf34b9e0dd939c03946bd445711204b30c17d72b55874
last=7f5766b39d218e772dc357d9588b7a7363c03b3b7a07be7bcbb41dc267b3
15/05/19 09:57:59 INFO mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://nameservice1/tmp/IM_ItemPrice/096518a1aa5823c4aec9477d7b1b63cf/
*of*/c01e895d483b4c86beb4eeae503e8fa9
first=e046037883152a81168507bf9faefc8ba716f15a6a028b81cbf83e2894896ec3
last=fff93051701920c2848f091369e484b685f42ed7a1378ffcb4e9dddf8bcd7ef7

15/05/19 10:09:17 INFO client.RpcRetryingCaller: Call exception, tries=10,
retries=35, started=677650 ms ago, cancelled=false, msg=row '' on table
'IM_ItemPrice' at
region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade.,
hostname=ssspark03,60020,1431768671791, seqNum=2
15/05/19 10:10:32 INFO client.RpcRetryingCaller: Call exception, tries=11,
retries=35, started=752931 ms ago, cancelled=false, msg=row '' on table
'IM_ItemPrice' at
region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade.,
hostname=ssspark03,60020,1431768671791, seqNum=2
15/05/19 10:11:48 INFO client.RpcRetryingCaller: Call exception, tries=12,
retries=35, started=828151 ms ago, cancelled=false, msg=row '' on table
'IM_ItemPrice' at
region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade.,
hostname=ssspark03,60020,1431768671791, seqNum=2
15/05/19 10:13:03 INFO client.RpcRetryingCaller: Call exception, tries=13,
retries=35, started=903409 ms ago, cancelled=false, msg=row '' on table
'IM_ItemPrice' at
region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade.,
hostname=ssspark03,60020,1431768671791, seqNum=2
15/05/19 10:14:18 INFO client.RpcRetryingCaller: Call exception, tries=14,
retries=35, started=978634 ms ago, cancelled=false, msg=row '' on table
'IM_ItemPrice' at
region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade.,
hostname=ssspark03,60020,1431768671791, seqNum=2
15/05/19 10:15:33 INFO client.RpcRetryingCaller: Call exception, tries=15,
retries=35, started=1054003 ms ago, cancelled=false, msg=row '' on table
'IM_ItemPrice' at
region=IM_ItemPrice,,1432000677144.36c13c3160de2e67e4fdb1d77c3c9ade.,
hostname=ssspark03,60020,1431768671791, seqNum=2
..


any suggestion?

-- 
*Ric Dong*


Re: HBase failing to restart in single-user mode

2015-05-18 Thread anil gupta
Hi Benoit,
I think you need to move the directory out of /tmp and give it a shot.
/tmp/hbase-${user.name}
/zk will get cleaned up during restart.


~Anil

On Mon, May 18, 2015 at 9:45 PM, tsuna tsuna...@gmail.com wrote:

 I added this to hbase-site.xml:

 property
   namehbase.zookeeper.property.dataDir/name
   value/tmp/hbase-${user.name}/zk/value
 /property

 Didn’t change anything.  Once I kill/shutdown HBase, it won’t come back up.

 On Mon, May 18, 2015 at 1:14 AM, Viral Bajaria viral.baja...@gmail.com
 wrote:
  Same for me, I had faced similar issues especially on my virtual machines
  since I would restart them more often than my host machine.
 
  Moving ZK from /tmp which could get cleared on reboots fixed the issue
 for
  me.
 
  Thanks,
  Viral
 
 
  On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com
 wrote:
 
  I noticed similar ZK related issues but those went away after changing
 the
  ZK directory to a permanent directory along with the HBase root
 directory.
  Both point now to a location in my home folder and restarts work fine
 now.
  Not much help but wanted to at least state that.
 
  Lars
 
  Sent from my iPhone
 
   On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com wrote:
  
   Hi all,
   For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the
   latest version from branch-1.0 and use the following config:
  
   configuration
   property
namehbase.rootdir/name
valuefile:///tmp/hbase-${user.name}/value
   /property
   property
namehbase.online.schema.update.enable/name
valuetrue/value
   /property
   property
namezookeeper.session.timeout/name
value30/value
   /property
   property
namehbase.zookeeper.property.tickTime/name
value200/value
   /property
property
  namehbase.zookeeper.dns.interface/name
  valuelo0/value
/property
property
  namehbase.regionserver.dns.interface/name
  valuelo0/value
/property
property
  namehbase.master.dns.interface/name
  valuelo0/value
/property
   /configuration
  
   Since at least a month ago (perhaps longer, I don’t remember exactly)
   I can’t restart HBase.  The very first time it starts up fine, but
   subsequent startup attempts all fail with:
  
   2015-05-17 20:39:19,024 INFO  [RpcServer.responder] ipc.RpcServer:
   RpcServer.responder: starting
   2015-05-17 20:39:19,024 INFO  [RpcServer.listener,port=49809]
   ipc.RpcServer: RpcServer.listener,port=49809: starting
   2015-05-17 20:39:19,029 INFO  [main] http.HttpRequestLog: Http request
   log for http.requests.regionserver is not defined
   2015-05-17 20:39:19,030 INFO  [main] http.HttpServer: Added global
   filter 'safety'
   (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)
   2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
   static_user_filter
  
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
   to context regionserver
   2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
   static_user_filter
  
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
   to context static
   2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
   static_user_filter
  
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
   to context logs
   2015-05-17 20:39:19,033 INFO  [main] http.HttpServer: Jetty bound to
  port 49811
   2015-05-17 20:39:19,033 INFO  [main] mortbay.log: jetty-6.1.26
   2015-05-17 20:39:19,157 INFO  [main] mortbay.log: Started
   SelectChannelConnector@0.0.0.0:49811
   2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
   zookeeper.RecoverableZooKeeper: Process
   identifier=hconnection-0x4f708099 connecting to ZooKeeper
   ensemble=localhost:2181
   2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
   zookeeper.ZooKeeper: Initiating client connection,
   connectString=localhost:2181 sessionTimeout=1
   watcher=hconnection-0x4f7080990x0, quorum=localhost:2181,
   baseZNode=/hbase
   2015-05-17 20:39:19,223 INFO
   [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
   Opening socket connection to server localhost/127.0.0.1:2181. Will
 not
   attempt to authenticate using SASL (unknown error)
   2015-05-17 20:39:19,223 INFO
   [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
   Socket connection established to localhost/127.0.0.1:2181, initiating
   session
   2015-05-17 20:39:19,223 INFO
   [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
   server.NIOServerCnxnFactory: Accepted socket connection from
   /127.0.0.1:49812
   2015-05-17 20:39:19,223 INFO
   [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer:
   Client attempting to establish new session at /127.0.0.1:49812
   2015-05-17 20:39:19,224 INFO  [SyncThread:0] server.ZooKeeperServer:
   Established session 0x14d651aaec2 with negotiated timeout 400
   for client /127.0.0.1:49812
   2015-05-17 

Re: HBase failing to restart in single-user mode

2015-05-18 Thread tsuna
I added this to hbase-site.xml:

property
  namehbase.zookeeper.property.dataDir/name
  value/tmp/hbase-${user.name}/zk/value
/property

Didn’t change anything.  Once I kill/shutdown HBase, it won’t come back up.

On Mon, May 18, 2015 at 1:14 AM, Viral Bajaria viral.baja...@gmail.com wrote:
 Same for me, I had faced similar issues especially on my virtual machines
 since I would restart them more often than my host machine.

 Moving ZK from /tmp which could get cleared on reboots fixed the issue for
 me.

 Thanks,
 Viral


 On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com wrote:

 I noticed similar ZK related issues but those went away after changing the
 ZK directory to a permanent directory along with the HBase root directory.
 Both point now to a location in my home folder and restarts work fine now.
 Not much help but wanted to at least state that.

 Lars

 Sent from my iPhone

  On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com wrote:
 
  Hi all,
  For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the
  latest version from branch-1.0 and use the following config:
 
  configuration
  property
   namehbase.rootdir/name
   valuefile:///tmp/hbase-${user.name}/value
  /property
  property
   namehbase.online.schema.update.enable/name
   valuetrue/value
  /property
  property
   namezookeeper.session.timeout/name
   value30/value
  /property
  property
   namehbase.zookeeper.property.tickTime/name
   value200/value
  /property
   property
 namehbase.zookeeper.dns.interface/name
 valuelo0/value
   /property
   property
 namehbase.regionserver.dns.interface/name
 valuelo0/value
   /property
   property
 namehbase.master.dns.interface/name
 valuelo0/value
   /property
  /configuration
 
  Since at least a month ago (perhaps longer, I don’t remember exactly)
  I can’t restart HBase.  The very first time it starts up fine, but
  subsequent startup attempts all fail with:
 
  2015-05-17 20:39:19,024 INFO  [RpcServer.responder] ipc.RpcServer:
  RpcServer.responder: starting
  2015-05-17 20:39:19,024 INFO  [RpcServer.listener,port=49809]
  ipc.RpcServer: RpcServer.listener,port=49809: starting
  2015-05-17 20:39:19,029 INFO  [main] http.HttpRequestLog: Http request
  log for http.requests.regionserver is not defined
  2015-05-17 20:39:19,030 INFO  [main] http.HttpServer: Added global
  filter 'safety'
  (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)
  2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
  static_user_filter
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
  to context regionserver
  2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
  static_user_filter
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
  to context static
  2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
  static_user_filter
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
  to context logs
  2015-05-17 20:39:19,033 INFO  [main] http.HttpServer: Jetty bound to
 port 49811
  2015-05-17 20:39:19,033 INFO  [main] mortbay.log: jetty-6.1.26
  2015-05-17 20:39:19,157 INFO  [main] mortbay.log: Started
  SelectChannelConnector@0.0.0.0:49811
  2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
  zookeeper.RecoverableZooKeeper: Process
  identifier=hconnection-0x4f708099 connecting to ZooKeeper
  ensemble=localhost:2181
  2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
  zookeeper.ZooKeeper: Initiating client connection,
  connectString=localhost:2181 sessionTimeout=1
  watcher=hconnection-0x4f7080990x0, quorum=localhost:2181,
  baseZNode=/hbase
  2015-05-17 20:39:19,223 INFO
  [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
  Opening socket connection to server localhost/127.0.0.1:2181. Will not
  attempt to authenticate using SASL (unknown error)
  2015-05-17 20:39:19,223 INFO
  [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
  Socket connection established to localhost/127.0.0.1:2181, initiating
  session
  2015-05-17 20:39:19,223 INFO
  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
  server.NIOServerCnxnFactory: Accepted socket connection from
  /127.0.0.1:49812
  2015-05-17 20:39:19,223 INFO
  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer:
  Client attempting to establish new session at /127.0.0.1:49812
  2015-05-17 20:39:19,224 INFO  [SyncThread:0] server.ZooKeeperServer:
  Established session 0x14d651aaec2 with negotiated timeout 400
  for client /127.0.0.1:49812
  2015-05-17 20:39:19,224 INFO
  [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
  Session establishment complete on server localhost/127.0.0.1:2181,
  sessionid = 0x14d651aaec2, negotiated timeout = 400
  2015-05-17 20:39:19,249 INFO  [M:0;localhost:49807]
  regionserver.HRegionServer: ClusterId :
  6ad7eddd-2886-4ff0-b377-a2ff42c8632f
 

Re: HBase failing to restart in single-user mode

2015-05-18 Thread Nick Dimiduk
Wait. Benoit, you mean restart the laptop or stop/start HBase? I agree that
contents of /tmp are not stable across system reboot, across stop/start of
HBase process there should be no problems. Should.

For what it's worth, on the Mac and local mode testing, I usually use
$HBASE_HOME/data. This is usually not on /tmp.

On Monday, May 18, 2015, anil gupta anilgupt...@gmail.com wrote:

 Hi Benoit,
 I think you need to move the directory out of /tmp and give it a shot.
 /tmp/hbase-${user.name}
 /zk will get cleaned up during restart.


 ~Anil

 On Mon, May 18, 2015 at 9:45 PM, tsuna tsuna...@gmail.com javascript:;
 wrote:

  I added this to hbase-site.xml:
 
  property
namehbase.zookeeper.property.dataDir/name
value/tmp/hbase-${user.name}/zk/value
  /property
 
  Didn’t change anything.  Once I kill/shutdown HBase, it won’t come back
 up.
 
  On Mon, May 18, 2015 at 1:14 AM, Viral Bajaria viral.baja...@gmail.com
 javascript:;
  wrote:
   Same for me, I had faced similar issues especially on my virtual
 machines
   since I would restart them more often than my host machine.
  
   Moving ZK from /tmp which could get cleared on reboots fixed the issue
  for
   me.
  
   Thanks,
   Viral
  
  
   On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com
 javascript:;
  wrote:
  
   I noticed similar ZK related issues but those went away after changing
  the
   ZK directory to a permanent directory along with the HBase root
  directory.
   Both point now to a location in my home folder and restarts work fine
  now.
   Not much help but wanted to at least state that.
  
   Lars
  
   Sent from my iPhone
  
On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com javascript:;
 wrote:
   
Hi all,
For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the
latest version from branch-1.0 and use the following config:
   
configuration
property
 namehbase.rootdir/name
 valuefile:///tmp/hbase-${user.name}/value
/property
property
 namehbase.online.schema.update.enable/name
 valuetrue/value
/property
property
 namezookeeper.session.timeout/name
 value30/value
/property
property
 namehbase.zookeeper.property.tickTime/name
 value200/value
/property
 property
   namehbase.zookeeper.dns.interface/name
   valuelo0/value
 /property
 property
   namehbase.regionserver.dns.interface/name
   valuelo0/value
 /property
 property
   namehbase.master.dns.interface/name
   valuelo0/value
 /property
/configuration
   
Since at least a month ago (perhaps longer, I don’t remember
 exactly)
I can’t restart HBase.  The very first time it starts up fine, but
subsequent startup attempts all fail with:
   
2015-05-17 20:39:19,024 INFO  [RpcServer.responder] ipc.RpcServer:
RpcServer.responder: starting
2015-05-17 20:39:19,024 INFO  [RpcServer.listener,port=49809]
ipc.RpcServer: RpcServer.listener,port=49809: starting
2015-05-17 20:39:19,029 INFO  [main] http.HttpRequestLog: Http
 request
log for http.requests.regionserver is not defined
2015-05-17 20:39:19,030 INFO  [main] http.HttpServer: Added global
filter 'safety'
(class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)
2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
static_user_filter
   
  
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
to context regionserver
2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
static_user_filter
   
  
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
to context static
2015-05-17 20:39:19,031 INFO  [main] http.HttpServer: Added filter
static_user_filter
   
  
 
 (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs
2015-05-17 20:39:19,033 INFO  [main] http.HttpServer: Jetty bound to
   port 49811
2015-05-17 20:39:19,033 INFO  [main] mortbay.log: jetty-6.1.26
2015-05-17 20:39:19,157 INFO  [main] mortbay.log: Started
SelectChannelConnector@0.0.0.0:49811
2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x4f708099 connecting to ZooKeeper
ensemble=localhost:2181
2015-05-17 20:39:19,222 INFO  [M:0;localhost:49807]
zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=1
watcher=hconnection-0x4f7080990x0, quorum=localhost:2181,
baseZNode=/hbase
2015-05-17 20:39:19,223 INFO
[M:0;localhost:49807-SendThread(localhost:2181)]
 zookeeper.ClientCnxn:
Opening socket connection to server localhost/127.0.0.1:2181. Will
  not
attempt to authenticate using SASL (unknown error)
2015-05-17 20:39:19,223 INFO
[M:0;localhost:49807-SendThread(localhost:2181)]
 zookeeper.ClientCnxn:
Socket connection 

Re: HBase Block locality always 0

2015-05-18 Thread Alex Baranau
Sorry if I'm asking a silly question... Are you sure your RSs and Datanodes
are all up and running? Are you sure they are collocated?

 Datanode on l-hbase[26-31].data.cn8 and regionserver on
 l-hbase[25-31].data.cn8,

Could be that your only live RS is on l-hbase25.data.cn8, which would cause
that behavior... Btw, why 25th is not collocated with datanode?

Alex Baranau
--
http://cdap.io - open source framework to build and run data applications
on Hadoop  HBase

On Fri, May 15, 2015 at 8:12 PM, Louis Hust louis.h...@gmail.com wrote:

 Hi, Esteban,

 Hadoop Version 2.2.0, r1537062.
 So i do not know why it always write other datanode instead of local
 datanode,
 If there is some log for the hdfs write policy? And now the cluster is
 working not healthy,
 with heavy networking.

 2015-05-15 1:28 GMT+08:00 Esteban Gutierrez este...@cloudera.com:

  Hi Louis,
 
  Locality 0 is not right for a cluster of that size and having 3 replicas
  per block unless all RS cannot connect to the local DN and somehow the
  local DN to the RS is always excluded from the pipeline. In Hadoop
  2.0-alpha there was a bug (HDFS-3224) that caused the NN to report a DN
 as
  live and dead if the storage ID was changed in a single volume (e.g.
 after
  replacing one drive) and that caused fs.getFileBlockLocations() to report
  less blocks for calculating the HDFS locality index. Unless your cluster
 is
  using Hadoop 2.0-alpha I won't worry too much about that.
 
  Regarding the logs its odd that the JN is taking about 1.5 seconds just
 to
  send less than 200 bytes. Perhaps some IO contention issue is going on in
  your cluster?
 
  thanks,
  esteban.
 
  --
  Cloudera, Inc.
 
 
  On Thu, May 14, 2015 at 5:48 AM, Louis Hust louis.h...@gmail.com
 wrote:
 
   Hi, Esteban
  
   Each region server has about 122 regions, data is large. HDFS replica
 is
   defined as default 3,  and namenode have some WARN like below.
  
   {log}
   2015-05-14 20:45:37,463 WARN
   org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took
 1503ms
  to
   send a batch of 3 edits (179 bytes) to remote journal
 192.168.44.29:8485
   {/log}
  
   Regionserver's log seems normal:
  
   {log}
   2015-05-14 20:46:59,890 INFO  [Thread-15] regionserver.HRegion:
 Finished
   memstore flush of ~44.4 M/46586984, currentsize=0/0 for region
  
  
 
 qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee.
   in 3141ms, sequenceid=7493455453, compaction requested=true
   2015-05-14 20:46:59,890 INFO
[regionserver60020-smallCompactions-1431462564717]
 regionserver.HRegion:
   Starting compaction on m in region
  
  
 
 qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee.
   {/log}
  
   Any idea?
  
  
  
   2015-05-13 1:26 GMT+08:00 Esteban Gutierrez este...@cloudera.com:
  
Hi,
   
How many regions you per RS? one possibility is that you have very
  little
data in your cluster and regions have moved around and there are no
   blocks
in the local DN to the RS. Another possibility is that you have one
   replica
configured and regions moved too so that makes even harder to have
 some
local blocks in the DN to the RS. Lastly it could be some other
 problem
where the HDFS pipeline has excluded the local DN. Have you seen any
exception in the RSs or the NameNode that might be interesting?
   
thanks,
esteban.
   
   
   
--
Cloudera, Inc.
   
   
On Tue, May 12, 2015 at 2:59 AM, 娄帅 louis.hust...@gmail.com wrote:
   
 Hi, all,

 I am maintaining an hbase 0.96.0 cluster, but from the web ui of
  HBase
 regionserver,
 i saw Block locality is 0 for all regionserver.

 Datanode on l-hbase[26-31].data.cn8 and regionserver on
 l-hbase[25-31].data.cn8,

 Any idea?

   
  
 



Optimizing compactions on super-low-cost HW

2015-05-18 Thread Serega Sheypak
Hi, we are using extremely cheap HW:
2 HHD 7200
4*2 core (Hyperthreading)
32GB RAM

We met serious IO performance issues.
We have more or less even distribution of read/write requests. The same for
datasize.

ServerName Request Per Second Read Request Count Write Request Count
node01.domain.com,60020,1430172017193 195 171871826 16761699
node02.domain.com,60020,1426925053570 24 34314930 16006603
node03.domain.com,60020,1430860939797 22 32054801 16913299
node04.domain.com,60020,1431975656065 33 1765121 253405
node05.domain.com,60020,1430484646409 27 42248883 16406280
node07.domain.com,60020,1426776403757 27 36324492 16299432
node08.domain.com,60020,1426775898757 26 38507165 13582109
node09.domain.com,60020,1430440612531 27 34360873 15080194
node11.domain.com,60020,1431989669340 28 44307 13466
node12.domain.com,60020,1431927604238 30 5318096 2020855
node13.domain.com,60020,1431372874221 29 31764957 15843688
node14.domain.com,60020,1429640630771 41 36300097 13049801

ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed Storefile
Size Index Size Bloom Size
node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb 641849k
310111k
node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb 649610k
318854k
node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb 627346k
307136k
node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb 655954k
289316k
node05.domain.com,60020,1430484646409 82 185 807m 81474mb 688136k
334127k
node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb 631774k
296169k
node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb 681486k
312325k
node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb 658924k
309734k
node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb 664753k
264081k
node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb 652970k
304137k
node13.domain.com,60020,1431372874221 82 178 937557m 70042mb 601684k 257607k
node14.domain.com,60020,1429640630771 82 145 949090m 69749mb 592812k 266677k


When compaction starts  random node gets I/O 100%, io wait for seconds,
even tenth of seconds.

What are the approaches to optimize minor and major compactions when you
are I/O bound..?


RE: How to set Timeout for get/scan operations without impacting others

2015-05-18 Thread Fang, Mike
Hi Ted,

Thanks.
Hbase version is: HBase 0.98.0.2.1.2.0-402-hadoop2
Data block encoding: DATA_BLOCK_ENCODING = 'DIFF'

I tried to run the hfile tool to scan, and it looks good though:

hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f 
hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/10afed9b44024d02992cfd0409686658
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
2015-05-18 18:34:33,406 INFO  [main] Configuration.deprecation: fs.default.name 
is deprecated. Instead, use fs.defaultFS
Scanning - 
hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/10afed9b44024d02992cfd0409686658
2015-05-18 18:34:33,800 INFO  [main] hfile.CacheConfig: Allocating 
LruBlockCache with maximum size 386.7 M
2015-05-18 18:34:34,032 INFO  [main] compress.CodecPool: Got brand-new 
decompressor [.snappy]
Scanned kv count - 13387493

Any thought or suggestion?
Also if it is corrupted file, do you have guidance/link showing how to fix that?

Thanks,
Mike
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, May 19, 2015 9:29 AM
To: Fang, Mike
Cc: user@hbase.apache.org; Dai, Kevin; Huang, Jianshi
Subject: Re: How to set Timeout for get/scan operations without impacting others

bq. Caused by: java.io.IOException: Invalid HFile block magic: 
\x00\x00\x00\x00\x00\x00\x00\x00

Looks like you have some corrupted HFile(s) in your cluster - which should be 
fixed first.

Which hbase release are you using ?
Do you use data block encoding ?

You can use http://hbase.apache.org/book.html#_hfile_tool to do some 
investigation.

Cheers

On Mon, May 18, 2015 at 6:19 PM, Fang, Mike 
chuf...@paypal.commailto:chuf...@paypal.com wrote:
Hi Ted,

Thanks for your information.
My application queries the HBase, and for some of the queries it just hang 
there and throw exception after several minutes (5-8minutes). As a workaround, 
I try to set the timeout to a shorter time, so my app won’t hang for minutes 
but for several seconds.  I tried to set both the time out to 1000 (1s). but it 
still hang for several minutes.Is this expected?

Appreciate it if you know how I could fix the exception.

Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader 
reader=hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/125acceb75d84724a089701c590a4d3d,
 compression=snappy, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] 
[cacheEvictOnClose=false] [cacheCompressed=false], 
firstKey=addrv#34005240#US,_28409,_822|addre/F|rval#null|cust#1158923121468951849|addre#1095283883|1/EOUT_T_SRD:~/143098200/Put,
 
lastKey=addrv#38035AC7#US,_60449,_4684|addre/F|rval#null|cust#1335211720509289817|addre#697997140|1/EOUT_T_SRD:~/143098200/Put,
 avgKeyLen=122, avgValueLen=187, entries=105492830, length=6880313695, 
cur=null] to key addrv#34B97AEC#FR,_06110,_41 route des 
breguieres|addre/F|rval#/EOUT_T_SRD:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:165)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:176)
at 
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1847)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3716)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1890)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1876)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1853)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3090)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28861)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Failed to read compressed block at 1253175503, 
onDiskSizeWithoutHeader=66428, preReadHeaderSize=33, header.length=33, header 
bytes: 
DATABLKE\x00\x003\x00\x00\xC3\xC9\x00\x00\x00\x01r\xC4-\xDF\x01\x00\x00@\x00\x00\x00P
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1451)
at 

RE: How to set Timeout for get/scan operations without impacting others

2015-05-18 Thread Fang, Mike
Hi Ted,

Thanks for your information.
My application queries the HBase, and for some of the queries it just hang 
there and throw exception after several minutes (5-8minutes). As a workaround, 
I try to set the timeout to a shorter time, so my app won’t hang for minutes 
but for several seconds.  I tried to set both the time out to 1000 (1s). but it 
still hang for several minutes.Is this expected?

Appreciate it if you know how I could fix the exception.

Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader 
reader=hdfs://xxx/hbase/data/data/default/xxx/af7898973c510425fabb7c814ac8ba04/EOUT_T_SRD/125acceb75d84724a089701c590a4d3d,
 compression=snappy, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] 
[cacheEvictOnClose=false] [cacheCompressed=false], 
firstKey=addrv#34005240#US,_28409,_822|addre/F|rval#null|cust#1158923121468951849|addre#1095283883|1/EOUT_T_SRD:~/143098200/Put,
 
lastKey=addrv#38035AC7#US,_60449,_4684|addre/F|rval#null|cust#1335211720509289817|addre#697997140|1/EOUT_T_SRD:~/143098200/Put,
 avgKeyLen=122, avgValueLen=187, entries=105492830, length=6880313695, 
cur=null] to key addrv#34B97AEC#FR,_06110,_41 route des 
breguieres|addre/F|rval#/EOUT_T_SRD:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:165)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:176)
at 
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1847)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3716)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1890)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1876)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1853)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3090)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28861)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Failed to read compressed block at 1253175503, 
onDiskSizeWithoutHeader=66428, preReadHeaderSize=33, header.length=33, header 
bytes: 
DATABLKE\x00\x003\x00\x00\xC3\xC9\x00\x00\x00\x01r\xC4-\xDF\x01\x00\x00@\x00\x00\x00P
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1451)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:494)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:515)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:238)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:153)
... 15 more
Caused by: java.io.IOException: Invalid HFile block magic: 
\x00\x00\x00\x00\x00\x00\x00\x00
at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)
at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:165)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:239)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1448

Thanks,
Mike
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Monday, May 18, 2015 11:55 PM
To: user@hbase.apache.org
Cc: Fang, Mike; Dai, Kevin
Subject: Re: How to set Timeout for get/scan operations without impacting others

hbase.client.operation.timeout is used by HBaseAdmin operations, by 
RegionReplicaFlushHandler and by various HTable operations (including Get).

hbase.rpc.timeout is for the RPC layer to define how long HBase client 
applications take for a remote call to time out. It uses pings to check 
connections 

Re: HMaster restart with error

2015-05-18 Thread Louis Hust
But from the log:

2015-05-15 12:16:40,522 INFO
 [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
handler.ServerShutdownHandler: Finished processing of shutdown of
l-hbase31.data.cn8.qunar.com,60020,1427789773001
2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
client.HConnectionManager$HConnectionImplementation: Checking master
connection

What the Hmaster doing between 12:16:40 and 12:17:11? It's about 30s.

2015-05-18 22:23 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 The exception originated from Web UI corresponding
 to HBaseAdmin.listTables().
 At that moment, master was unable to process the request - it needed some
 more time.

 Cheers

 On Sun, May 17, 2015 at 11:03 PM, Louis Hust louis.h...@gmail.com wrote:

  Yes, ted, can you tell me what the following excpetion means in
  l-namenode1.log?
 
  2015-05-15 12:16:40,522 INFO
   [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
  handler.ServerShutdownHandler: Finished processing of shutdown of
  l-hbase31.data.cn8.qunar.com,60020,1427789773001
  2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
  client.HConnectionManager$HConnectionImplementation: Checking master
  connection
 
  Does this mean the cluster was not operational?
 
 
  2015-05-18 11:45 GMT+08:00 Ted Yu yuzhih...@gmail.com:
 
   After l-namenode1 became active master , it assigned regions:
  
   2015-05-15 12:16:40,432 INFO  [master:l-namenode1:6]
   master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff
   state=OFFLINE, ts=1431663400432, server=null} to
   {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server=
   l-hbase31.data.cn8.qunar.com,60020,1431462584879}
  
   However, l-hbase31 went down:
  
   2015-05-15 12:16:40,508 INFO
[MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
   handler.ServerShutdownHandler: Splitting logs for
   l-hbase31.data.cn8.qunar.com,60020,1427789773001   before assignment.
  
   l-namenode1 was restarted :
  
   2015-05-15 12:20:25,322 INFO  [main] util.VersionInfo: HBase
  0.96.0-hadoop2
   2015-05-15 12:20:25,323 INFO  [main] util.VersionInfo: Subversion
   https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434
  
   However, it went down due to zookeeper session expiration:
  
   2015-05-15 12:20:25,580 WARN  [main] zookeeper.ZooKeeperNodeTracker:
  Can't
   get or delete the master znode
   org.apache.zookeeper.KeeperException$SessionExpiredException:
   KeeperErrorCode = Session expired for /hbase/master
  
   It started again after that and AssignmentManager did a lot of
  assignments.
  
   Looks like the cluster was operational this time.
  
   Cheers
  
   On Sun, May 17, 2015 at 8:24 AM, Ted Yu yuzhih...@gmail.com wrote:
  
bq. the backup master take over at 2015-05-15 12:16:40,024 ?
   
The switch of active master should be earlier than 12:16:40,024 -
  shortly
after 12:15:58
   
l-namenode1 would do some initialization (such as waiting for region
servers count to settle) after it became active master.
   
I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home)
 but
the download progress was very slow.
   
Will try downloading later in the day.
   
Do you have access to pastebin ?
   
Cheers
   
On Sun, May 17, 2015 at 2:07 AM, Louis Hust louis.h...@gmail.com
   wrote:
   
Hi, ted,
   
Thanks for your reply!!
   
I found the log in l-namenode2.dba.cn8 during the restarting
 progress:
2015-05-15 12:11:36,540 INFO  [master:l-namenode2:6]
master.ServerManager: Finished waiting for region servers count to
   settle;
checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
2147483647, master is running.
   
So this means the HMaster ready for handle request at 12:11:36?
   
The backup master is l-namenode1.dba.cn8 and you can get the log at
 :
   
http://pan.baidu.com/s/1eQlKXj0
   
After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
the backup master l-namenode1 take over, and i found log:
   
2015-05-15 12:16:40,024 INFO  [master:l-namenode1:6]
master.ServerManager: Finished waiting for region servers count to
   settle;
checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
2147483647, master is running.
   
So the backup master take over at 2015-05-15 12:16:40,024 ?
   
But it seems the l-namenode2 not working normally with the exception
  in
log:
   
2015-05-15 12:16:40,522 INFO
 [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
handler.ServerShutdownHandler: Finished processing of shutdown of
l-hbase31.data.cn8.qunar.com,60020,1427789773001
2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
client.HConnectionManager$HConnectionImplementation: Checking master
connection
com.google.protobuf.ServiceException: java.net.ConnectException:
Connection
refused
at
   
   
  
 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
at
   
   
  
 
 

Re: HBase Block locality always 0

2015-05-18 Thread Louis Hust
Hi, Alex,

May be the Block locality  display wrong? cause I checked some region file
and found some replica on the same machine!

2015-05-19 7:18 GMT+08:00 Alex Baranau alex.barano...@gmail.com:

 Sorry if I'm asking a silly question... Are you sure your RSs and Datanodes
 are all up and running? Are you sure they are collocated?

  Datanode on l-hbase[26-31].data.cn8 and regionserver on
  l-hbase[25-31].data.cn8,

 Could be that your only live RS is on l-hbase25.data.cn8, which would cause
 that behavior... Btw, why 25th is not collocated with datanode?

 Alex Baranau
 --
 http://cdap.io - open source framework to build and run data applications
 on Hadoop  HBase

 On Fri, May 15, 2015 at 8:12 PM, Louis Hust louis.h...@gmail.com wrote:

  Hi, Esteban,
 
  Hadoop Version 2.2.0, r1537062.
  So i do not know why it always write other datanode instead of local
  datanode,
  If there is some log for the hdfs write policy? And now the cluster is
  working not healthy,
  with heavy networking.
 
  2015-05-15 1:28 GMT+08:00 Esteban Gutierrez este...@cloudera.com:
 
   Hi Louis,
  
   Locality 0 is not right for a cluster of that size and having 3
 replicas
   per block unless all RS cannot connect to the local DN and somehow the
   local DN to the RS is always excluded from the pipeline. In Hadoop
   2.0-alpha there was a bug (HDFS-3224) that caused the NN to report a DN
  as
   live and dead if the storage ID was changed in a single volume (e.g.
  after
   replacing one drive) and that caused fs.getFileBlockLocations() to
 report
   less blocks for calculating the HDFS locality index. Unless your
 cluster
  is
   using Hadoop 2.0-alpha I won't worry too much about that.
  
   Regarding the logs its odd that the JN is taking about 1.5 seconds just
  to
   send less than 200 bytes. Perhaps some IO contention issue is going on
 in
   your cluster?
  
   thanks,
   esteban.
  
   --
   Cloudera, Inc.
  
  
   On Thu, May 14, 2015 at 5:48 AM, Louis Hust louis.h...@gmail.com
  wrote:
  
Hi, Esteban
   
Each region server has about 122 regions, data is large. HDFS replica
  is
defined as default 3,  and namenode have some WARN like below.
   
{log}
2015-05-14 20:45:37,463 WARN
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took
  1503ms
   to
send a batch of 3 edits (179 bytes) to remote journal
  192.168.44.29:8485
{/log}
   
Regionserver's log seems normal:
   
{log}
2015-05-14 20:46:59,890 INFO  [Thread-15] regionserver.HRegion:
  Finished
memstore flush of ~44.4 M/46586984, currentsize=0/0 for region
   
   
  
 
 qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee.
in 3141ms, sequenceid=7493455453, compaction requested=true
2015-05-14 20:46:59,890 INFO
 [regionserver60020-smallCompactions-1431462564717]
  regionserver.HRegion:
Starting compaction on m in region
   
   
  
 
 qmq_backup,0066485937885860620cb396a3e65c6c9de92cae9aa29,1412429632233.65684ef65f58cb3e27986ca38d397bee.
{/log}
   
Any idea?
   
   
   
2015-05-13 1:26 GMT+08:00 Esteban Gutierrez este...@cloudera.com:
   
 Hi,

 How many regions you per RS? one possibility is that you have very
   little
 data in your cluster and regions have moved around and there are no
blocks
 in the local DN to the RS. Another possibility is that you have one
replica
 configured and regions moved too so that makes even harder to have
  some
 local blocks in the DN to the RS. Lastly it could be some other
  problem
 where the HDFS pipeline has excluded the local DN. Have you seen
 any
 exception in the RSs or the NameNode that might be interesting?

 thanks,
 esteban.



 --
 Cloudera, Inc.


 On Tue, May 12, 2015 at 2:59 AM, 娄帅 louis.hust...@gmail.com
 wrote:

  Hi, all,
 
  I am maintaining an hbase 0.96.0 cluster, but from the web ui of
   HBase
  regionserver,
  i saw Block locality is 0 for all regionserver.
 
  Datanode on l-hbase[26-31].data.cn8 and regionserver on
  l-hbase[25-31].data.cn8,
 
  Any idea?
 

   
  
 



Re: HMaster restart with error

2015-05-18 Thread Louis Hust
Yes, ted, can you tell me what the following excpetion means in
l-namenode1.log?

2015-05-15 12:16:40,522 INFO
 [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
handler.ServerShutdownHandler: Finished processing of shutdown of
l-hbase31.data.cn8.qunar.com,60020,1427789773001
2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
client.HConnectionManager$HConnectionImplementation: Checking master
connection

Does this mean the cluster was not operational?


2015-05-18 11:45 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 After l-namenode1 became active master , it assigned regions:

 2015-05-15 12:16:40,432 INFO  [master:l-namenode1:6]
 master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff
 state=OFFLINE, ts=1431663400432, server=null} to
 {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server=
 l-hbase31.data.cn8.qunar.com,60020,1431462584879}

 However, l-hbase31 went down:

 2015-05-15 12:16:40,508 INFO
  [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
 handler.ServerShutdownHandler: Splitting logs for
 l-hbase31.data.cn8.qunar.com,60020,1427789773001   before assignment.

 l-namenode1 was restarted :

 2015-05-15 12:20:25,322 INFO  [main] util.VersionInfo: HBase 0.96.0-hadoop2
 2015-05-15 12:20:25,323 INFO  [main] util.VersionInfo: Subversion
 https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434

 However, it went down due to zookeeper session expiration:

 2015-05-15 12:20:25,580 WARN  [main] zookeeper.ZooKeeperNodeTracker: Can't
 get or delete the master znode
 org.apache.zookeeper.KeeperException$SessionExpiredException:
 KeeperErrorCode = Session expired for /hbase/master

 It started again after that and AssignmentManager did a lot of assignments.

 Looks like the cluster was operational this time.

 Cheers

 On Sun, May 17, 2015 at 8:24 AM, Ted Yu yuzhih...@gmail.com wrote:

  bq. the backup master take over at 2015-05-15 12:16:40,024 ?
 
  The switch of active master should be earlier than 12:16:40,024 - shortly
  after 12:15:58
 
  l-namenode1 would do some initialization (such as waiting for region
  servers count to settle) after it became active master.
 
  I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but
  the download progress was very slow.
 
  Will try downloading later in the day.
 
  Do you have access to pastebin ?
 
  Cheers
 
  On Sun, May 17, 2015 at 2:07 AM, Louis Hust louis.h...@gmail.com
 wrote:
 
  Hi, ted,
 
  Thanks for your reply!!
 
  I found the log in l-namenode2.dba.cn8 during the restarting progress:
  2015-05-15 12:11:36,540 INFO  [master:l-namenode2:6]
  master.ServerManager: Finished waiting for region servers count to
 settle;
  checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
  2147483647, master is running.
 
  So this means the HMaster ready for handle request at 12:11:36?
 
  The backup master is l-namenode1.dba.cn8 and you can get the log at :
 
  http://pan.baidu.com/s/1eQlKXj0
 
  After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
  the backup master l-namenode1 take over, and i found log:
 
  2015-05-15 12:16:40,024 INFO  [master:l-namenode1:6]
  master.ServerManager: Finished waiting for region servers count to
 settle;
  checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
  2147483647, master is running.
 
  So the backup master take over at 2015-05-15 12:16:40,024 ?
 
  But it seems the l-namenode2 not working normally with the exception in
  log:
 
  2015-05-15 12:16:40,522 INFO
   [MASTER_SERVER_OPERATIONS-l-namenode1:6-0]
  handler.ServerShutdownHandler: Finished processing of shutdown of
  l-hbase31.data.cn8.qunar.com,60020,1427789773001
  2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
  client.HConnectionManager$HConnectionImplementation: Checking master
  connection
  com.google.protobuf.ServiceException: java.net.ConnectException:
  Connection
  refused
  at
 
 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
  at
 
 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
  at
 
 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
  at
 
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
  at
 
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
  at
 
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
 
  Is the exception means the HMaster not working normally or somewhat?
 
 
 
  2015-05-17 11:06 GMT+08:00 Ted Yu yuzhih...@gmail.com:
 
   bq. the HMaster is handling two region server down, and not ready to
  handle
   client request?
  
   I didn't mean that - for a functioning master, 

How to set Timeout for get/scan operations without impacting others

2015-05-18 Thread Jianshi Huang
Hi,

I need to set tight timeout for get/scan operations and I think HBase
Client already support it.

I found three related keys:

- hbase.client.operation.timeout
- hbase.rpc.timeout
- hbase.client.retries.number

What's the difference between hbase.client.operation.timeout and
hbase.rpc.timeout?
My understanding is that hbase.rpc.timeout has larger scope than hbase.
client.operation.timeout, so setting hbase.client.operation.timeout  is
safer. Am I correct?

And any other property keys I can uses?

-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/


Default hbase.ipc.server.callqueue.scan.ratio is 0, is this right?

2015-05-18 Thread lagend
When I start a new cluster, package is hbase-1.0.1-bin.tar.gz, error occurs:
2015-05-18 17:21:09,514 ERROR [main] regionserver.HRegionServerCommandLine: 
Region server exiting
java.lang.RuntimeException: Failed construction of Regionserver: class 
org.apache.hadoop.hbase.regionserver.HRegionServer
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2496)
at 
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64)
at 
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2511)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2494)
... 5 more
Caused by: java.lang.IllegalArgumentException: Queue size is = 0, must be at 
least 1
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.getBalancer(RpcExecutor.java:177)
at 
org.apache.hadoop.hbase.ipc.RWQueueRpcExecutor.init(RWQueueRpcExecutor.java:133)
at 
org.apache.hadoop.hbase.ipc.RWQueueRpcExecutor.init(RWQueueRpcExecutor.java:95)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.init(SimpleRpcScheduler.java:134)
at 
org.apache.hadoop.hbase.regionserver.SimpleRpcSchedulerFactory.create(SimpleRpcSchedulerFactory.java:46)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:792)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:575)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:492)


I found the probably reason:
in hbase-default.xml, hbase.ipc.server.callqueue.read.ratio default value is 0


hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler#init
line 44.public static final String 
CALL_QUEUE_SCAN_SHARE_CONF_KEY = hbase.ipc.server.callqueue.scan.ratio;
line 123.   float callqScanShare = 
conf.getFloat(CALL_QUEUE_SCAN_SHARE_CONF_KEY, 0);// default is 0
line 134.   callExecutor = new RWQueueRpcExecutor(RW.default, 
handlerCount, numCallQueues,
callqReadShare, callqScanShare, maxQueueLength, conf, abortable,
BoundedPriorityBlockingQueue.class, callPriority);
 
hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RWQueueRpcExecutor#init
line 116.   int numScanQueues = Math.max(0, (int)Math.floor(numReadQueues * 
scanShare)); // numScanQueues is 0
line 133.this.scanBalancer = getBalancer(numScanQueues);




hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor#getBalancer
line 177.Preconditions.checkArgument(queueSize  0, Queue size is = 0, 
must be at least 1);
the queueSize is 0, so, throw IllegalArgumentEception:
Caused by: java.lang.IllegalArgumentException: Queue size is = 0, must be at 
least 1


my answer is:
this config, hbase.ipc.server.callqueue.scan.ratio, can be '0'? or other reason 
cause this fault?
Thanks