Re: Bulk loading disadvantages

2012-07-27 Thread Bijeet Singh
Anil,

The two directories in question here are -

  1.  the HDFS location where the MapReduce job creates the HFiles
  2.  the directory pointed to by hbase.rootdir in your HBase configuration
- the default value is /hbase. Inside the
   HBase root directory, there are per-table subdirectories.

So for the kind of comparison that you mentioned, you need to look in the
directory hbase.rootdir/table-name and the
directory where you are creating the HFiles.

BIjeet



On Fri, Jul 27, 2012 at 9:10 AM, Anil Gupta anilgupt...@gmail.com wrote:

 Hi Sever,

 That's a very interesting thing. Which Hadoop and hbase version you are
 using? I am going to run bulk loads tomorrow. If you can tell me which
 directories in hdfs you compared with /hbase/$table then I will try to
 check the same.

 Best Regards,
 Anil

 On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu 
 fundatureanu.se...@gmail.com wrote:

  On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu lakka...@gmail.com
 wrote:
 
 
  For the bulkloading process, the HBase documentation mentions that in
  a 2nd stage the appropriate Region Server adopts the HFile, moving it
  into its storage directory and making the data available to clients.
  But from my experience the files also remain in the original location
  from where they are adopted. So I guess the data is actually copied
  into the HBase directory right? This means that, compared to the
  online importing, when bulk loading you essentially need twice the
  disk space on HDFS, right?
 
 
  Yes, if you are generating HFiles on one cluster and loading into a
  separate hbase cluster. If they are co-located, its just a hdfs mv.
 
  Hmm, both the HFile generation and the HBase cluster runs on top of
  the same HDFS cluster. I did a du on both the source HDFS directory
  and the destination /hbase directory and I got the same sizes (+-
  few bytes). I deleted the source directory from HDFS and then scanned
  the table without any problems. Maybe there is a config parameter I'm
  missing?
 
  Sever



Re: Cluster load

2012-07-27 Thread Khang Pham
Hi,

by node do you mean regionserver node ?

if you referring to RegionServer node: you can go to the hbase master web
interface master:65510/master.jsp to see load for each regionserver. That's
the overall load. If you want to see load per node per table, you will need
to query on .META. table (column: info:server)



--K
On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Is there a way to see how much data does each node have per Hbase table?

 On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com wrote:

  First check whether the data in hbase is consistent ... check this by
  running hbck (bin/hbase hbck ) If all the region is consistent .
  Now check no of splits in localhost:60010 for the table mention ..
   On Jul 27, 2012 4:02 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
   I added new regions and the performance didn't improve. I think it
 still
  is
   the load balancing issue. I want to ensure that my rows are getting
   distrbuted accross cluster. What I see is this:
  
   Could you please tell me what's the best way to see the load?
  
  
   [root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/
  
   drwxr-xr-x - root root 3 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359
  
   drwxr-xr-x - root root 0 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX
  
   -rwxr-xr-x 3 root root 764 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo
  
   drwxr-xr-x - root root 3 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs/hlog.1343334723093
  
   drwxr-xr-x - root root 0 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/S_T_MTX
  
   -rwxr-xr-x 3 root root 764 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.regioninfo
  
   drwxr-xr-x - root root 3 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 
 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs/hlog.1343334723240
  
   drwxr-xr-x - root root 0 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/S_T_MTX
  
   -rwxr-xr-x 3 root root 764 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.regioninfo
  
   drwxr-xr-x - root root 3 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 
 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs/hlog.1343334723171
  
   drwxr-xr-x - root root 0 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/S_T_MTX
  
   -rwxr-xr-x 3 root root 764 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.regioninfo
  
   drwxr-xr-x - root root 3 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 
 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.oldlogs/hlog.1343334723397
  
   drwxr-xr-x - root root 0 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/S_T_MTX
  
   -rwxr-xr-x 3 root root 762 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.regioninfo
  
   drwxr-xr-x - root root 4 2012-07-26 13:57
   /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a
  
   drwxr-xr-x - root root 0 2012-07-26 13:59
   /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.tmp
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs/hlog.1343334723004
  
   drwxr-xr-x - root root 2 2012-07-26 13:59
   /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX
  
   -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
  
  
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
  
   

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-07-27 Thread yangyang
NNever nneverwei@... writes:

 
 I'm sorry i made a mistack. my protobuf's version is 2.4.0.a.jar. I got it
 from Hbase0.94's lib.
 
 2012/6/6 Amit Sela amits@...
 
  you mean protobuf-java-2.4.04.jar or there is a new version like you
  wrote protobuf-java-2.4.9.a.jar ?
 
 
  On Mon, Jun 4, 2012 at 6:09 AM, NNever nneverwei@... wrote:
 
   Hi  Amit,  I meet this error on the client side when I upgrade 0.92.1 to
   0.94.
   I just put protobuf-java-2.4.9.a.jar into the classpath then the problem
   sloved.
  
   if you're sure you have protobuf on your CLASSPATH when the job runs,
   have
   you just try restart M/R or even Hadoop?
   I met some strange classNotFound Exceptions when I run mapR before, just
   restart, and everything may return well.
  
 

I got a way. just export the hadoop classpath like this: export 
HADOOP_CLASSPATH=/home/chianyu/hbase-0.94.0/hbase-
0.94.0.jar:/home/chianyu/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/chianyu/hbase-
0.94.0/lib/zookeeper-3.4.3.jar:/home/chianyu/hbase-0.94.0/lib/protobuf-java-
2.4.0a.jar
that's all ok!




Re: Load balancer repeatedly close and open region in the same regionserver.

2012-07-27 Thread Howard
Thanks Suraj Varma,I have put the log into the pastebin.com.

master log: http://pastebin.com/QWv3K9HQ
regionserver log:http://pastebin.com/LM27ui72

Because there is a lot of region is not online in the regionserver log,so
I have filter this in the regionserver log.
The following is the count of Region is not online: log,start 23:16,there
is a lot of access fail because the region is not online.
--d70285c1a12dec9289ce9290c9349a79
 1 23:16
103 23:36
142 23:37
169 23:38
 94 23:39
120 23:40
 39 23:41
110 23:42
104 23:43
114 23:44
 90 23:45
121 23:46
104 23:47
 74 23:48
 96 23:49
100 23:50
125 23:51
 59 23:52
113 23:53
134 23:54
127 23:55
131 23:56
119 23:57
 82 23:58
165 23:59

and the region d70285c1a12dec9289ce9290c9349a79 is move between two
regionserver again and again by balancer.Start 23:36,the region is move
from regionserver 192.168.18.40 to 192.168.18.40 and fail.


2012/7/19 Suraj Varma svarma...@gmail.com

 You can use pastebin.com or similar services to cut/paste your logs.
 --S

 On Tue, Jul 17, 2012 at 7:11 PM, Howard rj03...@gmail.com wrote:
  this problem just only once,Because it happens two day before,I remember
 I
  check the master-status and only always see regions is pending open in
   Regions in Transition,not see there was two regionservers in the same
  server.
 
  Sent CLOSE to 192.168.0.2,60020,1342017399608,what
  does  60020,1342017399608 mean?Is there some document can help to read
  the source code?
  If still need to upload the log,how to upload the log?
  sorry I am a freshman with HBase.
 
  2012/7/17 Ted Yu yuzhih...@gmail.com
 
  Howard:
  Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out
  yesterday ?
  I guess you have noticed the following toward the end of log snippet:
 
  2012-07-16 00:17:50,774 DEBUG
  org.apache.hadoop.hbase.
  master.handler.OpenedRegionHandler: Handling OPENED
  event for
 
 
 trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956.
  from 192.168.1.2,60020,1342017399608; deleting unassigned node
 
  As Ram pointed out, there might be two region server processes running
 on
  192.168.1.2
 
  Please verify whether that was the case.
 
  Cheers
 
  On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan 
  ramkrishna.vasude...@huawei.com wrote:
 
   From the logs I can see that though the server's are same their start
  code
   is different.
   Need to analyse the previous logs also.  Pls file a JIRA, if possible
   attach
   the logs to that.
  
   Thanks Howard.
  
   Regards
   Ram
  
-Original Message-
From: Howard [mailto:rj03...@gmail.com]
Sent: Tuesday, July 17, 2012 7:32 PM
To: user@hbase.apache.org
Subject: Re: Load balancer repeatedly close and open region in the
 same
regionserver.
   
*hi,Ted Yu,thanks for your reply:*
the hbase and hadoop version is
HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1,
rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638
the following is a detail log about the same region
   
 trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
e956,
and it repeats again and again.:
2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster:
balance
   
 hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
e1b7e956.,
src=192.168.1.2,60020,1342017399608,
dest=192.168.1.2,60020,1342002082592
2012-07-16 00:12:49,843 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting
 unassignment
of
region
   
 trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
e956.
(offlining)
2012-07-16 00:12:49,843 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:6-0x4384d0a47f40068 Creating unassigned node for
93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state
2012-07-16 00:12:49,845 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
192.168.1.2,60020,1342017399608 for region
   
 trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
e956.
2012-07-16 00:12:50,555 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_CLOSED,
 server=192.168.1.2,60020,1342017399608,
region=93caf5147d40f5dd4625e160e1b7e956
2012-07-16 00:12:50,555 DEBUG
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling
CLOSED
event for 93caf5147d40f5dd4625e160e1b7e956
2012-07-16 00:12:50,555 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
   
 was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
e1b7e956.
state=CLOSED, ts=1342368770556,
 server=192.168.1.2,60020,1342017399608
2012-07-16 00:12:50,555 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:6-0x4384d0a47f40068 Creating (or updating) unassigned
 

Re: Bulk loading disadvantages

2012-07-27 Thread Sever Fundatureanu
Hi Anil,

I am using HBase 0.94.0 with Hadoop 1.0.0. The directories are indeed
the ones mentioned my Bijeet. I can also add that I am doing the 2nd
stage programatically by calling doBulkLoad(org.apache.hadoop.fs.Path
sourceDir, HTable table) on a LoadIncrementalHFiles object.

Best,
Sever


On Fri, Jul 27, 2012 at 5:40 AM, Anil Gupta anilgupt...@gmail.com wrote:
 Hi Sever,

 That's a very interesting thing. Which Hadoop and hbase version you are 
 using? I am going to run bulk loads tomorrow. If you can tell me which 
 directories in hdfs you compared with /hbase/$table then I will try to check 
 the same.

 Best Regards,
 Anil

 On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu 
 fundatureanu.se...@gmail.com wrote:

 On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu lakka...@gmail.com wrote:


 For the bulkloading process, the HBase documentation mentions that in
 a 2nd stage the appropriate Region Server adopts the HFile, moving it
 into its storage directory and making the data available to clients.
 But from my experience the files also remain in the original location
 from where they are adopted. So I guess the data is actually copied
 into the HBase directory right? This means that, compared to the
 online importing, when bulk loading you essentially need twice the
 disk space on HDFS, right?


 Yes, if you are generating HFiles on one cluster and loading into a
 separate hbase cluster. If they are co-located, its just a hdfs mv.

 Hmm, both the HFile generation and the HBase cluster runs on top of
 the same HDFS cluster. I did a du on both the source HDFS directory
 and the destination /hbase directory and I got the same sizes (+-
 few bytes). I deleted the source directory from HDFS and then scanned
 the table without any problems. Maybe there is a config parameter I'm
 missing?

 Sever



-- 
Sever Fundatureanu

Vrije Universiteit Amsterdam
E-mail: fundatureanu.se...@gmail.com


Re: Bulk loading disadvantages

2012-07-27 Thread Sever Fundatureanu
After digging a bit I've found my problem comes from the following
lines in the Store class:

void bulkLoadHFile(String srcPathStr) throws IOException {
Path srcPath = new Path(srcPathStr);

// Move the file if it's on another filesystem
FileSystem srcFs = srcPath.getFileSystem(conf);
if (!srcFs.equals(fs)) {
  LOG.info(File  + srcPath +  on different filesystem than  +
  destination store - moving to this filesystem.);
  Path tmpPath = getTmpPath();
  FileUtil.copy(srcFs, srcPath, fs, tmpPath, false, conf);
  LOG.info(Copied to temporary path on dst filesystem:  + tmpPath);
  srcPath = tmpPath;
}

The equality for the 2 filesystems fails in my case and I get the following log:

2012-07-27 14:47:25,321 INFO
org.apache.hadoop.hbase.regionserver.Store: File
hdfs://fs0.cm.cluster:8020/user/sfu200/outputBsbm/string2Id/F/e6cf2d1b69354e268b79597bf3855357
on different filesystem than destination store - moving to this
filesystem.
2012-07-27 14:47:27,286 INFO
org.apache.hadoop.hbase.regionserver.Store: Copied to temporary path
on dst filesystem:
hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
2012-07-27 14:47:27,286 DEBUG
org.apache.hadoop.hbase.regionserver.Store: Renaming bulk load file
hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
to 
hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/F/c4bbf70a6654422db81884f15f34c712
2012-07-27 14:47:27,297 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: HFile Bloom filter
type for c4bbf70a6654422db81884f15f34c712: NONE, but ROW specified in
column family configuration
2012-07-27 14:47:27,297 INFO
org.apache.hadoop.hbase.regionserver.Store: Moved hfile
hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
into store directory
hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/F
- updating store file list.
2012-07-27 14:47:27,297 INFO
org.apache.hadoop.hbase.regionserver.Store: Successfully loaded store
file 
hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
into store F (new location:
hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/F/c4bbf70a6654422db81884f15f34c712)

In my hbase-site.xml I have:
property
  namehbase.rootdir/name
  valuehdfs://fs0.cm.cluster:8020/hbase/value
  descriptionThe directory shared by RegionServers.
  /description
/property

and in my hdfs-site.xml I have:
property
  namefs.default.name/name
  valuehdfs://fs0.cm.cluster:8020/value
/property

As you can see they point to the same namenode. So I really don't
understand why the above check fails..

Regards,
Sever

On Fri, Jul 27, 2012 at 1:17 PM, Sever Fundatureanu
fundatureanu.se...@gmail.com wrote:
 Hi Anil,

 I am using HBase 0.94.0 with Hadoop 1.0.0. The directories are indeed
 the ones mentioned my Bijeet. I can also add that I am doing the 2nd
 stage programatically by calling doBulkLoad(org.apache.hadoop.fs.Path
 sourceDir, HTable table) on a LoadIncrementalHFiles object.

 Best,
 Sever


 On Fri, Jul 27, 2012 at 5:40 AM, Anil Gupta anilgupt...@gmail.com wrote:
 Hi Sever,

 That's a very interesting thing. Which Hadoop and hbase version you are 
 using? I am going to run bulk loads tomorrow. If you can tell me which 
 directories in hdfs you compared with /hbase/$table then I will try to check 
 the same.

 Best Regards,
 Anil

 On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu 
 fundatureanu.se...@gmail.com wrote:

 On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu lakka...@gmail.com 
 wrote:


 For the bulkloading process, the HBase documentation mentions that in
 a 2nd stage the appropriate Region Server adopts the HFile, moving it
 into its storage directory and making the data available to clients.
 But from my experience the files also remain in the original location
 from where they are adopted. So I guess the data is actually copied
 into the HBase directory right? This means that, compared to the
 online importing, when bulk loading you essentially need twice the
 disk space on HDFS, right?


 Yes, if you are generating HFiles on one cluster and loading into a
 separate hbase cluster. If they are co-located, its just a hdfs mv.

 Hmm, both the HFile generation and the HBase cluster runs on top of
 the same HDFS cluster. I did a du on both the source HDFS directory
 and the destination /hbase directory and I got the same sizes (+-
 few bytes). I deleted the source directory from HDFS and then scanned
 the table without any problems. Maybe there is a config parameter I'm
 missing?

 Sever



 --
 Sever Fundatureanu

 Vrije Universiteit Amsterdam
 E-mail: fundatureanu.se...@gmail.com



-- 
Sever Fundatureanu

Vrije Universiteit Amsterdam
E-mail: 

Re: Load balancer repeatedly close and open region in the same regionserver.

2012-07-27 Thread Ted Yu
bq. the region is move from regionserver 192.168.18.40 to 192.168.18.40

Have you checked whether there were two region server processes running on
192.168.18.40 ?

Cheers

On Fri, Jul 27, 2012 at 2:43 AM, Howard rj03...@gmail.com wrote:

 Thanks Suraj Varma,I have put the log into the pastebin.com.

 master log: http://pastebin.com/QWv3K9HQ
 regionserver log:http://pastebin.com/LM27ui72

 Because there is a lot of region is not online in the regionserver log,so
 I have filter this in the regionserver log.
 The following is the count of Region is not online: log,start 23:16,there
 is a lot of access fail because the region is not online.
 --d70285c1a12dec9289ce9290c9349a79
  1 23:16
 103 23:36
 142 23:37
 169 23:38
  94 23:39
 120 23:40
  39 23:41
 110 23:42
 104 23:43
 114 23:44
  90 23:45
 121 23:46
 104 23:47
  74 23:48
  96 23:49
 100 23:50
 125 23:51
  59 23:52
 113 23:53
 134 23:54
 127 23:55
 131 23:56
 119 23:57
  82 23:58
 165 23:59

 and the region d70285c1a12dec9289ce9290c9349a79 is move between two
 regionserver again and again by balancer.Start 23:36,the region is move
 from regionserver 192.168.18.40 to 192.168.18.40 and fail.


 2012/7/19 Suraj Varma svarma...@gmail.com

  You can use pastebin.com or similar services to cut/paste your logs.
  --S
 
  On Tue, Jul 17, 2012 at 7:11 PM, Howard rj03...@gmail.com wrote:
   this problem just only once,Because it happens two day before,I
 remember
  I
   check the master-status and only always see regions is pending open
 in
Regions in Transition,not see there was two regionservers in the same
   server.
  
   Sent CLOSE to 192.168.0.2,60020,1342017399608,what
   does  60020,1342017399608 mean?Is there some document can help to
 read
   the source code?
   If still need to upload the log,how to upload the log?
   sorry I am a freshman with HBase.
  
   2012/7/17 Ted Yu yuzhih...@gmail.com
  
   Howard:
   Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out
   yesterday ?
   I guess you have noticed the following toward the end of log snippet:
  
   2012-07-16 00:17:50,774 DEBUG
   org.apache.hadoop.hbase.
   master.handler.OpenedRegionHandler: Handling OPENED
   event for
  
  
 
 trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956.
   from 192.168.1.2,60020,1342017399608; deleting unassigned node
  
   As Ram pointed out, there might be two region server processes running
  on
   192.168.1.2
  
   Please verify whether that was the case.
  
   Cheers
  
   On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan 
   ramkrishna.vasude...@huawei.com wrote:
  
From the logs I can see that though the server's are same their
 start
   code
is different.
Need to analyse the previous logs also.  Pls file a JIRA, if
 possible
attach
the logs to that.
   
Thanks Howard.
   
Regards
Ram
   
 -Original Message-
 From: Howard [mailto:rj03...@gmail.com]
 Sent: Tuesday, July 17, 2012 7:32 PM
 To: user@hbase.apache.org
 Subject: Re: Load balancer repeatedly close and open region in the
  same
 regionserver.

 *hi,Ted Yu,thanks for your reply:*
 the hbase and hadoop version is
 HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1,
 rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638
 the following is a detail log about the same region

  trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
 e956,
 and it repeats again and again.:
 2012-07-16 00:12:49,843 INFO
 org.apache.hadoop.hbase.master.HMaster:
 balance

  hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
 e1b7e956.,
 src=192.168.1.2,60020,1342017399608,
 dest=192.168.1.2,60020,1342002082592
 2012-07-16 00:12:49,843 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Starting
  unassignment
 of
 region

  trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
 e956.
 (offlining)
 2012-07-16 00:12:49,843 DEBUG
 org.apache.hadoop.hbase.zookeeper.ZKAssign:
 master:6-0x4384d0a47f40068 Creating unassigned node for
 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state
 2012-07-16 00:12:49,845 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
 192.168.1.2,60020,1342017399608 for region

  trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
 e956.
 2012-07-16 00:12:50,555 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Handling
 transition=RS_ZK_REGION_CLOSED,
  server=192.168.1.2,60020,1342017399608,
 region=93caf5147d40f5dd4625e160e1b7e956
 2012-07-16 00:12:50,555 DEBUG
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler:
 Handling
 CLOSED
 event for 93caf5147d40f5dd4625e160e1b7e956
 2012-07-16 00:12:50,555 DEBUG
 

Re: Bulk loading disadvantages

2012-07-27 Thread Alex Baranau
 Another problem is with data locality immediately after bulk loading
 through MR.

You might find this recent discussion about that useful: [1]

Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

[1] The start is here:
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201207.mbox/%3CCAA7+SiBcu_yB45=wearkcpdw1hgnksuv4cevxhjf8k5yrwv...@mail.gmail.com%3Ebut
then the thread gets broken due to FWD/RES adding into subj. Also
you can find it here:
http://search-hadoop.com/?q=bulk+import+and+data+locality

On Fri, Jul 27, 2012 at 9:46 AM, Sever Fundatureanu 
fundatureanu.se...@gmail.com wrote:

 After digging a bit I've found my problem comes from the following
 lines in the Store class:

 void bulkLoadHFile(String srcPathStr) throws IOException {
 Path srcPath = new Path(srcPathStr);

 // Move the file if it's on another filesystem
 FileSystem srcFs = srcPath.getFileSystem(conf);
 if (!srcFs.equals(fs)) {
   LOG.info(File  + srcPath +  on different filesystem than  +
   destination store - moving to this filesystem.);
   Path tmpPath = getTmpPath();
   FileUtil.copy(srcFs, srcPath, fs, tmpPath, false, conf);
   LOG.info(Copied to temporary path on dst filesystem:  + tmpPath);
   srcPath = tmpPath;
 }

 The equality for the 2 filesystems fails in my case and I get the
 following log:

 2012-07-27 14:47:25,321 INFO
 org.apache.hadoop.hbase.regionserver.Store: File

 hdfs://fs0.cm.cluster:8020/user/sfu200/outputBsbm/string2Id/F/e6cf2d1b69354e268b79597bf3855357
 on different filesystem than destination store - moving to this
 filesystem.
 2012-07-27 14:47:27,286 INFO
 org.apache.hadoop.hbase.regionserver.Store: Copied to temporary path
 on dst filesystem:

 hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
 2012-07-27 14:47:27,286 DEBUG
 org.apache.hadoop.hbase.regionserver.Store: Renaming bulk load file

 hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
 to
 hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/F/c4bbf70a6654422db81884f15f34c712
 2012-07-27 14:47:27,297 INFO
 org.apache.hadoop.hbase.regionserver.StoreFile: HFile Bloom filter
 type for c4bbf70a6654422db81884f15f34c712: NONE, but ROW specified in
 column family configuration
 2012-07-27 14:47:27,297 INFO
 org.apache.hadoop.hbase.regionserver.Store: Moved hfile

 hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
 into store directory

 hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/F
 - updating store file list.
 2012-07-27 14:47:27,297 INFO
 org.apache.hadoop.hbase.regionserver.Store: Successfully loaded store
 file
 hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/.tmp/90f6b193e6fd48ba8e814c968179abb9
 into store F (new location:

 hdfs://fs0.cm.cluster:8020/hbase/String2Id_bsbm/9028c6a70b30a089b4312c622729e98e/F/c4bbf70a6654422db81884f15f34c712)

 In my hbase-site.xml I have:
 property
   namehbase.rootdir/name
   valuehdfs://fs0.cm.cluster:8020/hbase/value
   descriptionThe directory shared by RegionServers.
   /description
 /property

 and in my hdfs-site.xml I have:
 property
   namefs.default.name/name
   valuehdfs://fs0.cm.cluster:8020/value
 /property

 As you can see they point to the same namenode. So I really don't
 understand why the above check fails..

 Regards,
 Sever

 On Fri, Jul 27, 2012 at 1:17 PM, Sever Fundatureanu
 fundatureanu.se...@gmail.com wrote:
  Hi Anil,
 
  I am using HBase 0.94.0 with Hadoop 1.0.0. The directories are indeed
  the ones mentioned my Bijeet. I can also add that I am doing the 2nd
  stage programatically by calling doBulkLoad(org.apache.hadoop.fs.Path
  sourceDir, HTable table) on a LoadIncrementalHFiles object.
 
  Best,
  Sever
 
 
  On Fri, Jul 27, 2012 at 5:40 AM, Anil Gupta anilgupt...@gmail.com
 wrote:
  Hi Sever,
 
  That's a very interesting thing. Which Hadoop and hbase version you are
 using? I am going to run bulk loads tomorrow. If you can tell me which
 directories in hdfs you compared with /hbase/$table then I will try to
 check the same.
 
  Best Regards,
  Anil
 
  On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu 
 fundatureanu.se...@gmail.com wrote:
 
  On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu lakka...@gmail.com
 wrote:
 
 
  For the bulkloading process, the HBase documentation mentions that in
  a 2nd stage the appropriate Region Server adopts the HFile, moving
 it
  into its storage directory and making the data available to clients.
  But from my experience the files also remain in the original location
  from where they are adopted. So I guess the data is actually copied
  into the HBase directory right? This means that, compared to the
  online importing, 

Re: Cluster load

2012-07-27 Thread Alex Baranau
From what you posted above, I guess one of the regions
(0a5f6fadd0435898c6f4cf11daa9895a,
note that it has 2 files 2GB each [1], while others regions are empty) is
getting hit with writes. You may want to run flush 'mytable' command from
hbase shell before looking at hdfs - this way you make sure your data is
flushed to hdfs (and not hanged in Memstores).

You may want to check the START/END keys of this region (via master web ui
or in .META.). Then you can compare with the keys generated by your app.
This should give you some info about what's going on.

Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

[1]

-rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717

-rwxr-xr-x 3 root root 2003372 2012-07-26 13:57
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502

On Fri, Jul 27, 2012 at 3:16 AM, Khang Pham khang...@gmail.com wrote:

 Hi,

 by node do you mean regionserver node ?

 if you referring to RegionServer node: you can go to the hbase master web
 interface master:65510/master.jsp to see load for each regionserver. That's
 the overall load. If you want to see load per node per table, you will need
 to query on .META. table (column: info:server)



 --K
 On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Is there a way to see how much data does each node have per Hbase table?
 
  On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com wrote:
 
   First check whether the data in hbase is consistent ... check this by
   running hbck (bin/hbase hbck ) If all the region is consistent .
   Now check no of splits in localhost:60010 for the table mention ..
On Jul 27, 2012 4:02 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
  
I added new regions and the performance didn't improve. I think it
  still
   is
the load balancing issue. I want to ensure that my rows are getting
distrbuted accross cluster. What I see is this:
   
Could you please tell me what's the best way to see the load?
   
   
[root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/
   
drwxr-xr-x - root root 3 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641
   
drwxr-xr-x - root root 1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs
   
-rwxr-xr-x 3 root root 124 2012-07-26 13:32
   
   
  
 
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359
   
drwxr-xr-x - root root 0 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX
   
-rwxr-xr-x 3 root root 764 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo
   
drwxr-xr-x - root root 3 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854
   
drwxr-xr-x - root root 1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs
   
-rwxr-xr-x 3 root root 124 2012-07-26 13:32
   
   
  
 
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs/hlog.1343334723093
   
drwxr-xr-x - root root 0 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/S_T_MTX
   
-rwxr-xr-x 3 root root 764 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.regioninfo
   
drwxr-xr-x - root root 3 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba
   
drwxr-xr-x - root root 1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs
   
-rwxr-xr-x 3 root root 124 2012-07-26 13:32
   
   
  
 
 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs/hlog.1343334723240
   
drwxr-xr-x - root root 0 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/S_T_MTX
   
-rwxr-xr-x 3 root root 764 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.regioninfo
   
drwxr-xr-x - root root 3 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a
   
drwxr-xr-x - root root 1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs
   
-rwxr-xr-x 3 root root 124 2012-07-26 13:32
   
   
  
 
 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs/hlog.1343334723171
   
drwxr-xr-x - root root 0 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/S_T_MTX
   
-rwxr-xr-x 3 root root 764 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.regioninfo
   
drwxr-xr-x - root root 3 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf
   
drwxr-xr-x - root root 1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.oldlogs
   
-rwxr-xr-x 3 root root 

Re: Bloom Filter

2012-07-27 Thread Alex Baranau
Very good explanation (and food for thinking) about using bloom filters in
HBase in answers here:
http://www.quora.com/How-are-bloom-filters-used-in-HBase.

Should we put the link to it from Apache HBase book (ref guide)?

Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Thu, Jul 26, 2012 at 8:38 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen mdngu...@gmail.com
 wrote:

  Mohit,
 
  According to HBase: The Definitive Guide,
 
  The row+column Bloom filter is useful when you cannot batch updates for a
  specific row, and end up with store files which all contain parts of the
  row. The more specific row+column filter can then identify which of the
  files contain the data you are requesting. Obviously, if you always load
  the entire row, this filter is once again hardly useful, as the region
  server will need to load the matching block out of each file anyway.
  Since
  the row+column filter will require more storage, you need to do the math
 to
  determine whether it is worth the extra resources.
 

 Thanks! I have a timeseries data so I am thinking I should enable bloom
 filters for only rows

 
 
 ~ Minh
 
  On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   Is it advisable to enable bloom filters on the column family?
  
   Also, why is it called global kill switch?
  
   Bloom Filter Configuration
 2.9.1. io.hfile.bloom.enabled global kill switch
  
   io.hfile.bloom.enabled in Configuration serves as the kill switch in
 case
   something goes wrong. Default = true.
  
 




-- 
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr


Re: Cluster load

2012-07-27 Thread syed kather
Alex Baranau,

Can please tell how did you found it has 2GB of data from
0a5f6fadd0435898c6f4cf11daa9895a . I am pretty much intrested to know it .
Thanks and Regards,
S SYED ABDUL KATHER



On Fri, Jul 27, 2012 at 7:51 PM, Alex Baranau alex.barano...@gmail.comwrote:

 From what you posted above, I guess one of the regions
 (0a5f6fadd0435898c6f4cf11daa9895a,
 note that it has 2 files 2GB each [1], while others regions are empty) is
 getting hit with writes. You may want to run flush 'mytable' command from
 hbase shell before looking at hdfs - this way you make sure your data is
 flushed to hdfs (and not hanged in Memstores).

 You may want to check the START/END keys of this region (via master web ui
 or in .META.). Then you can compare with the keys generated by your app.
 This should give you some info about what's going on.

 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr

 [1]

 -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59

 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717

 -rwxr-xr-x 3 root root 2003372 2012-07-26 13:57

 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502

 On Fri, Jul 27, 2012 at 3:16 AM, Khang Pham khang...@gmail.com wrote:

  Hi,
 
  by node do you mean regionserver node ?
 
  if you referring to RegionServer node: you can go to the hbase master web
  interface master:65510/master.jsp to see load for each regionserver.
 That's
  the overall load. If you want to see load per node per table, you will
 need
  to query on .META. table (column: info:server)
 
 
 
  --K
  On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   Is there a way to see how much data does each node have per Hbase
 table?
  
   On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com
 wrote:
  
First check whether the data in hbase is consistent ... check this by
running hbck (bin/hbase hbck ) If all the region is consistent .
Now check no of splits in localhost:60010 for the table mention ..
 On Jul 27, 2012 4:02 AM, Mohit Anchlia mohitanch...@gmail.com
   wrote:
   
 I added new regions and the performance didn't improve. I think it
   still
is
 the load balancing issue. I want to ensure that my rows are getting
 distrbuted accross cluster. What I see is this:

 Could you please tell me what's the best way to see the load?


 [root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/

 drwxr-xr-x - root root 3 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641

 drwxr-xr-x - root root 1 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs

 -rwxr-xr-x 3 root root 124 2012-07-26 13:32


   
  
 
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359

 drwxr-xr-x - root root 0 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX

 -rwxr-xr-x 3 root root 764 2012-07-26 13:32

 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo

 drwxr-xr-x - root root 3 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854

 drwxr-xr-x - root root 1 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs

 -rwxr-xr-x 3 root root 124 2012-07-26 13:32


   
  
 
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs/hlog.1343334723093

 drwxr-xr-x - root root 0 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/S_T_MTX

 -rwxr-xr-x 3 root root 764 2012-07-26 13:32

 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.regioninfo

 drwxr-xr-x - root root 3 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba

 drwxr-xr-x - root root 1 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs

 -rwxr-xr-x 3 root root 124 2012-07-26 13:32


   
  
 
 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs/hlog.1343334723240

 drwxr-xr-x - root root 0 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/S_T_MTX

 -rwxr-xr-x 3 root root 764 2012-07-26 13:32

 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.regioninfo

 drwxr-xr-x - root root 3 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a

 drwxr-xr-x - root root 1 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs

 -rwxr-xr-x 3 root root 124 2012-07-26 13:32


   
  
 
 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs/hlog.1343334723171

 drwxr-xr-x - root root 0 2012-07-26 13:32
 

Re: Cluster load

2012-07-27 Thread Alex Baranau
-rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717

1993369 is the size. Oh sorry. It is 2MB, not 2GB. Yeah, that doesn't
tell a lot. Looks like all data is in Memstore. As I said, you should try
flushing the table, so that you can see where data was written.

Of course it is always great to setup monitoring and see what is going on ;)

Anyhow, the piece pasted above, means:

table:SESSION_TIMELINE1, region: 0a5f6fadd0435898c6f4cf11daa9895a,
 columnFamily: S_T_MTX, hfile(created by memstore flush): 1566523617482885717,
size: 1993369 bytes.

btw, 2MB looks weird: very small flush size (in this case, in other cases
this may happen - long story). May be compression does very well :)

Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Fri, Jul 27, 2012 at 10:52 AM, syed kather in.ab...@gmail.com wrote:

 Alex Baranau,

 Can please tell how did you found it has 2GB of data from
 0a5f6fadd0435898c6f4cf11daa9895a . I am pretty much intrested to know it
 .
 Thanks and Regards,
 S SYED ABDUL KATHER



 On Fri, Jul 27, 2012 at 7:51 PM, Alex Baranau alex.barano...@gmail.com
 wrote:

  From what you posted above, I guess one of the regions
  (0a5f6fadd0435898c6f4cf11daa9895a,
  note that it has 2 files 2GB each [1], while others regions are empty)
 is
  getting hit with writes. You may want to run flush 'mytable' command
 from
  hbase shell before looking at hdfs - this way you make sure your data is
  flushed to hdfs (and not hanged in Memstores).
 
  You may want to check the START/END keys of this region (via master web
 ui
  or in .META.). Then you can compare with the keys generated by your app.
  This should give you some info about what's going on.
 
  Alex Baranau
  --
  Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
 -
  Solr
 
  [1]
 
  -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
 
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
 
  -rwxr-xr-x 3 root root 2003372 2012-07-26 13:57
 
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502
 
  On Fri, Jul 27, 2012 at 3:16 AM, Khang Pham khang...@gmail.com wrote:
 
   Hi,
  
   by node do you mean regionserver node ?
  
   if you referring to RegionServer node: you can go to the hbase master
 web
   interface master:65510/master.jsp to see load for each regionserver.
  That's
   the overall load. If you want to see load per node per table, you will
  need
   to query on .META. table (column: info:server)
  
  
  
   --K
   On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia mohitanch...@gmail.com
   wrote:
  
Is there a way to see how much data does each node have per Hbase
  table?
   
On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com
  wrote:
   
 First check whether the data in hbase is consistent ... check this
 by
 running hbck (bin/hbase hbck ) If all the region is consistent .
 Now check no of splits in localhost:60010 for the table mention ..
  On Jul 27, 2012 4:02 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

  I added new regions and the performance didn't improve. I think
 it
still
 is
  the load balancing issue. I want to ensure that my rows are
 getting
  distrbuted accross cluster. What I see is this:
 
  Could you please tell me what's the best way to see the load?
 
 
  [root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/
 
  drwxr-xr-x - root root 3 2012-07-26 13:32
  /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641
 
  drwxr-xr-x - root root 1 2012-07-26 13:32
 
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs
 
  -rwxr-xr-x 3 root root 124 2012-07-26 13:32
 
 

   
  
 
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359
 
  drwxr-xr-x - root root 0 2012-07-26 13:32
  /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX
 
  -rwxr-xr-x 3 root root 764 2012-07-26 13:32
 
  /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo
 
  drwxr-xr-x - root root 3 2012-07-26 13:32
  /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854
 
  drwxr-xr-x - root root 1 2012-07-26 13:32
 
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs
 
  -rwxr-xr-x 3 root root 124 2012-07-26 13:32
 
 

   
  
 
 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs/hlog.1343334723093
 
  drwxr-xr-x - root root 0 2012-07-26 13:32
  /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/S_T_MTX
 
  -rwxr-xr-x 3 root root 764 2012-07-26 13:32
 
  /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.regioninfo
 
  drwxr-xr-x - root root 3 2012-07-26 13:32

Re: Cluster load

2012-07-27 Thread syed kather
Thank you so much for your valuable information. I had not yet used any
monitoring tool .. can please suggest me a good monitor tool .

Syed Abdul kather
send from Samsung S3
On Jul 27, 2012 11:37 PM, Alex Baranau alex.barano...@gmail.com wrote:

 -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59

 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717

 1993369 is the size. Oh sorry. It is 2MB, not 2GB. Yeah, that doesn't
 tell a lot. Looks like all data is in Memstore. As I said, you should try
 flushing the table, so that you can see where data was written.

 Of course it is always great to setup monitoring and see what is going on
 ;)

 Anyhow, the piece pasted above, means:

 table:SESSION_TIMELINE1, region: 0a5f6fadd0435898c6f4cf11daa9895a,
  columnFamily: S_T_MTX, hfile(created by memstore flush):
 1566523617482885717,
 size: 1993369 bytes.

 btw, 2MB looks weird: very small flush size (in this case, in other cases
 this may happen - long story). May be compression does very well :)

 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr

 On Fri, Jul 27, 2012 at 10:52 AM, syed kather in.ab...@gmail.com wrote:

  Alex Baranau,
 
  Can please tell how did you found it has 2GB of data from
  0a5f6fadd0435898c6f4cf11daa9895a . I am pretty much intrested to know
 it
  .
  Thanks and Regards,
  S SYED ABDUL KATHER
 
 
 
  On Fri, Jul 27, 2012 at 7:51 PM, Alex Baranau alex.barano...@gmail.com
  wrote:
 
   From what you posted above, I guess one of the regions
   (0a5f6fadd0435898c6f4cf11daa9895a,
   note that it has 2 files 2GB each [1], while others regions are
 empty)
  is
   getting hit with writes. You may want to run flush 'mytable' command
  from
   hbase shell before looking at hdfs - this way you make sure your data
 is
   flushed to hdfs (and not hanged in Memstores).
  
   You may want to check the START/END keys of this region (via master web
  ui
   or in .META.). Then you can compare with the keys generated by your
 app.
   This should give you some info about what's going on.
  
   Alex Baranau
   --
   Sematext :: http://blog.sematext.com/ :: Hadoop - HBase -
 ElasticSearch
  -
   Solr
  
   [1]
  
   -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
  
  
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
  
   -rwxr-xr-x 3 root root 2003372 2012-07-26 13:57
  
  
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502
  
   On Fri, Jul 27, 2012 at 3:16 AM, Khang Pham khang...@gmail.com
 wrote:
  
Hi,
   
by node do you mean regionserver node ?
   
if you referring to RegionServer node: you can go to the hbase master
  web
interface master:65510/master.jsp to see load for each regionserver.
   That's
the overall load. If you want to see load per node per table, you
 will
   need
to query on .META. table (column: info:server)
   
   
   
--K
On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia 
 mohitanch...@gmail.com
wrote:
   
 Is there a way to see how much data does each node have per Hbase
   table?

 On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com
   wrote:

  First check whether the data in hbase is consistent ... check
 this
  by
  running hbck (bin/hbase hbck ) If all the region is consistent .
  Now check no of splits in localhost:60010 for the table mention
 ..
   On Jul 27, 2012 4:02 AM, Mohit Anchlia 
 mohitanch...@gmail.com
 wrote:
 
   I added new regions and the performance didn't improve. I think
  it
 still
  is
   the load balancing issue. I want to ensure that my rows are
  getting
   distrbuted accross cluster. What I see is this:
  
   Could you please tell me what's the best way to see the load?
  
  
   [root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/
  
   drwxr-xr-x - root root 3 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
  
  /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 

   
  
 
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359
  
   drwxr-xr-x - root root 0 2012-07-26 13:32
  
 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX
  
   -rwxr-xr-x 3 root root 764 2012-07-26 13:32
  
   /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo
  
   drwxr-xr-x - root root 3 2012-07-26 13:32
   /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854
  
   drwxr-xr-x - root root 1 2012-07-26 13:32
  
  /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs
  
   -rwxr-xr-x 3 root root 124 2012-07-26 13:32
  
  
 
 

Re: Cluster load

2012-07-27 Thread Alex Baranau
You can read metrics [0] from JMX directly [1] or use Ganglia [2] or other
third-party tools like [3] (I'm a little biased here;)).

[0] http://hbase.apache.org/book.html#hbase_metrics
[1] http://hbase.apache.org/metrics.html
[2] http://wiki.apache.org/hadoop/GangliaMetrics
[3] http://sematext.com/spm/hbase-performance-monitoring/index.html

Note, that metrics values may seem a bit ugly/weird: as they say, you have
to refer to Lars' book HBase in Action to understand how some of them
calculated. There's an ongoing work towards revising metrics, they should
look much better in next releases.

Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Fri, Jul 27, 2012 at 2:21 PM, syed kather in.ab...@gmail.com wrote:

 Thank you so much for your valuable information. I had not yet used any
 monitoring tool .. can please suggest me a good monitor tool .

 Syed Abdul kather
 send from Samsung S3
 On Jul 27, 2012 11:37 PM, Alex Baranau alex.barano...@gmail.com wrote:

  -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
 
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
 
  1993369 is the size. Oh sorry. It is 2MB, not 2GB. Yeah, that doesn't
  tell a lot. Looks like all data is in Memstore. As I said, you should try
  flushing the table, so that you can see where data was written.
 
  Of course it is always great to setup monitoring and see what is going on
  ;)
 
  Anyhow, the piece pasted above, means:
 
  table:SESSION_TIMELINE1, region: 0a5f6fadd0435898c6f4cf11daa9895a,
   columnFamily: S_T_MTX, hfile(created by memstore flush):
  1566523617482885717,
  size: 1993369 bytes.
 
  btw, 2MB looks weird: very small flush size (in this case, in other cases
  this may happen - long story). May be compression does very well :)
 
  Alex Baranau
  --
  Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
 -
  Solr
 
  On Fri, Jul 27, 2012 at 10:52 AM, syed kather in.ab...@gmail.com
 wrote:
 
   Alex Baranau,
  
   Can please tell how did you found it has 2GB of data from
   0a5f6fadd0435898c6f4cf11daa9895a . I am pretty much intrested to know
  it
   .
   Thanks and Regards,
   S SYED ABDUL KATHER
  
  
  
   On Fri, Jul 27, 2012 at 7:51 PM, Alex Baranau 
 alex.barano...@gmail.com
   wrote:
  
From what you posted above, I guess one of the regions
(0a5f6fadd0435898c6f4cf11daa9895a,
note that it has 2 files 2GB each [1], while others regions are
  empty)
   is
getting hit with writes. You may want to run flush 'mytable'
 command
   from
hbase shell before looking at hdfs - this way you make sure your data
  is
flushed to hdfs (and not hanged in Memstores).
   
You may want to check the START/END keys of this region (via master
 web
   ui
or in .META.). Then you can compare with the keys generated by your
  app.
This should give you some info about what's going on.
   
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase -
  ElasticSearch
   -
Solr
   
[1]
   
-rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
   
   
  
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
   
-rwxr-xr-x 3 root root 2003372 2012-07-26 13:57
   
   
  
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502
   
On Fri, Jul 27, 2012 at 3:16 AM, Khang Pham khang...@gmail.com
  wrote:
   
 Hi,

 by node do you mean regionserver node ?

 if you referring to RegionServer node: you can go to the hbase
 master
   web
 interface master:65510/master.jsp to see load for each
 regionserver.
That's
 the overall load. If you want to see load per node per table, you
  will
need
 to query on .META. table (column: info:server)



 --K
 On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia 
  mohitanch...@gmail.com
 wrote:

  Is there a way to see how much data does each node have per Hbase
table?
 
  On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com
 
wrote:
 
   First check whether the data in hbase is consistent ... check
  this
   by
   running hbck (bin/hbase hbck ) If all the region is consistent
 .
   Now check no of splits in localhost:60010 for the table mention
  ..
On Jul 27, 2012 4:02 AM, Mohit Anchlia 
  mohitanch...@gmail.com
  wrote:
  
I added new regions and the performance didn't improve. I
 think
   it
  still
   is
the load balancing issue. I want to ensure that my rows are
   getting
distrbuted accross cluster. What I see is this:
   
Could you please tell me what's the best way to see the load?
   
   
[root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/
   
drwxr-xr-x - root root 3 2012-07-26 13:32

Question about heartbeat between HBase masters and Zookeeper

2012-07-27 Thread Yongcheng Li
I am testing my multi-master HBase configuration and want to know how HBase 
master keep its Zookeeper session alive. Does the master (version .94 or later) 
send heartbeat message to ZooKeeper periodically? If yes, how often does it 
send the heartbeat message? If not, how does it present the session from 
expiring?

Thanks!

Yongcheng Li


Re: Bloom Filter

2012-07-27 Thread Stack
On Fri, Jul 27, 2012 at 4:25 PM, Alex Baranau alex.barano...@gmail.com wrote:
 Should we put the link to it from Apache HBase book (ref guide)?


I added link.  Will show next time we push the site.
St.Ack


Re: Question about heartbeat between HBase masters and Zookeeper

2012-07-27 Thread Jean-Daniel Cryans
It's all handled by the zookeeper library, check out their
documentation: http://zookeeper.apache.org/doc/trunk/

J-D

On Fri, Jul 27, 2012 at 11:53 AM, Yongcheng Li yongcheng...@sas.com wrote:
 I am testing my multi-master HBase configuration and want to know how HBase 
 master keep its Zookeeper session alive. Does the master (version .94 or 
 later) send heartbeat message to ZooKeeper periodically? If yes, how often 
 does it send the heartbeat message? If not, how does it present the session 
 from expiring?

 Thanks!

 Yongcheng Li


Re: Bloom Filter

2012-07-27 Thread Mohit Anchlia
On Fri, Jul 27, 2012 at 7:25 AM, Alex Baranau alex.barano...@gmail.comwrote:

 Very good explanation (and food for thinking) about using bloom filters in
 HBase in answers here:
 http://www.quora.com/How-are-bloom-filters-used-in-HBase.

 Should we put the link to it from Apache HBase book (ref guide)?


Thanks this is helpful


 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr

 On Thu, Jul 26, 2012 at 8:38 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen mdngu...@gmail.com
  wrote:
 
   Mohit,
  
   According to HBase: The Definitive Guide,
  
   The row+column Bloom filter is useful when you cannot batch updates
 for a
   specific row, and end up with store files which all contain parts of
 the
   row. The more specific row+column filter can then identify which of the
   files contain the data you are requesting. Obviously, if you always
 load
   the entire row, this filter is once again hardly useful, as the region
   server will need to load the matching block out of each file anyway.
   Since
   the row+column filter will require more storage, you need to do the
 math
  to
   determine whether it is worth the extra resources.
  
 
  Thanks! I have a timeseries data so I am thinking I should enable bloom
  filters for only rows
 
  
  
  ~ Minh
  
   On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia mohitanch...@gmail.com
   wrote:
  
Is it advisable to enable bloom filters on the column family?
   
Also, why is it called global kill switch?
   
Bloom Filter Configuration
  2.9.1. io.hfile.bloom.enabled global kill switch
   
io.hfile.bloom.enabled in Configuration serves as the kill switch in
  case
something goes wrong. Default = true.
   
  
 



 --
 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr



Re: Cluster load

2012-07-27 Thread Mohit Anchlia
On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau alex.barano...@gmail.comwrote:

 You can read metrics [0] from JMX directly [1] or use Ganglia [2] or other
 third-party tools like [3] (I'm a little biased here;)).

 [0] http://hbase.apache.org/book.html#hbase_metrics
 [1] http://hbase.apache.org/metrics.html
 [2] http://wiki.apache.org/hadoop/GangliaMetrics
 [3] http://sematext.com/spm/hbase-performance-monitoring/index.html

 Note, that metrics values may seem a bit ugly/weird: as they say, you have
 to refer to Lars' book HBase in Action to understand how some of them
 calculated. There's an ongoing work towards revising metrics, they should
 look much better in next releases.



I did flush still what I am seeing is that all my keys are still going to
the first region even though my keys have 0-9 as the first character. Is
there a easy way to see why that might be? hbase shell scan only shows
value in hex.

  SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435 column=info:regioninfo,
timestamp=1343334723073, value=REGION = {NAME =
'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989
 898c6f4cf11daa9895a.   5a.', STARTKEY = '',
ENDKEY = '0', ENCODED = 0a5f6fadd0435898c6f4cf11daa9895a, TABLE = {{NAME
= 'SESSION_TIMELINE1', FAMILIES = [{NA
ME = 'S_T_MTX',
BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
'65536', IN_MEMORY =
'false', BLOCKCACHE = 'true'}]}}
 SESSION_TIMELINE1,0,1343334722986.79e03d78a784 column=info:regioninfo,
timestamp=1343334723116, value=REGION = {NAME =
'SESSION_TIMELINE1,0,1343334722986.79e03d78a784601e8daa88aa85c39
 601e8daa88aa85c39854.  854.', STARTKEY = '0',
ENDKEY = '1', ENCODED = 79e03d78a784601e8daa88aa85c39854, TABLE = {{NAME
= 'SESSION_TIMELINE1', FAMILIES = [{
NAME = 'S_T_MTX',
BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
 '65536', IN_MEMORY =
'false', BLOCKCACHE = 'true'}]}}
 SESSION_TIMELINE1,1,1343334722987.1f0735a7e085 column=info:regioninfo,
timestamp=1343334723154, value=REGION = {NAME =
'SESSION_TIMELINE1,1,1343334722987.1f0735a7e08504357d0bca07e6772
 04357d0bca07e6772a75.  a75.', STARTKEY = '1',
ENDKEY = '2', ENCODED = 1f0735a7e08504357d0bca07e6772a75, TABLE = {{NAME
= 'SESSION_TIMELINE1', FAMILIES = [{
NAME = 'S_T_MTX',
BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
 '65536', IN_MEMORY =
'false', BLOCKCACHE = 'true'}]}}

drwxr-xr-x   - root root  0 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/S_T_MTX
-rwxr-xr-x   3 root root762 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.regioninfo
drwxr-xr-x   - root root  4 2012-07-26 13:57
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a
drwxr-xr-x   - root root  0 2012-07-27 16:10
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.tmp
drwxr-xr-x   - root root  1 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs
-rwxr-xr-x   3 root root124 2012-07-26 13:32
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs/hlog.1343334723004
drwxr-xr-x   - root root  2 2012-07-27 16:10
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX
-rwxr-xr-x   3 root root   20249146 2012-07-26 17:54
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/5004103053282833292
-rwxr-xr-x   3 root root1400171 2012-07-27 16:10
/hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/8545160324229826840

 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr

  On Fri, Jul 27, 2012 at 2:21 PM, syed kather in.ab...@gmail.com wrote:

  Thank you so much for your valuable information. I had not yet used any
  monitoring tool .. can please suggest me a good monitor tool .
 
  Syed Abdul kather
  send from Samsung S3
  On Jul 27, 2012 11:37 PM, Alex Baranau alex.barano...@gmail.com
 wrote:
 
   -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59
  
  
 
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717
  
   1993369 is the size. Oh sorry. It is 2MB, not 2GB. Yeah, that doesn't
   tell a lot. Looks like all data is in Memstore. As I said, you should
 try
   flushing the table, so that you can see where data was written.
  
   Of course it is always great to setup monitoring and see what is going
 on
   ;)
  
   Anyhow, the piece pasted above, means:
  
   table:SESSION_TIMELINE1, region: 

Re: Cluster load

2012-07-27 Thread Alex Baranau
Can you scan your table and show one record?

I guess you might be confusing Bytes.toBytes(0) vs byte[] {(byte) 0} that
I mentioned in the other thread. I.e. looks like first region holds records
which key starts with any byte up to 0, which is (byte) 48. Hence, if you
set first byte of your key to anything from (byte) 0 - (byte) 9, all of
them will fall into first regions which holds records with prefixes (byte)
0 - (byte) 48.

Could you check that?

Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Fri, Jul 27, 2012 at 7:24 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau alex.barano...@gmail.com
 wrote:

  You can read metrics [0] from JMX directly [1] or use Ganglia [2] or
 other
  third-party tools like [3] (I'm a little biased here;)).
 
  [0] http://hbase.apache.org/book.html#hbase_metrics
  [1] http://hbase.apache.org/metrics.html
  [2] http://wiki.apache.org/hadoop/GangliaMetrics
  [3] http://sematext.com/spm/hbase-performance-monitoring/index.html
 
  Note, that metrics values may seem a bit ugly/weird: as they say, you
 have
  to refer to Lars' book HBase in Action to understand how some of them
  calculated. There's an ongoing work towards revising metrics, they should
  look much better in next releases.
 
 

 I did flush still what I am seeing is that all my keys are still going to
 the first region even though my keys have 0-9 as the first character. Is
 there a easy way to see why that might be? hbase shell scan only shows
 value in hex.

   SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435 column=info:regioninfo,
 timestamp=1343334723073, value=REGION = {NAME =
 'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989
  898c6f4cf11daa9895a.   5a.', STARTKEY = '',
 ENDKEY = '0', ENCODED = 0a5f6fadd0435898c6f4cf11daa9895a, TABLE = {{NAME
 = 'SESSION_TIMELINE1', FAMILIES = [{NA
 ME = 'S_T_MTX',
 BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
 VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
 '65536', IN_MEMORY =
 'false', BLOCKCACHE = 'true'}]}}
  SESSION_TIMELINE1,0,1343334722986.79e03d78a784 column=info:regioninfo,
 timestamp=1343334723116, value=REGION = {NAME =
 'SESSION_TIMELINE1,0,1343334722986.79e03d78a784601e8daa88aa85c39
  601e8daa88aa85c39854.  854.', STARTKEY = '0',
 ENDKEY = '1', ENCODED = 79e03d78a784601e8daa88aa85c39854, TABLE = {{NAME
 = 'SESSION_TIMELINE1', FAMILIES = [{
 NAME = 'S_T_MTX',
 BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
 VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
  '65536', IN_MEMORY =
 'false', BLOCKCACHE = 'true'}]}}
  SESSION_TIMELINE1,1,1343334722987.1f0735a7e085 column=info:regioninfo,
 timestamp=1343334723154, value=REGION = {NAME =
 'SESSION_TIMELINE1,1,1343334722987.1f0735a7e08504357d0bca07e6772
  04357d0bca07e6772a75.  a75.', STARTKEY = '1',
 ENDKEY = '2', ENCODED = 1f0735a7e08504357d0bca07e6772a75, TABLE = {{NAME
 = 'SESSION_TIMELINE1', FAMILIES = [{
 NAME = 'S_T_MTX',
 BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
 VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
  '65536', IN_MEMORY =
 'false', BLOCKCACHE = 'true'}]}}

 drwxr-xr-x   - root root  0 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/S_T_MTX
 -rwxr-xr-x   3 root root762 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.regioninfo
 drwxr-xr-x   - root root  4 2012-07-26 13:57
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a
 drwxr-xr-x   - root root  0 2012-07-27 16:10
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.tmp
 drwxr-xr-x   - root root  1 2012-07-26 13:32
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs
 -rwxr-xr-x   3 root root124 2012-07-26 13:32

 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs/hlog.1343334723004
 drwxr-xr-x   - root root  2 2012-07-27 16:10
 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX
 -rwxr-xr-x   3 root root   20249146 2012-07-26 17:54

 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/5004103053282833292
 -rwxr-xr-x   3 root root1400171 2012-07-27 16:10

 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/8545160324229826840

  Alex Baranau
  --
  Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
 -
  Solr
 
   On Fri, Jul 27, 2012 at 2:21 PM, syed kather in.ab...@gmail.com
 wrote:
 
   Thank you so much for your valuable information. I had not yet used any
   monitoring tool .. can 

Re: Cluster load

2012-07-27 Thread Mohit Anchlia
On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau alex.barano...@gmail.comwrote:

 Can you scan your table and show one record?

 I guess you might be confusing Bytes.toBytes(0) vs byte[] {(byte) 0} that
 I mentioned in the other thread. I.e. looks like first region holds records
 which key starts with any byte up to 0, which is (byte) 48. Hence, if you
 set first byte of your key to anything from (byte) 0 - (byte) 9, all of
 them will fall into first regions which holds records with prefixes (byte)
 0 - (byte) 48.

 Could you check that?


I thought that if I give Bytes.toBytes(0) it really means that the row
keys starting with 0 will go in that region. Here is my code that creates
a row key and splits using admin util. I also am including the output of
hbase shell scan after the code.

public static byte[][] splitRegionsSessionTimeline(int start, int end) {
 byte[][] splitKeys = new byte[end][];
 // the first region starting with empty key will be created
 // automatically
 for (int i = 0; i  splitKeys.length; i++) {
  splitKeys[i] = Bytes.toBytes(String.valueOf(i));
 }
 return splitKeys;
}
 public static byte [] getRowKey(MetricType metricName, Long timestamp,
Short bucketNo, char rowDelim){
  byte [] result = null;
  int rowSize = getRowSize();
  ByteBuffer b = ByteBuffer.allocate(rowSize);
  //Bucket No 0-9 randomely
  b.putShort(bucketNo);
  //Row Delimiter
  b.putChar(rowDelim);
  b.putShort(metricName.getId());
  long reverseOrderEpoch = getReverseBaseTimeStamp(metricName,timestamp);
  b.putLong(reverseOrderEpoch);
  result = b.array();
  return result;
}

from hbase shell scan table:

  \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gK,
timestamp=1343350528865, value=1343350646443
 F
 \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gL,
timestamp=1343350528866, value=1343350646444
 F
 \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gU,
timestamp=1343350528874, value=1343350646453
 F
 \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gZ,
timestamp=1343350528880, value=1343350646458
 F

 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr

 On Fri, Jul 27, 2012 at 7:24 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau alex.barano...@gmail.com
  wrote:
 
   You can read metrics [0] from JMX directly [1] or use Ganglia [2] or
  other
   third-party tools like [3] (I'm a little biased here;)).
  
   [0] http://hbase.apache.org/book.html#hbase_metrics
   [1] http://hbase.apache.org/metrics.html
   [2] http://wiki.apache.org/hadoop/GangliaMetrics
   [3] http://sematext.com/spm/hbase-performance-monitoring/index.html
  
   Note, that metrics values may seem a bit ugly/weird: as they say, you
  have
   to refer to Lars' book HBase in Action to understand how some of them
   calculated. There's an ongoing work towards revising metrics, they
 should
   look much better in next releases.
  
  
 
  I did flush still what I am seeing is that all my keys are still going to
  the first region even though my keys have 0-9 as the first character. Is
  there a easy way to see why that might be? hbase shell scan only shows
  value in hex.
 
SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435 column=info:regioninfo,
  timestamp=1343334723073, value=REGION = {NAME =
  'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989
   898c6f4cf11daa9895a.   5a.', STARTKEY = '',
  ENDKEY = '0', ENCODED = 0a5f6fadd0435898c6f4cf11daa9895a, TABLE =
 {{NAME
  = 'SESSION_TIMELINE1', FAMILIES = [{NA
  ME = 'S_T_MTX',
  BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
  VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
  '65536', IN_MEMORY =
  'false', BLOCKCACHE = 'true'}]}}
   SESSION_TIMELINE1,0,1343334722986.79e03d78a784 column=info:regioninfo,
  timestamp=1343334723116, value=REGION = {NAME =
  'SESSION_TIMELINE1,0,1343334722986.79e03d78a784601e8daa88aa85c39
   601e8daa88aa85c39854.  854.', STARTKEY = '0',
  ENDKEY = '1', ENCODED = 79e03d78a784601e8daa88aa85c39854, TABLE =
 {{NAME
  = 'SESSION_TIMELINE1', FAMILIES = [{
  NAME = 'S_T_MTX',
  BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
  VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
   '65536', IN_MEMORY =
  'false', BLOCKCACHE = 'true'}]}}
   SESSION_TIMELINE1,1,1343334722987.1f0735a7e085 column=info:regioninfo,
  timestamp=1343334723154, value=REGION = {NAME =
  'SESSION_TIMELINE1,1,1343334722987.1f0735a7e08504357d0bca07e6772
   04357d0bca07e6772a75.  a75.', STARTKEY = '1',
  ENDKEY = '2', ENCODED = 1f0735a7e08504357d0bca07e6772a75, TABLE =
 {{NAME
  = 'SESSION_TIMELINE1', FAMILIES = [{
 

Re: Cluster load

2012-07-27 Thread Alex Baranau
Yeah, your row keys start with \x00 which is = (byte) 0. This is not the
same as 0 (which is = (byte) 48). You know what to fix now ;)

Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr


On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau alex.barano...@gmail.com
 wrote:

  Can you scan your table and show one record?
 
  I guess you might be confusing Bytes.toBytes(0) vs byte[] {(byte) 0}
 that
  I mentioned in the other thread. I.e. looks like first region holds
 records
  which key starts with any byte up to 0, which is (byte) 48. Hence, if
 you
  set first byte of your key to anything from (byte) 0 - (byte) 9, all of
  them will fall into first regions which holds records with prefixes
 (byte)
  0 - (byte) 48.
 
  Could you check that?
 
 
 I thought that if I give Bytes.toBytes(0) it really means that the row
 keys starting with 0 will go in that region. Here is my code that creates
 a row key and splits using admin util. I also am including the output of
 hbase shell scan after the code.

 public static byte[][] splitRegionsSessionTimeline(int start, int end) {
  byte[][] splitKeys = new byte[end][];
  // the first region starting with empty key will be created
  // automatically
  for (int i = 0; i  splitKeys.length; i++) {
   splitKeys[i] = Bytes.toBytes(String.valueOf(i));
  }
  return splitKeys;
 }
  public static byte [] getRowKey(MetricType metricName, Long timestamp,
 Short bucketNo, char rowDelim){
   byte [] result = null;
   int rowSize = getRowSize();
   ByteBuffer b = ByteBuffer.allocate(rowSize);
   //Bucket No 0-9 randomely
   b.putShort(bucketNo);
   //Row Delimiter
   b.putChar(rowDelim);
   b.putShort(metricName.getId());
   long reverseOrderEpoch = getReverseBaseTimeStamp(metricName,timestamp);
   b.putLong(reverseOrderEpoch);
   result = b.array();
   return result;
 }

 from hbase shell scan table:

   \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9
 column=S_T_MTX:\x00\x00gK,
 timestamp=1343350528865, value=1343350646443
  F
  \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gL,
 timestamp=1343350528866, value=1343350646444
  F
  \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gU,
 timestamp=1343350528874, value=1343350646453
  F
  \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gZ,
 timestamp=1343350528880, value=1343350646458
  F

  Alex Baranau
  --
  Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
 -
  Solr
 
  On Fri, Jul 27, 2012 at 7:24 PM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau 
 alex.barano...@gmail.com
   wrote:
  
You can read metrics [0] from JMX directly [1] or use Ganglia [2] or
   other
third-party tools like [3] (I'm a little biased here;)).
   
[0] http://hbase.apache.org/book.html#hbase_metrics
[1] http://hbase.apache.org/metrics.html
[2] http://wiki.apache.org/hadoop/GangliaMetrics
[3] http://sematext.com/spm/hbase-performance-monitoring/index.html
   
Note, that metrics values may seem a bit ugly/weird: as they say, you
   have
to refer to Lars' book HBase in Action to understand how some of them
calculated. There's an ongoing work towards revising metrics, they
  should
look much better in next releases.
   
   
  
   I did flush still what I am seeing is that all my keys are still going
 to
   the first region even though my keys have 0-9 as the first character.
 Is
   there a easy way to see why that might be? hbase shell scan only shows
   value in hex.
  
 SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435
 column=info:regioninfo,
   timestamp=1343334723073, value=REGION = {NAME =
   'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989
898c6f4cf11daa9895a.   5a.', STARTKEY = '',
   ENDKEY = '0', ENCODED = 0a5f6fadd0435898c6f4cf11daa9895a, TABLE =
  {{NAME
   = 'SESSION_TIMELINE1', FAMILIES = [{NA
   ME = 'S_T_MTX',
   BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
   VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
   '65536', IN_MEMORY =
   'false', BLOCKCACHE = 'true'}]}}
SESSION_TIMELINE1,0,1343334722986.79e03d78a784 column=info:regioninfo,
   timestamp=1343334723116, value=REGION = {NAME =
   'SESSION_TIMELINE1,0,1343334722986.79e03d78a784601e8daa88aa85c39
601e8daa88aa85c39854.  854.', STARTKEY = '0',
   ENDKEY = '1', ENCODED = 79e03d78a784601e8daa88aa85c39854, TABLE =
  {{NAME
   = 'SESSION_TIMELINE1', FAMILIES = [{
   NAME = 'S_T_MTX',
   BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ',
   VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =