Using HBase data

2014-04-15 Thread Shashidhar Rao
Hi,

I am starting to think of a new project using Hadoop and Hbase as my
persistent store. But I am quite confused as to how to use these HBASE data.

1. Can these HBASE data be used in web applications. Meaning retrieving the
data and showing it on the web page.

Can somebody please suggest how HBASE data is used by other companies.

Some use case links would certainly be helpful.

Regards
Shashi


Aw: Re: replication verifyrep

2014-04-15 Thread Hansi Klose
Hi Jean-Daniel,

thank you for your answer and bring some light into the darkness.

 You can see the bad rows listed in the user logs for your MR job.

What log do you mean. The output from the command line? 
I only see the count of GOOD or BAD rows.
Are the bad rows listed in that log which are not replicated?

Regards Hansi

 Gesendet: Montag, 14. April 2014 um 19:25 Uhr
 Von: Jean-Daniel Cryans jdcry...@apache.org
 An: user@hbase.apache.org user@hbase.apache.org
 Betreff: Re: replication verifyrep

 Yeah you should use endtime, it was fixed as part of
 https://issues.apache.org/jira/browse/HBASE-10395.
 
 You can see the bad rows listed in the user logs for your MR job.
 
 J-D
 
 
 On Mon, Apr 14, 2014 at 3:06 AM, Hansi Klose hansi.kl...@web.de wrote:
 
  Hi,
 
  I wrote a little script which should control the running replication.
 
  The script is triggered by cron and executes the following command with
  the actual time stamp in endtime and
  a time stamp = endtime - 1080 milli seconds. So the time frame is 3
  hours.
 
  hadoop jar /usr/lib/hbase/hbase.jar verifyrep --starttime=1397217601927
  --endtime=1397228401927 --families=t 1 tablename 21
 
  After some running's the script found some BADROWS.
 
  14/04/11 17:04:05 INFO mapred.JobClient: BADROWS=176
  14/04/11 17:04:05 INFO mapred.JobClient: GOODROWS=2
 
  I executed the same command 20 Minutes later in the shell and got :
 
  hadoop jar /usr/lib/hbase/hbase.jar verifyrep --starttime=1397217601927
  --endtime=1397228401927 --families=t 1 tablename 21
  14/04/11 17:21:03 INFO mapred.JobClient: BADROWS=178
 
  After that I run the command with the same start time and the actual
  timestamp an end time, so the time frame is greater
  but with the same start time. And now I got :
 
  hadoop jar /usr/lib/hbase/hbase.jar verifyrep --starttime=1397217601927
  --endtime=1397230074876 --families=t 1 tablename 21
  14/04/11 17:28:28 INFO mapred.JobClient: GOODROWS=184
 
  Is there something wrong with the command?
  In our metrics i could not see that three is an Issue at that time.
 
  We are a little bit confused about the endtime. In all documents they talk
  about stoptime.
  But we found that in the job configuration there is no parameter called
  stoptime.
  We found the verifyrep.startTime which hold the value of the starttime
  in our command and
  verifyrep.endTime which is alway 0 when we use stoptime in the command.
  So we decided to use endtime
 
  Even in the code
  http://hbase.apache.org/xref/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.html
  they use: static long endTime = Long.MAX_VALUE;
 
  Which name is the right on? endtime or stoptime?
 
  We use cdh 4.2.0.
 
  Regards Hansi
 
 


Weird behavior splitting regions

2014-04-15 Thread Guillermo Ortiz
I have a table in Hbase that sizes around 96Gb,

I generate 4 regions of 30Gb. Some time, table starts to split because the
max size for region is 1Gb (I just realize of that, I'm going to change it
or create more pre-splits.).

There're two things that I don't understand. how is it creating the splits?
right now I have 130 regions and growing. The problem is the size of the
new regions:

1.7 M/hbase/filters/4ddbc34a2242e44c03121ae4608788a2
1.6 G/hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
3.1 G/hbase/filters/58b50df089bd9d4d1f079f53238e060d
2.5 M/hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
1.9 G/hbase/filters/5b0a35b5735a473b7e804c4b045ce374
883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
1.7 M/hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2

There're some new regions that they're just a some KBytes!. Why they are so
small?? When does HBase decide to split? because it started to split two
hours later to create the table.

One, I create the table and insert data, I don't insert new data or modify
them.


Another interested point it's why there're major compactions:
2014-04-15 11:33:47,400 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming compacted file at
hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
to
hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
2014-04-15 11:33:47,407 INFO
org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
(CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.Store:*
Completed major compaction of 1 file*(s) in d of
filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
789.1 M
2014-04-15 11:33:47,416 INFO
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
completed compaction:
regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
storeName=d, fileCount=1, fileSize=1.5 G, priority=6, time=414761474510060;
duration=7sec

I thought major compaction just happen once at day and compact many files
per region. Data is always the same here, I don't inject new data.


I'm working with 0.94.6 CDH44. I'm going to change the size of the regions,
but, I would like to understand why things happen.

Thank you.


Re: Weird behavior splitting regions

2014-04-15 Thread Bharath Vissapragada
There're some new regions that they're just a some KBytes!. Why they are so
small?? When does HBase decide to split? because it started to split two
hours later to create the table.

When hbase does a split, it doesn't actually split at the disk/file level.
Its just a metadata operation which creates new regions that contain the
reference files that still point to old HFiles. That is the reason you find
KB size regions.

I thought major compaction just happen once at day and compact many files
per region. Data is always the same here, I don't inject new data.

IIRC sometimes minor compactions get promoted to major compactions based on
some criteria, but I'll leave it for others to answer!



On Tue, Apr 15, 2014 at 3:15 PM, Guillermo Ortiz konstt2...@gmail.comwrote:

 I have a table in Hbase that sizes around 96Gb,

 I generate 4 regions of 30Gb. Some time, table starts to split because the
 max size for region is 1Gb (I just realize of that, I'm going to change it
 or create more pre-splits.).

 There're two things that I don't understand. how is it creating the splits?
 right now I have 130 regions and growing. The problem is the size of the
 new regions:

 1.7 M/hbase/filters/4ddbc34a2242e44c03121ae4608788a2
 1.6 G/hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
 3.1 G/hbase/filters/58b50df089bd9d4d1f079f53238e060d
 2.5 M/hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
 1.9 G/hbase/filters/5b0a35b5735a473b7e804c4b045ce374
 883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
 1.7 M/hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
 632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2

 There're some new regions that they're just a some KBytes!. Why they are so
 small?? When does HBase decide to split? because it started to split two
 hours later to create the table.

 One, I create the table and insert data, I don't insert new data or modify
 them.


 Another interested point it's why there're major compactions:
 2014-04-15 11:33:47,400 INFO org.apache.hadoop.hbase.regionserver.Store:
 Renaming compacted file at

 hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
 to

 hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
 2014-04-15 11:33:47,407 INFO
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
 (CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
 2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.Store:*
 Completed major compaction of 1 file*(s) in d of
 filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
 df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
 789.1 M
 2014-04-15 11:33:47,416 INFO
 org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
 completed compaction:
 regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
 storeName=d, fileCount=1, fileSize=1.5 G, priority=6, time=414761474510060;
 duration=7sec

 I thought major compaction just happen once at day and compact many files
 per region. Data is always the same here, I don't inject new data.


 I'm working with 0.94.6 CDH44. I'm going to change the size of the regions,
 but, I would like to understand why things happen.

 Thank you.




-- 
Bharath Vissapragada
http://www.cloudera.com


Re: Weird behavior splitting regions

2014-04-15 Thread divye sheth
The default split policy in hbase0.94.x is IncreaseToUpperBound rather than
ConstantSizeSplitPolicy which was the default in the older versions of
hbase.

Please refer to the link given below to understand how a
IncreaseToUpperBoundSplitPolicy works:
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
check the auto-splitting section

Hope this answers your question

Thanks
Divye Sheth



On Tue, Apr 15, 2014 at 3:36 PM, Bharath Vissapragada bhara...@cloudera.com
 wrote:

 There're some new regions that they're just a some KBytes!. Why they are
 so
 small?? When does HBase decide to split? because it started to split two
 hours later to create the table.

 When hbase does a split, it doesn't actually split at the disk/file level.
 Its just a metadata operation which creates new regions that contain the
 reference files that still point to old HFiles. That is the reason you find
 KB size regions.

 I thought major compaction just happen once at day and compact many files
 per region. Data is always the same here, I don't inject new data.

 IIRC sometimes minor compactions get promoted to major compactions based on
 some criteria, but I'll leave it for others to answer!



 On Tue, Apr 15, 2014 at 3:15 PM, Guillermo Ortiz konstt2...@gmail.com
 wrote:

  I have a table in Hbase that sizes around 96Gb,
 
  I generate 4 regions of 30Gb. Some time, table starts to split because
 the
  max size for region is 1Gb (I just realize of that, I'm going to change
 it
  or create more pre-splits.).
 
  There're two things that I don't understand. how is it creating the
 splits?
  right now I have 130 regions and growing. The problem is the size of the
  new regions:
 
  1.7 M/hbase/filters/4ddbc34a2242e44c03121ae4608788a2
  1.6 G/hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
  3.1 G/hbase/filters/58b50df089bd9d4d1f079f53238e060d
  2.5 M/hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
  1.9 G/hbase/filters/5b0a35b5735a473b7e804c4b045ce374
  883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
  1.7 M/hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
  632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2
 
  There're some new regions that they're just a some KBytes!. Why they are
 so
  small?? When does HBase decide to split? because it started to split two
  hours later to create the table.
 
  One, I create the table and insert data, I don't insert new data or
 modify
  them.
 
 
  Another interested point it's why there're major compactions:
  2014-04-15 11:33:47,400 INFO org.apache.hadoop.hbase.regionserver.Store:
  Renaming compacted file at
 
 
 hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
  to
 
 
 hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
  2014-04-15 11:33:47,407 INFO
  org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
  (CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
  2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.Store:*
  Completed major compaction of 1 file*(s) in d of
  filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
  df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
  789.1 M
  2014-04-15 11:33:47,416 INFO
  org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
  completed compaction:
  regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
  storeName=d, fileCount=1, fileSize=1.5 G, priority=6,
 time=414761474510060;
  duration=7sec
 
  I thought major compaction just happen once at day and compact many files
  per region. Data is always the same here, I don't inject new data.
 
 
  I'm working with 0.94.6 CDH44. I'm going to change the size of the
 regions,
  but, I would like to understand why things happen.
 
  Thank you.
 



 --
 Bharath Vissapragada
 http://www.cloudera.com



Re: Using HBase data

2014-04-15 Thread Ted Yu
Please take a look at 
https://m.facebook.com/UsingHbase?id=191660807562816refsrc=https%3A%2F%2Fwww.facebook.com%2FUsingHbase

On Apr 14, 2014, at 11:20 PM, Shashidhar Rao raoshashidhar...@gmail.com wrote:

 Hi,
 
 I am starting to think of a new project using Hadoop and Hbase as my
 persistent store. But I am quite confused as to how to use these HBASE data.
 
 1. Can these HBASE data be used in web applications. Meaning retrieving the
 data and showing it on the web page.
 
 Can somebody please suggest how HBASE data is used by other companies.
 
 Some use case links would certainly be helpful.
 
 Regards
 Shashi


Re: Weird behavior splitting regions

2014-04-15 Thread Guillermo Ortiz
I read the article, that's why I typed the question, because I didn't
understand the result I got.

Oh, yes!!, that's true, so silly.
I think some of the files are pretty small because the table has two
families and one of them is much smaller than the another one. So, it has
been splitted many  times. The big regions get a size close to 1Gb, but the
smaller regions has a final size pretty small because they have been
splitted a lot of times.

What I don't know, it's why HBase decides to split the table so late, not
when I create the table presplitted if not, two hours later or whatever.
Anyway, that's my error, I'm just curious about it.


2014-04-15 12:17 GMT+02:00 divye sheth divs.sh...@gmail.com:

 The default split policy in hbase0.94.x is IncreaseToUpperBound rather than
 ConstantSizeSplitPolicy which was the default in the older versions of
 hbase.

 Please refer to the link given below to understand how a
 IncreaseToUpperBoundSplitPolicy works:
 http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
 check the auto-splitting section

 Hope this answers your question

 Thanks
 Divye Sheth



 On Tue, Apr 15, 2014 at 3:36 PM, Bharath Vissapragada 
 bhara...@cloudera.com
  wrote:

  There're some new regions that they're just a some KBytes!. Why they are
  so
  small?? When does HBase decide to split? because it started to split two
  hours later to create the table.
 
  When hbase does a split, it doesn't actually split at the disk/file
 level.
  Its just a metadata operation which creates new regions that contain the
  reference files that still point to old HFiles. That is the reason you
 find
  KB size regions.
 
  I thought major compaction just happen once at day and compact many
 files
  per region. Data is always the same here, I don't inject new data.
 
  IIRC sometimes minor compactions get promoted to major compactions based
 on
  some criteria, but I'll leave it for others to answer!
 
 
 
  On Tue, Apr 15, 2014 at 3:15 PM, Guillermo Ortiz konstt2...@gmail.com
  wrote:
 
   I have a table in Hbase that sizes around 96Gb,
  
   I generate 4 regions of 30Gb. Some time, table starts to split because
  the
   max size for region is 1Gb (I just realize of that, I'm going to change
  it
   or create more pre-splits.).
  
   There're two things that I don't understand. how is it creating the
  splits?
   right now I have 130 regions and growing. The problem is the size of
 the
   new regions:
  
   1.7 M/hbase/filters/4ddbc34a2242e44c03121ae4608788a2
   1.6 G/hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
   3.1 G/hbase/filters/58b50df089bd9d4d1f079f53238e060d
   2.5 M/hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
   1.9 G/hbase/filters/5b0a35b5735a473b7e804c4b045ce374
   883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
   1.7 M/hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
   632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2
  
   There're some new regions that they're just a some KBytes!. Why they
 are
  so
   small?? When does HBase decide to split? because it started to split
 two
   hours later to create the table.
  
   One, I create the table and insert data, I don't insert new data or
  modify
   them.
  
  
   Another interested point it's why there're major compactions:
   2014-04-15 11:33:47,400 INFO
 org.apache.hadoop.hbase.regionserver.Store:
   Renaming compacted file at
  
  
 
 hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
   to
  
  
 
 hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
   2014-04-15 11:33:47,407 INFO
   org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
   (CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
   2014-04-15 11:33:47,416 INFO
 org.apache.hadoop.hbase.regionserver.Store:*
   Completed major compaction of 1 file*(s) in d of
   filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
   df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
   789.1 M
   2014-04-15 11:33:47,416 INFO
   org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
   completed compaction:
   regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
   storeName=d, fileCount=1, fileSize=1.5 G, priority=6,
  time=414761474510060;
   duration=7sec
  
   I thought major compaction just happen once at day and compact many
 files
   per region. Data is always the same here, I don't inject new data.
  
  
   I'm working with 0.94.6 CDH44. I'm going to change the size of the
  regions,
   but, I would like to understand why things happen.
  
   Thank you.
  
 
 
 
  --
  Bharath Vissapragada
  http://www.cloudera.com
 



All regions stay on two nodes out of 18 nodes

2014-04-15 Thread Tao Xiao
I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
tables has 50 regions but I found that the 50 regions all stay in just two
nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
this table was gradually split into 50 regions itself.

I'd like to know why all the regions stay in just two nodes, not the 18
nodes of the cluster, and how to spread the regions evenly across all the
region servers. Thanks.


Re: All regions stay on two nodes out of 18 nodes

2014-04-15 Thread divye sheth
Check if hbase balancer is on.
$hbase_shell balance_switch true

Run the balancer from the hbase shell

$hbase_shell balancer

If the above command returns false check for any regions in transition on
the HMaster UI or check HMaster logs.

Thanks
Divye Sheth


On Tue, Apr 15, 2014 at 5:10 PM, Tao Xiao xiaotao.cs@gmail.com wrote:

 I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
 tables has 50 regions but I found that the 50 regions all stay in just two
 nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
 this table was gradually split into 50 regions itself.

 I'd like to know why all the regions stay in just two nodes, not the 18
 nodes of the cluster, and how to spread the regions evenly across all the
 region servers. Thanks.



Re: All regions stay on two nodes out of 18 nodes

2014-04-15 Thread Ted Yu
Is load balancer enabled ?

Can you grep this table in master log and pastebin what you found ?

Cheers

On Apr 15, 2014, at 4:40 AM, Tao Xiao xiaotao.cs@gmail.com wrote:

 I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
 tables has 50 regions but I found that the 50 regions all stay in just two
 nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
 this table was gradually split into 50 regions itself.
 
 I'd like to know why all the regions stay in just two nodes, not the 18
 nodes of the cluster, and how to spread the regions evenly across all the
 region servers. Thanks.


Re: hbase exception: Could not reseek StoreFileScanner

2014-04-15 Thread Ted Yu
bq. HFileScanner for reader reader=hdfs://
192.168.11.150:8020/hbase/vc2.in_link/6http://192.168.11.150:8020/hbase/vc2.in_link/6b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75
b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75http://192.168.11.150:8020/hbase/vc2.in_link/6b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75

You can find, from master log, which region server hosted
6http://192.168.11.150:8020/hbase/vc2.in_link/6b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75
b879cb43205cdae084a280c38fab34ahttp://192.168.11.150:8020/hbase/vc2.in_link/6b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75
Then you can check the region server log on that server.


On Mon, Apr 14, 2014 at 10:11 PM, Li Li fancye...@gmail.com wrote:

 where to find the server log? I mean there are many region servers.
 should I find one by one?

 On Tue, Apr 15, 2014 at 12:31 PM, lars hofhansl la...@apache.org wrote:
  Thanks.
  Was there anything in the server logs at that time?
  The client does not report the full stack trace.
 
  I have not seen this one before. I assume HDFS was running at the time...
 
 
  -- Lars
 
 
 
  
   From: Li Li fancye...@gmail.com
  To: user@hbase.apache.org; lars hofhansl la...@apache.org
  Sent: Monday, April 14, 2014 9:09 PM
  Subject: Re: hbase exception: Could not reseek StoreFileScanner
 
 
  Version 0.94.11, r1513697, Wed Aug 14 04:54:46 UTC 2013
 
 
  On Tue, Apr 15, 2014 at 12:03 PM, lars hofhansl la...@apache.org
 wrote:
  Hi Li,
 
  please always tell us which version of HBase/Hadoop you are using and
 what it is that you were trying to do.
  Thanks.
 
  -- Lars
 
 
 
  
   From: Li Li fancye...@gmail.com
  To: user@hbase.apache.org
  Sent: Monday, April 14, 2014 5:32 PM
  Subject: hbase exception: Could not reseek StoreFileScanner
 
 
  Mon Apr 14 23:54:40 CST 2014,
  org.apache.hadoop.hbase.client.HTable$9@14923f6b, java.io.IOException:
  java.io.IOException: Could not reseek StoreFileScanner[HFileScanner
  for reader reader=hdfs://192.168.11.150:8020/hbase/vc2.in_link/6
  b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75,
  compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true]
  [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
  [cacheBloomsOnWrite=false] [cacheEvictOn Close=false]
  [cacheCompressed=false],
 
 firstKey=\xE82\x14\xFF/\xF04\xA4\xBC\xB0X\xEB\xB4\xE9\xD1\x11\x93h\xD3\xAA\xC4\xAB\x99\xC3\x09\x874\x16VZ\x05\x10/cf:an/1397117856840/Put,
  lastKey=\xF0\x1F\xA7\xF7u\x9E.\xB2\x8EZ\xD5\xEB\xD6h\x03
 
 W\x0F\x8A\xA0\x9B\x0A\xE8\xEC\x9ELu5o\xFE\x03\xCE/cf:an/1397131734218/Put,
  avgKeyLen=48, avgValueLen=14, entries=3712302, length=260849569,
 
 cur=\xEC5cA\xF1\x03Y\x01!\xD6\x86\x15\x13\xD6\xC9\xBDb:#A\x08\x86\x14j\xA0)\xA8\x85\x11\xDC
  F/cf:an/1397454753471/Maximum/vlen=0/ts=0] to key
 
 \xEC5cA\xF1\x03Y\x01!\xD6\x86\x15\x13\xD6\xC9\xBDb:#A\x08\x86\x14j\xA0)\xA8\x85\x11\xDCF/cf:an/LATEST_TIMESTAMP/Maximum/vlen=0/ts=0
  3968 Mon Apr 14 23:55:50 CST 2014,
  org.apache.hadoop.hbase.client.HTable$9@14923f6b, java.io.IOException:
  java.io.IOException: Could not reseek StoreFileScanner[HFileScanner
  for reader reader=hdfs://192.168.11.150:8020/hbase/vc2.in_link/6
  b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75,
  compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true]
  [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
  [cacheBloomsOnWrite=false] [cacheEvictOn Close=false]
  [cacheCompressed=false],
 
 firstKey=\xE82\x14\xFF/\xF04\xA4\xBC\xB0X\xEB\xB4\xE9\xD1\x11\x93h\xD3\xAA\xC4\xAB\x99\xC3\x09\x874\x16VZ\x05\x10/cf:an/1397117856840/Put,
  lastKey=\xF0\x1F\xA7\xF7u\x9E.\xB2\x8EZ\xD5\xEB\xD6h\x03
 
 W\x0F\x8A\xA0\x9B\x0A\xE8\xEC\x9ELu5o\xFE\x03\xCE/cf:an/1397131734218/Put,
  avgKeyLen=48, avgValueLen=14, entries=3712302, length=260849569,
 
 cur=\xEC5cA\xF1\x03Y\x01!\xD6\x86\x15\x13\xD6\xC9\xBDb:#A\x08\x86\x14j\xA0)\xA8\x85\x11\xDC
  F/cf:an/1397454753471/Maximum/vlen=0/ts=0] to key
 
 \xEC5cA\xF1\x03Y\x01!\xD6\x86\x15\x13\xD6\xC9\xBDb:#A\x08\x86\x14j\xA0)\xA8\x85\x11\xDCF/cf:an/LATEST_TIMESTAMP/Maximum/vlen=0/ts=0
  3969
  3970 at
 
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:188)
  3971 at
  org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:918)
  ...
 
  3975 14-04-14 23:58:14,993 ERROR Thread-8
  com.founder.extractor.ExtractWorker Failed after attempts=14,
  exceptions:
  3976 Mon Apr 14 23:51:52 CST 2014,
  org.apache.hadoop.hbase.client.HTable$9@17b86244, java.io.IOException:
  java.io.IOException: Could not reseek StoreFileScanner[HFileScanner
  for reader reader=hdfs://192.168.11.150:8020/hbase/vc2.in_link/6
  b879cb43205cdae084a280c38fab34a/cf/4dc235709de44f53b2484d2903f1bb75,
  compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true]
  

Re: Re: replication verifyrep

2014-04-15 Thread Jean-Daniel Cryans
On Tue, Apr 15, 2014 at 12:17 AM, Hansi Klose hansi.kl...@web.de wrote:

 Hi Jean-Daniel,

 thank you for your answer and bring some light into the darkness.


You're welcome!


  You can see the bad rows listed in the user logs for your MR job.

 What log do you mean. The output from the command line?
 I only see the count of GOOD or BAD rows.
 Are the bad rows listed in that log which are not replicated?


You started VerifyReplication via hadoop jar, so it's a MapReduce job. Go
to your JobTracker's web UI, you should see your jobs there, then checkout
one of them and click on one of the completed maps then look for the log.
The bad rows are listed in that output.

J-D


RE: endpoint coprocessor

2014-04-15 Thread Bogala, Chandra Reddy


Thanks  Yu. I have added below coprocessor to my table. And tried to invoke 
coprocessor using java client. But fails with below error. But I could see 
coprocessor in describe table output.



Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.UnknownProtocolException):
 org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for name AggregateService in region 
test3,,1397469869214.c73698dce0d5b91d29d42a9f9e194965.

at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:5070)



From describe table:

--

test3', {TABLE_ATTRIBUTES = {coprocessor$1 = 
'hdfs://xxx.com:8020/user///hbase-server-0.98.1-hadoop2.jar|org.apache.hadoop.hbase.coprocessor.AggregateImplementation||'},
 {NAME = 'cf'



Thanks,

Chandra





-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Thursday, April 10, 2014 5:36 PM
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: endpoint coprocessor



Here is a reference implementation for aggregation :

http://search-hadoop.com/c/HBase:hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java||Hbase+aggregation+endpoint



You can find it in hbase source code.

Cheers



On Apr 10, 2014, at 4:29 AM, Bogala, Chandra Reddy 
chandra.bog...@gs.commailto:chandra.bog...@gs.com wrote:



 Hi,

 I am planning to write endpoint coprocessor to calculate TOP N results for my 
 usecase.  I got confused with old apis and new apis.

 I followed below links and try to implement. But looks like api's changed a 
 lot. I don't see many of these classes in hbase jars. We are using Hbase 0.96.

 Can anyone point to the latest document/apis?. And if possible sample code to 
 calculate top N.



 https://blogs.apache.org/hbase/entry/coprocessor_introduction

 https://www.youtube.com/watch?v=xHvJhuGGOKc



 Thanks,

 Chandra






Re: HBase atomic append functionality (not just client)

2014-04-15 Thread Sergey Shelukhin
Hmm... Wouldn't mvcc prevent seeing partial append?
Append is just put in the end, the way it is currently implemented.


On Mon, Apr 14, 2014 at 10:41 AM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

 From HRegion.java:

 Appends performed are done under row lock but reads do not take locks out
 so this can be seen partially complete by gets and scans.

 Appends are partially atomic (you can get partial reads but you will never
 get corrupted writes) and they are implemented on the server side.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: GSK Chaitanya [gskchaitany...@gmail.com]
 Sent: Monday, April 14, 2014 10:05 AM
 To: user@hbase.apache.org; d...@hbase.apache.org
 Subject: HBase atomic append functionality (not just client)

 Mighty Hbase users and developers,

 I have few questions and I'd really appreciate it if someone can clarify
 them.

 1) I want to know if Hbase inherently supports *atomic
 append*functionality like
 *get* and *put*. For my work, I would be using OpenTSDB which is a layer on
 top of AsynchHBase and AsynchHBase doesnt work with HBase client (which
 supports *atomic append*).

 2) If I understand correctly, atomic append of HBase client internally does
 a get and put instead of actually appending to the end of the cell. If
 that's the case, I wonder how does this functionality is of much use in
 terms of performance. In our case, we would like a very light weight append
 functionality. I'd like to know if there are any plans of adding this
 feature to HBase main in the near future.

 Thanks,
 Chaitanya

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: All regions stay on two nodes out of 18 nodes

2014-04-15 Thread Tao Xiao
The command balance_switch true returns true, but the command balancer
returns false. I checked the HMaster UI and found some regions of other
tables in transition, not of this table.

This table's name is E_MP_DAY_READ, I did grep it in the master log and
found only the following lines:

2014-04-15 15:50:59,925 INFO  [MASTER_SERVER_OPERATIONS-b03:6-1]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:6-1]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,37915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:6-1]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,937 INFO  [MASTER_SERVER_OPERATIONS-b03:6-2]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,30497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,938 INFO  [MASTER_SERVER_OPERATIONS-b03:6-2]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,38188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,940 INFO  [MASTER_SERVER_OPERATIONS-b03:6-2]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
because it has been opened in a04.jsepc.com,60020,1397548219084

so few logging lines about it, looks strange ?


BTW, I can spread the regions of this table evenly across the whole cluster
after I shutdown the two region servers where the regions of this table
resided originally.


2014-04-15 19:47 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Is load balancer enabled ?

 Can you grep this table in master log and pastebin what you found ?

 Cheers

 On Apr 15, 2014, at 4:40 AM, Tao Xiao xiaotao.cs@gmail.com wrote:

  I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
  tables has 50 regions but I found that the 50 regions all stay in just
 two
  nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
  this table was gradually split into 50 regions itself.
 
  I'd like to know why all the regions stay in just two nodes, not the 18
  nodes of the cluster, and how to spread the regions evenly across all the
  region servers. Thanks.