Re: HBase Region Size of 2.5 TB

2016-08-26 Thread Ted Yu
>From IncreasingToUpperBoundRegionSplitPolicy#configureForRegion():

initialSize = conf.getLong("hbase.increasing.policy.initial.size", -1);

...

if (initialSize <= 0) {

  initialSize = 2 * conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE,

 HTableDescriptor.
DEFAULT_MEMSTORE_FLUSH_SIZE);

If you haven't changed the value for
"hbase.increasing.policy.initial.size", the last two lines should have been
executed.

initialSize would be 2GB in that case according to the config you listed.


FYI

On Fri, Aug 26, 2016 at 3:23 PM, yeshwanth kumar 
wrote:

> Hi we are using  CDH 5.7 HBase 1.2
>
> we are doing a performance testing over HBase through regular Load, which
> has 4 Region Servers.
>
> Input Data is compressed binary files around 2TB, which we process and
> write as Key-Value pairs to HBase.
> the output data size in  HBase is almost 4 times around 8TB, because we are
> writing as text.
> this process is a Map-Reduce Job,
>
> when we are doing the load, we observed there's a lot of GC happening on
> Region Server's ,so we changed couple of  parameters to decrease the GC
> time.
>
> we increased the flush size to 128MB to 1 GB and compactionThreshold to 50
> and  regionserver.maxlogs to 42
> following are the configuration we changed from default.
>
>
> hbase.hregion.memstore.flush.size = 1 GB
> hbase.hstore.max.filesize=10GB
> hbase.hregion.preclose.flush.size= 50 MB
>
> hbase.hstore.compactionThreshold=50
> hbase.regionserver.maxlogs=42
>
> after the load, we observed that HBase table has only 4 regions with each
> of size around 2.5 TB
>
> i am trying to understand, what configuration parameter caused this issue.
>
> i was going through this article
> http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
>
> Region split policy in our HBase is
> org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSp
> litPolicy
> according to Region Split policy, Region Server should create regions when
> the region size limit is exceeded.
> can some one explain me the root cause.
>
>
> Thanks,
> Yeshwanth
>


HBase Region Size of 2.5 TB

2016-08-26 Thread yeshwanth kumar
Hi we are using  CDH 5.7 HBase 1.2

we are doing a performance testing over HBase through regular Load, which
has 4 Region Servers.

Input Data is compressed binary files around 2TB, which we process and
write as Key-Value pairs to HBase.
the output data size in  HBase is almost 4 times around 8TB, because we are
writing as text.
this process is a Map-Reduce Job,

when we are doing the load, we observed there's a lot of GC happening on
Region Server's ,so we changed couple of  parameters to decrease the GC
time.

we increased the flush size to 128MB to 1 GB and compactionThreshold to 50
and  regionserver.maxlogs to 42
following are the configuration we changed from default.


hbase.hregion.memstore.flush.size = 1 GB
hbase.hstore.max.filesize=10GB
hbase.hregion.preclose.flush.size= 50 MB

hbase.hstore.compactionThreshold=50
hbase.regionserver.maxlogs=42

after the load, we observed that HBase table has only 4 regions with each
of size around 2.5 TB

i am trying to understand, what configuration parameter caused this issue.

i was going through this article
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

Region split policy in our HBase is
org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
according to Region Split policy, Region Server should create regions when
the region size limit is exceeded.
can some one explain me the root cause.


Thanks,
Yeshwanth


Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Hey.. Approx we have 294 regions in 42 region servers.

Manish

On Fri, Aug 26, 2016 at 3:05 PM, Ted Yu  wrote:

> I currently don't have concrete numbers but the impact is not big.
>
> How many regions are there in the table(s) ?
>
> Cheers
>
> On Fri, Aug 26, 2016 at 2:57 PM, Manish Maheshwari 
> wrote:
>
> > Thanks Ted. I looked into using JMX. Unfortunately it requires us to
> > restart HBase after the config changes. In the production environment we
> > are unable to do so. The table size is small. Around 9.6 TB. We have
> around
> > 42 nodes each with 10 TB storage. The scan will take time, but would
> need a
> > HBase restart.
> >
> > We will enable JMX at the next opportunity for restart. In general the
> > impact on JMX would be less than 2-3% on HBase performance?
> >
> > Thanks,
> > Manish
> >
> >
> > On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu  wrote:
> >
> > > Have you looked at /jmx endpoint on the servers ?
> > > Below is a sample w.r.t. the metrics that would be of interest to you:
> > >
> > >
> > > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> > > 51_metric_appendCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_num_ops"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_min"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_max"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_mean"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_median"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_75th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_95th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_99th_percentile"
> > > : 0.0,
> > >
> > >
> > > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> > > ab_metric_deleteCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> > > 15_metric_deleteCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> > > 15_metric_appendCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_num_ops"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_min"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_max"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_mean"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_median"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_75th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_95th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_99th_percentile"
> > > : 0.0,
> > >
> > >
> > > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> > > d2_metric_mutateCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> > > be_metric_incrementCount"
> > > : 0,
> > >
> > > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari <
> mylogi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > I understand the region crash/migration/splitting impact. Currently
> we
> > > have
> > > > hotspotting on few region servers. I am trying to collect the row
> stats
> > > at
> > > > region server and region levels to see how bad the skew of the data
> is.
> > > >
> > > > Manish
> > > >
> > > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu 
> wrote:
> > > >
> > > > > Can you elaborate on your use case ?
> > > > >
> > > > > Suppose row A is on server B, after you retrieve row A, the region
> > for
> > > > row
> > > > > A gets moved to server C (load balancer or server crash). Server B
> > > would
> > > > no
> > > > > longer be relevant.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> > > mylogi...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I looked at the HBase Count functionality to count rows in a
> Table.
> > > Is
> > > > > > there a way that we can count the number of rows in Regions &
> > Region
> > > > > > Servers? When we use a HBase scan, we dont get the Region ID or
> > > Region
> > > > > > Server of the row. Is there a way to do this via Scans?
>

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Thanks Ted.

On Fri, Aug 26, 2016 at 3:16 PM, Ted Yu  wrote:

> For #1, please look at the following method in HTable.java :
>
>   public NavigableMap getRegionLocations() throws
> IOException {
>
> Cheers
>
> On Fri, Aug 26, 2016 at 3:06 PM, Manish Maheshwari 
> wrote:
>
> > Thanks Rahul.
> >
> > 1 - I understand the idea of listing the usage on each of the disks that
> we
> > have HBase running on for that table. However how do I map the Nodes to
> > Regions. I looked at RegionLocator - getStartEndKeys. But these just give
> > me the values and not the Hostnames where each region is currently
> running.
> > Is there a way to map the Region to the Node?
> >
> > 2 - Some of our row sizes vary quite a bit depending on the number of
> > updates to the row. This will give us a rough idea of the size of the
> > Region, but not the number of Rows. Is there a way to get both..
> Apologies
> > if I am bothering too much..
> >
> > Thanks,
> > Manish
> >
> >
> >
> >
> >
> > On Fri, Aug 26, 2016 at 12:21 PM, rahul gidwani  >
> > wrote:
> >
> > > If you want to see which regionservers are currently hot, then jmx
> would
> > be
> > > the best way to get that data.
> > >
> > > If you want to see overall what is hot, you can do this without the use
> > of
> > > a scan (it will be a pretty decent estimate)
> > >
> > > you can do:
> > >
> > > hdfs dfs -du /hbase/data/default//
> > >
> > > with that data you can create a Map
> > >
> > > Then you can use the RegionLocator to find which region resides on
> which
> > > machine.
> > >
> > > That will tell you the overall skew of your data in terms of raw bytes.
> > >
> > > Should be a pretty decent estimate and a lot faster than scanning your
> > > table provided your table / cluster is sufficiently large.
> > >
> > > hope that helps.
> > > rahul
> > >
> > > On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu  wrote:
> > >
> > > > Have you looked at /jmx endpoint on the servers ?
> > > > Below is a sample w.r.t. the metrics that would be of interest to
> you:
> > > >
> > > >
> > > > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> > > > 51_metric_appendCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_num_ops"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_min"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_max"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_mean"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_median"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_75th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_95th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_99th_percentile"
> > > > : 0.0,
> > > >
> > > >
> > > > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> > > > ab_metric_deleteCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> > > > 15_metric_deleteCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> > > > 15_metric_appendCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_num_ops"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_min"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_max"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_mean"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_median"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_75th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_95th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_99th_percentile"
> > > > : 0.0,
> > > >
> > > >
> > > > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> > > > d2_metric_mutateCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> > > > be_metric_incrementCount"
> > > > : 0,
> > > >
> > > > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari <
> > mylogi...@gmail.com
> > > >
> > > > wrote:
> > > >

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
For #1, please look at the following method in HTable.java :

  public NavigableMap getRegionLocations() throws
IOException {

Cheers

On Fri, Aug 26, 2016 at 3:06 PM, Manish Maheshwari 
wrote:

> Thanks Rahul.
>
> 1 - I understand the idea of listing the usage on each of the disks that we
> have HBase running on for that table. However how do I map the Nodes to
> Regions. I looked at RegionLocator - getStartEndKeys. But these just give
> me the values and not the Hostnames where each region is currently running.
> Is there a way to map the Region to the Node?
>
> 2 - Some of our row sizes vary quite a bit depending on the number of
> updates to the row. This will give us a rough idea of the size of the
> Region, but not the number of Rows. Is there a way to get both.. Apologies
> if I am bothering too much..
>
> Thanks,
> Manish
>
>
>
>
>
> On Fri, Aug 26, 2016 at 12:21 PM, rahul gidwani 
> wrote:
>
> > If you want to see which regionservers are currently hot, then jmx would
> be
> > the best way to get that data.
> >
> > If you want to see overall what is hot, you can do this without the use
> of
> > a scan (it will be a pretty decent estimate)
> >
> > you can do:
> >
> > hdfs dfs -du /hbase/data/default//
> >
> > with that data you can create a Map
> >
> > Then you can use the RegionLocator to find which region resides on which
> > machine.
> >
> > That will tell you the overall skew of your data in terms of raw bytes.
> >
> > Should be a pretty decent estimate and a lot faster than scanning your
> > table provided your table / cluster is sufficiently large.
> >
> > hope that helps.
> > rahul
> >
> > On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu  wrote:
> >
> > > Have you looked at /jmx endpoint on the servers ?
> > > Below is a sample w.r.t. the metrics that would be of interest to you:
> > >
> > >
> > > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> > > 51_metric_appendCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_num_ops"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_min"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_max"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_mean"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_median"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_75th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_95th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_99th_percentile"
> > > : 0.0,
> > >
> > >
> > > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> > > ab_metric_deleteCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> > > 15_metric_deleteCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> > > 15_metric_appendCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_num_ops"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_min"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_max"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_mean"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_median"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_75th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_95th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_99th_percentile"
> > > : 0.0,
> > >
> > >
> > > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> > > d2_metric_mutateCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> > > be_metric_incrementCount"
> > > : 0,
> > >
> > > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari <
> mylogi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > I understand the region crash/migration/splitting impact. Currently
> we
> > > have
> > > > hotspotting on few region servers. I am trying to collect the row
> stats
> > > at
> > > > region server and region levels to see how bad the skew of the data
> is.
> > > >
> > > > Manish
> > > >
> > > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu 
> wrote:
> > > >
> >

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Thanks Rahul.

1 - I understand the idea of listing the usage on each of the disks that we
have HBase running on for that table. However how do I map the Nodes to
Regions. I looked at RegionLocator - getStartEndKeys. But these just give
me the values and not the Hostnames where each region is currently running.
Is there a way to map the Region to the Node?

2 - Some of our row sizes vary quite a bit depending on the number of
updates to the row. This will give us a rough idea of the size of the
Region, but not the number of Rows. Is there a way to get both.. Apologies
if I am bothering too much..

Thanks,
Manish





On Fri, Aug 26, 2016 at 12:21 PM, rahul gidwani 
wrote:

> If you want to see which regionservers are currently hot, then jmx would be
> the best way to get that data.
>
> If you want to see overall what is hot, you can do this without the use of
> a scan (it will be a pretty decent estimate)
>
> you can do:
>
> hdfs dfs -du /hbase/data/default//
>
> with that data you can create a Map
>
> Then you can use the RegionLocator to find which region resides on which
> machine.
>
> That will tell you the overall skew of your data in terms of raw bytes.
>
> Should be a pretty decent estimate and a lot faster than scanning your
> table provided your table / cluster is sufficiently large.
>
> hope that helps.
> rahul
>
> On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu  wrote:
>
> > Have you looked at /jmx endpoint on the servers ?
> > Below is a sample w.r.t. the metrics that would be of interest to you:
> >
> >
> > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> > 51_metric_appendCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_num_ops"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_min"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_max"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_mean"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_median"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_75th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_95th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_99th_percentile"
> > : 0.0,
> >
> >
> > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> > ab_metric_deleteCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> > 15_metric_deleteCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> > 15_metric_appendCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_num_ops"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_min"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_max"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_mean"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_median"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_75th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_95th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_99th_percentile"
> > : 0.0,
> >
> >
> > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> > d2_metric_mutateCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> > be_metric_incrementCount"
> > : 0,
> >
> > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari  >
> > wrote:
> >
> > > Hi Ted,
> > >
> > > I understand the region crash/migration/splitting impact. Currently we
> > have
> > > hotspotting on few region servers. I am trying to collect the row stats
> > at
> > > region server and region levels to see how bad the skew of the data is.
> > >
> > > Manish
> > >
> > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu  wrote:
> > >
> > > > Can you elaborate on your use case ?
> > > >
> > > > Suppose row A is on server B, after you retrieve row A, the region
> for
> > > row
> > > > A gets moved to server C (load balancer or server crash). Server B
> > would
> > > no
> > > > longer be relevant.
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> > mylogi...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I looked at the HBase Count functionality to count rows in a Table.
> > Is
> > > > > there a w

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
I currently don't have concrete numbers but the impact is not big.

How many regions are there in the table(s) ?

Cheers

On Fri, Aug 26, 2016 at 2:57 PM, Manish Maheshwari 
wrote:

> Thanks Ted. I looked into using JMX. Unfortunately it requires us to
> restart HBase after the config changes. In the production environment we
> are unable to do so. The table size is small. Around 9.6 TB. We have around
> 42 nodes each with 10 TB storage. The scan will take time, but would need a
> HBase restart.
>
> We will enable JMX at the next opportunity for restart. In general the
> impact on JMX would be less than 2-3% on HBase performance?
>
> Thanks,
> Manish
>
>
> On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu  wrote:
>
> > Have you looked at /jmx endpoint on the servers ?
> > Below is a sample w.r.t. the metrics that would be of interest to you:
> >
> >
> > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> > 51_metric_appendCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_num_ops"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_min"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_max"
> > : 0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_mean"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_median"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_75th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_95th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > ad_metric_scanNext_99th_percentile"
> > : 0.0,
> >
> >
> > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> > ab_metric_deleteCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> > 15_metric_deleteCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> > 15_metric_appendCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_num_ops"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_min"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_max"
> > : 0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_mean"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_median"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_75th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_95th_percentile"
> > : 0.0,
> >
> > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > 86_metric_get_99th_percentile"
> > : 0.0,
> >
> >
> > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> > d2_metric_mutateCount"
> > : 0,
> >
> > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> > be_metric_incrementCount"
> > : 0,
> >
> > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari  >
> > wrote:
> >
> > > Hi Ted,
> > >
> > > I understand the region crash/migration/splitting impact. Currently we
> > have
> > > hotspotting on few region servers. I am trying to collect the row stats
> > at
> > > region server and region levels to see how bad the skew of the data is.
> > >
> > > Manish
> > >
> > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu  wrote:
> > >
> > > > Can you elaborate on your use case ?
> > > >
> > > > Suppose row A is on server B, after you retrieve row A, the region
> for
> > > row
> > > > A gets moved to server C (load balancer or server crash). Server B
> > would
> > > no
> > > > longer be relevant.
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> > mylogi...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I looked at the HBase Count functionality to count rows in a Table.
> > Is
> > > > > there a way that we can count the number of rows in Regions &
> Region
> > > > > Servers? When we use a HBase scan, we dont get the Region ID or
> > Region
> > > > > Server of the row. Is there a way to do this via Scans?
> > > > >
> > > > > Thanks,
> > > > > Manish
> > > > >
> > > >
> > >
> >
>


Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Thanks Ted. I looked into using JMX. Unfortunately it requires us to
restart HBase after the config changes. In the production environment we
are unable to do so. The table size is small. Around 9.6 TB. We have around
42 nodes each with 10 TB storage. The scan will take time, but would need a
HBase restart.

We will enable JMX at the next opportunity for restart. In general the
impact on JMX would be less than 2-3% on HBase performance?

Thanks,
Manish


On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu  wrote:

> Have you looked at /jmx endpoint on the servers ?
> Below is a sample w.r.t. the metrics that would be of interest to you:
>
>
> "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> 51_metric_appendCount"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_num_ops"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_min"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_max"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_mean"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_median"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_75th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_95th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_99th_percentile"
> : 0.0,
>
>
> "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> ab_metric_deleteCount"
> : 0,
>
> "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> 15_metric_deleteCount"
> : 0,
>
> "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> 15_metric_appendCount"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_num_ops"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_min"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_max"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_mean"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_median"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_75th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_95th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_99th_percentile"
> : 0.0,
>
>
> "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> d2_metric_mutateCount"
> : 0,
>
> "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> be_metric_incrementCount"
> : 0,
>
> On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari 
> wrote:
>
> > Hi Ted,
> >
> > I understand the region crash/migration/splitting impact. Currently we
> have
> > hotspotting on few region servers. I am trying to collect the row stats
> at
> > region server and region levels to see how bad the skew of the data is.
> >
> > Manish
> >
> > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu  wrote:
> >
> > > Can you elaborate on your use case ?
> > >
> > > Suppose row A is on server B, after you retrieve row A, the region for
> > row
> > > A gets moved to server C (load balancer or server crash). Server B
> would
> > no
> > > longer be relevant.
> > >
> > > Cheers
> > >
> > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> mylogi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I looked at the HBase Count functionality to count rows in a Table.
> Is
> > > > there a way that we can count the number of rows in Regions & Region
> > > > Servers? When we use a HBase scan, we dont get the Region ID or
> Region
> > > > Server of the row. Is there a way to do this via Scans?
> > > >
> > > > Thanks,
> > > > Manish
> > > >
> > >
> >
>


Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread rahul gidwani
If you want to see which regionservers are currently hot, then jmx would be
the best way to get that data.

If you want to see overall what is hot, you can do this without the use of
a scan (it will be a pretty decent estimate)

you can do:

hdfs dfs -du /hbase/data/default//

with that data you can create a Map

Then you can use the RegionLocator to find which region resides on which
machine.

That will tell you the overall skew of your data in terms of raw bytes.

Should be a pretty decent estimate and a lot faster than scanning your
table provided your table / cluster is sufficiently large.

hope that helps.
rahul

On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu  wrote:

> Have you looked at /jmx endpoint on the servers ?
> Below is a sample w.r.t. the metrics that would be of interest to you:
>
>
> "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> 51_metric_appendCount"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_num_ops"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_min"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_max"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_mean"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_median"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_75th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_95th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_99th_percentile"
> : 0.0,
>
>
> "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> ab_metric_deleteCount"
> : 0,
>
> "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> 15_metric_deleteCount"
> : 0,
>
> "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> 15_metric_appendCount"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_num_ops"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_min"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_max"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_mean"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_median"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_75th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_95th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_99th_percentile"
> : 0.0,
>
>
> "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> d2_metric_mutateCount"
> : 0,
>
> "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> be_metric_incrementCount"
> : 0,
>
> On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari 
> wrote:
>
> > Hi Ted,
> >
> > I understand the region crash/migration/splitting impact. Currently we
> have
> > hotspotting on few region servers. I am trying to collect the row stats
> at
> > region server and region levels to see how bad the skew of the data is.
> >
> > Manish
> >
> > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu  wrote:
> >
> > > Can you elaborate on your use case ?
> > >
> > > Suppose row A is on server B, after you retrieve row A, the region for
> > row
> > > A gets moved to server C (load balancer or server crash). Server B
> would
> > no
> > > longer be relevant.
> > >
> > > Cheers
> > >
> > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> mylogi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I looked at the HBase Count functionality to count rows in a Table.
> Is
> > > > there a way that we can count the number of rows in Regions & Region
> > > > Servers? When we use a HBase scan, we dont get the Region ID or
> Region
> > > > Server of the row. Is there a way to do this via Scans?
> > > >
> > > > Thanks,
> > > > Manish
> > > >
> > >
> >
>


Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
Have you looked at /jmx endpoint on the servers ?
Below is a sample w.r.t. the metrics that would be of interest to you:


"Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b50551_metric_appendCount"
: 0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_num_ops"
: 0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_min"
: 0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_max"
: 0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_mean"
: 0.0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_median"
: 0.0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_75th_percentile"
: 0.0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_95th_percentile"
: 0.0,

"Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_99th_percentile"
: 0.0,


"Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936ab_metric_deleteCount"
: 0,

"Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d6215_metric_deleteCount"
: 0,

"Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b715_metric_appendCount"
: 0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_num_ops"
: 0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_min"
: 0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_max"
: 0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_mean"
: 0.0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_median"
: 0.0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_75th_percentile"
: 0.0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_95th_percentile"
: 0.0,

"Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df2186_metric_get_99th_percentile"
: 0.0,


"Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76d2_metric_mutateCount"
: 0,

"Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17cbe_metric_incrementCount"
: 0,

On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari 
wrote:

> Hi Ted,
>
> I understand the region crash/migration/splitting impact. Currently we have
> hotspotting on few region servers. I am trying to collect the row stats at
> region server and region levels to see how bad the skew of the data is.
>
> Manish
>
> On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu  wrote:
>
> > Can you elaborate on your use case ?
> >
> > Suppose row A is on server B, after you retrieve row A, the region for
> row
> > A gets moved to server C (load balancer or server crash). Server B would
> no
> > longer be relevant.
> >
> > Cheers
> >
> > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari  >
> > wrote:
> >
> > > Hi,
> > >
> > > I looked at the HBase Count functionality to count rows in a Table. Is
> > > there a way that we can count the number of rows in Regions & Region
> > > Servers? When we use a HBase scan, we dont get the Region ID or Region
> > > Server of the row. Is there a way to do this via Scans?
> > >
> > > Thanks,
> > > Manish
> > >
> >
>


Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Hi Ted,

I understand the region crash/migration/splitting impact. Currently we have
hotspotting on few region servers. I am trying to collect the row stats at
region server and region levels to see how bad the skew of the data is.

Manish

On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu  wrote:

> Can you elaborate on your use case ?
>
> Suppose row A is on server B, after you retrieve row A, the region for row
> A gets moved to server C (load balancer or server crash). Server B would no
> longer be relevant.
>
> Cheers
>
> On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari 
> wrote:
>
> > Hi,
> >
> > I looked at the HBase Count functionality to count rows in a Table. Is
> > there a way that we can count the number of rows in Regions & Region
> > Servers? When we use a HBase scan, we dont get the Region ID or Region
> > Server of the row. Is there a way to do this via Scans?
> >
> > Thanks,
> > Manish
> >
>


[DISCUSS] 0.98 branch disposition

2016-08-26 Thread Andrew Purtell
Greetings,

HBase 0.98.0 was released in February of 2014. We have had 21 releases in 2
1/2 years at a fairly regular cadence, a terrific run for any software
product. However as 0.98 RM I think it's now time to discuss winding down
0.98. I want to give you notice of this as far in advance as possible (and
have just come to a decision barely this week). We have several more recent
releases at this point that are quite stable, a superset of 0.98
functionality, and have been proven in deployments. It's wise not to take
on unnecessary risk by upgrading from a particular version, but in the case
of 0.98, it's getting to be that time.

If you have not yet, I would encourage you to take a few moments to
participate in our fully anonymous usage survey:
https://www.surveymonkey.com/r/NJFKKGW . According to results received so
far, the versions of HBase in production use break down as:

   - 0.94 - 19%
   - 0.96 - 2%
   - *0.98 - 23%*
   - 1.0 - 20%
   - 1.1 - 34%
   - 1.2 - 23%

These figures add up to more than 100% because some respondents I expect
run more than one version.

For those 23% still on 0.98 (and the 2% on 0.96) it's time to start
seriously thinking about an upgrade to 1.1 or later. The upgrade process
can be done in a rolling manner. We consider 1.1 (and 1.2 for that matter)
to be stable and ready for production.

As 0.98 RM, my plan is to continue active maintenance at a roughly monthly
release cadence through December of this year. However in January 2017 I
plan to tender my resignation as 0.98 RM and, hopefully, take that active
role forward to more recent code not so full of dust and cobwebs and more
interesting to develop and maintain. Unless someone else steps up to take
on that task this will end regular 0.98 releases. I do not expect anyone to
take on that role, frankly. Of course we can still make occasional 0.98
releases on demand. Any committer can wrangle the bits and the PMC can
entertain a vote. (If you can conscript a committer to assist with
releasing I don't think you even need to be a committer to function as RM
for a release.) Anyway, concurrent with my resignation as 0.98 RM I expect
the project to discuss and decide an official position on 0.98 support. It
is quite possible we will announce that position to be an end of life.


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
Can you elaborate on your use case ?

Suppose row A is on server B, after you retrieve row A, the region for row
A gets moved to server C (load balancer or server crash). Server B would no
longer be relevant.

Cheers

On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari 
wrote:

> Hi,
>
> I looked at the HBase Count functionality to count rows in a Table. Is
> there a way that we can count the number of rows in Regions & Region
> Servers? When we use a HBase scan, we dont get the Region ID or Region
> Server of the row. Is there a way to do this via Scans?
>
> Thanks,
> Manish
>


HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Hi,

I looked at the HBase Count functionality to count rows in a Table. Is
there a way that we can count the number of rows in Regions & Region
Servers? When we use a HBase scan, we dont get the Region ID or Region
Server of the row. Is there a way to do this via Scans?

Thanks,
Manish


Re: Hbase Heap Size problem and Native API response is slow

2016-08-26 Thread Ted Yu
Looks like the image didn't go through.

Can you pastebin the error ?

Cheers

On Fri, Aug 26, 2016 at 7:28 AM, Manjeet Singh 
wrote:

> Adding
> I am getting below error on truncating the table
>
> [image: Inline image 1]
>
> On Fri, Aug 26, 2016 at 7:56 PM, Manjeet Singh  > wrote:
>
>> Hi All
>>
>> I am using wide table approach where I have might have more  1,00,
>> column qualifier
>>
>> I am getting problem as below
>> Heap size problem by using scan on shell , as a solution I increase java
>> heap size by using cloudera manager to 4 GB
>>
>>
>> second I have below Native API code It took very long time to process can
>> any one help me on same?
>>
>>
>>
>>
>>
>> public static ArrayList getColumnQualifyerByPrefixScan(String
>> rowKey, String prefix) {
>>
>> ArrayList list = null;
>> try {
>>
>> FilterList filterList = new FilterList(FilterList.Operator
>> .MUST_PASS_ALL);
>> Filter filterB = new QualifierFilter(CompareFilter.CompareOp.EQUAL,
>> new BinaryPrefixComparator(Bytes.toBytes(prefix)));
>> filterList.addFilter(filterB);
>>
>> list = new ArrayList();
>>
>> Get get1 = new Get(rowKey.getBytes());
>> get1.setFilter(filterList);
>> Result rs1 = hTable.get(get1);
>> int i = 0;
>> for (KeyValue kv : rs1.raw()) {
>> list.add(new String(kv.getQualifier()) + " ");
>> }
>> } catch (Exception e) {
>> //System.out.println(e.getMessage());
>>
>> }
>> return list;
>> }
>>
>> Thanks
>> Manjeet
>> --
>> luv all
>>
>
>
>
> --
> luv all
>


Re: Hbase Heap Size problem and Native API response is slow

2016-08-26 Thread Manjeet Singh
Adding
I am getting below error on truncating the table

[image: Inline image 1]

On Fri, Aug 26, 2016 at 7:56 PM, Manjeet Singh 
wrote:

> Hi All
>
> I am using wide table approach where I have might have more  1,00,
> column qualifier
>
> I am getting problem as below
> Heap size problem by using scan on shell , as a solution I increase java
> heap size by using cloudera manager to 4 GB
>
>
> second I have below Native API code It took very long time to process can
> any one help me on same?
>
>
>
>
>
> public static ArrayList getColumnQualifyerByPrefixScan(String
> rowKey, String prefix) {
>
> ArrayList list = null;
> try {
>
> FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
> Filter filterB = new QualifierFilter(CompareFilter.CompareOp.EQUAL,
> new BinaryPrefixComparator(Bytes.toBytes(prefix)));
> filterList.addFilter(filterB);
>
> list = new ArrayList();
>
> Get get1 = new Get(rowKey.getBytes());
> get1.setFilter(filterList);
> Result rs1 = hTable.get(get1);
> int i = 0;
> for (KeyValue kv : rs1.raw()) {
> list.add(new String(kv.getQualifier()) + " ");
> }
> } catch (Exception e) {
> //System.out.println(e.getMessage());
>
> }
> return list;
> }
>
> Thanks
> Manjeet
> --
> luv all
>



-- 
luv all


Hbase Heap Size problem and Native API response is slow

2016-08-26 Thread Manjeet Singh
Hi All

I am using wide table approach where I have might have more  1,00,
column qualifier

I am getting problem as below
Heap size problem by using scan on shell , as a solution I increase java
heap size by using cloudera manager to 4 GB


second I have below Native API code It took very long time to process can
any one help me on same?





public static ArrayList getColumnQualifyerByPrefixScan(String
rowKey, String prefix) {

ArrayList list = null;
try {

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
Filter filterB = new QualifierFilter(CompareFilter.CompareOp.EQUAL,
new BinaryPrefixComparator(Bytes.toBytes(prefix)));
filterList.addFilter(filterB);

list = new ArrayList();

Get get1 = new Get(rowKey.getBytes());
get1.setFilter(filterList);
Result rs1 = hTable.get(get1);
int i = 0;
for (KeyValue kv : rs1.raw()) {
list.add(new String(kv.getQualifier()) + " ");
}
} catch (Exception e) {
//System.out.println(e.getMessage());

}
return list;
}

Thanks
Manjeet
-- 
luv all


Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Ted Yu
Can you take a look at the replication bridge [0] Jeffrey wrote ?
It used both client library versions through JarJar [1] to avoid name
collision.

[0]: https://github.com/hortonworks/HBaseReplicationBridgeServer
[1]: https://code.google.com/p/jarjar/


On Fri, Aug 26, 2016 at 12:26 AM, Enrico Olivelli - Diennea <
enrico.olive...@diennea.com> wrote:

> Thank you Dima for your quick answer.
>
> Do you think that it would be possible to create a shaded version of the
> 0.94  client (with all the dependencies) and let it live inside the same
> JVM of a pure 1.2.2 client ?
>
> My real need is to copy data from a 0.94 cluster to a new 1.2.2
> installation, but temporary continuing to read from 0.94 in order not to
> provide down time
>
> what is the best way to copy data from a 0.94 cluster to a new cluster of
> different hbase major versions ?
>
> can you give me some link ?
>
> Thanks
> Enrico
>
>
> Il giorno ven, 26/08/2016 alle 00.04 -0700, Dima Spivak ha scritto:
>
> I would say no; 0.94 is not wire compatible with 1.2.2  because the former
> uses Hadoop IPC and the latter uses protocol buffers. Sorry, Enrico.
>
> On Friday, August 26, 2016, Enrico Olivelli - Diennea <
> enrico.olive...@diennea.com> wrote:
>
>
>
> Hi,
> I would like to connect to both a 0.94 hbase cluster and a 1.2.2 hbase
> cluster from the same JVM
> I think that 0.94 client code is not compatible with 1.2.2
>
> do you think it is possible ?
>
> Thank you
>
>
> --
> Enrico Olivelli
> Software Development Manager @Diennea
> Tel.: (+39) 0546 066100 - Int. 925
> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>
> MagNews - E-mail Marketing Solutions
> http://www.magnews.it
> Diennea - Digital Marketing Solutions
> http://www.diennea.com
>
>
> 
>
> Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> email marketing! http://www.magnews.it/newsletter/
>
> The information in this email is confidential and may be legally
> privileged. If you are not the intended recipient please notify the sender
> immediately and destroy this email. Any unauthorized, direct or indirect,
> disclosure, copying, storage, distribution or other use is strictly
> forbidden.
>
>
>
>
>
>
>
> --
> Enrico Olivelli
> Software Development Manager @Diennea
> Tel.: (+39) 0546 066100 - Int. 925
> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>
> MagNews - E-mail Marketing Solutions
> http://www.magnews.it
> Diennea - Digital Marketing Solutions
> http://www.diennea.com
>
>
> 
>
> Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> email marketing! http://www.magnews.it/newsletter/
>
> The information in this email is confidential and may be legally
> privileged. If you are not the intended recipient please notify the sender
> immediately and destroy this email. Any unauthorized, direct or indirect,
> disclosure, copying, storage, distribution or other use is strictly
> forbidden.
>


Re: adding a column to exiting tables

2016-08-26 Thread Ted Yu
Switching to user@

http://hbase.apache.org/book.html#datamodel

By column I guess you mean column qualifier. The addition of column
qualifier in future writes can be performed based on existing schema.

On application side, when row retrieved doesn't contain the new column
qualifier, you can interpret as seeing default value of 2.

On Thu, Aug 25, 2016 at 1:07 PM, satishleo  wrote:

> Hi,
>
> I am a hbase newbie so posing this question, I need to add an integer
> column
> to hbase table with default value 2 and  also need to back fill the column
> value to 2 for the existing rows.
>
> could someone please let me know how to do this.
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/adding-a-column-to-exiting-tables-tp4082097.html
> Sent from the HBase Developer mailing list archive at Nabble.com.
>


Re: Hbase replication between 0.98.6 and 1.2.0 versions

2016-08-26 Thread sudhir patil
Great, thanks Ted.

On Aug 26, 2016 7:29 PM, "Ted Yu"  wrote:

> Replication between 0.98.6 and 1.2.0 should work.
>
> Thanks
>
> > On Aug 26, 2016, at 1:59 AM, spats  wrote:
> >
> >
> > Does hbase replication works between different versions 0.98.6 and 1.2.0?
> >
> > We are in the process of upgrading our clusters & during that time we
> want
> > to make sure if replication will work fine across clusters. It would be
> > really helpful if anyone can share about hbase replication with different
> > versions?
> >
> >
> >
> > --
> > View this message in context: http://apache-hbase.679495.n3.
> nabble.com/Hbase-replication-between-0-98-6-and-1-2-0-
> versions-tp4082106.html
> > Sent from the HBase User mailing list archive at Nabble.com.
>


Re: Hbase replication between 0.98.6 and 1.2.0 versions

2016-08-26 Thread Ted Yu
Replication between 0.98.6 and 1.2.0 should work. 

Thanks 

> On Aug 26, 2016, at 1:59 AM, spats  wrote:
> 
> 
> Does hbase replication works between different versions 0.98.6 and 1.2.0? 
> 
> We are in the process of upgrading our clusters & during that time we want
> to make sure if replication will work fine across clusters. It would be
> really helpful if anyone can share about hbase replication with different
> versions?
> 
> 
> 
> --
> View this message in context: 
> http://apache-hbase.679495.n3.nabble.com/Hbase-replication-between-0-98-6-and-1-2-0-versions-tp4082106.html
> Sent from the HBase User mailing list archive at Nabble.com.


Hbase replication between 0.98.6 and 1.2.0 versions

2016-08-26 Thread spats

Does hbase replication works between different versions 0.98.6 and 1.2.0? 

We are in the process of upgrading our clusters & during that time we want
to make sure if replication will work fine across clusters. It would be
really helpful if anyone can share about hbase replication with different
versions?



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Hbase-replication-between-0-98-6-and-1-2-0-versions-tp4082106.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Dima Spivak
Sadly, there is no "easy" way to do it (blame filesystem changes and the
rpc differences, among other things).

A while back, someone posted about how he was able to do a snapshot export
between 0.94 and 0.98 [1] but this is not officially supported. Perhaps
someone else has ideas?

1.
http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3cc18855da-cb0f-4500-8d35-c79f78106...@digitalenvoy.net%3E

On Friday, August 26, 2016, Enrico Olivelli - Diennea <
enrico.olive...@diennea.com> wrote:

> Thank you Dima for your quick answer.
>
> Do you think that it would be possible to create a shaded version of the
> 0.94  client (with all the dependencies) and let it live inside the same
> JVM of a pure 1.2.2 client ?
>
> My real need is to copy data from a 0.94 cluster to a new 1.2.2
> installation, but temporary continuing to read from 0.94 in order not to
> provide down time
>
> what is the best way to copy data from a 0.94 cluster to a new cluster of
> different hbase major versions ?
>
> can you give me some link ?
>
> Thanks
> Enrico
>
>
> Il giorno ven, 26/08/2016 alle 00.04 -0700, Dima Spivak ha scritto:
>
> I would say no; 0.94 is not wire compatible with 1.2.2  because the former
> uses Hadoop IPC and the latter uses protocol buffers. Sorry, Enrico.
>
> On Friday, August 26, 2016, Enrico Olivelli - Diennea <
> enrico.olive...@diennea.com  enrico.olive...@diennea.com >> wrote:
>
>
>
> Hi,
> I would like to connect to both a 0.94 hbase cluster and a 1.2.2 hbase
> cluster from the same JVM
> I think that 0.94 client code is not compatible with 1.2.2
>
> do you think it is possible ?
>
> Thank you
>
>
> --
> Enrico Olivelli
> Software Development Manager @Diennea
> Tel.: (+39) 0546 066100 - Int. 925
> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>
> MagNews - E-mail Marketing Solutions
> http://www.magnews.it
> Diennea - Digital Marketing Solutions
> http://www.diennea.com
>
>
> 
>
> Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> email marketing! http://www.magnews.it/newsletter/
>
> The information in this email is confidential and may be legally
> privileged. If you are not the intended recipient please notify the sender
> immediately and destroy this email. Any unauthorized, direct or indirect,
> disclosure, copying, storage, distribution or other use is strictly
> forbidden.
>
>
>
>
>
>
>
> --
> Enrico Olivelli
> Software Development Manager @Diennea
> Tel.: (+39) 0546 066100 - Int. 925
> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>
> MagNews - E-mail Marketing Solutions
> http://www.magnews.it
> Diennea - Digital Marketing Solutions
> http://www.diennea.com
>
>
> 
>
> Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> email marketing! http://www.magnews.it/newsletter/
>
> The information in this email is confidential and may be legally
> privileged. If you are not the intended recipient please notify the sender
> immediately and destroy this email. Any unauthorized, direct or indirect,
> disclosure, copying, storage, distribution or other use is strictly
> forbidden.
>


-- 
-Dima


Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Enrico Olivelli - Diennea
Thank you Dima for your quick answer.

Do you think that it would be possible to create a shaded version of the 0.94  
client (with all the dependencies) and let it live inside the same JVM of a 
pure 1.2.2 client ?

My real need is to copy data from a 0.94 cluster to a new 1.2.2 installation, 
but temporary continuing to read from 0.94 in order not to provide down time

what is the best way to copy data from a 0.94 cluster to a new cluster of 
different hbase major versions ?

can you give me some link ?

Thanks
Enrico


Il giorno ven, 26/08/2016 alle 00.04 -0700, Dima Spivak ha scritto:

I would say no; 0.94 is not wire compatible with 1.2.2  because the former
uses Hadoop IPC and the latter uses protocol buffers. Sorry, Enrico.

On Friday, August 26, 2016, Enrico Olivelli - Diennea <
enrico.olive...@diennea.com> wrote:



Hi,
I would like to connect to both a 0.94 hbase cluster and a 1.2.2 hbase
cluster from the same JVM
I think that 0.94 client code is not compatible with 1.2.2

do you think it is possible ?

Thank you


--
Enrico Olivelli
Software Development Manager @Diennea
Tel.: (+39) 0546 066100 - Int. 925
Viale G.Marconi 30/14 - 48018 Faenza (RA)

MagNews - E-mail Marketing Solutions
http://www.magnews.it
Diennea - Digital Marketing Solutions
http://www.diennea.com




Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
email marketing! http://www.magnews.it/newsletter/

The information in this email is confidential and may be legally
privileged. If you are not the intended recipient please notify the sender
immediately and destroy this email. Any unauthorized, direct or indirect,
disclosure, copying, storage, distribution or other use is strictly
forbidden.







--
Enrico Olivelli
Software Development Manager @Diennea
Tel.: (+39) 0546 066100 - Int. 925
Viale G.Marconi 30/14 - 48018 Faenza (RA)

MagNews - E-mail Marketing Solutions
http://www.magnews.it
Diennea - Digital Marketing Solutions
http://www.diennea.com




Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed email 
marketing! http://www.magnews.it/newsletter/

The information in this email is confidential and may be legally privileged. If 
you are not the intended recipient please notify the sender immediately and 
destroy this email. Any unauthorized, direct or indirect, disclosure, copying, 
storage, distribution or other use is strictly forbidden.


Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Dima Spivak
I would say no; 0.94 is not wire compatible with 1.2.2  because the former
uses Hadoop IPC and the latter uses protocol buffers. Sorry, Enrico.

On Friday, August 26, 2016, Enrico Olivelli - Diennea <
enrico.olive...@diennea.com> wrote:

> Hi,
> I would like to connect to both a 0.94 hbase cluster and a 1.2.2 hbase
> cluster from the same JVM
> I think that 0.94 client code is not compatible with 1.2.2
>
> do you think it is possible ?
>
> Thank you
>
>
> --
> Enrico Olivelli
> Software Development Manager @Diennea
> Tel.: (+39) 0546 066100 - Int. 925
> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>
> MagNews - E-mail Marketing Solutions
> http://www.magnews.it
> Diennea - Digital Marketing Solutions
> http://www.diennea.com
>
>
> 
>
> Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> email marketing! http://www.magnews.it/newsletter/
>
> The information in this email is confidential and may be legally
> privileged. If you are not the intended recipient please notify the sender
> immediately and destroy this email. Any unauthorized, direct or indirect,
> disclosure, copying, storage, distribution or other use is strictly
> forbidden.
>


-- 
-Dima


Accessing different HBase versions from the same JVM

2016-08-26 Thread Enrico Olivelli - Diennea
Hi,
I would like to connect to both a 0.94 hbase cluster and a 1.2.2 hbase cluster 
from the same JVM
I think that 0.94 client code is not compatible with 1.2.2

do you think it is possible ?

Thank you


--
Enrico Olivelli
Software Development Manager @Diennea
Tel.: (+39) 0546 066100 - Int. 925
Viale G.Marconi 30/14 - 48018 Faenza (RA)

MagNews - E-mail Marketing Solutions
http://www.magnews.it
Diennea - Digital Marketing Solutions
http://www.diennea.com




Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed email 
marketing! http://www.magnews.it/newsletter/

The information in this email is confidential and may be legally privileged. If 
you are not the intended recipient please notify the sender immediately and 
destroy this email. Any unauthorized, direct or indirect, disclosure, copying, 
storage, distribution or other use is strictly forbidden.