Re: distributing new regions immediately

2017-07-27 Thread jeff saremi
very interesting. Thanks Ted


From: Ted Yu 
Sent: Thursday, July 27, 2017 2:13:25 PM
To: user@hbase.apache.org
Subject: Re: distributing new regions immediately

Since you're more concerned with write load, you can take a look at the
following parameter:

hbase.master.balancer.stochastic.writeRequestCost

Default value is 5, much smaller than default value for region count cost
(500).
Consider raising the value so that load balancer reacts more responsively.

On Thu, Jul 27, 2017 at 12:17 PM, jeff saremi 
wrote:

> We haven't done enough testing for me to say this with certainty but as we
> insert data and new regions get created, it could be a while before those
> regions are distributed. As such and if the data injection continues the
> load on the region server becomes overwhelming
>
> Is there a way to expedite the distribution of regions among available
> region servers?
>
> thanks
>
>


Re: distributing new regions immediately

2017-07-27 Thread jeff saremi
Thanks Dima


From: Dima Spivak 
Sent: Thursday, July 27, 2017 12:38:56 PM
To: user@hbase.apache.org
Subject: Re: distributing new regions immediately

Presplitting tables is typically how this is addressed in production cases.

On Thu, Jul 27, 2017 at 12:17 PM jeff saremi  wrote:

> We haven't done enough testing for me to say this with certainty but as we
> insert data and new regions get created, it could be a while before those
> regions are distributed. As such and if the data injection continues the
> load on the region server becomes overwhelming
>
> Is there a way to expedite the distribution of regions among available
> region servers?
>
> thanks
>
> --
-Dima


Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

2017-07-27 Thread Robert Yokota
For those who are interested, yet another blog on analyzing graphs stored
in HBase, this time with Apache Flink Gelly:

https://yokota.blog/2017/07/27/graph-analytics-on-hbase-with-hgraphdb-and-apache-flink-gelly/


Re: distributing new regions immediately

2017-07-27 Thread Ted Yu
Since you're more concerned with write load, you can take a look at the
following parameter:

hbase.master.balancer.stochastic.writeRequestCost

Default value is 5, much smaller than default value for region count cost
(500).
Consider raising the value so that load balancer reacts more responsively.

On Thu, Jul 27, 2017 at 12:17 PM, jeff saremi 
wrote:

> We haven't done enough testing for me to say this with certainty but as we
> insert data and new regions get created, it could be a while before those
> regions are distributed. As such and if the data injection continues the
> load on the region server becomes overwhelming
>
> Is there a way to expedite the distribution of regions among available
> region servers?
>
> thanks
>
>


Re: distributing new regions immediately

2017-07-27 Thread Dima Spivak
Presplitting tables is typically how this is addressed in production cases.

On Thu, Jul 27, 2017 at 12:17 PM jeff saremi  wrote:

> We haven't done enough testing for me to say this with certainty but as we
> insert data and new regions get created, it could be a while before those
> regions are distributed. As such and if the data injection continues the
> load on the region server becomes overwhelming
>
> Is there a way to expedite the distribution of regions among available
> region servers?
>
> thanks
>
> --
-Dima


distributing new regions immediately

2017-07-27 Thread jeff saremi
We haven't done enough testing for me to say this with certainty but as we 
insert data and new regions get created, it could be a while before those 
regions are distributed. As such and if the data injection continues the load 
on the region server becomes overwhelming

Is there a way to expedite the distribution of regions among available region 
servers?

thanks



Re: HBase GET operation max row size - partial results

2017-07-27 Thread Yu Li
AFAIK there's no max result size or partial result for get request. If we
add such feature in future, we will add release note in JIRA.
(Actually we have implemented such limit in our customized version and it
requires CP to correctly handle it, we may upstream the feature later)

Some more detailed information at code level:

When saying "get uses scan internally", we mean it reuse the scan logic in
HRegion class. But at the rpc service level in RSRpcServices, there're two
different methods (get and scan) for these two kinds of requests, and
currently you'll only find below max result size limit in scan code path:

{code}

long maxResultSize;

if (scanner.getMaxResultSize() > 0) {

  maxResultSize = Math.min(scanner.getMaxResultSize(),
maxQuotaResultSize);

} else {

  maxResultSize = maxQuotaResultSize;

}

...

ScannerContext.Builder contextBuilder = ScannerContext.newBuilder(
true);

// maxResultSize - either we can reach this much size for all
cells(being read) data or sum

// of heap size occupied by cells(being read). Cell data means its
key and value parts.

contextBuilder.setSizeLimit(sizeScope, maxResultSize, maxResultSize
);

...

ScannerContext scannerContext = contextBuilder.build();

while (numOfResults < maxResults) {

  ...

  moreRows = scanner.nextRaw(values, scannerContext);

 ...
{code}

Hope it helps.

Best Regards,
Yu

On 27 July 2017 at 09:54, Anoop John  wrote:

> You mean within your RegionObserver you are doing the get?  Within
> which hook?   What is the way you are doing the get?   Can u paste
> that sample code.
>
> -Anoop-
>
> On Wed, Jul 26, 2017 at 8:02 PM, Veerraju Tadimeti 
> wrote:
> > Hi,
> >
> > If i use GET operation, is there any chance of getting partial result? If
> > Yes, under what circumstances.  Is there any way to reproduce it?
> >
> > I am using GET operation in my coProcessor ( to the same region), adding
> > the resut to the List .  I am afraid that any chance  of partial
> > result when using GET operation, since GET uses SCAN operation
> internally.
> >
> > Thanks,
> > Raju,
> > (972)273-0155.
>