[jira] [Created] (HBASE-24544) Recommend upping zk jute.maxbuffer in all but minor installs

2020-06-11 Thread Michael Stack (Jira)
Michael Stack created HBASE-24544:
-

 Summary: Recommend upping zk jute.maxbuffer in all but minor 
installs
 Key: HBASE-24544
 URL: https://issues.apache.org/jira/browse/HBASE-24544
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Michael Stack


Add a doc note in upgrade and in zookeeper section recommending upping zk 
jute.maxbuffer to be above the default of 1M.

Here is jute.maxbuffer from zk doc.

{code}
jute.maxbuffer:
(Java system property: jute.maxbuffer)
This option can only be set as a Java system property. There is no zookeeper 
prefix on it. It specifies the maximum size of the data that can be stored in a 
znode. The default is 0xf, or just under 1M. If this option is changed, the 
system property must be set on all servers and clients otherwise problems will 
arise. This is really a sanity check. ZooKeeper is designed to store data on 
the order of kilobytes in size.
{code}

It seems easy enough blowing the 1MB default. Here is one such scenario. A peer 
is disabled so WALs backup on each RegionServer or a bug makes it so we don't 
clear WALs out from under the RegionServer promptly. Backed-up WALs get into 
the hundreds... easy enough on a busy cluster. Next, there is a power outage 
and the cluster crashes down.

Recovery may require an SCP recovering hundreds of WALs. As is, the way our SCP 
works, we can end up with a /hbase/splitWAL dir with hundreds -- even thousands 
-- of WALs in it. The 1MB buffer limit in zk can't carry listings this big.

Of note, the jute.maxbuffer needs to be set on the zk servers -- with restart 
so the change is noticed -- and on the client-side, in the hbase master at 
least.

This issue is about highlighting this old issue in our doc. It seems to be 
absent totally.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24543) ScheduledChore logging is too chatty, replace with metrics

2020-06-11 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24543:
---

 Summary: ScheduledChore logging is too chatty, replace with metrics
 Key: HBASE-24543
 URL: https://issues.apache.org/jira/browse/HBASE-24543
 Project: HBase
  Issue Type: Improvement
  Components: metrics, Operability
Reporter: Andrew Kyle Purtell


ScheduledChore logs at DEBUG level the execution time of each chore. 

We used to log an average execution time across all chores every five minutes, 
which by consensus was judged to not be useful. Derived metrics like averages 
or histograms should be calculated per chore. So we modified the logging to 
dump the chore execution time each time it runs, to facilitate such 
calculations with the log aggregation and searching tool of choice. Per chore 
execution logging is more useful, in that sense, but may be too chatty. This is 
not unexpected but let me provide my observations so we can revisit this.

On the master, for example, this is logged every second:
{noformat}
2020-06-11 16:35:28,263 DEBUG 
[master/apurtell-ltm:8100.splitLogManager..Chore.1] hbase.ScheduledChore: 
SplitLogManager Timeout Monitor execution time: 0 ms.
{noformat}

Does the value of these lines outweigh the cost of 86,400 log lines per day per 
master instance? (At least.)

On the regionserver it is somewhat better, these are logged every 10 seconds:
{noformat}
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] 
hbase.ScheduledChore: CompactionChecker execution time: 0 ms.
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] 
hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms.
{noformat}

So that will be 17,280 log lines per day per regionserver. (At least.)

Perhaps these should be moved to TRACE level. 

We should definitely replace this logging with histogram metrics. There should 
be a separate metric for each distinct chore classname, allocated as needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24534) Delete reference off to Hadoop wiki's HBase FAQ

2020-06-11 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24534.
--
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

> Delete reference off to Hadoop wiki's HBase FAQ
> ---
>
> Key: HBASE-24534
> URL: https://issues.apache.org/jira/browse/HBASE-24534
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> Our `faq.adoc` has a link off to [a 
> FAQ|https://cwiki.apache.org/confluence/display/HADOOP2/Hbase+FAQ] in the 
> Hadoop wiki, which is empty other than a pointer back to our book. Let's just 
> delete the reference to the hadoop wiki.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24542) Update anonymous git url on website

2020-06-11 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HBASE-24542:


 Summary: Update anonymous git url on website
 Key: HBASE-24542
 URL: https://issues.apache.org/jira/browse/HBASE-24542
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Nick Dimiduk


Our [source repository page|https://hbase.apache.org/source-repository.html] 
lists the anonymous gitbox url using {{git://}} protocol. They tell me over on 
{{#asfinfra}} that gitbox has never supported the {{git://}} protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24400) Automatically download CMake Dependencies

2020-06-11 Thread Bharath Vissapragada (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Vissapragada resolved HBASE-24400.
--
Fix Version/s: master
   Resolution: Fixed

> Automatically download CMake Dependencies
> -
>
> Key: HBASE-24400
> URL: https://issues.apache.org/jira/browse/HBASE-24400
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Marc Parisi
>Assignee: Marc Parisi
>Priority: Major
> Fix For: master
>
>
> To improve the ability to build we should download and link a local version 
> of dependencies ( in the build folder )
>  
> This will help with skew of versions and the ability to build the project.
>  
> This will help the build process in docker and allow people to develop 
> locally. this will also pave the way for future work to support



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Change the IA for MutableSizeHistogram and MutableTimeHistogram to LImitedPrivate

2020-06-11 Thread Andrew Purtell
That's unfortunate, but needs must, IMHO.

A potential benefit of also marking the impls LP(COPROC) is this captures
any implicit dependency on semantics and functionality of the
implementation classes not directly exposed in the hbase-metrics-api facade.

So, let's do both? (Facade improvement, raise to LP the impl classes)


On Thu, Jun 11, 2020 at 12:00 PM Geoffrey Jacoby  wrote:

> Couple points:
>
> 1. I like Andrew's proposed solution, and we should do it, but I'm not sure
> it's sufficient for Rushabh's purposes because of semver rules. Phoenix
> supports HBase 1.3 -1.5 (soon to add 1.6) and HBase 2.0 (soon to gain 2.1
> and 2.2, with 2.3 coming shortly after its release here.) If we add the new
> sizeHistogram and timeHistogram methods to hbase-metrics, they'll be
> available in Phoenix only in HBase 1.7 and 2.4. (since 2.3 is
> mostly-frozen)
>
>  Since Phoenix will be supporting earlier versions of both HBase branches
> for a good while, there will need to be a compatibility shim. And the
> older-version instance of the shim will probably need to access the classes
> directly. (Please correct me if I'm wrong, Rushabh or Andrew.) So it still
> might need a LimitedPrivate IA.
>
> 2. I agree with Nick that it's better to use LimitedPrivate.COPROC rather
> than LimitedPrivate.PHOENIX.
>
> Geoffrey
>
>
>
> On Thu, Jun 11, 2020 at 11:28 AM Josh Elser  wrote:
>
> > Sounds reasonable to me!
> >
> > On 6/11/20 1:06 PM, Andrew Purtell wrote:
> > > hbase-metrics-api is available for coprocessors already and interfaces
> > > within are already LimitedPrivate(COPROC). However, that package is
> > mostly
> > > interface and seems geared toward consuming metrics instantiated and
> > > registered via private stuff. Or, rather, I didn't see how Phoenix
> could
> > choose
> > > which of MutableSizeHistogram and MutableTimeHistogram to instantiate
> > using
> > > those interfaces, there is only Histogram
> MetricRegistry#histogram(String
> > > name). So I think it is also worth some time to review the utility of
> > > hbase-metrics-api and decide if more need be done there. Would the
> > addition
> > > of
> > >
> > > Histogram MetricRegistry#sizeHistogram(String name)
> > > Histogram MetricRegistry#timeHistogram(String name)
> > >
> > > achieve the desired objective instead?
> > >
> > >
> > > On Thu, Jun 11, 2020 at 9:16 AM Nick Dimiduk 
> > wrote:
> > >
> > >> I was just about to reply with the same -- Josh is faster :) +1 on
> > >> considering the full surface area of the APIs being exposed.
> > >>
> > >> I also wonder if exposing the metrics infrastructure is something of
> > >> interest more broadly than Phoenix. Seems like any coprocessor might
> > want
> > >> to provide or monitor some metric value.
> > >>
> > >> On Thu, Jun 11, 2020 at 9:08 AM Josh Elser  wrote:
> > >>
> > >>> My only concern is that you can't just mark these two classes a
> > >>> LimitedPrivate for Phoenix -- you would also have to mark
> > >>> MutableRangeHistogram, MutableHistogram (and the rest of the class
> > >>> hierarchy) to make sure that we don't make it super confusing as to
> > what
> > >>> comes from LimitedPrivate classes and what is coming from Private
> > >> classes.
> > >>>
> > >>> Would it be better to just say: make
> > >>> ./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/lib
> > >>> LimitedPrivate?
> > >>>
> > >>> Do you also need the stuff in
> > >>> hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase to push
> > >>> metrics back through the HBase metrics subsystem?
> > >>>
> > >>> Sorry for the late reply. Just want to make sure we open up the
> > >>> audience, we open it sufficiently.
> > >>>
> > >>> On 6/8/20 1:15 PM, Rushabh Shah wrote:
> >  Hi,
> >  Currently the IA for MutableSizeHistogram and MutableTimeHistogram
> is
> >  private. We want to use these classes in PHOENIX project and I
> thought
> > >> we
> >  can leverage the existing implementation from hbase histo
> > >> implementation.
> >  IIUC the private IA can't be used in other projects. Proposing to
> make
> > >> it
> >  LimitedPrivate and mark HBaseInterfaceAudience.PHOENIX. Please
> > suggest.
> >  Related jira: https://issues.apache.org/jira/browse/HBASE-24520
> > 
> > >>>
> > >>
> > >
> > >
> >
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [DISCUSS] Change the IA for MutableSizeHistogram and MutableTimeHistogram to LImitedPrivate

2020-06-11 Thread Geoffrey Jacoby
Couple points:

1. I like Andrew's proposed solution, and we should do it, but I'm not sure
it's sufficient for Rushabh's purposes because of semver rules. Phoenix
supports HBase 1.3 -1.5 (soon to add 1.6) and HBase 2.0 (soon to gain 2.1
and 2.2, with 2.3 coming shortly after its release here.) If we add the new
sizeHistogram and timeHistogram methods to hbase-metrics, they'll be
available in Phoenix only in HBase 1.7 and 2.4. (since 2.3 is mostly-frozen)

 Since Phoenix will be supporting earlier versions of both HBase branches
for a good while, there will need to be a compatibility shim. And the
older-version instance of the shim will probably need to access the classes
directly. (Please correct me if I'm wrong, Rushabh or Andrew.) So it still
might need a LimitedPrivate IA.

2. I agree with Nick that it's better to use LimitedPrivate.COPROC rather
than LimitedPrivate.PHOENIX.

Geoffrey



On Thu, Jun 11, 2020 at 11:28 AM Josh Elser  wrote:

> Sounds reasonable to me!
>
> On 6/11/20 1:06 PM, Andrew Purtell wrote:
> > hbase-metrics-api is available for coprocessors already and interfaces
> > within are already LimitedPrivate(COPROC). However, that package is
> mostly
> > interface and seems geared toward consuming metrics instantiated and
> > registered via private stuff. Or, rather, I didn't see how Phoenix could
> choose
> > which of MutableSizeHistogram and MutableTimeHistogram to instantiate
> using
> > those interfaces, there is only Histogram MetricRegistry#histogram(String
> > name). So I think it is also worth some time to review the utility of
> > hbase-metrics-api and decide if more need be done there. Would the
> addition
> > of
> >
> > Histogram MetricRegistry#sizeHistogram(String name)
> > Histogram MetricRegistry#timeHistogram(String name)
> >
> > achieve the desired objective instead?
> >
> >
> > On Thu, Jun 11, 2020 at 9:16 AM Nick Dimiduk 
> wrote:
> >
> >> I was just about to reply with the same -- Josh is faster :) +1 on
> >> considering the full surface area of the APIs being exposed.
> >>
> >> I also wonder if exposing the metrics infrastructure is something of
> >> interest more broadly than Phoenix. Seems like any coprocessor might
> want
> >> to provide or monitor some metric value.
> >>
> >> On Thu, Jun 11, 2020 at 9:08 AM Josh Elser  wrote:
> >>
> >>> My only concern is that you can't just mark these two classes a
> >>> LimitedPrivate for Phoenix -- you would also have to mark
> >>> MutableRangeHistogram, MutableHistogram (and the rest of the class
> >>> hierarchy) to make sure that we don't make it super confusing as to
> what
> >>> comes from LimitedPrivate classes and what is coming from Private
> >> classes.
> >>>
> >>> Would it be better to just say: make
> >>> ./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/lib
> >>> LimitedPrivate?
> >>>
> >>> Do you also need the stuff in
> >>> hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase to push
> >>> metrics back through the HBase metrics subsystem?
> >>>
> >>> Sorry for the late reply. Just want to make sure we open up the
> >>> audience, we open it sufficiently.
> >>>
> >>> On 6/8/20 1:15 PM, Rushabh Shah wrote:
>  Hi,
>  Currently the IA for MutableSizeHistogram and MutableTimeHistogram is
>  private. We want to use these classes in PHOENIX project and I thought
> >> we
>  can leverage the existing implementation from hbase histo
> >> implementation.
>  IIUC the private IA can't be used in other projects. Proposing to make
> >> it
>  LimitedPrivate and mark HBaseInterfaceAudience.PHOENIX. Please
> suggest.
>  Related jira: https://issues.apache.org/jira/browse/HBASE-24520
> 
> >>>
> >>
> >
> >
>


Re: [DISCUSS] Change the IA for MutableSizeHistogram and MutableTimeHistogram to LImitedPrivate

2020-06-11 Thread Josh Elser

Sounds reasonable to me!

On 6/11/20 1:06 PM, Andrew Purtell wrote:

hbase-metrics-api is available for coprocessors already and interfaces
within are already LimitedPrivate(COPROC). However, that package is mostly
interface and seems geared toward consuming metrics instantiated and
registered via private stuff. Or, rather, I didn't see how Phoenix could choose
which of MutableSizeHistogram and MutableTimeHistogram to instantiate using
those interfaces, there is only Histogram MetricRegistry#histogram(String
name). So I think it is also worth some time to review the utility of
hbase-metrics-api and decide if more need be done there. Would the addition
of

Histogram MetricRegistry#sizeHistogram(String name)
Histogram MetricRegistry#timeHistogram(String name)

achieve the desired objective instead?


On Thu, Jun 11, 2020 at 9:16 AM Nick Dimiduk  wrote:


I was just about to reply with the same -- Josh is faster :) +1 on
considering the full surface area of the APIs being exposed.

I also wonder if exposing the metrics infrastructure is something of
interest more broadly than Phoenix. Seems like any coprocessor might want
to provide or monitor some metric value.

On Thu, Jun 11, 2020 at 9:08 AM Josh Elser  wrote:


My only concern is that you can't just mark these two classes a
LimitedPrivate for Phoenix -- you would also have to mark
MutableRangeHistogram, MutableHistogram (and the rest of the class
hierarchy) to make sure that we don't make it super confusing as to what
comes from LimitedPrivate classes and what is coming from Private

classes.


Would it be better to just say: make
./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/lib
LimitedPrivate?

Do you also need the stuff in
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase to push
metrics back through the HBase metrics subsystem?

Sorry for the late reply. Just want to make sure we open up the
audience, we open it sufficiently.

On 6/8/20 1:15 PM, Rushabh Shah wrote:

Hi,
Currently the IA for MutableSizeHistogram and MutableTimeHistogram is
private. We want to use these classes in PHOENIX project and I thought

we

can leverage the existing implementation from hbase histo

implementation.

IIUC the private IA can't be used in other projects. Proposing to make

it

LimitedPrivate and mark HBaseInterfaceAudience.PHOENIX. Please suggest.
Related jira: https://issues.apache.org/jira/browse/HBASE-24520










Re: [DISCUSS] Change the IA for MutableSizeHistogram and MutableTimeHistogram to LImitedPrivate

2020-06-11 Thread Andrew Purtell
hbase-metrics-api is available for coprocessors already and interfaces
within are already LimitedPrivate(COPROC). However, that package is mostly
interface and seems geared toward consuming metrics instantiated and
registered via private stuff. Or, rather, I didn't see how Phoenix could choose
which of MutableSizeHistogram and MutableTimeHistogram to instantiate using
those interfaces, there is only Histogram MetricRegistry#histogram(String
name). So I think it is also worth some time to review the utility of
hbase-metrics-api and decide if more need be done there. Would the addition
of

Histogram MetricRegistry#sizeHistogram(String name)
Histogram MetricRegistry#timeHistogram(String name)

achieve the desired objective instead?


On Thu, Jun 11, 2020 at 9:16 AM Nick Dimiduk  wrote:

> I was just about to reply with the same -- Josh is faster :) +1 on
> considering the full surface area of the APIs being exposed.
>
> I also wonder if exposing the metrics infrastructure is something of
> interest more broadly than Phoenix. Seems like any coprocessor might want
> to provide or monitor some metric value.
>
> On Thu, Jun 11, 2020 at 9:08 AM Josh Elser  wrote:
>
> > My only concern is that you can't just mark these two classes a
> > LimitedPrivate for Phoenix -- you would also have to mark
> > MutableRangeHistogram, MutableHistogram (and the rest of the class
> > hierarchy) to make sure that we don't make it super confusing as to what
> > comes from LimitedPrivate classes and what is coming from Private
> classes.
> >
> > Would it be better to just say: make
> > ./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/lib
> > LimitedPrivate?
> >
> > Do you also need the stuff in
> > hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase to push
> > metrics back through the HBase metrics subsystem?
> >
> > Sorry for the late reply. Just want to make sure we open up the
> > audience, we open it sufficiently.
> >
> > On 6/8/20 1:15 PM, Rushabh Shah wrote:
> > > Hi,
> > > Currently the IA for MutableSizeHistogram and MutableTimeHistogram is
> > > private. We want to use these classes in PHOENIX project and I thought
> we
> > > can leverage the existing implementation from hbase histo
> implementation.
> > > IIUC the private IA can't be used in other projects. Proposing to make
> it
> > > LimitedPrivate and mark HBaseInterfaceAudience.PHOENIX. Please suggest.
> > > Related jira: https://issues.apache.org/jira/browse/HBASE-24520
> > >
> >
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [DISCUSS] Change the IA for MutableSizeHistogram and MutableTimeHistogram to LImitedPrivate

2020-06-11 Thread Nick Dimiduk
I was just about to reply with the same -- Josh is faster :) +1 on
considering the full surface area of the APIs being exposed.

I also wonder if exposing the metrics infrastructure is something of
interest more broadly than Phoenix. Seems like any coprocessor might want
to provide or monitor some metric value.

On Thu, Jun 11, 2020 at 9:08 AM Josh Elser  wrote:

> My only concern is that you can't just mark these two classes a
> LimitedPrivate for Phoenix -- you would also have to mark
> MutableRangeHistogram, MutableHistogram (and the rest of the class
> hierarchy) to make sure that we don't make it super confusing as to what
> comes from LimitedPrivate classes and what is coming from Private classes.
>
> Would it be better to just say: make
> ./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/lib
> LimitedPrivate?
>
> Do you also need the stuff in
> hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase to push
> metrics back through the HBase metrics subsystem?
>
> Sorry for the late reply. Just want to make sure we open up the
> audience, we open it sufficiently.
>
> On 6/8/20 1:15 PM, Rushabh Shah wrote:
> > Hi,
> > Currently the IA for MutableSizeHistogram and MutableTimeHistogram is
> > private. We want to use these classes in PHOENIX project and I thought we
> > can leverage the existing implementation from hbase histo implementation.
> > IIUC the private IA can't be used in other projects. Proposing to make it
> > LimitedPrivate and mark HBaseInterfaceAudience.PHOENIX. Please suggest.
> > Related jira: https://issues.apache.org/jira/browse/HBASE-24520
> >
>


Re: [DISCUSS] Change the IA for MutableSizeHistogram and MutableTimeHistogram to LImitedPrivate

2020-06-11 Thread Josh Elser
My only concern is that you can't just mark these two classes a 
LimitedPrivate for Phoenix -- you would also have to mark 
MutableRangeHistogram, MutableHistogram (and the rest of the class 
hierarchy) to make sure that we don't make it super confusing as to what 
comes from LimitedPrivate classes and what is coming from Private classes.


Would it be better to just say: make 
./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/lib 
LimitedPrivate?


Do you also need the stuff in 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase to push 
metrics back through the HBase metrics subsystem?


Sorry for the late reply. Just want to make sure we open up the 
audience, we open it sufficiently.


On 6/8/20 1:15 PM, Rushabh Shah wrote:

Hi,
Currently the IA for MutableSizeHistogram and MutableTimeHistogram is
private. We want to use these classes in PHOENIX project and I thought we
can leverage the existing implementation from hbase histo implementation.
IIUC the private IA can't be used in other projects. Proposing to make it
LimitedPrivate and mark HBaseInterfaceAudience.PHOENIX. Please suggest.
Related jira: https://issues.apache.org/jira/browse/HBASE-24520



[jira] [Created] (HBASE-24541) Add support to run LoadIncrementalHFiles in a distributed manner

2020-06-11 Thread Constantin-Catalin Luca (Jira)
Constantin-Catalin Luca created HBASE-24541:
---

 Summary: Add support to run LoadIncrementalHFiles in a distributed 
manner
 Key: HBASE-24541
 URL: https://issues.apache.org/jira/browse/HBASE-24541
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, Performance
Affects Versions: 1.4.0
Reporter: Constantin-Catalin Luca


LoadIncrementalHFiles takes a very long time to complete when running HBase on 
top of S3 and attempting to bulkload 500K-700K files.

The root cause of this is a combination of the higher latency of S3 (as 
compared to HDFS) as well as the calls made by LoadIncrementalHFiles to the 
underlying filesystem(each file is opened, seeked to the trailer offset at the 
end, and then the trailer is read).

Increasing the parallelism does not yield any significant improvement. This 
seems to stem from the fact that once the trailer is read the stream is not 
consumed to the end. This causes the underlying HTTP connection to be aborted 
and it cannot be re-used.

 

The proposed solution would be to also add support to run LoadIncrementalHFiles 
on multiple machines as a map reduce job. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]HBase2.1.0 is slower than HBase1.2.0

2020-06-11 Thread ramkrishna vasudevan
Oh great. Thanks for pointing that out. I think that is what is the exact
place that the perf bottleneck was found.

Regards
Ram


On Thu, Jun 11, 2020 at 4:29 PM 张铎(Duo Zhang)  wrote:

> Oh, good. I recall that there is a related issue but I just forget the
> title so I can not find it...
>
> Thanks for chimming in.
>
> OpenInx  于2020年6月11日周四 下午6:39写道:
>
> > Hi Zheng wang.
> >
> > Hope this issue will be helpful for you.
> > https://issues.apache.org/jira/browse/HBASE-21657
> > Thanks.
> >
> > On Tue, Jun 9, 2020 at 5:53 PM Anoop John  wrote:
> >
> > > Thanks for the detailed analysis and update zheng wang.
> > > >The code line below in StoreScanner.next() cost about 100ms in v2.1,
> and
> > > it added from v2.0, see HBASE-17647.
> > > So still there is some additional cost in 2.1 right? Do u have any
> other
> > > observation?  Are we doing more cell compares in 2.x?
> > >
> > > Anoop
> > >
> > >
> > > On Mon, Jun 8, 2020 at 1:50 AM zheng wang <18031...@qq.com> wrote:
> > >
> > > > Hi guys:
> > > >
> > > >
> > > > I did some test on my pc to find the reason as Jan Van Besien
> mentioned
> > > in
> > > > user channel.
> > > >
> > > >
> > > > #test env
> > > > OS : win10
> > > > JDK: 1.8
> > > > MEM: 8GB
> > > >
> > > >
> > > > #test data:
> > > > 1 million rows with only one columnfamily and one qualifier.
> > > >
> > > >
> > > > rowkey: rowkey-#index#
> > > > value: value-#index#
> > > >
> > > >
> > > > #test method:
> > > > just use client api to scan with default config several times, no pe,
> > no
> > > > ycsb
> > > >
> > > >
> > > > #test result(avg):
> > > > v1.2.0: 800ms
> > > > v2.1.0: 1050ms
> > > >
> > > >
> > > > So, it is sure that v2.1 is slower than v1.2, after this, i did some
> > > > statistics on regionserver.
> > > > Then i find the partly reason is related to the size estimated.
> > > >
> > > >
> > > > The code line below in StoreScanner.next() cost about 100ms in v2.1,
> > and
> > > > it added from v2.0, see HBASE-17647.
> > > > "int cellSize = PrivateCellUtil.estimatedSerializedSizeOf(cell);"
> > > >
> > > >
> > > > Should we support to disable the MaxResultSize limit(2MB by default
> > now)
> > > > to get more efficient if user exactly knows their data and could
> limit
> > > > results only by setBatch and setLimit?
> > >
> >
>


Re: [DISCUSS]HBase2.1.0 is slower than HBase1.2.0

2020-06-11 Thread Duo Zhang
Oh, good. I recall that there is a related issue but I just forget the
title so I can not find it...

Thanks for chimming in.

OpenInx  于2020年6月11日周四 下午6:39写道:

> Hi Zheng wang.
>
> Hope this issue will be helpful for you.
> https://issues.apache.org/jira/browse/HBASE-21657
> Thanks.
>
> On Tue, Jun 9, 2020 at 5:53 PM Anoop John  wrote:
>
> > Thanks for the detailed analysis and update zheng wang.
> > >The code line below in StoreScanner.next() cost about 100ms in v2.1, and
> > it added from v2.0, see HBASE-17647.
> > So still there is some additional cost in 2.1 right? Do u have any other
> > observation?  Are we doing more cell compares in 2.x?
> >
> > Anoop
> >
> >
> > On Mon, Jun 8, 2020 at 1:50 AM zheng wang <18031...@qq.com> wrote:
> >
> > > Hi guys:
> > >
> > >
> > > I did some test on my pc to find the reason as Jan Van Besien mentioned
> > in
> > > user channel.
> > >
> > >
> > > #test env
> > > OS : win10
> > > JDK: 1.8
> > > MEM: 8GB
> > >
> > >
> > > #test data:
> > > 1 million rows with only one columnfamily and one qualifier.
> > >
> > >
> > > rowkey: rowkey-#index#
> > > value: value-#index#
> > >
> > >
> > > #test method:
> > > just use client api to scan with default config several times, no pe,
> no
> > > ycsb
> > >
> > >
> > > #test result(avg):
> > > v1.2.0: 800ms
> > > v2.1.0: 1050ms
> > >
> > >
> > > So, it is sure that v2.1 is slower than v1.2, after this, i did some
> > > statistics on regionserver.
> > > Then i find the partly reason is related to the size estimated.
> > >
> > >
> > > The code line below in StoreScanner.next() cost about 100ms in v2.1,
> and
> > > it added from v2.0, see HBASE-17647.
> > > "int cellSize = PrivateCellUtil.estimatedSerializedSizeOf(cell);"
> > >
> > >
> > > Should we support to disable the MaxResultSize limit(2MB by default
> now)
> > > to get more efficient if user exactly knows their data and could limit
> > > results only by setBatch and setLimit?
> >
>


Re: [DISCUSS]HBase2.1.0 is slower than HBase1.2.0

2020-06-11 Thread OpenInx
Hi Zheng wang.

Hope this issue will be helpful for you.
https://issues.apache.org/jira/browse/HBASE-21657
Thanks.

On Tue, Jun 9, 2020 at 5:53 PM Anoop John  wrote:

> Thanks for the detailed analysis and update zheng wang.
> >The code line below in StoreScanner.next() cost about 100ms in v2.1, and
> it added from v2.0, see HBASE-17647.
> So still there is some additional cost in 2.1 right? Do u have any other
> observation?  Are we doing more cell compares in 2.x?
>
> Anoop
>
>
> On Mon, Jun 8, 2020 at 1:50 AM zheng wang <18031...@qq.com> wrote:
>
> > Hi guys:
> >
> >
> > I did some test on my pc to find the reason as Jan Van Besien mentioned
> in
> > user channel.
> >
> >
> > #test env
> > OS : win10
> > JDK: 1.8
> > MEM: 8GB
> >
> >
> > #test data:
> > 1 million rows with only one columnfamily and one qualifier.
> >
> >
> > rowkey: rowkey-#index#
> > value: value-#index#
> >
> >
> > #test method:
> > just use client api to scan with default config several times, no pe, no
> > ycsb
> >
> >
> > #test result(avg):
> > v1.2.0: 800ms
> > v2.1.0: 1050ms
> >
> >
> > So, it is sure that v2.1 is slower than v1.2, after this, i did some
> > statistics on regionserver.
> > Then i find the partly reason is related to the size estimated.
> >
> >
> > The code line below in StoreScanner.next() cost about 100ms in v2.1, and
> > it added from v2.0, see HBASE-17647.
> > "int cellSize = PrivateCellUtil.estimatedSerializedSizeOf(cell);"
> >
> >
> > Should we support to disable the MaxResultSize limit(2MB by default now)
> > to get more efficient if user exactly knows their data and could limit
> > results only by setBatch and setLimit?
>