Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Ashish Singhi
Congratulations, Chia-Ping.

On Sat, Sep 30, 2017 at 3:49 AM, Misty Stanley-Jones 
wrote:

> The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
> join
> the HBase PMC, and help to make the project run smoothly. Chia-Ping became
> an
> HBase committer over 6 months ago, based on long-running participate in the
> HBase project, a consistent record of resolving HBase issues, and
> contributions
> to testing and performance.
>
> Thank you for stepping up to serve, Chia-Ping!
>
> As a reminder, if anyone would like to nominate another person as a
> committer or PMC member, even if you are not currently a committer or PMC
> member, you can always drop a note to priv...@hbase.apache.org to let us
> know!
>
> Thanks,
> Misty (on behalf of the HBase PMC)
>


[jira] [Created] (HBASE-18911) Unify Admin and AsyncAdmin's methods name

2017-09-29 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-18911:
--

 Summary: Unify Admin and AsyncAdmin's methods name
 Key: HBASE-18911
 URL: https://issues.apache.org/jira/browse/HBASE-18911
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18910) Backport HBASE-17292 "Add observer notification before bulk loaded hfile is moved to region directory" to 1.3

2017-09-29 Thread Guangxu Cheng (JIRA)
Guangxu Cheng created HBASE-18910:
-

 Summary: Backport HBASE-17292 "Add observer notification before 
bulk loaded hfile is moved to region directory" to 1.3
 Key: HBASE-18910
 URL: https://issues.apache.org/jira/browse/HBASE-18910
 Project: HBase
  Issue Type: Bug
Reporter: Guangxu Cheng
Assignee: Guangxu Cheng
 Fix For: 1.3.2


HBASE-18900 will backport HBASE-17290 to branch-1.3.But  HBASE-17290 is 
dependent on HBASE-17292.so this issue will backport HBASE-17292 to branch-1.3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18909) Deprecate Admin's methods which used String regex

2017-09-29 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-18909:
--

 Summary: Deprecate Admin's methods which used String regex
 Key: HBASE-18909
 URL: https://issues.apache.org/jira/browse/HBASE-18909
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18908) Add Java 9 section to support matrix documentation

2017-09-29 Thread Mike Drob (JIRA)
Mike Drob created HBASE-18908:
-

 Summary: Add Java 9 section to support matrix documentation
 Key: HBASE-18908
 URL: https://issues.apache.org/jira/browse/HBASE-18908
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Mike Drob






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Performance issue in the Join query on the HBase tables

2017-09-29 Thread wenxing zheng
@Eric: for the trafodion, will take a look.

@Nick: And for the Hive/Spark over snapshots, I just have a try on the Hive
over HBase snapshots, the select(count) is much more faster than Hive over
HBase. Since the HBase tables are all so big, how to make the engine
respecting the data locality?

Thank you very much,



On Fri, Sep 29, 2017 at 10:22 PM, Nick Dimiduk  wrote:

> Have you considered running Hive/Spark over snapshots of your HBase tables?
>
> If you're seeing network saturation over HBase but not hdfs, makes me think
> data locality is not being honored. Might be worth investigating as well.
>
> On Fri, Sep 29, 2017 at 3:26 AM wenxing zheng 
> wrote:
>
> > Dear all,
> >
> > I have 3 big HBase tables, which all have millions of rows(rows are
> synced
> > from MySQL DB via Bin log) and for each HBase table, we have an external
> > table on Hive correspondingly with the storage by
> > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is
> that
> > we can always keep sync up with the production DB and provides random
> > access by key.
> >
> > Now our business needs to do some analysis on those tables with Join
> query.
> > What's the best practice to make it?
> >
> > From my experiment, I found that with the Spark SQL on HBase or Hive, the
> > job ran very slowly and will saturate the network bandwidth. But it works
> > very well for the Hive SQL directly against Hive from HDFS files(make a
> > copy of the data to HDFS files).
> >
> > Appreciated for any advice on what would be the problem here? and the way
> > to optimize the job.
> > Regards, Wenxing
> >
>


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Mike Drob
Well deserved, Chia-Ping!

On Fri, Sep 29, 2017 at 6:04 PM, Esteban Gutierrez 
wrote:

> Congrats  Chia-Ping! and Welcome!
>
> --
> Cloudera, Inc.
>
>
> On Fri, Sep 29, 2017 at 3:52 PM, Guanghao Zhang 
> wrote:
>
> > Congratulations!
> >
> > 2017-09-30 6:38 GMT+08:00 Andrew Purtell :
> >
> > > Congratulations, Chia-Ping! Welcome to the PMC.
> > >
> > > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones  >
> > > wrote:
> > >
> > > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed
> > to
> > > > join
> > > > the HBase PMC, and help to make the project run smoothly. Chia-Ping
> > > became
> > > > an
> > > > HBase committer over 6 months ago, based on long-running participate
> in
> > > the
> > > > HBase project, a consistent record of resolving HBase issues, and
> > > > contributions
> > > > to testing and performance.
> > > >
> > > > Thank you for stepping up to serve, Chia-Ping!
> > > >
> > > > As a reminder, if anyone would like to nominate another person as a
> > > > committer or PMC member, even if you are not currently a committer or
> > PMC
> > > > member, you can always drop a note to priv...@hbase.apache.org to
> let
> > us
> > > > know!
> > > >
> > > > Thanks,
> > > > Misty (on behalf of the HBase PMC)
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >- A23, Crosstalk
> > >
> >
>


[jira] [Resolved] (HBASE-18559) Add histogram to MetricsConnection to track concurrent calls per server

2017-09-29 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-18559.

  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to 1.4 and up

> Add histogram to MetricsConnection to track concurrent calls per server
> ---
>
> Key: HBASE-18559
> URL: https://issues.apache.org/jira/browse/HBASE-18559
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Robert Yokota
>Assignee: Robert Yokota
>Priority: Minor
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0
>
> Attachments: HBASE-18559.master.001.patch
>
>
> HBASE-16388 introduced a new configuration setting 
> "hbase.client.perserver.requests.threshold " to deal with slow region 
> servers.   I have back-ported the code for the new config setting to our 
> environment, but I don't feel comfortable setting it in production without 
> visibility into how the number of concurrent calls per server varies 
> (especially the current high water mark or max in production when the cluster 
> is healthy).  
> It is straightforward to pass the value for the concurrent calls per server 
> to a new histogram in MetricsConnection.  I will attach a patch that I am 
> using to gain a better understanding of how setting 
> "hbase.client.perserver.requests.threshold" will affect our production 
> environment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Stack
Welcome Chia-Ping. Keep up the great work.
S

On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones 
wrote:

> The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
> join
> the HBase PMC, and help to make the project run smoothly. Chia-Ping became
> an
> HBase committer over 6 months ago, based on long-running participate in the
> HBase project, a consistent record of resolving HBase issues, and
> contributions
> to testing and performance.
>
> Thank you for stepping up to serve, Chia-Ping!
>
> As a reminder, if anyone would like to nominate another person as a
> committer or PMC member, even if you are not currently a committer or PMC
> member, you can always drop a note to priv...@hbase.apache.org to let us
> know!
>
> Thanks,
> Misty (on behalf of the HBase PMC)
>


[jira] [Resolved] (HBASE-18436) Add client-side hedged read metrics

2017-09-29 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-18436.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.5.0
   1.4.0
   3.0.0
   2.0.0

Pushed to 1.4 and up

> Add client-side hedged read metrics
> ---
>
> Key: HBASE-18436
> URL: https://issues.apache.org/jira/browse/HBASE-18436
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yun Zhao
>Assignee: Yun Zhao
>Priority: Minor
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0
>
> Attachments: HBASE-18436.master.001.patch
>
>
> Need some metrics to represent indicate read high-availability.
> +hedgedReadOps -- the number of hedged read that have occurred.
> +hedgedReadWin -- the number of hedged read returned faster than the original 
> read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Esteban Gutierrez
Congrats  Chia-Ping! and Welcome!

--
Cloudera, Inc.


On Fri, Sep 29, 2017 at 3:52 PM, Guanghao Zhang  wrote:

> Congratulations!
>
> 2017-09-30 6:38 GMT+08:00 Andrew Purtell :
>
> > Congratulations, Chia-Ping! Welcome to the PMC.
> >
> > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones 
> > wrote:
> >
> > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed
> to
> > > join
> > > the HBase PMC, and help to make the project run smoothly. Chia-Ping
> > became
> > > an
> > > HBase committer over 6 months ago, based on long-running participate in
> > the
> > > HBase project, a consistent record of resolving HBase issues, and
> > > contributions
> > > to testing and performance.
> > >
> > > Thank you for stepping up to serve, Chia-Ping!
> > >
> > > As a reminder, if anyone would like to nominate another person as a
> > > committer or PMC member, even if you are not currently a committer or
> PMC
> > > member, you can always drop a note to priv...@hbase.apache.org to let
> us
> > > know!
> > >
> > > Thanks,
> > > Misty (on behalf of the HBase PMC)
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Huaxiang Sun
Congratulations Chia-Ping!

Huaxiang

> On Sep 29, 2017, at 3:52 PM, Guanghao Zhang  wrote:
> 
> Congratulations!
> 
> 2017-09-30 6:38 GMT+08:00 Andrew Purtell :
> 
>> Congratulations, Chia-Ping! Welcome to the PMC.
>> 
>> On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones 
>> wrote:
>> 
>>> The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
>>> join
>>> the HBase PMC, and help to make the project run smoothly. Chia-Ping
>> became
>>> an
>>> HBase committer over 6 months ago, based on long-running participate in
>> the
>>> HBase project, a consistent record of resolving HBase issues, and
>>> contributions
>>> to testing and performance.
>>> 
>>> Thank you for stepping up to serve, Chia-Ping!
>>> 
>>> As a reminder, if anyone would like to nominate another person as a
>>> committer or PMC member, even if you are not currently a committer or PMC
>>> member, you can always drop a note to priv...@hbase.apache.org to let us
>>> know!
>>> 
>>> Thanks,
>>> Misty (on behalf of the HBase PMC)
>>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> Andrew
>> 
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>   - A23, Crosstalk
>> 



Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Guanghao Zhang
Congratulations!

2017-09-30 6:38 GMT+08:00 Andrew Purtell :

> Congratulations, Chia-Ping! Welcome to the PMC.
>
> On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones 
> wrote:
>
> > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
> > join
> > the HBase PMC, and help to make the project run smoothly. Chia-Ping
> became
> > an
> > HBase committer over 6 months ago, based on long-running participate in
> the
> > HBase project, a consistent record of resolving HBase issues, and
> > contributions
> > to testing and performance.
> >
> > Thank you for stepping up to serve, Chia-Ping!
> >
> > As a reminder, if anyone would like to nominate another person as a
> > committer or PMC member, even if you are not currently a committer or PMC
> > member, you can always drop a note to priv...@hbase.apache.org to let us
> > know!
> >
> > Thanks,
> > Misty (on behalf of the HBase PMC)
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
>


[jira] [Created] (HBASE-18907) Methods missing rpc timeout parameter in HTable

2017-09-29 Thread Ted Yu (JIRA)
Ted Yu created HBASE-18907:
--

 Summary: Methods missing rpc timeout parameter in HTable
 Key: HBASE-18907
 URL: https://issues.apache.org/jira/browse/HBASE-18907
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


When revisiting HBASE-15645, I found that two methods miss the rpcTimeout 
parameter to newCaller() in HTable:
{code}
return rpcCallerFactory. newCaller().callWithRetries(callable, 
this.operationTimeout);
{code}
I checked branch-1.2

Other branch(es) may have the same problem



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Andrew Purtell
Congratulations, Chia-Ping! Welcome to the PMC.

On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones 
wrote:

> The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
> join
> the HBase PMC, and help to make the project run smoothly. Chia-Ping became
> an
> HBase committer over 6 months ago, based on long-running participate in the
> HBase project, a consistent record of resolving HBase issues, and
> contributions
> to testing and performance.
>
> Thank you for stepping up to serve, Chia-Ping!
>
> As a reminder, if anyone would like to nominate another person as a
> committer or PMC member, even if you are not currently a committer or PMC
> member, you can always drop a note to priv...@hbase.apache.org to let us
> know!
>
> Thanks,
> Misty (on behalf of the HBase PMC)
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Wei-Chiu Chuang
My sincere congratulations!

On Fri, Sep 29, 2017 at 3:22 PM, Ted Yu  wrote:

> Congratulations, Chia-Ping.
>
> On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones 
> wrote:
>
> > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
> > join
> > the HBase PMC, and help to make the project run smoothly. Chia-Ping
> became
> > an
> > HBase committer over 6 months ago, based on long-running participate in
> the
> > HBase project, a consistent record of resolving HBase issues, and
> > contributions
> > to testing and performance.
> >
> > Thank you for stepping up to serve, Chia-Ping!
> >
> > As a reminder, if anyone would like to nominate another person as a
> > committer or PMC member, even if you are not currently a committer or PMC
> > member, you can always drop a note to priv...@hbase.apache.org to let us
> > know!
> >
> > Thanks,
> > Misty (on behalf of the HBase PMC)
> >
>



-- 
A very happy Clouderan


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Ted Yu
Congratulations, Chia-Ping.

On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones 
wrote:

> The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
> join
> the HBase PMC, and help to make the project run smoothly. Chia-Ping became
> an
> HBase committer over 6 months ago, based on long-running participate in the
> HBase project, a consistent record of resolving HBase issues, and
> contributions
> to testing and performance.
>
> Thank you for stepping up to serve, Chia-Ping!
>
> As a reminder, if anyone would like to nominate another person as a
> committer or PMC member, even if you are not currently a committer or PMC
> member, you can always drop a note to priv...@hbase.apache.org to let us
> know!
>
> Thanks,
> Misty (on behalf of the HBase PMC)
>


Welcome Chia-Ping Tsai to the HBase PMC

2017-09-29 Thread Misty Stanley-Jones
The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to
join
the HBase PMC, and help to make the project run smoothly. Chia-Ping became
an
HBase committer over 6 months ago, based on long-running participate in the
HBase project, a consistent record of resolving HBase issues, and
contributions
to testing and performance.

Thank you for stepping up to serve, Chia-Ping!

As a reminder, if anyone would like to nominate another person as a
committer or PMC member, even if you are not currently a committer or PMC
member, you can always drop a note to priv...@hbase.apache.org to let us
know!

Thanks,
Misty (on behalf of the HBase PMC)


Re: [DISCUSS] Move Type out of KeyValue

2017-09-29 Thread Andrew Purtell
​Construct a normal put or delete or batch mutation, add whatever extra
state you need in one or more operation attributes, and use a
regionobserver to extend normal processing to handle the extra state. I'm
curious what dispatching to extension code because of a custom cell type
buys you over dispatching to extension code because of the presence of an
attribute (or cell tag). For example, in security coprocessors we take
attribute data and attach it to the cell using cell tags. Later we check
for cell tag(s) to determine if we have to take special action when the
cell is accessed by a scanner, or during some operations (e.g. appends or
increments have to do extra handling for cell security tags).


On Fri, Sep 29, 2017 at 2:43 PM, Chia-Ping Tsai  wrote:

> > Instead of a custom cell, could you use a regular cell with a custom
> > operation attribute (see OperationWithAttributes).
> Pardon me, I didn't get what you said.
>
>
>
> On 2017-09-30 04:31, Andrew Purtell  wrote:
> > Instead of a custom cell, could you use a regular cell with a custom
> > operation attribute (see OperationWithAttributes).
> >
> > On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai 
> wrote:
> >
> > > The custom cell help us to save memory consumption. We don't have own
> > > serialization/deserialization mechanism, hence to transform data from
> > > client to server needs many conversion phase (user data -> Put/Cell ->
> pb
> > > object). The cost of conversion is large in transferring bulk data. In
> > > fact, we also have custom mutation to manage the memory usage of inner
> cell
> > > collection.
> > >
> > > On 2017-09-30 02:43, Andrew Purtell  wrote:
> > > > What are the use cases for a custom cell? It seems a dangerously low
> > > level
> > > > thing to attempt and perhaps we should unwind support for it. But
> perhaps
> > > > there is a compelling justification.
> > > >
> > > >
> > > > On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai <
> chia7...@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for all comment.
> > > > >
> > > > > The problem i want to resolve is the valid code should be exposed
> as
> > > > > IA.Public. Otherwise, end user have to access the IA.Private class
> to
> > > build
> > > > > the custom cell.
> > > > >
> > > > > For example, I have a use case which plays a streaming role in our
> > > > > appliaction. It
> > > > > applies the CellBuilder(HBASE-18519) to build custom cells. These
> cells
> > > > > have many same fields so they are put in shared-memory for
> avoiding GC
> > > > > pause. Everything is wonderful. However, we have to access the
> > > IA.Private
> > > > > class - KeyValue#Type - to get the valid code of Put.
> > > > >
> > > > > I believe there are many use cases of custom cell, and
> consequently it
> > > is
> > > > > worth adding a way to get the valid type via IA.Public class.
> > > Otherwise, it
> > > > > may imply that the custom cell is based on a unstable way, because
> the
> > > > > related code can be changed at any time.
> > > > > --
> > > > > Chia-Ping
> > > > >
> > > > > On 2017-09-29 00:49, Andrew Purtell  wrote:
> > > > > > I agree with Stack. Was typing up a reply to Anoop but let me
> move it
> > > > > down
> > > > > > here.
> > > > > >
> > > > > > The type code exposes some low level details of how our current
> > > stores
> > > > > are
> > > > > > architected. But what if in the future you could swap out HStore
> > > > > implements
> > > > > > Store with PStore implements Store, where HStore is backed by
> HFiles
> > > and
> > > > > > PStore is backed by Parquet? Just as a hypothetical example. I
> know
> > > there
> > > > > > would be larger issues if this were actually attempted. Bear with
> > > me. You
> > > > > > can imagine some different new Store implementation that has some
> > > > > > advantages but is not a design derived from the log structured
> merge
> > > tree
> > > > > > if you like. Most values from a new Cell.Type based on
> KeyValue.Type
> > > > > > wouldn't apply to cells from such a thing because they are
> > > particular to
> > > > > > how LSMs work. I'm sure such a project if attempted would make a
> > > number
> > > > > of
> > > > > > changes requiring a major version increment and low level details
> > > could
> > > > > be
> > > > > > unwound from Cell then, but if we could avoid doing it in the
> first
> > > > > place,
> > > > > > I think it would better for maintainability.
> > > > > >
> > > > > >
> > > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack  wrote:
> > > > > >
> > > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai <
> > > chia7...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > hi folks,
> > > > > > > >
> > > > > > > > User is allowed to create custom cell but the valid code of
> type
> > > -
> > > > > > > > KeyValue#Type - is declared as IA.Private. As i see it, we
> should
> > > > > expose
> > > > > > > > KeyValue#Type as Public Client. Three possible ways are shown
> > > below:
> > > > > > > > 1) Change declaration 

Re: [DISCUSS] Move Type out of KeyValue

2017-09-29 Thread Chia-Ping Tsai
> Instead of a custom cell, could you use a regular cell with a custom
> operation attribute (see OperationWithAttributes).
Pardon me, I didn't get what you said.



On 2017-09-30 04:31, Andrew Purtell  wrote: 
> Instead of a custom cell, could you use a regular cell with a custom
> operation attribute (see OperationWithAttributes).
> 
> On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai  wrote:
> 
> > The custom cell help us to save memory consumption. We don't have own
> > serialization/deserialization mechanism, hence to transform data from
> > client to server needs many conversion phase (user data -> Put/Cell -> pb
> > object). The cost of conversion is large in transferring bulk data. In
> > fact, we also have custom mutation to manage the memory usage of inner cell
> > collection.
> >
> > On 2017-09-30 02:43, Andrew Purtell  wrote:
> > > What are the use cases for a custom cell? It seems a dangerously low
> > level
> > > thing to attempt and perhaps we should unwind support for it. But perhaps
> > > there is a compelling justification.
> > >
> > >
> > > On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai 
> > > wrote:
> > >
> > > > Thanks for all comment.
> > > >
> > > > The problem i want to resolve is the valid code should be exposed as
> > > > IA.Public. Otherwise, end user have to access the IA.Private class to
> > build
> > > > the custom cell.
> > > >
> > > > For example, I have a use case which plays a streaming role in our
> > > > appliaction. It
> > > > applies the CellBuilder(HBASE-18519) to build custom cells. These cells
> > > > have many same fields so they are put in shared-memory for avoiding GC
> > > > pause. Everything is wonderful. However, we have to access the
> > IA.Private
> > > > class - KeyValue#Type - to get the valid code of Put.
> > > >
> > > > I believe there are many use cases of custom cell, and consequently it
> > is
> > > > worth adding a way to get the valid type via IA.Public class.
> > Otherwise, it
> > > > may imply that the custom cell is based on a unstable way, because the
> > > > related code can be changed at any time.
> > > > --
> > > > Chia-Ping
> > > >
> > > > On 2017-09-29 00:49, Andrew Purtell  wrote:
> > > > > I agree with Stack. Was typing up a reply to Anoop but let me move it
> > > > down
> > > > > here.
> > > > >
> > > > > The type code exposes some low level details of how our current
> > stores
> > > > are
> > > > > architected. But what if in the future you could swap out HStore
> > > > implements
> > > > > Store with PStore implements Store, where HStore is backed by HFiles
> > and
> > > > > PStore is backed by Parquet? Just as a hypothetical example. I know
> > there
> > > > > would be larger issues if this were actually attempted. Bear with
> > me. You
> > > > > can imagine some different new Store implementation that has some
> > > > > advantages but is not a design derived from the log structured merge
> > tree
> > > > > if you like. Most values from a new Cell.Type based on KeyValue.Type
> > > > > wouldn't apply to cells from such a thing because they are
> > particular to
> > > > > how LSMs work. I'm sure such a project if attempted would make a
> > number
> > > > of
> > > > > changes requiring a major version increment and low level details
> > could
> > > > be
> > > > > unwound from Cell then, but if we could avoid doing it in the first
> > > > place,
> > > > > I think it would better for maintainability.
> > > > >
> > > > >
> > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack  wrote:
> > > > >
> > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai <
> > chia7...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > hi folks,
> > > > > > >
> > > > > > > User is allowed to create custom cell but the valid code of type
> > -
> > > > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should
> > > > expose
> > > > > > > KeyValue#Type as Public Client. Three possible ways are shown
> > below:
> > > > > > > 1) Change declaration of KeyValue#Type from IA.Private to
> > IA.Public
> > > > > > > 2) Move KeyValue#Type into Cell.
> > > > > > > 3) Move KeyValue#Type to upper level
> > > > > > >
> > > > > > > Any suggestions?
> > > > > > >
> > > > > > >
> > > > > > What is the problem that we are trying to solve Chia-Ping? You
> > want to
> > > > make
> > > > > > Cells of a new Type?
> > > > > >
> > > > > > My first reaction is that KV#Type is particular to the KV
> > > > implementation.
> > > > > > Any new Cell implementation should not have to adopt the KeyValue
> > > > typing
> > > > > > mechanism.
> > > > > >
> > > > > > S
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > --
> > > > > > > Chia-Ping
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew
> > > > >
> > > > > Words like orphans lost among the crosstalk, meaning torn from
> > truth's
> > > > > decrepit hands
> > > > >- A23, Crosstalk
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best r

Re: [DISCUSS] Move Type out of KeyValue

2017-09-29 Thread Andrew Purtell
Instead of a custom cell, could you use a regular cell with a custom
operation attribute (see OperationWithAttributes).

On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai  wrote:

> The custom cell help us to save memory consumption. We don't have own
> serialization/deserialization mechanism, hence to transform data from
> client to server needs many conversion phase (user data -> Put/Cell -> pb
> object). The cost of conversion is large in transferring bulk data. In
> fact, we also have custom mutation to manage the memory usage of inner cell
> collection.
>
> On 2017-09-30 02:43, Andrew Purtell  wrote:
> > What are the use cases for a custom cell? It seems a dangerously low
> level
> > thing to attempt and perhaps we should unwind support for it. But perhaps
> > there is a compelling justification.
> >
> >
> > On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai 
> > wrote:
> >
> > > Thanks for all comment.
> > >
> > > The problem i want to resolve is the valid code should be exposed as
> > > IA.Public. Otherwise, end user have to access the IA.Private class to
> build
> > > the custom cell.
> > >
> > > For example, I have a use case which plays a streaming role in our
> > > appliaction. It
> > > applies the CellBuilder(HBASE-18519) to build custom cells. These cells
> > > have many same fields so they are put in shared-memory for avoiding GC
> > > pause. Everything is wonderful. However, we have to access the
> IA.Private
> > > class - KeyValue#Type - to get the valid code of Put.
> > >
> > > I believe there are many use cases of custom cell, and consequently it
> is
> > > worth adding a way to get the valid type via IA.Public class.
> Otherwise, it
> > > may imply that the custom cell is based on a unstable way, because the
> > > related code can be changed at any time.
> > > --
> > > Chia-Ping
> > >
> > > On 2017-09-29 00:49, Andrew Purtell  wrote:
> > > > I agree with Stack. Was typing up a reply to Anoop but let me move it
> > > down
> > > > here.
> > > >
> > > > The type code exposes some low level details of how our current
> stores
> > > are
> > > > architected. But what if in the future you could swap out HStore
> > > implements
> > > > Store with PStore implements Store, where HStore is backed by HFiles
> and
> > > > PStore is backed by Parquet? Just as a hypothetical example. I know
> there
> > > > would be larger issues if this were actually attempted. Bear with
> me. You
> > > > can imagine some different new Store implementation that has some
> > > > advantages but is not a design derived from the log structured merge
> tree
> > > > if you like. Most values from a new Cell.Type based on KeyValue.Type
> > > > wouldn't apply to cells from such a thing because they are
> particular to
> > > > how LSMs work. I'm sure such a project if attempted would make a
> number
> > > of
> > > > changes requiring a major version increment and low level details
> could
> > > be
> > > > unwound from Cell then, but if we could avoid doing it in the first
> > > place,
> > > > I think it would better for maintainability.
> > > >
> > > >
> > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack  wrote:
> > > >
> > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai <
> chia7...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > hi folks,
> > > > > >
> > > > > > User is allowed to create custom cell but the valid code of type
> -
> > > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should
> > > expose
> > > > > > KeyValue#Type as Public Client. Three possible ways are shown
> below:
> > > > > > 1) Change declaration of KeyValue#Type from IA.Private to
> IA.Public
> > > > > > 2) Move KeyValue#Type into Cell.
> > > > > > 3) Move KeyValue#Type to upper level
> > > > > >
> > > > > > Any suggestions?
> > > > > >
> > > > > >
> > > > > What is the problem that we are trying to solve Chia-Ping? You
> want to
> > > make
> > > > > Cells of a new Type?
> > > > >
> > > > > My first reaction is that KV#Type is particular to the KV
> > > implementation.
> > > > > Any new Cell implementation should not have to adopt the KeyValue
> > > typing
> > > > > mechanism.
> > > > >
> > > > > S
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > --
> > > > > > Chia-Ping
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >- A23, Crosstalk
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [DISCUSS] Move Type out of KeyValue

2017-09-29 Thread Chia-Ping Tsai
The custom cell help us to save memory consumption. We don't have own 
serialization/deserialization mechanism, hence to transform data from client to 
server needs many conversion phase (user data -> Put/Cell -> pb object). The 
cost of conversion is large in transferring bulk data. In fact, we also have 
custom mutation to manage the memory usage of inner cell collection.

On 2017-09-30 02:43, Andrew Purtell  wrote: 
> What are the use cases for a custom cell? It seems a dangerously low level
> thing to attempt and perhaps we should unwind support for it. But perhaps
> there is a compelling justification.
> 
> 
> On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai 
> wrote:
> 
> > Thanks for all comment.
> >
> > The problem i want to resolve is the valid code should be exposed as
> > IA.Public. Otherwise, end user have to access the IA.Private class to build
> > the custom cell.
> >
> > For example, I have a use case which plays a streaming role in our
> > appliaction. It
> > applies the CellBuilder(HBASE-18519) to build custom cells. These cells
> > have many same fields so they are put in shared-memory for avoiding GC
> > pause. Everything is wonderful. However, we have to access the IA.Private
> > class - KeyValue#Type - to get the valid code of Put.
> >
> > I believe there are many use cases of custom cell, and consequently it is
> > worth adding a way to get the valid type via IA.Public class. Otherwise, it
> > may imply that the custom cell is based on a unstable way, because the
> > related code can be changed at any time.
> > --
> > Chia-Ping
> >
> > On 2017-09-29 00:49, Andrew Purtell  wrote:
> > > I agree with Stack. Was typing up a reply to Anoop but let me move it
> > down
> > > here.
> > >
> > > The type code exposes some low level details of how our current stores
> > are
> > > architected. But what if in the future you could swap out HStore
> > implements
> > > Store with PStore implements Store, where HStore is backed by HFiles and
> > > PStore is backed by Parquet? Just as a hypothetical example. I know there
> > > would be larger issues if this were actually attempted. Bear with me. You
> > > can imagine some different new Store implementation that has some
> > > advantages but is not a design derived from the log structured merge tree
> > > if you like. Most values from a new Cell.Type based on KeyValue.Type
> > > wouldn't apply to cells from such a thing because they are particular to
> > > how LSMs work. I'm sure such a project if attempted would make a number
> > of
> > > changes requiring a major version increment and low level details could
> > be
> > > unwound from Cell then, but if we could avoid doing it in the first
> > place,
> > > I think it would better for maintainability.
> > >
> > >
> > > On Thu, Sep 28, 2017 at 9:39 AM, Stack  wrote:
> > >
> > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai 
> > > > wrote:
> > > >
> > > > > hi folks,
> > > > >
> > > > > User is allowed to create custom cell but the valid code of type -
> > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should
> > expose
> > > > > KeyValue#Type as Public Client. Three possible ways are shown below:
> > > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public
> > > > > 2) Move KeyValue#Type into Cell.
> > > > > 3) Move KeyValue#Type to upper level
> > > > >
> > > > > Any suggestions?
> > > > >
> > > > >
> > > > What is the problem that we are trying to solve Chia-Ping? You want to
> > make
> > > > Cells of a new Type?
> > > >
> > > > My first reaction is that KV#Type is particular to the KV
> > implementation.
> > > > Any new Cell implementation should not have to adopt the KeyValue
> > typing
> > > > mechanism.
> > > >
> > > > S
> > > >
> > > >
> > > >
> > > >
> > > > > --
> > > > > Chia-Ping
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >- A23, Crosstalk
> > >
> >
> 
> 
> 
> -- 
> Best regards,
> Andrew
> 
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
> 


[DISCUSS] deprecating o.a.h.h.regionserver.RowProcessor

2017-09-29 Thread Umesh Agashe
Hi,

Currently Region.processRowsWithLocks() API takes
o.a.h.h.regionserver.RowProcessor as an argument and only implementation of
this class is MultiRowMutationProcessor. This implementation is internal
and used from HRegion.mutateRows...() methods.

HRegion.processRowsWithLocks() implementation, doesn't call coprocessor
hooks but instead calls RowProcessor hooks at appropriate point in
execution. Many of these hooks/ methods have same names and are called at
similar points during the course of execution but they are not related!

HRegion.batchMutate() methods call coprocessor hooks but not row
RowProcessor hooks.

Internal implementation MultiRowMutationProcessor, call coprocessor hooks
from inside it's own methods/ hooks. But this can not be expected of all
implementations for RowProcessors.

In case of HRegion.batchMutate...() methods, CP mutations are merged with
input mutations and these merged mutations are applied to WALEdit fetched
from CPs.

In case of processRowsWithLocks(), mutations are fetched from RowProcessor
instance and are applied on WALEdit built by RowProcessor.

The major inconsistency here is, one code path uses coprocessors while
other uses RowProcessor. There are other minor inconsistencies along those
two code paths.

Proposed fix:

* Unify two code paths.
* Deprecate RowProcessor and API Region.processRowsWithLocks() that takes
RowProcessor as an argument.
* Provide alternate API that doesn't take RowProcessor.
* Modify batchMutate...() to take additional arguments: rowsToLock
(byte[][]) and atomic/ allOrNone (boolean).
* Remove MultiRowMutationProcessor. Make HRegion.mutateRows() methods to
use batchMutate().
* Make new implementation of Region.processRowsWithLocks() which doesn't
take RowProcessor as an argument use batchMutate().

Suggestion is that coprocessors can be used to do things RowProcessors are
doing.

Related JIRAs: HBASE-18703, HBASE-18183

Let me know your thoughts.

Thanks,
Umesh


Re: [DISCUSS] Move Type out of KeyValue

2017-09-29 Thread Andrew Purtell
What are the use cases for a custom cell? It seems a dangerously low level
thing to attempt and perhaps we should unwind support for it. But perhaps
there is a compelling justification.


On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai 
wrote:

> Thanks for all comment.
>
> The problem i want to resolve is the valid code should be exposed as
> IA.Public. Otherwise, end user have to access the IA.Private class to build
> the custom cell.
>
> For example, I have a use case which plays a streaming role in our
> appliaction. It
> applies the CellBuilder(HBASE-18519) to build custom cells. These cells
> have many same fields so they are put in shared-memory for avoiding GC
> pause. Everything is wonderful. However, we have to access the IA.Private
> class - KeyValue#Type - to get the valid code of Put.
>
> I believe there are many use cases of custom cell, and consequently it is
> worth adding a way to get the valid type via IA.Public class. Otherwise, it
> may imply that the custom cell is based on a unstable way, because the
> related code can be changed at any time.
> --
> Chia-Ping
>
> On 2017-09-29 00:49, Andrew Purtell  wrote:
> > I agree with Stack. Was typing up a reply to Anoop but let me move it
> down
> > here.
> >
> > The type code exposes some low level details of how our current stores
> are
> > architected. But what if in the future you could swap out HStore
> implements
> > Store with PStore implements Store, where HStore is backed by HFiles and
> > PStore is backed by Parquet? Just as a hypothetical example. I know there
> > would be larger issues if this were actually attempted. Bear with me. You
> > can imagine some different new Store implementation that has some
> > advantages but is not a design derived from the log structured merge tree
> > if you like. Most values from a new Cell.Type based on KeyValue.Type
> > wouldn't apply to cells from such a thing because they are particular to
> > how LSMs work. I'm sure such a project if attempted would make a number
> of
> > changes requiring a major version increment and low level details could
> be
> > unwound from Cell then, but if we could avoid doing it in the first
> place,
> > I think it would better for maintainability.
> >
> >
> > On Thu, Sep 28, 2017 at 9:39 AM, Stack  wrote:
> >
> > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai 
> > > wrote:
> > >
> > > > hi folks,
> > > >
> > > > User is allowed to create custom cell but the valid code of type -
> > > > KeyValue#Type - is declared as IA.Private. As i see it, we should
> expose
> > > > KeyValue#Type as Public Client. Three possible ways are shown below:
> > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public
> > > > 2) Move KeyValue#Type into Cell.
> > > > 3) Move KeyValue#Type to upper level
> > > >
> > > > Any suggestions?
> > > >
> > > >
> > > What is the problem that we are trying to solve Chia-Ping? You want to
> make
> > > Cells of a new Type?
> > >
> > > My first reaction is that KV#Type is particular to the KV
> implementation.
> > > Any new Cell implementation should not have to adopt the KeyValue
> typing
> > > mechanism.
> > >
> > > S
> > >
> > >
> > >
> > >
> > > > --
> > > > Chia-Ping
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [DISCUSS] Becoming a Committer

2017-09-29 Thread Andrew Purtell
This conversation is in a good place. I apologize for the tone of my
earlier allergic reaction but not the content. I hope that is acceptable.


On Fri, Sep 29, 2017 at 11:01 AM, Mike Drob  wrote:

> To bounce off of what Yu Li said earlier - I see Hadoop has adopted very
> similar language to the Spark list:
> http://hadoop.apache.org/committer_criteria.html
>
> I especially like the examples at the bottom. They are four diverse paths,
> and there is no expectation that this is an exclusive list. If we were to
> write our own, I think it should reflect Andrew's highlighting of the
> non-professional contributor's path. And also important is to include the
> soft skills from Misty's list.
>
> Mike
>
> On Sat, Sep 23, 2017 at 5:07 PM, Stack  wrote:
>
> > Good discussion. Thanks Mike for kicking it off.
> >
> > The Misty list is great.
> >
> > I find myself giving double kudos for non-code or feature contribs; e.g.
> > stuff like test-fixing, patches that fix bugs found in production or
> > patches from operators that ease their day-to-day burden, voting on
> > releases, doc., (useful, encouraging, deep, helpful) review of the work
> of
> > others, etc. (I'd love it if someone took ownership of our website --
> hint,
> > hint).
> >
> > Sean has a dictum, paraphrasing, "...the fastest route to commitership is
> > doing what no one else wants to do" (Did I mangle that Busbey?), which I
> > like.
> >
> > While Andrew may have misjudged Mike Drob's original intent, I appreciate
> > his rallying to the cause of the non-professional contributor and his
> > reaction to (mis-perceived) call for quantification (For a classic on the
> > problems that arise when hard-and-fast rules, see [2]). I'm with him
> > defending PMC right to give 'spirit' and 'gut' precedence over 'rules'
> > (Often, it *is* just a case of you know it when you see it). And as per
> > Andy, if perceived injustice or bias, please write here or private@hbase.
> >
> > Lets keep dumping on this thread. We can then summarize and make it easy
> > for prospectives to find (can also add links to stuff such as the recent
> > Wang+Leblang talk at ApacheCon [1] and Andrew's write up for how to be a
> > committer on Hadoop as background).
> >
> > Thanks,
> > St.Ack
> >
> > 1.
> > https://apachecon2017.sched.com/event/9zv3/a-tale-of-two-
> > developers-finding-harmony-between-commercial-software-
> > development-and-the-apache-way-andrew-wang-alex-leblang-cloudera
> > 2.
> > https://books.google.com/books/about/Seeing_Like_a_
> > State.html?id=PqcPCgsr2u0C
> >
> >
> >
> >
> >
> > On Fri, Sep 22, 2017 at 4:08 PM, Zach York  >
> > wrote:
> >
> > > bq. As a
> > > relatively new member in the HBase community and a non-committer, once
> > the
> > > new member decides that he/ she wants to become a Committer, it will be
> > > helpful to have a list of PMC members that he/ she can communicate with
> > and
> > > get feedback from time to time. Feedback may include potential
> > adjustments
> > > and rough idea about progress towards the goal.
> > >
> > > This sounds like a good idea! Ideally, if you interact with the
> community
> > > often enough, you should be building connections, but it nevers hurts
> to
> > > have someone to check how they perceive your work.
> > >
> > > bq. For others, having
> > > this list of volunteer mentors, will surely help.
> > >
> > > Again I agree. This part is especially important as it is hard to judge
> > > your progress if you don't have someone at the same company to converse
> > > with.
> > >
> > > On Fri, Sep 22, 2017 at 3:38 PM, Umesh Agashe 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Thank you all for a good discussion here. Issues with both having and
> > NOT
> > > > having documented specific criteria are well articulated here. As a
> > > > relatively new member in the HBase community and a non-committer,
> once
> > > the
> > > > new member decides that he/ she wants to become a Committer, it will
> be
> > > > helpful to have a list of PMC members that he/ she can communicate
> with
> > > and
> > > > get feedback from time to time. Feedback may include potential
> > > adjustments
> > > > and rough idea about progress towards the goal. Paid professionals
> who
> > > are
> > > > working with PMC members, can talk to their colleagues. For others,
> > > having
> > > > this list of volunteer mentors, will surely help. IMHO, this will
> make
> > > > process a bit more transparent. I would like to know your thoughts on
> > > this.
> > > >
> > > > Thanks,
> > > > Umesh
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Sep 21, 2017 at 1:41 PM, Misty Stanley-Jones <
> mi...@apache.org
> > >
> > > > wrote:
> > > >
> > > > > I feel like I inject this note into all discussions like this, but
> > I'm
> > > > > going to do it again. "Act like a committer" does not ONLY mean to
> > > > produce
> > > > > code for HBase. It means to support the project. This may mean any
> of
> > > the
> > > > > following, plus a long list of other things I

Re: [DISCUSS] Becoming a Committer

2017-09-29 Thread Mike Drob
To bounce off of what Yu Li said earlier - I see Hadoop has adopted very
similar language to the Spark list:
http://hadoop.apache.org/committer_criteria.html

I especially like the examples at the bottom. They are four diverse paths,
and there is no expectation that this is an exclusive list. If we were to
write our own, I think it should reflect Andrew's highlighting of the
non-professional contributor's path. And also important is to include the
soft skills from Misty's list.

Mike

On Sat, Sep 23, 2017 at 5:07 PM, Stack  wrote:

> Good discussion. Thanks Mike for kicking it off.
>
> The Misty list is great.
>
> I find myself giving double kudos for non-code or feature contribs; e.g.
> stuff like test-fixing, patches that fix bugs found in production or
> patches from operators that ease their day-to-day burden, voting on
> releases, doc., (useful, encouraging, deep, helpful) review of the work of
> others, etc. (I'd love it if someone took ownership of our website -- hint,
> hint).
>
> Sean has a dictum, paraphrasing, "...the fastest route to commitership is
> doing what no one else wants to do" (Did I mangle that Busbey?), which I
> like.
>
> While Andrew may have misjudged Mike Drob's original intent, I appreciate
> his rallying to the cause of the non-professional contributor and his
> reaction to (mis-perceived) call for quantification (For a classic on the
> problems that arise when hard-and-fast rules, see [2]). I'm with him
> defending PMC right to give 'spirit' and 'gut' precedence over 'rules'
> (Often, it *is* just a case of you know it when you see it). And as per
> Andy, if perceived injustice or bias, please write here or private@hbase.
>
> Lets keep dumping on this thread. We can then summarize and make it easy
> for prospectives to find (can also add links to stuff such as the recent
> Wang+Leblang talk at ApacheCon [1] and Andrew's write up for how to be a
> committer on Hadoop as background).
>
> Thanks,
> St.Ack
>
> 1.
> https://apachecon2017.sched.com/event/9zv3/a-tale-of-two-
> developers-finding-harmony-between-commercial-software-
> development-and-the-apache-way-andrew-wang-alex-leblang-cloudera
> 2.
> https://books.google.com/books/about/Seeing_Like_a_
> State.html?id=PqcPCgsr2u0C
>
>
>
>
>
> On Fri, Sep 22, 2017 at 4:08 PM, Zach York 
> wrote:
>
> > bq. As a
> > relatively new member in the HBase community and a non-committer, once
> the
> > new member decides that he/ she wants to become a Committer, it will be
> > helpful to have a list of PMC members that he/ she can communicate with
> and
> > get feedback from time to time. Feedback may include potential
> adjustments
> > and rough idea about progress towards the goal.
> >
> > This sounds like a good idea! Ideally, if you interact with the community
> > often enough, you should be building connections, but it nevers hurts to
> > have someone to check how they perceive your work.
> >
> > bq. For others, having
> > this list of volunteer mentors, will surely help.
> >
> > Again I agree. This part is especially important as it is hard to judge
> > your progress if you don't have someone at the same company to converse
> > with.
> >
> > On Fri, Sep 22, 2017 at 3:38 PM, Umesh Agashe 
> > wrote:
> >
> > > Hi,
> > >
> > > Thank you all for a good discussion here. Issues with both having and
> NOT
> > > having documented specific criteria are well articulated here. As a
> > > relatively new member in the HBase community and a non-committer, once
> > the
> > > new member decides that he/ she wants to become a Committer, it will be
> > > helpful to have a list of PMC members that he/ she can communicate with
> > and
> > > get feedback from time to time. Feedback may include potential
> > adjustments
> > > and rough idea about progress towards the goal. Paid professionals who
> > are
> > > working with PMC members, can talk to their colleagues. For others,
> > having
> > > this list of volunteer mentors, will surely help. IMHO, this will make
> > > process a bit more transparent. I would like to know your thoughts on
> > this.
> > >
> > > Thanks,
> > > Umesh
> > >
> > >
> > >
> > >
> > > On Thu, Sep 21, 2017 at 1:41 PM, Misty Stanley-Jones  >
> > > wrote:
> > >
> > > > I feel like I inject this note into all discussions like this, but
> I'm
> > > > going to do it again. "Act like a committer" does not ONLY mean to
> > > produce
> > > > code for HBase. It means to support the project. This may mean any of
> > the
> > > > following, plus a long list of other things I'm sure I'm not thinking
> > of
> > > > right now:
> > > >
> > > > - Contribute to the docs (yay!)
> > > > - Help fix and improve testing
> > > > - Participate in release candidate votes, even if non-binding
> > > > - Review other people's work
> > > > - Help newbies
> > > > - Answer questions
> > > > - Update the website
> > > > - File issues
> > > > - Mentor new contributors of all sorts
> > > > - Give talks about HBase
> > > > - Write blogs about HBase
> > > > - Participat

Re: Performance issue in the Join query on the HBase tables

2017-09-29 Thread Nick Dimiduk
Have you considered running Hive/Spark over snapshots of your HBase tables?

If you're seeing network saturation over HBase but not hdfs, makes me think
data locality is not being honored. Might be worth investigating as well.

On Fri, Sep 29, 2017 at 3:26 AM wenxing zheng 
wrote:

> Dear all,
>
> I have 3 big HBase tables, which all have millions of rows(rows are synced
> from MySQL DB via Bin log) and for each HBase table, we have an external
> table on Hive correspondingly with the storage by
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is that
> we can always keep sync up with the production DB and provides random
> access by key.
>
> Now our business needs to do some analysis on those tables with Join query.
> What's the best practice to make it?
>
> From my experiment, I found that with the Spark SQL on HBase or Hive, the
> job ran very slowly and will saturate the network bandwidth. But it works
> very well for the Hive SQL directly against Hive from HDFS files(make a
> copy of the data to HDFS files).
>
> Appreciated for any advice on what would be the problem here? and the way
> to optimize the job.
> Regards, Wenxing
>


RE: Performance issue in the Join query on the HBase tables

2017-09-29 Thread Eric Owhadi
Hi Wenxing,
From the use case you describe, you may want to take a look at Trafodion or 
EsgynDB (commercial version of Trafodion).
http://trafodion.incubator.apache.org/
Trafodion uses a very mature SQL engine on top of HBASE/HIVE coming with 20 
years of IP given away to open source by Hewlett-packard 2 years ago.
Support many different JOIN types (hash join, nested joins, merge joins) with 
optimized overflow to disk mechanisms over an optimized pipelined architecture, 
full indexing capabilities, and an optimized row format that will make your 
hbase table a lot faster than it is when using one cell per column.
From a SQL capability standpoint for analytics queries, Trafodion can run full 
TPCDS 99 queries.
Hope this helps,
Eric




-Original Message-
From: wenxing zheng [mailto:wenxing.zh...@gmail.com] 
Sent: Friday, September 29, 2017 7:24 AM
To: dev@hbase.apache.org
Subject: Re: Performance issue in the Join query on the HBase tables

Thanks to Ted.

We didn't try the phoneix yet. From the performance test on the official site 
of phoenix, I didn't find the report on the Join query. Not sure whether it's 
much better or not

On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu  wrote:

> Have you looked at Phoenix ?
>
> https://phoenix.apache.org/joins.html
>
> On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng 
> 
> wrote:
>
> > Dear all,
> >
> > I have 3 big HBase tables, which all have millions of rows(rows are
> synced
> > from MySQL DB via Bin log) and for each HBase table, we have an 
> > external table on Hive correspondingly with the storage by 
> > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is
> that
> > we can always keep sync up with the production DB and provides 
> > random access by key.
> >
> > Now our business needs to do some analysis on those tables with Join
> query.
> > What's the best practice to make it?
> >
> > From my experiment, I found that with the Spark SQL on HBase or 
> > Hive, the job ran very slowly and will saturate the network 
> > bandwidth. But it works very well for the Hive SQL directly against 
> > Hive from HDFS files(make a copy of the data to HDFS files).
> >
> > Appreciated for any advice on what would be the problem here? and 
> > the way to optimize the job.
> > Regards, Wenxing
> >
>


Re: Performance issue in the Join query on the HBase tables

2017-09-29 Thread wenxing zheng
Thanks to Ted.

We didn't try the phoneix yet. From the performance test on the official
site of phoenix, I didn't find the report on the Join query. Not sure
whether it's much better or not

On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu  wrote:

> Have you looked at Phoenix ?
>
> https://phoenix.apache.org/joins.html
>
> On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng 
> wrote:
>
> > Dear all,
> >
> > I have 3 big HBase tables, which all have millions of rows(rows are
> synced
> > from MySQL DB via Bin log) and for each HBase table, we have an external
> > table on Hive correspondingly with the storage by
> > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is
> that
> > we can always keep sync up with the production DB and provides random
> > access by key.
> >
> > Now our business needs to do some analysis on those tables with Join
> query.
> > What's the best practice to make it?
> >
> > From my experiment, I found that with the Spark SQL on HBase or Hive, the
> > job ran very slowly and will saturate the network bandwidth. But it works
> > very well for the Hive SQL directly against Hive from HDFS files(make a
> > copy of the data to HDFS files).
> >
> > Appreciated for any advice on what would be the problem here? and the way
> > to optimize the job.
> > Regards, Wenxing
> >
>


Re: Performance issue in the Join query on the HBase tables

2017-09-29 Thread Ted Yu
Have you looked at Phoenix ?

https://phoenix.apache.org/joins.html

On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng 
wrote:

> Dear all,
>
> I have 3 big HBase tables, which all have millions of rows(rows are synced
> from MySQL DB via Bin log) and for each HBase table, we have an external
> table on Hive correspondingly with the storage by
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is that
> we can always keep sync up with the production DB and provides random
> access by key.
>
> Now our business needs to do some analysis on those tables with Join query.
> What's the best practice to make it?
>
> From my experiment, I found that with the Spark SQL on HBase or Hive, the
> job ran very slowly and will saturate the network bandwidth. But it works
> very well for the Hive SQL directly against Hive from HDFS files(make a
> copy of the data to HDFS files).
>
> Appreciated for any advice on what would be the problem here? and the way
> to optimize the job.
> Regards, Wenxing
>


Re: [DISCUSS] Move Type out of KeyValue

2017-09-29 Thread Anoop John
Ya as Chia-Ping said, the problem he is trying to solve is very basic
one. As long as we allow custom Cell creation (Via CellBuilder API)
and allow Mutations to be added with Cells and pass that from client
side APIs, we have to make the Type public accessible.
Or else the Cell building APIs should not be taking in a type byte.
We have to some way allow user to make put/delete cells stc.

Is type that bound for only KV?   We have getType in Cell also right?
The type in full form what we have in KV now, may be making us confuse
here?  As Ram said it contains some internal types also which the user
has never to know abt.   Pls correct if saying in wrong way.

Good that Chia-Ping brought this out here.   We have to either way
solve it and make the public API fully public.

-Anoop-

On Fri, Sep 29, 2017 at 2:27 PM, ramkrishna vasudevan
 wrote:
> Even if we are trying to move out I think only few of the types are really
> user readable. So we should be very careful here. So since we have
> CellBuilder way it is better we check what type of cells a user can build.
> I think for now the Cellbuilder is not client exposed?
> But again moving to Cell means it becomes public which is not right IMO and
> I thinks others here also agree to it.
>
> Regards
> Ram
>
> On Fri, Sep 29, 2017 at 10:50 AM, Chia-Ping Tsai 
> wrote:
>
>> Thanks for all comment.
>>
>> The problem i want to resolve is the valid code should be exposed as
>> IA.Public. Otherwise, end user have to access the IA.Private class to build
>> the custom cell.
>>
>> For example, I have a use case which plays a streaming role in our
>> appliaction. It
>> applies the CellBuilder(HBASE-18519) to build custom cells. These cells
>> have many same fields so they are put in shared-memory for avoiding GC
>> pause. Everything is wonderful. However, we have to access the IA.Private
>> class - KeyValue#Type - to get the valid code of Put.
>>
>> I believe there are many use cases of custom cell, and consequently it is
>> worth adding a way to get the valid type via IA.Public class. Otherwise, it
>> may imply that the custom cell is based on a unstable way, because the
>> related code can be changed at any time.
>> --
>> Chia-Ping
>>
>> On 2017-09-29 00:49, Andrew Purtell  wrote:
>> > I agree with Stack. Was typing up a reply to Anoop but let me move it
>> down
>> > here.
>> >
>> > The type code exposes some low level details of how our current stores
>> are
>> > architected. But what if in the future you could swap out HStore
>> implements
>> > Store with PStore implements Store, where HStore is backed by HFiles and
>> > PStore is backed by Parquet? Just as a hypothetical example. I know there
>> > would be larger issues if this were actually attempted. Bear with me. You
>> > can imagine some different new Store implementation that has some
>> > advantages but is not a design derived from the log structured merge tree
>> > if you like. Most values from a new Cell.Type based on KeyValue.Type
>> > wouldn't apply to cells from such a thing because they are particular to
>> > how LSMs work. I'm sure such a project if attempted would make a number
>> of
>> > changes requiring a major version increment and low level details could
>> be
>> > unwound from Cell then, but if we could avoid doing it in the first
>> place,
>> > I think it would better for maintainability.
>> >
>> >
>> > On Thu, Sep 28, 2017 at 9:39 AM, Stack  wrote:
>> >
>> > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai 
>> > > wrote:
>> > >
>> > > > hi folks,
>> > > >
>> > > > User is allowed to create custom cell but the valid code of type -
>> > > > KeyValue#Type - is declared as IA.Private. As i see it, we should
>> expose
>> > > > KeyValue#Type as Public Client. Three possible ways are shown below:
>> > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public
>> > > > 2) Move KeyValue#Type into Cell.
>> > > > 3) Move KeyValue#Type to upper level
>> > > >
>> > > > Any suggestions?
>> > > >
>> > > >
>> > > What is the problem that we are trying to solve Chia-Ping? You want to
>> make
>> > > Cells of a new Type?
>> > >
>> > > My first reaction is that KV#Type is particular to the KV
>> implementation.
>> > > Any new Cell implementation should not have to adopt the KeyValue
>> typing
>> > > mechanism.
>> > >
>> > > S
>> > >
>> > >
>> > >
>> > >
>> > > > --
>> > > > Chia-Ping
>> > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Andrew
>> >
>> > Words like orphans lost among the crosstalk, meaning torn from truth's
>> > decrepit hands
>> >- A23, Crosstalk
>> >
>>


Performance issue in the Join query on the HBase tables

2017-09-29 Thread wenxing zheng
Dear all,

I have 3 big HBase tables, which all have millions of rows(rows are synced
from MySQL DB via Bin log) and for each HBase table, we have an external
table on Hive correspondingly with the storage by
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is that
we can always keep sync up with the production DB and provides random
access by key.

Now our business needs to do some analysis on those tables with Join query.
What's the best practice to make it?

>From my experiment, I found that with the Spark SQL on HBase or Hive, the
job ran very slowly and will saturate the network bandwidth. But it works
very well for the Hive SQL directly against Hive from HDFS files(make a
copy of the data to HDFS files).

Appreciated for any advice on what would be the problem here? and the way
to optimize the job.
Regards, Wenxing


Re: [DISCUSS] Move Type out of KeyValue

2017-09-29 Thread ramkrishna vasudevan
Even if we are trying to move out I think only few of the types are really
user readable. So we should be very careful here. So since we have
CellBuilder way it is better we check what type of cells a user can build.
I think for now the Cellbuilder is not client exposed?
But again moving to Cell means it becomes public which is not right IMO and
I thinks others here also agree to it.

Regards
Ram

On Fri, Sep 29, 2017 at 10:50 AM, Chia-Ping Tsai 
wrote:

> Thanks for all comment.
>
> The problem i want to resolve is the valid code should be exposed as
> IA.Public. Otherwise, end user have to access the IA.Private class to build
> the custom cell.
>
> For example, I have a use case which plays a streaming role in our
> appliaction. It
> applies the CellBuilder(HBASE-18519) to build custom cells. These cells
> have many same fields so they are put in shared-memory for avoiding GC
> pause. Everything is wonderful. However, we have to access the IA.Private
> class - KeyValue#Type - to get the valid code of Put.
>
> I believe there are many use cases of custom cell, and consequently it is
> worth adding a way to get the valid type via IA.Public class. Otherwise, it
> may imply that the custom cell is based on a unstable way, because the
> related code can be changed at any time.
> --
> Chia-Ping
>
> On 2017-09-29 00:49, Andrew Purtell  wrote:
> > I agree with Stack. Was typing up a reply to Anoop but let me move it
> down
> > here.
> >
> > The type code exposes some low level details of how our current stores
> are
> > architected. But what if in the future you could swap out HStore
> implements
> > Store with PStore implements Store, where HStore is backed by HFiles and
> > PStore is backed by Parquet? Just as a hypothetical example. I know there
> > would be larger issues if this were actually attempted. Bear with me. You
> > can imagine some different new Store implementation that has some
> > advantages but is not a design derived from the log structured merge tree
> > if you like. Most values from a new Cell.Type based on KeyValue.Type
> > wouldn't apply to cells from such a thing because they are particular to
> > how LSMs work. I'm sure such a project if attempted would make a number
> of
> > changes requiring a major version increment and low level details could
> be
> > unwound from Cell then, but if we could avoid doing it in the first
> place,
> > I think it would better for maintainability.
> >
> >
> > On Thu, Sep 28, 2017 at 9:39 AM, Stack  wrote:
> >
> > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai 
> > > wrote:
> > >
> > > > hi folks,
> > > >
> > > > User is allowed to create custom cell but the valid code of type -
> > > > KeyValue#Type - is declared as IA.Private. As i see it, we should
> expose
> > > > KeyValue#Type as Public Client. Three possible ways are shown below:
> > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public
> > > > 2) Move KeyValue#Type into Cell.
> > > > 3) Move KeyValue#Type to upper level
> > > >
> > > > Any suggestions?
> > > >
> > > >
> > > What is the problem that we are trying to solve Chia-Ping? You want to
> make
> > > Cells of a new Type?
> > >
> > > My first reaction is that KV#Type is particular to the KV
> implementation.
> > > Any new Cell implementation should not have to adopt the KeyValue
> typing
> > > mechanism.
> > >
> > > S
> > >
> > >
> > >
> > >
> > > > --
> > > > Chia-Ping
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>