Re: Welcome Chia-Ping Tsai to the HBase PMC
Congratulations, Chia-Ping. On Sat, Sep 30, 2017 at 3:49 AM, Misty Stanley-Jones wrote: > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to > join > the HBase PMC, and help to make the project run smoothly. Chia-Ping became > an > HBase committer over 6 months ago, based on long-running participate in the > HBase project, a consistent record of resolving HBase issues, and > contributions > to testing and performance. > > Thank you for stepping up to serve, Chia-Ping! > > As a reminder, if anyone would like to nominate another person as a > committer or PMC member, even if you are not currently a committer or PMC > member, you can always drop a note to priv...@hbase.apache.org to let us > know! > > Thanks, > Misty (on behalf of the HBase PMC) >
[jira] [Created] (HBASE-18911) Unify Admin and AsyncAdmin's methods name
Guanghao Zhang created HBASE-18911: -- Summary: Unify Admin and AsyncAdmin's methods name Key: HBASE-18911 URL: https://issues.apache.org/jira/browse/HBASE-18911 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18910) Backport HBASE-17292 "Add observer notification before bulk loaded hfile is moved to region directory" to 1.3
Guangxu Cheng created HBASE-18910: - Summary: Backport HBASE-17292 "Add observer notification before bulk loaded hfile is moved to region directory" to 1.3 Key: HBASE-18910 URL: https://issues.apache.org/jira/browse/HBASE-18910 Project: HBase Issue Type: Bug Reporter: Guangxu Cheng Assignee: Guangxu Cheng Fix For: 1.3.2 HBASE-18900 will backport HBASE-17290 to branch-1.3.But HBASE-17290 is dependent on HBASE-17292.so this issue will backport HBASE-17292 to branch-1.3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18909) Deprecate Admin's methods which used String regex
Guanghao Zhang created HBASE-18909: -- Summary: Deprecate Admin's methods which used String regex Key: HBASE-18909 URL: https://issues.apache.org/jira/browse/HBASE-18909 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18908) Add Java 9 section to support matrix documentation
Mike Drob created HBASE-18908: - Summary: Add Java 9 section to support matrix documentation Key: HBASE-18908 URL: https://issues.apache.org/jira/browse/HBASE-18908 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Mike Drob -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Performance issue in the Join query on the HBase tables
@Eric: for the trafodion, will take a look. @Nick: And for the Hive/Spark over snapshots, I just have a try on the Hive over HBase snapshots, the select(count) is much more faster than Hive over HBase. Since the HBase tables are all so big, how to make the engine respecting the data locality? Thank you very much, On Fri, Sep 29, 2017 at 10:22 PM, Nick Dimiduk wrote: > Have you considered running Hive/Spark over snapshots of your HBase tables? > > If you're seeing network saturation over HBase but not hdfs, makes me think > data locality is not being honored. Might be worth investigating as well. > > On Fri, Sep 29, 2017 at 3:26 AM wenxing zheng > wrote: > > > Dear all, > > > > I have 3 big HBase tables, which all have millions of rows(rows are > synced > > from MySQL DB via Bin log) and for each HBase table, we have an external > > table on Hive correspondingly with the storage by > > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is > that > > we can always keep sync up with the production DB and provides random > > access by key. > > > > Now our business needs to do some analysis on those tables with Join > query. > > What's the best practice to make it? > > > > From my experiment, I found that with the Spark SQL on HBase or Hive, the > > job ran very slowly and will saturate the network bandwidth. But it works > > very well for the Hive SQL directly against Hive from HDFS files(make a > > copy of the data to HDFS files). > > > > Appreciated for any advice on what would be the problem here? and the way > > to optimize the job. > > Regards, Wenxing > > >
Re: Welcome Chia-Ping Tsai to the HBase PMC
Well deserved, Chia-Ping! On Fri, Sep 29, 2017 at 6:04 PM, Esteban Gutierrez wrote: > Congrats Chia-Ping! and Welcome! > > -- > Cloudera, Inc. > > > On Fri, Sep 29, 2017 at 3:52 PM, Guanghao Zhang > wrote: > > > Congratulations! > > > > 2017-09-30 6:38 GMT+08:00 Andrew Purtell : > > > > > Congratulations, Chia-Ping! Welcome to the PMC. > > > > > > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones > > > > wrote: > > > > > > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed > > to > > > > join > > > > the HBase PMC, and help to make the project run smoothly. Chia-Ping > > > became > > > > an > > > > HBase committer over 6 months ago, based on long-running participate > in > > > the > > > > HBase project, a consistent record of resolving HBase issues, and > > > > contributions > > > > to testing and performance. > > > > > > > > Thank you for stepping up to serve, Chia-Ping! > > > > > > > > As a reminder, if anyone would like to nominate another person as a > > > > committer or PMC member, even if you are not currently a committer or > > PMC > > > > member, you can always drop a note to priv...@hbase.apache.org to > let > > us > > > > know! > > > > > > > > Thanks, > > > > Misty (on behalf of the HBase PMC) > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Andrew > > > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > > decrepit hands > > >- A23, Crosstalk > > > > > >
[jira] [Resolved] (HBASE-18559) Add histogram to MetricsConnection to track concurrent calls per server
[ https://issues.apache.org/jira/browse/HBASE-18559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-18559. Resolution: Fixed Hadoop Flags: Reviewed Pushed to 1.4 and up > Add histogram to MetricsConnection to track concurrent calls per server > --- > > Key: HBASE-18559 > URL: https://issues.apache.org/jira/browse/HBASE-18559 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Robert Yokota >Assignee: Robert Yokota >Priority: Minor > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0 > > Attachments: HBASE-18559.master.001.patch > > > HBASE-16388 introduced a new configuration setting > "hbase.client.perserver.requests.threshold " to deal with slow region > servers. I have back-ported the code for the new config setting to our > environment, but I don't feel comfortable setting it in production without > visibility into how the number of concurrent calls per server varies > (especially the current high water mark or max in production when the cluster > is healthy). > It is straightforward to pass the value for the concurrent calls per server > to a new histogram in MetricsConnection. I will attach a patch that I am > using to gain a better understanding of how setting > "hbase.client.perserver.requests.threshold" will affect our production > environment. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Welcome Chia-Ping Tsai to the HBase PMC
Welcome Chia-Ping. Keep up the great work. S On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones wrote: > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to > join > the HBase PMC, and help to make the project run smoothly. Chia-Ping became > an > HBase committer over 6 months ago, based on long-running participate in the > HBase project, a consistent record of resolving HBase issues, and > contributions > to testing and performance. > > Thank you for stepping up to serve, Chia-Ping! > > As a reminder, if anyone would like to nominate another person as a > committer or PMC member, even if you are not currently a committer or PMC > member, you can always drop a note to priv...@hbase.apache.org to let us > know! > > Thanks, > Misty (on behalf of the HBase PMC) >
[jira] [Resolved] (HBASE-18436) Add client-side hedged read metrics
[ https://issues.apache.org/jira/browse/HBASE-18436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-18436. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.5.0 1.4.0 3.0.0 2.0.0 Pushed to 1.4 and up > Add client-side hedged read metrics > --- > > Key: HBASE-18436 > URL: https://issues.apache.org/jira/browse/HBASE-18436 > Project: HBase > Issue Type: Improvement >Reporter: Yun Zhao >Assignee: Yun Zhao >Priority: Minor > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0 > > Attachments: HBASE-18436.master.001.patch > > > Need some metrics to represent indicate read high-availability. > +hedgedReadOps -- the number of hedged read that have occurred. > +hedgedReadWin -- the number of hedged read returned faster than the original > read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Welcome Chia-Ping Tsai to the HBase PMC
Congrats Chia-Ping! and Welcome! -- Cloudera, Inc. On Fri, Sep 29, 2017 at 3:52 PM, Guanghao Zhang wrote: > Congratulations! > > 2017-09-30 6:38 GMT+08:00 Andrew Purtell : > > > Congratulations, Chia-Ping! Welcome to the PMC. > > > > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones > > wrote: > > > > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed > to > > > join > > > the HBase PMC, and help to make the project run smoothly. Chia-Ping > > became > > > an > > > HBase committer over 6 months ago, based on long-running participate in > > the > > > HBase project, a consistent record of resolving HBase issues, and > > > contributions > > > to testing and performance. > > > > > > Thank you for stepping up to serve, Chia-Ping! > > > > > > As a reminder, if anyone would like to nominate another person as a > > > committer or PMC member, even if you are not currently a committer or > PMC > > > member, you can always drop a note to priv...@hbase.apache.org to let > us > > > know! > > > > > > Thanks, > > > Misty (on behalf of the HBase PMC) > > > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > >- A23, Crosstalk > > >
Re: Welcome Chia-Ping Tsai to the HBase PMC
Congratulations Chia-Ping! Huaxiang > On Sep 29, 2017, at 3:52 PM, Guanghao Zhang wrote: > > Congratulations! > > 2017-09-30 6:38 GMT+08:00 Andrew Purtell : > >> Congratulations, Chia-Ping! Welcome to the PMC. >> >> On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones >> wrote: >> >>> The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to >>> join >>> the HBase PMC, and help to make the project run smoothly. Chia-Ping >> became >>> an >>> HBase committer over 6 months ago, based on long-running participate in >> the >>> HBase project, a consistent record of resolving HBase issues, and >>> contributions >>> to testing and performance. >>> >>> Thank you for stepping up to serve, Chia-Ping! >>> >>> As a reminder, if anyone would like to nominate another person as a >>> committer or PMC member, even if you are not currently a committer or PMC >>> member, you can always drop a note to priv...@hbase.apache.org to let us >>> know! >>> >>> Thanks, >>> Misty (on behalf of the HBase PMC) >>> >> >> >> >> -- >> Best regards, >> Andrew >> >> Words like orphans lost among the crosstalk, meaning torn from truth's >> decrepit hands >> - A23, Crosstalk >>
Re: Welcome Chia-Ping Tsai to the HBase PMC
Congratulations! 2017-09-30 6:38 GMT+08:00 Andrew Purtell : > Congratulations, Chia-Ping! Welcome to the PMC. > > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones > wrote: > > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to > > join > > the HBase PMC, and help to make the project run smoothly. Chia-Ping > became > > an > > HBase committer over 6 months ago, based on long-running participate in > the > > HBase project, a consistent record of resolving HBase issues, and > > contributions > > to testing and performance. > > > > Thank you for stepping up to serve, Chia-Ping! > > > > As a reminder, if anyone would like to nominate another person as a > > committer or PMC member, even if you are not currently a committer or PMC > > member, you can always drop a note to priv...@hbase.apache.org to let us > > know! > > > > Thanks, > > Misty (on behalf of the HBase PMC) > > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
[jira] [Created] (HBASE-18907) Methods missing rpc timeout parameter in HTable
Ted Yu created HBASE-18907: -- Summary: Methods missing rpc timeout parameter in HTable Key: HBASE-18907 URL: https://issues.apache.org/jira/browse/HBASE-18907 Project: HBase Issue Type: Bug Reporter: Ted Yu When revisiting HBASE-15645, I found that two methods miss the rpcTimeout parameter to newCaller() in HTable: {code} return rpcCallerFactory. newCaller().callWithRetries(callable, this.operationTimeout); {code} I checked branch-1.2 Other branch(es) may have the same problem -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Welcome Chia-Ping Tsai to the HBase PMC
Congratulations, Chia-Ping! Welcome to the PMC. On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones wrote: > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to > join > the HBase PMC, and help to make the project run smoothly. Chia-Ping became > an > HBase committer over 6 months ago, based on long-running participate in the > HBase project, a consistent record of resolving HBase issues, and > contributions > to testing and performance. > > Thank you for stepping up to serve, Chia-Ping! > > As a reminder, if anyone would like to nominate another person as a > committer or PMC member, even if you are not currently a committer or PMC > member, you can always drop a note to priv...@hbase.apache.org to let us > know! > > Thanks, > Misty (on behalf of the HBase PMC) > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
Re: Welcome Chia-Ping Tsai to the HBase PMC
My sincere congratulations! On Fri, Sep 29, 2017 at 3:22 PM, Ted Yu wrote: > Congratulations, Chia-Ping. > > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones > wrote: > > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to > > join > > the HBase PMC, and help to make the project run smoothly. Chia-Ping > became > > an > > HBase committer over 6 months ago, based on long-running participate in > the > > HBase project, a consistent record of resolving HBase issues, and > > contributions > > to testing and performance. > > > > Thank you for stepping up to serve, Chia-Ping! > > > > As a reminder, if anyone would like to nominate another person as a > > committer or PMC member, even if you are not currently a committer or PMC > > member, you can always drop a note to priv...@hbase.apache.org to let us > > know! > > > > Thanks, > > Misty (on behalf of the HBase PMC) > > > -- A very happy Clouderan
Re: Welcome Chia-Ping Tsai to the HBase PMC
Congratulations, Chia-Ping. On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones wrote: > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to > join > the HBase PMC, and help to make the project run smoothly. Chia-Ping became > an > HBase committer over 6 months ago, based on long-running participate in the > HBase project, a consistent record of resolving HBase issues, and > contributions > to testing and performance. > > Thank you for stepping up to serve, Chia-Ping! > > As a reminder, if anyone would like to nominate another person as a > committer or PMC member, even if you are not currently a committer or PMC > member, you can always drop a note to priv...@hbase.apache.org to let us > know! > > Thanks, > Misty (on behalf of the HBase PMC) >
Welcome Chia-Ping Tsai to the HBase PMC
The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed to join the HBase PMC, and help to make the project run smoothly. Chia-Ping became an HBase committer over 6 months ago, based on long-running participate in the HBase project, a consistent record of resolving HBase issues, and contributions to testing and performance. Thank you for stepping up to serve, Chia-Ping! As a reminder, if anyone would like to nominate another person as a committer or PMC member, even if you are not currently a committer or PMC member, you can always drop a note to priv...@hbase.apache.org to let us know! Thanks, Misty (on behalf of the HBase PMC)
Re: [DISCUSS] Move Type out of KeyValue
​Construct a normal put or delete or batch mutation, add whatever extra state you need in one or more operation attributes, and use a regionobserver to extend normal processing to handle the extra state. I'm curious what dispatching to extension code because of a custom cell type buys you over dispatching to extension code because of the presence of an attribute (or cell tag). For example, in security coprocessors we take attribute data and attach it to the cell using cell tags. Later we check for cell tag(s) to determine if we have to take special action when the cell is accessed by a scanner, or during some operations (e.g. appends or increments have to do extra handling for cell security tags). On Fri, Sep 29, 2017 at 2:43 PM, Chia-Ping Tsai wrote: > > Instead of a custom cell, could you use a regular cell with a custom > > operation attribute (see OperationWithAttributes). > Pardon me, I didn't get what you said. > > > > On 2017-09-30 04:31, Andrew Purtell wrote: > > Instead of a custom cell, could you use a regular cell with a custom > > operation attribute (see OperationWithAttributes). > > > > On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai > wrote: > > > > > The custom cell help us to save memory consumption. We don't have own > > > serialization/deserialization mechanism, hence to transform data from > > > client to server needs many conversion phase (user data -> Put/Cell -> > pb > > > object). The cost of conversion is large in transferring bulk data. In > > > fact, we also have custom mutation to manage the memory usage of inner > cell > > > collection. > > > > > > On 2017-09-30 02:43, Andrew Purtell wrote: > > > > What are the use cases for a custom cell? It seems a dangerously low > > > level > > > > thing to attempt and perhaps we should unwind support for it. But > perhaps > > > > there is a compelling justification. > > > > > > > > > > > > On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai < > chia7...@apache.org> > > > > wrote: > > > > > > > > > Thanks for all comment. > > > > > > > > > > The problem i want to resolve is the valid code should be exposed > as > > > > > IA.Public. Otherwise, end user have to access the IA.Private class > to > > > build > > > > > the custom cell. > > > > > > > > > > For example, I have a use case which plays a streaming role in our > > > > > appliaction. It > > > > > applies the CellBuilder(HBASE-18519) to build custom cells. These > cells > > > > > have many same fields so they are put in shared-memory for > avoiding GC > > > > > pause. Everything is wonderful. However, we have to access the > > > IA.Private > > > > > class - KeyValue#Type - to get the valid code of Put. > > > > > > > > > > I believe there are many use cases of custom cell, and > consequently it > > > is > > > > > worth adding a way to get the valid type via IA.Public class. > > > Otherwise, it > > > > > may imply that the custom cell is based on a unstable way, because > the > > > > > related code can be changed at any time. > > > > > -- > > > > > Chia-Ping > > > > > > > > > > On 2017-09-29 00:49, Andrew Purtell wrote: > > > > > > I agree with Stack. Was typing up a reply to Anoop but let me > move it > > > > > down > > > > > > here. > > > > > > > > > > > > The type code exposes some low level details of how our current > > > stores > > > > > are > > > > > > architected. But what if in the future you could swap out HStore > > > > > implements > > > > > > Store with PStore implements Store, where HStore is backed by > HFiles > > > and > > > > > > PStore is backed by Parquet? Just as a hypothetical example. I > know > > > there > > > > > > would be larger issues if this were actually attempted. Bear with > > > me. You > > > > > > can imagine some different new Store implementation that has some > > > > > > advantages but is not a design derived from the log structured > merge > > > tree > > > > > > if you like. Most values from a new Cell.Type based on > KeyValue.Type > > > > > > wouldn't apply to cells from such a thing because they are > > > particular to > > > > > > how LSMs work. I'm sure such a project if attempted would make a > > > number > > > > > of > > > > > > changes requiring a major version increment and low level details > > > could > > > > > be > > > > > > unwound from Cell then, but if we could avoid doing it in the > first > > > > > place, > > > > > > I think it would better for maintainability. > > > > > > > > > > > > > > > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack wrote: > > > > > > > > > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai < > > > chia7...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > hi folks, > > > > > > > > > > > > > > > > User is allowed to create custom cell but the valid code of > type > > > - > > > > > > > > KeyValue#Type - is declared as IA.Private. As i see it, we > should > > > > > expose > > > > > > > > KeyValue#Type as Public Client. Three possible ways are shown > > > below: > > > > > > > > 1) Change declaration
Re: [DISCUSS] Move Type out of KeyValue
> Instead of a custom cell, could you use a regular cell with a custom > operation attribute (see OperationWithAttributes). Pardon me, I didn't get what you said. On 2017-09-30 04:31, Andrew Purtell wrote: > Instead of a custom cell, could you use a regular cell with a custom > operation attribute (see OperationWithAttributes). > > On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai wrote: > > > The custom cell help us to save memory consumption. We don't have own > > serialization/deserialization mechanism, hence to transform data from > > client to server needs many conversion phase (user data -> Put/Cell -> pb > > object). The cost of conversion is large in transferring bulk data. In > > fact, we also have custom mutation to manage the memory usage of inner cell > > collection. > > > > On 2017-09-30 02:43, Andrew Purtell wrote: > > > What are the use cases for a custom cell? It seems a dangerously low > > level > > > thing to attempt and perhaps we should unwind support for it. But perhaps > > > there is a compelling justification. > > > > > > > > > On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai > > > wrote: > > > > > > > Thanks for all comment. > > > > > > > > The problem i want to resolve is the valid code should be exposed as > > > > IA.Public. Otherwise, end user have to access the IA.Private class to > > build > > > > the custom cell. > > > > > > > > For example, I have a use case which plays a streaming role in our > > > > appliaction. It > > > > applies the CellBuilder(HBASE-18519) to build custom cells. These cells > > > > have many same fields so they are put in shared-memory for avoiding GC > > > > pause. Everything is wonderful. However, we have to access the > > IA.Private > > > > class - KeyValue#Type - to get the valid code of Put. > > > > > > > > I believe there are many use cases of custom cell, and consequently it > > is > > > > worth adding a way to get the valid type via IA.Public class. > > Otherwise, it > > > > may imply that the custom cell is based on a unstable way, because the > > > > related code can be changed at any time. > > > > -- > > > > Chia-Ping > > > > > > > > On 2017-09-29 00:49, Andrew Purtell wrote: > > > > > I agree with Stack. Was typing up a reply to Anoop but let me move it > > > > down > > > > > here. > > > > > > > > > > The type code exposes some low level details of how our current > > stores > > > > are > > > > > architected. But what if in the future you could swap out HStore > > > > implements > > > > > Store with PStore implements Store, where HStore is backed by HFiles > > and > > > > > PStore is backed by Parquet? Just as a hypothetical example. I know > > there > > > > > would be larger issues if this were actually attempted. Bear with > > me. You > > > > > can imagine some different new Store implementation that has some > > > > > advantages but is not a design derived from the log structured merge > > tree > > > > > if you like. Most values from a new Cell.Type based on KeyValue.Type > > > > > wouldn't apply to cells from such a thing because they are > > particular to > > > > > how LSMs work. I'm sure such a project if attempted would make a > > number > > > > of > > > > > changes requiring a major version increment and low level details > > could > > > > be > > > > > unwound from Cell then, but if we could avoid doing it in the first > > > > place, > > > > > I think it would better for maintainability. > > > > > > > > > > > > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack wrote: > > > > > > > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai < > > chia7...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > hi folks, > > > > > > > > > > > > > > User is allowed to create custom cell but the valid code of type > > - > > > > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should > > > > expose > > > > > > > KeyValue#Type as Public Client. Three possible ways are shown > > below: > > > > > > > 1) Change declaration of KeyValue#Type from IA.Private to > > IA.Public > > > > > > > 2) Move KeyValue#Type into Cell. > > > > > > > 3) Move KeyValue#Type to upper level > > > > > > > > > > > > > > Any suggestions? > > > > > > > > > > > > > > > > > > > > What is the problem that we are trying to solve Chia-Ping? You > > want to > > > > make > > > > > > Cells of a new Type? > > > > > > > > > > > > My first reaction is that KV#Type is particular to the KV > > > > implementation. > > > > > > Any new Cell implementation should not have to adopt the KeyValue > > > > typing > > > > > > mechanism. > > > > > > > > > > > > S > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Andrew > > > > > > > > > > Words like orphans lost among the crosstalk, meaning torn from > > truth's > > > > > decrepit hands > > > > >- A23, Crosstalk > > > > > > > > > > > > > > > > > > > > > -- > > > Best r
Re: [DISCUSS] Move Type out of KeyValue
Instead of a custom cell, could you use a regular cell with a custom operation attribute (see OperationWithAttributes). On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai wrote: > The custom cell help us to save memory consumption. We don't have own > serialization/deserialization mechanism, hence to transform data from > client to server needs many conversion phase (user data -> Put/Cell -> pb > object). The cost of conversion is large in transferring bulk data. In > fact, we also have custom mutation to manage the memory usage of inner cell > collection. > > On 2017-09-30 02:43, Andrew Purtell wrote: > > What are the use cases for a custom cell? It seems a dangerously low > level > > thing to attempt and perhaps we should unwind support for it. But perhaps > > there is a compelling justification. > > > > > > On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai > > wrote: > > > > > Thanks for all comment. > > > > > > The problem i want to resolve is the valid code should be exposed as > > > IA.Public. Otherwise, end user have to access the IA.Private class to > build > > > the custom cell. > > > > > > For example, I have a use case which plays a streaming role in our > > > appliaction. It > > > applies the CellBuilder(HBASE-18519) to build custom cells. These cells > > > have many same fields so they are put in shared-memory for avoiding GC > > > pause. Everything is wonderful. However, we have to access the > IA.Private > > > class - KeyValue#Type - to get the valid code of Put. > > > > > > I believe there are many use cases of custom cell, and consequently it > is > > > worth adding a way to get the valid type via IA.Public class. > Otherwise, it > > > may imply that the custom cell is based on a unstable way, because the > > > related code can be changed at any time. > > > -- > > > Chia-Ping > > > > > > On 2017-09-29 00:49, Andrew Purtell wrote: > > > > I agree with Stack. Was typing up a reply to Anoop but let me move it > > > down > > > > here. > > > > > > > > The type code exposes some low level details of how our current > stores > > > are > > > > architected. But what if in the future you could swap out HStore > > > implements > > > > Store with PStore implements Store, where HStore is backed by HFiles > and > > > > PStore is backed by Parquet? Just as a hypothetical example. I know > there > > > > would be larger issues if this were actually attempted. Bear with > me. You > > > > can imagine some different new Store implementation that has some > > > > advantages but is not a design derived from the log structured merge > tree > > > > if you like. Most values from a new Cell.Type based on KeyValue.Type > > > > wouldn't apply to cells from such a thing because they are > particular to > > > > how LSMs work. I'm sure such a project if attempted would make a > number > > > of > > > > changes requiring a major version increment and low level details > could > > > be > > > > unwound from Cell then, but if we could avoid doing it in the first > > > place, > > > > I think it would better for maintainability. > > > > > > > > > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack wrote: > > > > > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai < > chia7...@apache.org> > > > > > wrote: > > > > > > > > > > > hi folks, > > > > > > > > > > > > User is allowed to create custom cell but the valid code of type > - > > > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should > > > expose > > > > > > KeyValue#Type as Public Client. Three possible ways are shown > below: > > > > > > 1) Change declaration of KeyValue#Type from IA.Private to > IA.Public > > > > > > 2) Move KeyValue#Type into Cell. > > > > > > 3) Move KeyValue#Type to upper level > > > > > > > > > > > > Any suggestions? > > > > > > > > > > > > > > > > > What is the problem that we are trying to solve Chia-Ping? You > want to > > > make > > > > > Cells of a new Type? > > > > > > > > > > My first reaction is that KV#Type is particular to the KV > > > implementation. > > > > > Any new Cell implementation should not have to adopt the KeyValue > > > typing > > > > > mechanism. > > > > > > > > > > S > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrew > > > > > > > > Words like orphans lost among the crosstalk, meaning torn from > truth's > > > > decrepit hands > > > >- A23, Crosstalk > > > > > > > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > >- A23, Crosstalk > > > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
Re: [DISCUSS] Move Type out of KeyValue
The custom cell help us to save memory consumption. We don't have own serialization/deserialization mechanism, hence to transform data from client to server needs many conversion phase (user data -> Put/Cell -> pb object). The cost of conversion is large in transferring bulk data. In fact, we also have custom mutation to manage the memory usage of inner cell collection. On 2017-09-30 02:43, Andrew Purtell wrote: > What are the use cases for a custom cell? It seems a dangerously low level > thing to attempt and perhaps we should unwind support for it. But perhaps > there is a compelling justification. > > > On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai > wrote: > > > Thanks for all comment. > > > > The problem i want to resolve is the valid code should be exposed as > > IA.Public. Otherwise, end user have to access the IA.Private class to build > > the custom cell. > > > > For example, I have a use case which plays a streaming role in our > > appliaction. It > > applies the CellBuilder(HBASE-18519) to build custom cells. These cells > > have many same fields so they are put in shared-memory for avoiding GC > > pause. Everything is wonderful. However, we have to access the IA.Private > > class - KeyValue#Type - to get the valid code of Put. > > > > I believe there are many use cases of custom cell, and consequently it is > > worth adding a way to get the valid type via IA.Public class. Otherwise, it > > may imply that the custom cell is based on a unstable way, because the > > related code can be changed at any time. > > -- > > Chia-Ping > > > > On 2017-09-29 00:49, Andrew Purtell wrote: > > > I agree with Stack. Was typing up a reply to Anoop but let me move it > > down > > > here. > > > > > > The type code exposes some low level details of how our current stores > > are > > > architected. But what if in the future you could swap out HStore > > implements > > > Store with PStore implements Store, where HStore is backed by HFiles and > > > PStore is backed by Parquet? Just as a hypothetical example. I know there > > > would be larger issues if this were actually attempted. Bear with me. You > > > can imagine some different new Store implementation that has some > > > advantages but is not a design derived from the log structured merge tree > > > if you like. Most values from a new Cell.Type based on KeyValue.Type > > > wouldn't apply to cells from such a thing because they are particular to > > > how LSMs work. I'm sure such a project if attempted would make a number > > of > > > changes requiring a major version increment and low level details could > > be > > > unwound from Cell then, but if we could avoid doing it in the first > > place, > > > I think it would better for maintainability. > > > > > > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack wrote: > > > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai > > > > wrote: > > > > > > > > > hi folks, > > > > > > > > > > User is allowed to create custom cell but the valid code of type - > > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should > > expose > > > > > KeyValue#Type as Public Client. Three possible ways are shown below: > > > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public > > > > > 2) Move KeyValue#Type into Cell. > > > > > 3) Move KeyValue#Type to upper level > > > > > > > > > > Any suggestions? > > > > > > > > > > > > > > What is the problem that we are trying to solve Chia-Ping? You want to > > make > > > > Cells of a new Type? > > > > > > > > My first reaction is that KV#Type is particular to the KV > > implementation. > > > > Any new Cell implementation should not have to adopt the KeyValue > > typing > > > > mechanism. > > > > > > > > S > > > > > > > > > > > > > > > > > > > > > -- > > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Andrew > > > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > > decrepit hands > > >- A23, Crosstalk > > > > > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
[DISCUSS] deprecating o.a.h.h.regionserver.RowProcessor
Hi, Currently Region.processRowsWithLocks() API takes o.a.h.h.regionserver.RowProcessor as an argument and only implementation of this class is MultiRowMutationProcessor. This implementation is internal and used from HRegion.mutateRows...() methods. HRegion.processRowsWithLocks() implementation, doesn't call coprocessor hooks but instead calls RowProcessor hooks at appropriate point in execution. Many of these hooks/ methods have same names and are called at similar points during the course of execution but they are not related! HRegion.batchMutate() methods call coprocessor hooks but not row RowProcessor hooks. Internal implementation MultiRowMutationProcessor, call coprocessor hooks from inside it's own methods/ hooks. But this can not be expected of all implementations for RowProcessors. In case of HRegion.batchMutate...() methods, CP mutations are merged with input mutations and these merged mutations are applied to WALEdit fetched from CPs. In case of processRowsWithLocks(), mutations are fetched from RowProcessor instance and are applied on WALEdit built by RowProcessor. The major inconsistency here is, one code path uses coprocessors while other uses RowProcessor. There are other minor inconsistencies along those two code paths. Proposed fix: * Unify two code paths. * Deprecate RowProcessor and API Region.processRowsWithLocks() that takes RowProcessor as an argument. * Provide alternate API that doesn't take RowProcessor. * Modify batchMutate...() to take additional arguments: rowsToLock (byte[][]) and atomic/ allOrNone (boolean). * Remove MultiRowMutationProcessor. Make HRegion.mutateRows() methods to use batchMutate(). * Make new implementation of Region.processRowsWithLocks() which doesn't take RowProcessor as an argument use batchMutate(). Suggestion is that coprocessors can be used to do things RowProcessors are doing. Related JIRAs: HBASE-18703, HBASE-18183 Let me know your thoughts. Thanks, Umesh
Re: [DISCUSS] Move Type out of KeyValue
What are the use cases for a custom cell? It seems a dangerously low level thing to attempt and perhaps we should unwind support for it. But perhaps there is a compelling justification. On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai wrote: > Thanks for all comment. > > The problem i want to resolve is the valid code should be exposed as > IA.Public. Otherwise, end user have to access the IA.Private class to build > the custom cell. > > For example, I have a use case which plays a streaming role in our > appliaction. It > applies the CellBuilder(HBASE-18519) to build custom cells. These cells > have many same fields so they are put in shared-memory for avoiding GC > pause. Everything is wonderful. However, we have to access the IA.Private > class - KeyValue#Type - to get the valid code of Put. > > I believe there are many use cases of custom cell, and consequently it is > worth adding a way to get the valid type via IA.Public class. Otherwise, it > may imply that the custom cell is based on a unstable way, because the > related code can be changed at any time. > -- > Chia-Ping > > On 2017-09-29 00:49, Andrew Purtell wrote: > > I agree with Stack. Was typing up a reply to Anoop but let me move it > down > > here. > > > > The type code exposes some low level details of how our current stores > are > > architected. But what if in the future you could swap out HStore > implements > > Store with PStore implements Store, where HStore is backed by HFiles and > > PStore is backed by Parquet? Just as a hypothetical example. I know there > > would be larger issues if this were actually attempted. Bear with me. You > > can imagine some different new Store implementation that has some > > advantages but is not a design derived from the log structured merge tree > > if you like. Most values from a new Cell.Type based on KeyValue.Type > > wouldn't apply to cells from such a thing because they are particular to > > how LSMs work. I'm sure such a project if attempted would make a number > of > > changes requiring a major version increment and low level details could > be > > unwound from Cell then, but if we could avoid doing it in the first > place, > > I think it would better for maintainability. > > > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack wrote: > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai > > > wrote: > > > > > > > hi folks, > > > > > > > > User is allowed to create custom cell but the valid code of type - > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should > expose > > > > KeyValue#Type as Public Client. Three possible ways are shown below: > > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public > > > > 2) Move KeyValue#Type into Cell. > > > > 3) Move KeyValue#Type to upper level > > > > > > > > Any suggestions? > > > > > > > > > > > What is the problem that we are trying to solve Chia-Ping? You want to > make > > > Cells of a new Type? > > > > > > My first reaction is that KV#Type is particular to the KV > implementation. > > > Any new Cell implementation should not have to adopt the KeyValue > typing > > > mechanism. > > > > > > S > > > > > > > > > > > > > > > > -- > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > >- A23, Crosstalk > > > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
Re: [DISCUSS] Becoming a Committer
This conversation is in a good place. I apologize for the tone of my earlier allergic reaction but not the content. I hope that is acceptable. On Fri, Sep 29, 2017 at 11:01 AM, Mike Drob wrote: > To bounce off of what Yu Li said earlier - I see Hadoop has adopted very > similar language to the Spark list: > http://hadoop.apache.org/committer_criteria.html > > I especially like the examples at the bottom. They are four diverse paths, > and there is no expectation that this is an exclusive list. If we were to > write our own, I think it should reflect Andrew's highlighting of the > non-professional contributor's path. And also important is to include the > soft skills from Misty's list. > > Mike > > On Sat, Sep 23, 2017 at 5:07 PM, Stack wrote: > > > Good discussion. Thanks Mike for kicking it off. > > > > The Misty list is great. > > > > I find myself giving double kudos for non-code or feature contribs; e.g. > > stuff like test-fixing, patches that fix bugs found in production or > > patches from operators that ease their day-to-day burden, voting on > > releases, doc., (useful, encouraging, deep, helpful) review of the work > of > > others, etc. (I'd love it if someone took ownership of our website -- > hint, > > hint). > > > > Sean has a dictum, paraphrasing, "...the fastest route to commitership is > > doing what no one else wants to do" (Did I mangle that Busbey?), which I > > like. > > > > While Andrew may have misjudged Mike Drob's original intent, I appreciate > > his rallying to the cause of the non-professional contributor and his > > reaction to (mis-perceived) call for quantification (For a classic on the > > problems that arise when hard-and-fast rules, see [2]). I'm with him > > defending PMC right to give 'spirit' and 'gut' precedence over 'rules' > > (Often, it *is* just a case of you know it when you see it). And as per > > Andy, if perceived injustice or bias, please write here or private@hbase. > > > > Lets keep dumping on this thread. We can then summarize and make it easy > > for prospectives to find (can also add links to stuff such as the recent > > Wang+Leblang talk at ApacheCon [1] and Andrew's write up for how to be a > > committer on Hadoop as background). > > > > Thanks, > > St.Ack > > > > 1. > > https://apachecon2017.sched.com/event/9zv3/a-tale-of-two- > > developers-finding-harmony-between-commercial-software- > > development-and-the-apache-way-andrew-wang-alex-leblang-cloudera > > 2. > > https://books.google.com/books/about/Seeing_Like_a_ > > State.html?id=PqcPCgsr2u0C > > > > > > > > > > > > On Fri, Sep 22, 2017 at 4:08 PM, Zach York > > > wrote: > > > > > bq. As a > > > relatively new member in the HBase community and a non-committer, once > > the > > > new member decides that he/ she wants to become a Committer, it will be > > > helpful to have a list of PMC members that he/ she can communicate with > > and > > > get feedback from time to time. Feedback may include potential > > adjustments > > > and rough idea about progress towards the goal. > > > > > > This sounds like a good idea! Ideally, if you interact with the > community > > > often enough, you should be building connections, but it nevers hurts > to > > > have someone to check how they perceive your work. > > > > > > bq. For others, having > > > this list of volunteer mentors, will surely help. > > > > > > Again I agree. This part is especially important as it is hard to judge > > > your progress if you don't have someone at the same company to converse > > > with. > > > > > > On Fri, Sep 22, 2017 at 3:38 PM, Umesh Agashe > > > wrote: > > > > > > > Hi, > > > > > > > > Thank you all for a good discussion here. Issues with both having and > > NOT > > > > having documented specific criteria are well articulated here. As a > > > > relatively new member in the HBase community and a non-committer, > once > > > the > > > > new member decides that he/ she wants to become a Committer, it will > be > > > > helpful to have a list of PMC members that he/ she can communicate > with > > > and > > > > get feedback from time to time. Feedback may include potential > > > adjustments > > > > and rough idea about progress towards the goal. Paid professionals > who > > > are > > > > working with PMC members, can talk to their colleagues. For others, > > > having > > > > this list of volunteer mentors, will surely help. IMHO, this will > make > > > > process a bit more transparent. I would like to know your thoughts on > > > this. > > > > > > > > Thanks, > > > > Umesh > > > > > > > > > > > > > > > > > > > > On Thu, Sep 21, 2017 at 1:41 PM, Misty Stanley-Jones < > mi...@apache.org > > > > > > > wrote: > > > > > > > > > I feel like I inject this note into all discussions like this, but > > I'm > > > > > going to do it again. "Act like a committer" does not ONLY mean to > > > > produce > > > > > code for HBase. It means to support the project. This may mean any > of > > > the > > > > > following, plus a long list of other things I
Re: [DISCUSS] Becoming a Committer
To bounce off of what Yu Li said earlier - I see Hadoop has adopted very similar language to the Spark list: http://hadoop.apache.org/committer_criteria.html I especially like the examples at the bottom. They are four diverse paths, and there is no expectation that this is an exclusive list. If we were to write our own, I think it should reflect Andrew's highlighting of the non-professional contributor's path. And also important is to include the soft skills from Misty's list. Mike On Sat, Sep 23, 2017 at 5:07 PM, Stack wrote: > Good discussion. Thanks Mike for kicking it off. > > The Misty list is great. > > I find myself giving double kudos for non-code or feature contribs; e.g. > stuff like test-fixing, patches that fix bugs found in production or > patches from operators that ease their day-to-day burden, voting on > releases, doc., (useful, encouraging, deep, helpful) review of the work of > others, etc. (I'd love it if someone took ownership of our website -- hint, > hint). > > Sean has a dictum, paraphrasing, "...the fastest route to commitership is > doing what no one else wants to do" (Did I mangle that Busbey?), which I > like. > > While Andrew may have misjudged Mike Drob's original intent, I appreciate > his rallying to the cause of the non-professional contributor and his > reaction to (mis-perceived) call for quantification (For a classic on the > problems that arise when hard-and-fast rules, see [2]). I'm with him > defending PMC right to give 'spirit' and 'gut' precedence over 'rules' > (Often, it *is* just a case of you know it when you see it). And as per > Andy, if perceived injustice or bias, please write here or private@hbase. > > Lets keep dumping on this thread. We can then summarize and make it easy > for prospectives to find (can also add links to stuff such as the recent > Wang+Leblang talk at ApacheCon [1] and Andrew's write up for how to be a > committer on Hadoop as background). > > Thanks, > St.Ack > > 1. > https://apachecon2017.sched.com/event/9zv3/a-tale-of-two- > developers-finding-harmony-between-commercial-software- > development-and-the-apache-way-andrew-wang-alex-leblang-cloudera > 2. > https://books.google.com/books/about/Seeing_Like_a_ > State.html?id=PqcPCgsr2u0C > > > > > > On Fri, Sep 22, 2017 at 4:08 PM, Zach York > wrote: > > > bq. As a > > relatively new member in the HBase community and a non-committer, once > the > > new member decides that he/ she wants to become a Committer, it will be > > helpful to have a list of PMC members that he/ she can communicate with > and > > get feedback from time to time. Feedback may include potential > adjustments > > and rough idea about progress towards the goal. > > > > This sounds like a good idea! Ideally, if you interact with the community > > often enough, you should be building connections, but it nevers hurts to > > have someone to check how they perceive your work. > > > > bq. For others, having > > this list of volunteer mentors, will surely help. > > > > Again I agree. This part is especially important as it is hard to judge > > your progress if you don't have someone at the same company to converse > > with. > > > > On Fri, Sep 22, 2017 at 3:38 PM, Umesh Agashe > > wrote: > > > > > Hi, > > > > > > Thank you all for a good discussion here. Issues with both having and > NOT > > > having documented specific criteria are well articulated here. As a > > > relatively new member in the HBase community and a non-committer, once > > the > > > new member decides that he/ she wants to become a Committer, it will be > > > helpful to have a list of PMC members that he/ she can communicate with > > and > > > get feedback from time to time. Feedback may include potential > > adjustments > > > and rough idea about progress towards the goal. Paid professionals who > > are > > > working with PMC members, can talk to their colleagues. For others, > > having > > > this list of volunteer mentors, will surely help. IMHO, this will make > > > process a bit more transparent. I would like to know your thoughts on > > this. > > > > > > Thanks, > > > Umesh > > > > > > > > > > > > > > > On Thu, Sep 21, 2017 at 1:41 PM, Misty Stanley-Jones > > > > wrote: > > > > > > > I feel like I inject this note into all discussions like this, but > I'm > > > > going to do it again. "Act like a committer" does not ONLY mean to > > > produce > > > > code for HBase. It means to support the project. This may mean any of > > the > > > > following, plus a long list of other things I'm sure I'm not thinking > > of > > > > right now: > > > > > > > > - Contribute to the docs (yay!) > > > > - Help fix and improve testing > > > > - Participate in release candidate votes, even if non-binding > > > > - Review other people's work > > > > - Help newbies > > > > - Answer questions > > > > - Update the website > > > > - File issues > > > > - Mentor new contributors of all sorts > > > > - Give talks about HBase > > > > - Write blogs about HBase > > > > - Participat
Re: Performance issue in the Join query on the HBase tables
Have you considered running Hive/Spark over snapshots of your HBase tables? If you're seeing network saturation over HBase but not hdfs, makes me think data locality is not being honored. Might be worth investigating as well. On Fri, Sep 29, 2017 at 3:26 AM wenxing zheng wrote: > Dear all, > > I have 3 big HBase tables, which all have millions of rows(rows are synced > from MySQL DB via Bin log) and for each HBase table, we have an external > table on Hive correspondingly with the storage by > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is that > we can always keep sync up with the production DB and provides random > access by key. > > Now our business needs to do some analysis on those tables with Join query. > What's the best practice to make it? > > From my experiment, I found that with the Spark SQL on HBase or Hive, the > job ran very slowly and will saturate the network bandwidth. But it works > very well for the Hive SQL directly against Hive from HDFS files(make a > copy of the data to HDFS files). > > Appreciated for any advice on what would be the problem here? and the way > to optimize the job. > Regards, Wenxing >
RE: Performance issue in the Join query on the HBase tables
Hi Wenxing, From the use case you describe, you may want to take a look at Trafodion or EsgynDB (commercial version of Trafodion). http://trafodion.incubator.apache.org/ Trafodion uses a very mature SQL engine on top of HBASE/HIVE coming with 20 years of IP given away to open source by Hewlett-packard 2 years ago. Support many different JOIN types (hash join, nested joins, merge joins) with optimized overflow to disk mechanisms over an optimized pipelined architecture, full indexing capabilities, and an optimized row format that will make your hbase table a lot faster than it is when using one cell per column. From a SQL capability standpoint for analytics queries, Trafodion can run full TPCDS 99 queries. Hope this helps, Eric -Original Message- From: wenxing zheng [mailto:wenxing.zh...@gmail.com] Sent: Friday, September 29, 2017 7:24 AM To: dev@hbase.apache.org Subject: Re: Performance issue in the Join query on the HBase tables Thanks to Ted. We didn't try the phoneix yet. From the performance test on the official site of phoenix, I didn't find the report on the Join query. Not sure whether it's much better or not On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu wrote: > Have you looked at Phoenix ? > > https://phoenix.apache.org/joins.html > > On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng > > wrote: > > > Dear all, > > > > I have 3 big HBase tables, which all have millions of rows(rows are > synced > > from MySQL DB via Bin log) and for each HBase table, we have an > > external table on Hive correspondingly with the storage by > > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is > that > > we can always keep sync up with the production DB and provides > > random access by key. > > > > Now our business needs to do some analysis on those tables with Join > query. > > What's the best practice to make it? > > > > From my experiment, I found that with the Spark SQL on HBase or > > Hive, the job ran very slowly and will saturate the network > > bandwidth. But it works very well for the Hive SQL directly against > > Hive from HDFS files(make a copy of the data to HDFS files). > > > > Appreciated for any advice on what would be the problem here? and > > the way to optimize the job. > > Regards, Wenxing > > >
Re: Performance issue in the Join query on the HBase tables
Thanks to Ted. We didn't try the phoneix yet. From the performance test on the official site of phoenix, I didn't find the report on the Join query. Not sure whether it's much better or not On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu wrote: > Have you looked at Phoenix ? > > https://phoenix.apache.org/joins.html > > On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng > wrote: > > > Dear all, > > > > I have 3 big HBase tables, which all have millions of rows(rows are > synced > > from MySQL DB via Bin log) and for each HBase table, we have an external > > table on Hive correspondingly with the storage by > > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is > that > > we can always keep sync up with the production DB and provides random > > access by key. > > > > Now our business needs to do some analysis on those tables with Join > query. > > What's the best practice to make it? > > > > From my experiment, I found that with the Spark SQL on HBase or Hive, the > > job ran very slowly and will saturate the network bandwidth. But it works > > very well for the Hive SQL directly against Hive from HDFS files(make a > > copy of the data to HDFS files). > > > > Appreciated for any advice on what would be the problem here? and the way > > to optimize the job. > > Regards, Wenxing > > >
Re: Performance issue in the Join query on the HBase tables
Have you looked at Phoenix ? https://phoenix.apache.org/joins.html On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng wrote: > Dear all, > > I have 3 big HBase tables, which all have millions of rows(rows are synced > from MySQL DB via Bin log) and for each HBase table, we have an external > table on Hive correspondingly with the storage by > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is that > we can always keep sync up with the production DB and provides random > access by key. > > Now our business needs to do some analysis on those tables with Join query. > What's the best practice to make it? > > From my experiment, I found that with the Spark SQL on HBase or Hive, the > job ran very slowly and will saturate the network bandwidth. But it works > very well for the Hive SQL directly against Hive from HDFS files(make a > copy of the data to HDFS files). > > Appreciated for any advice on what would be the problem here? and the way > to optimize the job. > Regards, Wenxing >
Re: [DISCUSS] Move Type out of KeyValue
Ya as Chia-Ping said, the problem he is trying to solve is very basic one. As long as we allow custom Cell creation (Via CellBuilder API) and allow Mutations to be added with Cells and pass that from client side APIs, we have to make the Type public accessible. Or else the Cell building APIs should not be taking in a type byte. We have to some way allow user to make put/delete cells stc. Is type that bound for only KV? We have getType in Cell also right? The type in full form what we have in KV now, may be making us confuse here? As Ram said it contains some internal types also which the user has never to know abt. Pls correct if saying in wrong way. Good that Chia-Ping brought this out here. We have to either way solve it and make the public API fully public. -Anoop- On Fri, Sep 29, 2017 at 2:27 PM, ramkrishna vasudevan wrote: > Even if we are trying to move out I think only few of the types are really > user readable. So we should be very careful here. So since we have > CellBuilder way it is better we check what type of cells a user can build. > I think for now the Cellbuilder is not client exposed? > But again moving to Cell means it becomes public which is not right IMO and > I thinks others here also agree to it. > > Regards > Ram > > On Fri, Sep 29, 2017 at 10:50 AM, Chia-Ping Tsai > wrote: > >> Thanks for all comment. >> >> The problem i want to resolve is the valid code should be exposed as >> IA.Public. Otherwise, end user have to access the IA.Private class to build >> the custom cell. >> >> For example, I have a use case which plays a streaming role in our >> appliaction. It >> applies the CellBuilder(HBASE-18519) to build custom cells. These cells >> have many same fields so they are put in shared-memory for avoiding GC >> pause. Everything is wonderful. However, we have to access the IA.Private >> class - KeyValue#Type - to get the valid code of Put. >> >> I believe there are many use cases of custom cell, and consequently it is >> worth adding a way to get the valid type via IA.Public class. Otherwise, it >> may imply that the custom cell is based on a unstable way, because the >> related code can be changed at any time. >> -- >> Chia-Ping >> >> On 2017-09-29 00:49, Andrew Purtell wrote: >> > I agree with Stack. Was typing up a reply to Anoop but let me move it >> down >> > here. >> > >> > The type code exposes some low level details of how our current stores >> are >> > architected. But what if in the future you could swap out HStore >> implements >> > Store with PStore implements Store, where HStore is backed by HFiles and >> > PStore is backed by Parquet? Just as a hypothetical example. I know there >> > would be larger issues if this were actually attempted. Bear with me. You >> > can imagine some different new Store implementation that has some >> > advantages but is not a design derived from the log structured merge tree >> > if you like. Most values from a new Cell.Type based on KeyValue.Type >> > wouldn't apply to cells from such a thing because they are particular to >> > how LSMs work. I'm sure such a project if attempted would make a number >> of >> > changes requiring a major version increment and low level details could >> be >> > unwound from Cell then, but if we could avoid doing it in the first >> place, >> > I think it would better for maintainability. >> > >> > >> > On Thu, Sep 28, 2017 at 9:39 AM, Stack wrote: >> > >> > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai >> > > wrote: >> > > >> > > > hi folks, >> > > > >> > > > User is allowed to create custom cell but the valid code of type - >> > > > KeyValue#Type - is declared as IA.Private. As i see it, we should >> expose >> > > > KeyValue#Type as Public Client. Three possible ways are shown below: >> > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public >> > > > 2) Move KeyValue#Type into Cell. >> > > > 3) Move KeyValue#Type to upper level >> > > > >> > > > Any suggestions? >> > > > >> > > > >> > > What is the problem that we are trying to solve Chia-Ping? You want to >> make >> > > Cells of a new Type? >> > > >> > > My first reaction is that KV#Type is particular to the KV >> implementation. >> > > Any new Cell implementation should not have to adopt the KeyValue >> typing >> > > mechanism. >> > > >> > > S >> > > >> > > >> > > >> > > >> > > > -- >> > > > Chia-Ping >> > > > >> > > > >> > > >> > >> > >> > >> > -- >> > Best regards, >> > Andrew >> > >> > Words like orphans lost among the crosstalk, meaning torn from truth's >> > decrepit hands >> >- A23, Crosstalk >> > >>
Performance issue in the Join query on the HBase tables
Dear all, I have 3 big HBase tables, which all have millions of rows(rows are synced from MySQL DB via Bin log) and for each HBase table, we have an external table on Hive correspondingly with the storage by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is that we can always keep sync up with the production DB and provides random access by key. Now our business needs to do some analysis on those tables with Join query. What's the best practice to make it? >From my experiment, I found that with the Spark SQL on HBase or Hive, the job ran very slowly and will saturate the network bandwidth. But it works very well for the Hive SQL directly against Hive from HDFS files(make a copy of the data to HDFS files). Appreciated for any advice on what would be the problem here? and the way to optimize the job. Regards, Wenxing
Re: [DISCUSS] Move Type out of KeyValue
Even if we are trying to move out I think only few of the types are really user readable. So we should be very careful here. So since we have CellBuilder way it is better we check what type of cells a user can build. I think for now the Cellbuilder is not client exposed? But again moving to Cell means it becomes public which is not right IMO and I thinks others here also agree to it. Regards Ram On Fri, Sep 29, 2017 at 10:50 AM, Chia-Ping Tsai wrote: > Thanks for all comment. > > The problem i want to resolve is the valid code should be exposed as > IA.Public. Otherwise, end user have to access the IA.Private class to build > the custom cell. > > For example, I have a use case which plays a streaming role in our > appliaction. It > applies the CellBuilder(HBASE-18519) to build custom cells. These cells > have many same fields so they are put in shared-memory for avoiding GC > pause. Everything is wonderful. However, we have to access the IA.Private > class - KeyValue#Type - to get the valid code of Put. > > I believe there are many use cases of custom cell, and consequently it is > worth adding a way to get the valid type via IA.Public class. Otherwise, it > may imply that the custom cell is based on a unstable way, because the > related code can be changed at any time. > -- > Chia-Ping > > On 2017-09-29 00:49, Andrew Purtell wrote: > > I agree with Stack. Was typing up a reply to Anoop but let me move it > down > > here. > > > > The type code exposes some low level details of how our current stores > are > > architected. But what if in the future you could swap out HStore > implements > > Store with PStore implements Store, where HStore is backed by HFiles and > > PStore is backed by Parquet? Just as a hypothetical example. I know there > > would be larger issues if this were actually attempted. Bear with me. You > > can imagine some different new Store implementation that has some > > advantages but is not a design derived from the log structured merge tree > > if you like. Most values from a new Cell.Type based on KeyValue.Type > > wouldn't apply to cells from such a thing because they are particular to > > how LSMs work. I'm sure such a project if attempted would make a number > of > > changes requiring a major version increment and low level details could > be > > unwound from Cell then, but if we could avoid doing it in the first > place, > > I think it would better for maintainability. > > > > > > On Thu, Sep 28, 2017 at 9:39 AM, Stack wrote: > > > > > On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai > > > wrote: > > > > > > > hi folks, > > > > > > > > User is allowed to create custom cell but the valid code of type - > > > > KeyValue#Type - is declared as IA.Private. As i see it, we should > expose > > > > KeyValue#Type as Public Client. Three possible ways are shown below: > > > > 1) Change declaration of KeyValue#Type from IA.Private to IA.Public > > > > 2) Move KeyValue#Type into Cell. > > > > 3) Move KeyValue#Type to upper level > > > > > > > > Any suggestions? > > > > > > > > > > > What is the problem that we are trying to solve Chia-Ping? You want to > make > > > Cells of a new Type? > > > > > > My first reaction is that KV#Type is particular to the KV > implementation. > > > Any new Cell implementation should not have to adopt the KeyValue > typing > > > mechanism. > > > > > > S > > > > > > > > > > > > > > > > -- > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > >- A23, Crosstalk > > >