Re: [DISCUSS] More Shading

2017-04-11 Thread York, Zach
Should we allow dependent projects (such as Phoenix) to weigh in on this issue 
since they are likely going to be the ones that benefit/are effected?

On 4/11/17, 10:17 AM, "York, Zach" <zy...@amazon.com> wrote:

+1 (non-binding)

This sounds like a good idea to me!

Zach

On 4/11/17, 9:48 AM, "saint@gmail.com on behalf of Stack" 
<saint@gmail.com on behalf of st...@duboce.net> wrote:

Let me revive this thread.

Recall, we are stuck on old or particular versions of critical libs. We 
are
unable to update because our versions will clash w/ versions from
upstreamer hadoop2.7/2.8/3.0/spark, etc. We have a shaded client. We 
need
to message downstreamers that they should use it going forward.  This 
will
help going forward but it will not inoculate our internals nor an 
existing
context where we'd like to be a compatible drop-in.

We could try hackery filtering transitive includes up in poms for each
version of hadoop/spark that we support but in the end, its a bunch of
effort, hard to test, and we are unable to dictate the CLASSPATH order 
in
all situations.

We could try some shading voodoo inline w/ build. Because shading is a
post-package step and because we are modularized and shading includes 
the
shaded classes in the artifact produced, we'd end up w/ multiple copies 
of
guava/netty/etc. classes, an instance per module that makes a reference.

Lets do Sean's idea of a pre-build step where we package and relocate
('shade') critical dependencies (Going by the thread above, Ram, Anoop, 
and
Andy seems good w/ general idea).

In implementation, we (The HBase PMC) would ask for a new repo [1]. In 
here
we'd create a new mvn project. This project would produce a single 
artifact
(jar) called hbase-dependencies or hbase-3rdparty or 
hbase-shaded-3rdparty
libs. In it would be relocated core libs such as guava and netty (and 
maybe
protobuf). We'd publish this artifact and then have hbase depend on it
changing all references to point at the relocation: e.g. rather than 
import
com.google.common.collect.Maps, we'd import
org.apache.hadoop.hbase.com.google.common.collect.Maps.

We (The HBase PMC) will have to make releases of this new artifact and 
vote
on them. I think it will be a relatively rare event.

I'd be up for doing the first cut if folks are game.

St.Ack


1. URL via Sean but for committers to view only: 
https://reporeq.apache.org/

On Sun, Oct 2, 2016 at 10:29 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> +1 for Sean's ideas. Bundling all the dependent libraries and shading 
them
> into one jar and HBase referring to it makes sense and should avoid 
some of
> the pain in terms of IDE usage. Stack's doc clearly talks about the 
IDE
> issues that we may get after this protobuf shading goes in. It may be
> difficult for new comers and those who don't know this background of 
why it
> has to be like that.
>
> Regards
> Ram
>
> On Sun, Oct 2, 2016 at 10:51 AM, Stack <st...@duboce.net> wrote:
>
> > On Sat, Oct 1, 2016 at 6:32 PM, Jerry He <jerry...@gmail.com> wrote:
> >
> > > How is the proposed going to impact the existing shaded-client and
> > > shaded-server modules, making them unnecessary and go away?
> > >
> >
> > No. We still need the blanket shading of hbase client and server.
> >
> > This effort is about our internals. We have a mess of other 
components
> all
> > up inside us such as HDFS, etc., each with their own sets of 
dependencies
> > many of which we have in common. This project t is about making it 
so we
> > can upgrade at a rate independent of when our upstreamers choose to
> change.
> >
> >
> > > It doesn't seem so.  These modules are supposed to shade HBase and
> > upstream
> > > from downstream users.
> > >
> >
> > Agree.
> >
> > Thanks for drawing out the difference between these two shading 
efforts,
> >
> > St.Ack
> >
> >
> >
> > > Thanks.
> > >
> > > Jerry
> > >
> > > On

Re: [DISCUSS] More Shading

2017-04-11 Thread York, Zach
+1 (non-binding)

This sounds like a good idea to me!

Zach

On 4/11/17, 9:48 AM, "saint@gmail.com on behalf of Stack" 
 wrote:

Let me revive this thread.

Recall, we are stuck on old or particular versions of critical libs. We are
unable to update because our versions will clash w/ versions from
upstreamer hadoop2.7/2.8/3.0/spark, etc. We have a shaded client. We need
to message downstreamers that they should use it going forward.  This will
help going forward but it will not inoculate our internals nor an existing
context where we'd like to be a compatible drop-in.

We could try hackery filtering transitive includes up in poms for each
version of hadoop/spark that we support but in the end, its a bunch of
effort, hard to test, and we are unable to dictate the CLASSPATH order in
all situations.

We could try some shading voodoo inline w/ build. Because shading is a
post-package step and because we are modularized and shading includes the
shaded classes in the artifact produced, we'd end up w/ multiple copies of
guava/netty/etc. classes, an instance per module that makes a reference.

Lets do Sean's idea of a pre-build step where we package and relocate
('shade') critical dependencies (Going by the thread above, Ram, Anoop, and
Andy seems good w/ general idea).

In implementation, we (The HBase PMC) would ask for a new repo [1]. In here
we'd create a new mvn project. This project would produce a single artifact
(jar) called hbase-dependencies or hbase-3rdparty or hbase-shaded-3rdparty
libs. In it would be relocated core libs such as guava and netty (and maybe
protobuf). We'd publish this artifact and then have hbase depend on it
changing all references to point at the relocation: e.g. rather than import
com.google.common.collect.Maps, we'd import
org.apache.hadoop.hbase.com.google.common.collect.Maps.

We (The HBase PMC) will have to make releases of this new artifact and vote
on them. I think it will be a relatively rare event.

I'd be up for doing the first cut if folks are game.

St.Ack


1. URL via Sean but for committers to view only: https://reporeq.apache.org/

On Sun, Oct 2, 2016 at 10:29 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> +1 for Sean's ideas. Bundling all the dependent libraries and shading them
> into one jar and HBase referring to it makes sense and should avoid some 
of
> the pain in terms of IDE usage. Stack's doc clearly talks about the IDE
> issues that we may get after this protobuf shading goes in. It may be
> difficult for new comers and those who don't know this background of why 
it
> has to be like that.
>
> Regards
> Ram
>
> On Sun, Oct 2, 2016 at 10:51 AM, Stack  wrote:
>
> > On Sat, Oct 1, 2016 at 6:32 PM, Jerry He  wrote:
> >
> > > How is the proposed going to impact the existing shaded-client and
> > > shaded-server modules, making them unnecessary and go away?
> > >
> >
> > No. We still need the blanket shading of hbase client and server.
> >
> > This effort is about our internals. We have a mess of other components
> all
> > up inside us such as HDFS, etc., each with their own sets of 
dependencies
> > many of which we have in common. This project t is about making it so we
> > can upgrade at a rate independent of when our upstreamers choose to
> change.
> >
> >
> > > It doesn't seem so.  These modules are supposed to shade HBase and
> > upstream
> > > from downstream users.
> > >
> >
> > Agree.
> >
> > Thanks for drawing out the difference between these two shading efforts,
> >
> > St.Ack
> >
> >
> >
> > > Thanks.
> > >
> > > Jerry
> > >
> > > On Sat, Oct 1, 2016 at 2:33 PM, Andrew Purtell <
> andrew.purt...@gmail.com
> > >
> > > wrote:
> > >
> > > > > Sean has suggested a pre-build step where in another repo we'd 
make
> > > hbase
> > > > > shaded versions of critical libs, 'release' them (votes, etc.) and
> > then
> > > > > have core depend on these. It be a bunch of work but would make 
the
> > > dev's
> > > > > life easier.
> > > >
> > > > So when we make changes that require updates to and rebuild of the
> > > > supporting libraries, as a developer I would make local changes,
> > install
> > > a
> > > > snapshot of that into the local maven cache, then point the HBase
> build
> > > at
> > > > the snapshot, then do the other half of the work, then push up to
> both?
> > > >
> > > > I think this could work.
> > >
> >
>




Re: release of HBase 1.3.1?

2017-04-03 Thread York, Zach
+1

I meant to send this a while ago.

On 4/3/17, 5:43 PM, "James Taylor"  wrote:

Hello,
We'd like to start supporting releases of Phoenix that work with the HBase
1.3 branch, but there's a committed fix on which we rely (HBASE-17587) for
Phoenix to function correctly. Is there a time frame for an HBase 1.3.1
release?
Thanks,
James




Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)

2017-02-17 Thread York, Zach
Thanks for the updates! I will review when I have time.

On 2/17/17, 4:16 PM, "Umesh Agashe"  wrote:

Hi,

Here is the doc that summarizes our discussion about why we think top-down
approach requiring radical code changes compared to incremental, phased
(bottom-up) approach will help us REDO of FS directory layout.



https://docs.google.com/document/d/128Q0BqJY7OvHMUpEpZWKCaBrH1qDjpxxOVkX2KM46No/edit#heading=h.iyja9q78fh2j

Thanks,
Umesh


On Fri, Feb 17, 2017 at 12:57 PM, Stack  wrote:

> Notes from this morning's online meeting @10AM PST (please fill in any
> detail I missed):
>
> IN ATTENDANCE:
> Aman Poonia
> Umesh Agashe, Cloudera
> Stephen Tak, AMZ
> Zach York, AMZ
> Francis Liu, Yahoo!
> Ben Mau, Yahoo!
> Sean Busbey, Cloudera
> Ted Yu, HWX
> Appy (Apekshit Sharma), Cloudera
>
>
> BACKGROUND (St.Ack)
> Y! want to do millions of regions in a Cluster.
> Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend
> heavily on HDFS rename doing atomic file and directory swaps); complicates
> being able to run on another FS.
> HBase is bound to a particular physical layout in the FS.
> Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation
> with how FS ops are distributed all about the codebase had him propose a
> logical tier with a radically simplified set of requirements of underlying
> FS (block store?); atomic operations would be done by HBase rather than
> farmed out to the FS.
> Matteo not w/ us anymore but he passed on the vision to Umesh
>
> CURRENT STATE OF FS REDO PROJECT (Umesh)
> Currently it is shelved but hope to get back to it 'soon'.
> Spent a few months on FS REDO at end of last year.
> Initial approach was to abstract out three Interfaces (original sketched 
by
> Matteo in [1]).
> Idea was to centralize all FS use in a few well-known locations.
> Then refactor all FS usage.
> Keep all meta data about tables, files, etc., in hbase:meta
> Idea was to slowly migrate over ops, tools etc., to the new Interface.
> This was a bottom-up approach, finding FS references, and moving 
references
> to one place.
> Soon found too many refs all over the code.
> Found that we might not get to desired simple Interface because API had to
> carry around baggage.
> Matteo had tried this approach in [1] and started to argue this stepped
> migration would never arrive.
>
> So restarted over w/ the ideal Simple FS Interface and the implementation
> seemed to flow smoothly.
> An in-memory POC that did simple file ops was posted a while back here 
[2].
>
> Given the two approaches taken above, experience indicates that the
> radical, top-down approach is more likely to succeed.
>
> WHY ARE PEOPLE INTERESTED IN FS REDO?
> Francis and Ben Mau, we want to be able to do 1M regions.
> St.Ack suggested that even small installs need to be able to do more,
> smaller regions.
> Zach is interested because wants to optimize HBase over S3 (rename,
> consistency issues). Liked the idea of metadata up in hbase;meta table and
> avoiding renames, etc.
>
> WHAT SHOULD WE DO?
> We have few resources. It is a big job (We've been talking about it a good
> while now). All docs are stale missing benefit of Umesh recent
> explorations.
> Sean pointed out that before shelving, the idea was to try the PoC
> Interface against a new hbase operation other than simple file reading and
> writing (compactions?). If the PoC Interface survived in the new context,
> we'd then step back and write up a design.
> Seemed like as good a plan as any. Plan should talk about all the ways in
> which ops can go wrong.
> Thereafter, split up the work and bring over subsystems.
> It is looking like hbase3 rather than hbase2 project (though all hoped it
> could make an hbase2).
>
> TODOs
> We agreed to post these notes with pointers to current state of FS REDO
> (See below).
> Umesh and Stack to do up a one-pager on current PoC to be posted on this
> thread or up in the FS REDO issue (HBASE-14090).
> Keep up macro status on this thread.
>
> What else?
> Thanks,
> S
>
> 1. Matteo's original FS REDO suggested plan: https://docs.google.com/
> document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit#
> 2. Umesh's PoC: https://reviews.apache.org/r/55200/
> 3. HBASE-14090 is the parent issue for this project?
> 4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming
> Matteo's ideas): https://docs.google.com/document/d/
> 10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit#
>
>
> On Fri, Feb 17, 2017 at 9:53 AM, Stack  

Re: Git Pull Request to HBase

2017-01-25 Thread York, Zach
I would recommend you use the dev-support/submit-patch.py script to create your 
patch.

Information can be found here: 
http://hbase.apache.org/book.html#submitting.patches.create

Thanks,
Zach

On 1/25/17, 11:34 AM, "Chetan Khatri"  wrote:

Hello ,

I have been submitted 3 PR to HBase Git Repository, I think it is not what
HBase Community expect, I need to submit Patch as attachment to HBase JIRA.

Can anybody give me steps for the same.


Thanks.