Re: [DISCUSS] More Shading
Should we allow dependent projects (such as Phoenix) to weigh in on this issue since they are likely going to be the ones that benefit/are effected? On 4/11/17, 10:17 AM, "York, Zach" <zy...@amazon.com> wrote: +1 (non-binding) This sounds like a good idea to me! Zach On 4/11/17, 9:48 AM, "saint@gmail.com on behalf of Stack" <saint@gmail.com on behalf of st...@duboce.net> wrote: Let me revive this thread. Recall, we are stuck on old or particular versions of critical libs. We are unable to update because our versions will clash w/ versions from upstreamer hadoop2.7/2.8/3.0/spark, etc. We have a shaded client. We need to message downstreamers that they should use it going forward. This will help going forward but it will not inoculate our internals nor an existing context where we'd like to be a compatible drop-in. We could try hackery filtering transitive includes up in poms for each version of hadoop/spark that we support but in the end, its a bunch of effort, hard to test, and we are unable to dictate the CLASSPATH order in all situations. We could try some shading voodoo inline w/ build. Because shading is a post-package step and because we are modularized and shading includes the shaded classes in the artifact produced, we'd end up w/ multiple copies of guava/netty/etc. classes, an instance per module that makes a reference. Lets do Sean's idea of a pre-build step where we package and relocate ('shade') critical dependencies (Going by the thread above, Ram, Anoop, and Andy seems good w/ general idea). In implementation, we (The HBase PMC) would ask for a new repo [1]. In here we'd create a new mvn project. This project would produce a single artifact (jar) called hbase-dependencies or hbase-3rdparty or hbase-shaded-3rdparty libs. In it would be relocated core libs such as guava and netty (and maybe protobuf). We'd publish this artifact and then have hbase depend on it changing all references to point at the relocation: e.g. rather than import com.google.common.collect.Maps, we'd import org.apache.hadoop.hbase.com.google.common.collect.Maps. We (The HBase PMC) will have to make releases of this new artifact and vote on them. I think it will be a relatively rare event. I'd be up for doing the first cut if folks are game. St.Ack 1. URL via Sean but for committers to view only: https://reporeq.apache.org/ On Sun, Oct 2, 2016 at 10:29 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > +1 for Sean's ideas. Bundling all the dependent libraries and shading them > into one jar and HBase referring to it makes sense and should avoid some of > the pain in terms of IDE usage. Stack's doc clearly talks about the IDE > issues that we may get after this protobuf shading goes in. It may be > difficult for new comers and those who don't know this background of why it > has to be like that. > > Regards > Ram > > On Sun, Oct 2, 2016 at 10:51 AM, Stack <st...@duboce.net> wrote: > > > On Sat, Oct 1, 2016 at 6:32 PM, Jerry He <jerry...@gmail.com> wrote: > > > > > How is the proposed going to impact the existing shaded-client and > > > shaded-server modules, making them unnecessary and go away? > > > > > > > No. We still need the blanket shading of hbase client and server. > > > > This effort is about our internals. We have a mess of other components > all > > up inside us such as HDFS, etc., each with their own sets of dependencies > > many of which we have in common. This project t is about making it so we > > can upgrade at a rate independent of when our upstreamers choose to > change. > > > > > > > It doesn't seem so. These modules are supposed to shade HBase and > > upstream > > > from downstream users. > > > > > > > Agree. > > > > Thanks for drawing out the difference between these two shading efforts, > > > > St.Ack > > > > > > > > > Thanks. > > > > > > Jerry > > > > > > On
Re: [DISCUSS] More Shading
+1 (non-binding) This sounds like a good idea to me! Zach On 4/11/17, 9:48 AM, "saint@gmail.com on behalf of Stack"wrote: Let me revive this thread. Recall, we are stuck on old or particular versions of critical libs. We are unable to update because our versions will clash w/ versions from upstreamer hadoop2.7/2.8/3.0/spark, etc. We have a shaded client. We need to message downstreamers that they should use it going forward. This will help going forward but it will not inoculate our internals nor an existing context where we'd like to be a compatible drop-in. We could try hackery filtering transitive includes up in poms for each version of hadoop/spark that we support but in the end, its a bunch of effort, hard to test, and we are unable to dictate the CLASSPATH order in all situations. We could try some shading voodoo inline w/ build. Because shading is a post-package step and because we are modularized and shading includes the shaded classes in the artifact produced, we'd end up w/ multiple copies of guava/netty/etc. classes, an instance per module that makes a reference. Lets do Sean's idea of a pre-build step where we package and relocate ('shade') critical dependencies (Going by the thread above, Ram, Anoop, and Andy seems good w/ general idea). In implementation, we (The HBase PMC) would ask for a new repo [1]. In here we'd create a new mvn project. This project would produce a single artifact (jar) called hbase-dependencies or hbase-3rdparty or hbase-shaded-3rdparty libs. In it would be relocated core libs such as guava and netty (and maybe protobuf). We'd publish this artifact and then have hbase depend on it changing all references to point at the relocation: e.g. rather than import com.google.common.collect.Maps, we'd import org.apache.hadoop.hbase.com.google.common.collect.Maps. We (The HBase PMC) will have to make releases of this new artifact and vote on them. I think it will be a relatively rare event. I'd be up for doing the first cut if folks are game. St.Ack 1. URL via Sean but for committers to view only: https://reporeq.apache.org/ On Sun, Oct 2, 2016 at 10:29 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > +1 for Sean's ideas. Bundling all the dependent libraries and shading them > into one jar and HBase referring to it makes sense and should avoid some of > the pain in terms of IDE usage. Stack's doc clearly talks about the IDE > issues that we may get after this protobuf shading goes in. It may be > difficult for new comers and those who don't know this background of why it > has to be like that. > > Regards > Ram > > On Sun, Oct 2, 2016 at 10:51 AM, Stack wrote: > > > On Sat, Oct 1, 2016 at 6:32 PM, Jerry He wrote: > > > > > How is the proposed going to impact the existing shaded-client and > > > shaded-server modules, making them unnecessary and go away? > > > > > > > No. We still need the blanket shading of hbase client and server. > > > > This effort is about our internals. We have a mess of other components > all > > up inside us such as HDFS, etc., each with their own sets of dependencies > > many of which we have in common. This project t is about making it so we > > can upgrade at a rate independent of when our upstreamers choose to > change. > > > > > > > It doesn't seem so. These modules are supposed to shade HBase and > > upstream > > > from downstream users. > > > > > > > Agree. > > > > Thanks for drawing out the difference between these two shading efforts, > > > > St.Ack > > > > > > > > > Thanks. > > > > > > Jerry > > > > > > On Sat, Oct 1, 2016 at 2:33 PM, Andrew Purtell < > andrew.purt...@gmail.com > > > > > > wrote: > > > > > > > > Sean has suggested a pre-build step where in another repo we'd make > > > hbase > > > > > shaded versions of critical libs, 'release' them (votes, etc.) and > > then > > > > > have core depend on these. It be a bunch of work but would make the > > > dev's > > > > > life easier. > > > > > > > > So when we make changes that require updates to and rebuild of the > > > > supporting libraries, as a developer I would make local changes, > > install > > > a > > > > snapshot of that into the local maven cache, then point the HBase > build > > > at > > > > the snapshot, then do the other half of the work, then push up to > both? > > > > > > > > I think this could work. > > > > > >
Re: release of HBase 1.3.1?
+1 I meant to send this a while ago. On 4/3/17, 5:43 PM, "James Taylor"wrote: Hello, We'd like to start supporting releases of Phoenix that work with the HBase 1.3 branch, but there's a committed fix on which we rely (HBASE-17587) for Phoenix to function correctly. Is there a time frame for an HBase 1.3.1 release? Thanks, James
Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)
Thanks for the updates! I will review when I have time. On 2/17/17, 4:16 PM, "Umesh Agashe"wrote: Hi, Here is the doc that summarizes our discussion about why we think top-down approach requiring radical code changes compared to incremental, phased (bottom-up) approach will help us REDO of FS directory layout. https://docs.google.com/document/d/128Q0BqJY7OvHMUpEpZWKCaBrH1qDjpxxOVkX2KM46No/edit#heading=h.iyja9q78fh2j Thanks, Umesh On Fri, Feb 17, 2017 at 12:57 PM, Stack wrote: > Notes from this morning's online meeting @10AM PST (please fill in any > detail I missed): > > IN ATTENDANCE: > Aman Poonia > Umesh Agashe, Cloudera > Stephen Tak, AMZ > Zach York, AMZ > Francis Liu, Yahoo! > Ben Mau, Yahoo! > Sean Busbey, Cloudera > Ted Yu, HWX > Appy (Apekshit Sharma), Cloudera > > > BACKGROUND (St.Ack) > Y! want to do millions of regions in a Cluster. > Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend > heavily on HDFS rename doing atomic file and directory swaps); complicates > being able to run on another FS. > HBase is bound to a particular physical layout in the FS. > Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation > with how FS ops are distributed all about the codebase had him propose a > logical tier with a radically simplified set of requirements of underlying > FS (block store?); atomic operations would be done by HBase rather than > farmed out to the FS. > Matteo not w/ us anymore but he passed on the vision to Umesh > > CURRENT STATE OF FS REDO PROJECT (Umesh) > Currently it is shelved but hope to get back to it 'soon'. > Spent a few months on FS REDO at end of last year. > Initial approach was to abstract out three Interfaces (original sketched by > Matteo in [1]). > Idea was to centralize all FS use in a few well-known locations. > Then refactor all FS usage. > Keep all meta data about tables, files, etc., in hbase:meta > Idea was to slowly migrate over ops, tools etc., to the new Interface. > This was a bottom-up approach, finding FS references, and moving references > to one place. > Soon found too many refs all over the code. > Found that we might not get to desired simple Interface because API had to > carry around baggage. > Matteo had tried this approach in [1] and started to argue this stepped > migration would never arrive. > > So restarted over w/ the ideal Simple FS Interface and the implementation > seemed to flow smoothly. > An in-memory POC that did simple file ops was posted a while back here [2]. > > Given the two approaches taken above, experience indicates that the > radical, top-down approach is more likely to succeed. > > WHY ARE PEOPLE INTERESTED IN FS REDO? > Francis and Ben Mau, we want to be able to do 1M regions. > St.Ack suggested that even small installs need to be able to do more, > smaller regions. > Zach is interested because wants to optimize HBase over S3 (rename, > consistency issues). Liked the idea of metadata up in hbase;meta table and > avoiding renames, etc. > > WHAT SHOULD WE DO? > We have few resources. It is a big job (We've been talking about it a good > while now). All docs are stale missing benefit of Umesh recent > explorations. > Sean pointed out that before shelving, the idea was to try the PoC > Interface against a new hbase operation other than simple file reading and > writing (compactions?). If the PoC Interface survived in the new context, > we'd then step back and write up a design. > Seemed like as good a plan as any. Plan should talk about all the ways in > which ops can go wrong. > Thereafter, split up the work and bring over subsystems. > It is looking like hbase3 rather than hbase2 project (though all hoped it > could make an hbase2). > > TODOs > We agreed to post these notes with pointers to current state of FS REDO > (See below). > Umesh and Stack to do up a one-pager on current PoC to be posted on this > thread or up in the FS REDO issue (HBASE-14090). > Keep up macro status on this thread. > > What else? > Thanks, > S > > 1. Matteo's original FS REDO suggested plan: https://docs.google.com/ > document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit# > 2. Umesh's PoC: https://reviews.apache.org/r/55200/ > 3. HBASE-14090 is the parent issue for this project? > 4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming > Matteo's ideas): https://docs.google.com/document/d/ > 10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit# > > > On Fri, Feb 17, 2017 at 9:53 AM, Stack
Re: Git Pull Request to HBase
I would recommend you use the dev-support/submit-patch.py script to create your patch. Information can be found here: http://hbase.apache.org/book.html#submitting.patches.create Thanks, Zach On 1/25/17, 11:34 AM, "Chetan Khatri"wrote: Hello , I have been submitted 3 PR to HBase Git Repository, I think it is not what HBase Community expect, I need to submit Patch as attachment to HBase JIRA. Can anybody give me steps for the same. Thanks.