Thanks for the updates! I will review when I have time. On 2/17/17, 4:16 PM, "Umesh Agashe" <uaga...@cloudera.com> wrote:
Hi, Here is the doc that summarizes our discussion about why we think top-down approach requiring radical code changes compared to incremental, phased (bottom-up) approach will help us REDO of FS directory layout. https://docs.google.com/document/d/128Q0BqJY7OvHMUpEpZWKCaBrH1qDjpxxOVkX2KM46No/edit#heading=h.iyja9q78fh2j Thanks, Umesh On Fri, Feb 17, 2017 at 12:57 PM, Stack <st...@duboce.net> wrote: > Notes from this morning's online meeting @10AM PST (please fill in any > detail I missed): > > IN ATTENDANCE: > Aman Poonia > Umesh Agashe, Cloudera > Stephen Tak, AMZ > Zach York, AMZ > Francis Liu, Yahoo! > Ben Mau, Yahoo! > Sean Busbey, Cloudera > Ted Yu, HWX > Appy (Apekshit Sharma), Cloudera > > > BACKGROUND (St.Ack) > Y! want to do millions of regions in a Cluster. > Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend > heavily on HDFS rename doing atomic file and directory swaps); complicates > being able to run on another FS. > HBase is bound to a particular physical layout in the FS. > Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation > with how FS ops are distributed all about the codebase had him propose a > logical tier with a radically simplified set of requirements of underlying > FS (block store?); atomic operations would be done by HBase rather than > farmed out to the FS. > Matteo not w/ us anymore but he passed on the vision to Umesh > > CURRENT STATE OF FS REDO PROJECT (Umesh) > Currently it is shelved but hope to get back to it 'soon'. > Spent a few months on FS REDO at end of last year. > Initial approach was to abstract out three Interfaces (original sketched by > Matteo in [1]). > Idea was to centralize all FS use in a few well-known locations. > Then refactor all FS usage. > Keep all meta data about tables, files, etc., in hbase:meta > Idea was to slowly migrate over ops, tools etc., to the new Interface. > This was a bottom-up approach, finding FS references, and moving references > to one place. > Soon found too many refs all over the code. > Found that we might not get to desired simple Interface because API had to > carry around baggage. > Matteo had tried this approach in [1] and started to argue this stepped > migration would never arrive. > > So restarted over w/ the ideal Simple FS Interface and the implementation > seemed to flow smoothly. > An in-memory POC that did simple file ops was posted a while back here [2]. > > Given the two approaches taken above, experience indicates that the > radical, top-down approach is more likely to succeed. > > WHY ARE PEOPLE INTERESTED IN FS REDO? > Francis and Ben Mau, we want to be able to do 1M regions. > St.Ack suggested that even small installs need to be able to do more, > smaller regions. > Zach is interested because wants to optimize HBase over S3 (rename, > consistency issues). Liked the idea of metadata up in hbase;meta table and > avoiding renames, etc. > > WHAT SHOULD WE DO? > We have few resources. It is a big job (We've been talking about it a good > while now). All docs are stale missing benefit of Umesh recent > explorations. > Sean pointed out that before shelving, the idea was to try the PoC > Interface against a new hbase operation other than simple file reading and > writing (compactions?). If the PoC Interface survived in the new context, > we'd then step back and write up a design. > Seemed like as good a plan as any. Plan should talk about all the ways in > which ops can go wrong. > Thereafter, split up the work and bring over subsystems. > It is looking like hbase3 rather than hbase2 project (though all hoped it > could make an hbase2). > > TODOs > We agreed to post these notes with pointers to current state of FS REDO > (See below). > Umesh and Stack to do up a one-pager on current PoC to be posted on this > thread or up in the FS REDO issue (HBASE-14090). > Keep up macro status on this thread. > > What else? > Thanks, > S > > 1. Matteo's original FS REDO suggested plan: https://docs.google.com/ > document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit# > 2. Umesh's PoC: https://reviews.apache.org/r/55200/ > 3. HBASE-14090 is the parent issue for this project? > 4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming > Matteo's ideas): https://docs.google.com/document/d/ > 10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit# > > > On Fri, Feb 17, 2017 at 9:53 AM, Stack <st...@duboce.net> wrote: > > > I put up a hangout. If above link doesn't work, try this > > https://hangouts.google.com/call/aaahkufdurgctflufw4ivhsngue and write > > here if can't get in. > > > > St.Ack > > > > On Tue, Feb 14, 2017 at 12:36 PM, Stack <st...@duboce.net> wrote: > > > >> A few folks want to have a quick chat about the state of the proposed FS > >> redo project. The proposal is for 10AM, this Friday morning, PST. All > >> interested parties are invited to join (shout if 10AM PST is untenable > and > >> suggest an alternative). Below is a google hangout link that comes alive > >> friday morning [1]. > >> > >> One of us will keep notes and post synopsis of discussion back here and > >> in issue after the meeting is done. > >> > >> Suggest those who join try to do some background reading -- see > >> HBASE-14439 -- so we are all around the same level of understanding when > >> the meeting starts. Agenda will be a basic intros, current state of the > >> project (with update on most recent effort), and then expectations. > Basic. > >> > >> Thanks, > >> S > >> > >> 1. https://plus.google.com/hangouts/_/calendar/c2FpbnQuYWNrQ > >> GdtYWlsLmNvbQ.1oaqlr00ru20s1hqrsq1q05j3k?authuser=0 > >> > > > > >