>> HBase is a founding partner of a Hadoop stack: HDFS, MapReduce, HBase
and should stay in Hadoop stack (with HDFS and Yarn/MapReduce). The world (of NoSQL) outside of Hadoop is scary (C* is probably the least scariest of all). I personally do not mind code refactoring and moving everything from Master to a separate client tool. We have already hbck to repair HBase, we will have backup repair tool as well - to repair failed backup/restore sessions. We will delegate all these fault-tolerance duties to a user. -Vlad On Sat, Sep 24, 2016 at 11:08 AM, Vladimir Rodionov <vladrodio...@gmail.com> wrote: > >> The key takeaway seems to be don't call out to an external framework we > don't own from master (or regionserver) code. > Should we ban HDFS as well? > > HBase is a founding partner of a Hadoop stack: HDFS, MapReduce, HBase > > -Vlad > > On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purt...@gmail.com > > wrote: > >> I was attempting to summarize Ted. >> >> A new maven module sounds like a good idea to me. Or we could move all >> the tools that use MR out to one. Or... >> >> The key takeaway seems to be don't call out to an external framework we >> don't own from master (or regionserver) code. >> >> > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> > >> > bq. Internally the tool can also use the procedure framework for state >> > durability >> > >> > Isn't this the standalone service I proposed this morning ? >> > >> > bq. Move cross HBase and MR coordination to a separate tool >> > >> > Where should this tool live (hbase-backup module) ? >> > >> > Thanks >> > >> > >> > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < >> andrew.purt...@gmail.com> >> > wrote: >> > >> >> At branch merge voting time now more eyes are getting on the design >> issues >> >> with dissenting opinion emerging. This is the branch merge process >> working >> >> as our community has designed it. Because this is the first full >> project >> >> review of the code and implementation I think we all have to be >> flexible. I >> >> see the community as trying to narrow the technical objection at issue >> to >> >> the smallest possible scope. It's simple: don't call out to an external >> >> execution framework we don't own from core master (and by extension >> >> regionserver) code. We had this objection before to a proposed external >> >> compaction implementation for >> >> MOB so should not come as a surprise. Please let me know if I have >> >> misstated this. >> >> >> >> This would seem to require a modest refactor of coordination to move >> >> invocation of MR code out from any core code path. To restate what I >> think >> >> is an emerging recommendation: Move cross HBase and MR coordination to >> a >> >> separate tool. This tool can ask the master to invoke procedures on the >> >> HBase side that do first mile export and last mile restore. >> (Internally the >> >> tool can also use the procedure framework for state durability, >> perhaps, >> >> just a thought.) Then the tool can further drive the things done with >> MR >> >> like shipping data off cluster or moving remote data in place and >> preparing >> >> it for import. These activities do not need procedure coordination and >> >> involvement of the HBase master. Only the first and last mile of the >> >> process needs atomicity within the HBase deploy. Please let me know if >> I >> >> have misstated this. >> >> >> >> >> >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> >> >>> bq. procedure gives you a retry mechanism on failure >> >>> >> >>> We do need this mechanism. Take a look at the multi-step >> >>> in FullTableBackupProcedure, etc. >> >>> >> >>> bq. let the user export it later when he wants >> >>> >> >>> This would make supporting security more complex (user A shouldn't be >> >>> exporting user B's backup). And it is not user friendly - at the time >> >>> backup request is issued, the following is specified: >> >>> >> >>> + + " BACKUP_ROOT The full root path to store the backup >> >>> image,\n" >> >>> + + " the prefix can be hdfs, webhdfs or >> gpfs\n" >> >>> >> >>> Backup root is an integral part of backup manifest. >> >>> >> >>> Cheers >> >>> >> >>> >> >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < >> >> theo.berto...@gmail.com> >> >>> wrote: >> >>> >> >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com> >> wrote: >> >>>>> >> >>>>> Ideally the export should have one job running which does the retry >> (on >> >>>>> failed partition) itself. >> >>>>> >> >>>> >> >>>> procedure gives you a retry mechanism on failure. if you don't use >> that, >> >>>> than you don't need procedure. >> >>>> if you want you can start a procedure executor in a non master >> process >> >> (the >> >>>> hbase-procedure is a separate package and does not depend on master). >> >> but >> >>>> again, export seems a case where you don't need procedure. >> >>>> >> >>>> like snapshot, the logic may just be: ask the master to take a >> backup. >> >> and >> >>>> let the user export it later when he wants. so you avoid having a MR >> job >> >>>> started by the master since people does not seems to like it. >> >>>> >> >>>> for restore (I think that is where you use the MR splitter) you can >> >>>> probably just have a backup ready (already splitted). there is >> already a >> >>>> jira that should do that HBASE-14135. instead of doing the operation >> of >> >>>> split/merge on restore. you consolidate the backup "offline" (mr job >> >>>> started by the user) and then ask to restore the backup. >> >>>> >> >>>> >> >>>>> >> >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < >> >>>> theo.berto...@gmail.com> >> >>>>> wrote: >> >>>>> >> >>>>>> as far as I understand the code, you don't need procedure for the >> >>>> export >> >>>>>> itself. >> >>>>>> the export operation is already idempotent, since you are just >> copying >> >>>>>> files. >> >>>>>> if the file exist and is complete (check length, checksum, ...) you >> >> can >> >>>>>> skip it, >> >>>>>> otherwise you'll send it over again. >> >>>>>> >> >>>>>> you need the proc for taking the backup and restoring, >> >>>>>> because you want to complete the operation and end up with a >> >> consistent >> >>>>>> state >> >>>>>> across the multiple components you are updating (meta, fs, ...) >> >>>>>> but again, for export you can just run the tool over and over until >> >> the >> >>>>>> operation succeed, and that should be ok. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Matteo >> >>>>>> >> >>>>>> >> >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com> >> wrote: >> >>>>>>> >> >>>>>>> Master is involved in this discussion because currently only >> Master >> >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for >> >>>> backup / >> >>>>>>> restore. >> >>>>>>> >> >>>>>>> What if an optional standalone service which hosts >> ProcedureExecutor >> >>>> is >> >>>>>>> used for this purpose ? >> >>>>>>> Would that have better chance of giving us middle ground so that >> we >> >>>> can >> >>>>>>> move this forward ? >> >>>>>>> >> >>>>>>> Cheers >> >>>>>>> >> >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> wrote: >> >>>>>>>> >> >>>>>>>> (Moved out of the Master doing MR DISCUSSION) >> >>>>>>>> >> >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < >> >>>>>>>> vladrodio...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>>>> -1 on that backup be in core hbase >> >>>>>>>>> >> >>>>>>>>> Not sure I understand what it means. >> >>>>>>>>> >> >>>>>>>>> Sorry for the imprecision. >> >>>>>>>> >> >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a >> dependency >> >>>>> and >> >>>>>>> so >> >>>>>>>> -1 on the Master running backup/restore MR jobs, even if >> optional. >> >>>>>>>> >> >>>>>>>> Master should not depend on MR. We've gone out of our way to >> avoid >> >>>>>> taking >> >>>>>>>> MR on as dependency in the past. Seems late in the game for us to >> >>>>>> change >> >>>>>>>> our opinion on this. If we didn't do it for distributed log >> >>>>> splitting, >> >>>>>> or >> >>>>>>>> MOB, why would we do it to support an optional backup/restore? >> >>>>>>>> >> >>>>>>>> I have opinions on the questions below -- i.e. that Master >> running >> >>>>>>>> backup/restore is outside of the Master's charge -- but they are >> >>>> not >> >>>>>>> worth >> >>>>>>>> much since I've not done much by way of review or contrib to >> >>>>>>> backup/restore >> >>>>>>>> other than to try it as a 'user' so I'll keep them to myself >> until >> >>>> I >> >>>>>> do. >> >>>>>>> I >> >>>>>>>> only came out from under my shell to participate on the MR as >> >>>>>> dependency >> >>>>>>>> chat. >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> M >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole >> >>>> process? >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> We >> >>>>>>>>> have already brought up all advantages of using >> >>>>>>>>> Master and distributed procedures for backup and restore. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Downside of moving this to client tool is lack of fault >> >>>> tolerance: >> >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can, >> >>>>>>> potentially >> >>>>>>>>> affect >> >>>>>>>>> cluster, such as disabling splits/merges, balancer. >> >>>>>>>>> 1.2 In case of client failure who will be doing the whole >> >>>> rollback >> >>>>>>>> stuff? >> >>>>>>>>> We are trying to make it atomic. >> >>>>>>>>> >> >>>>>>>>> Security is not clear. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> 2. We are not allowed to modify code of existing HBase core >> classes >> >>>>>> (what >> >>>>>>>>> does core mean anyway)? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> 3. We are not allowed to create backup system table >> >>>> (hbase:backup) >> >>>>>> in a >> >>>>>>>>> system space? Only in user space? The table is global. >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we >> >>>> have >> >>>>>>>> touched, >> >>>>>>>>> of course some existing HBase code. >> >>>>>>>>> 3. is not that critical, of course we can move backup system >> into >> >>>>>> user >> >>>>>>>>> space. >> >>>>>>>>> >> >>>>>>>>> And finally, will moving backup into external tool give us +1 >> >>>> from >> >>>>>>> stack? >> >>>>>>>>> >> >>>>>>>>> -Vlad >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net> >> >>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < >> >>>>>>>>>> vladrodio...@gmail.com> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>>>> + MR is dead >> >>>>>>>>>>> >> >>>>>>>>>>> Does MR know that? :) >> >>>>>>>>>>> >> >>>>>>>>>>> Again. With all due respect, stack - still no suggestions >> >>>> what >> >>>>>>> should >> >>>>>>>>> we >> >>>>>>>>>>> use for "bulk data move and transformation" instead of MR? >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark, >> >>>>>>>>> distributed >> >>>>>>>>>> shell -- just don't have HBase core depend on it, even >> >>>>> optionally. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my >> >>>>>>>> opinion, >> >>>>>>>>>> some >> >>>>>>>>>>> group members still not sure about that and some will give -1 >> >>>>>>>>>>> in any case. Just because ... >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase >> >>>> (+1 >> >>>>>> on >> >>>>>>>>> adding >> >>>>>>>>>> all the API any such external tool might need to run). >> >>>>>>>>>> >> >>>>>>>>>> St.Ack >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> -Vlad >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net> >> >>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < >> >>>>>>>>>>> theo.berto...@gmail.com> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> let me try to go back to my original topic. >> >>>>>>>>>>>>> this question was meant to be generic, and provide some >> >>>>> rule >> >>>>>>> for >> >>>>>>>>>> future >> >>>>>>>>>>>>> code. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone >> >>>>> can >> >>>>>>> be: >> >>>>>>>>>>>>> - we don't want any core feature (e.g. >> >>>>>>> compaction/log-split/log- >> >>>>>>>>>>> reply) >> >>>>>>>>>>>>> over MR, because some cluster may not want or may have an >> >>>>>>>>>>>>> external/uncontrolled MR setup. >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> +1 >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a >> >>>>>> flag) >> >>>>>>>> to >> >>>>>>>>>> run >> >>>>>>>>>>> MR >> >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR >> >>>> is >> >>>>>> not >> >>>>>>>>>>> required. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind >> >>>> a >> >>>>>> flag >> >>>>>>>> or >> >>>>>>>>>> not >> >>>>>>>>>>> -- >> >>>>>>>>>>>> ever being able to launch MR jobs. >> >>>>>>>>>>>> >> >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it >> >>>> from >> >>>>>>>>>> hbase-server >> >>>>>>>>>>>> moving it out to be an optional module (Spark would be its >> >>>>>> peer). >> >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy >> >>>>> are >> >>>>>>>> busy >> >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets >> >>>> not >> >>>>>>>> clutter >> >>>>>>>>>>> task >> >>>>>>>>>>>> harder by piling on more moving parts. >> >>>>>>>>>>>> >> >>>>>>>>>>>> St.Ack >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Matteo >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < >> >>>>> yuzhih...@gmail.com >> >>>>>>> >> >>>>>>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> I suggest you look at Matteo's work for >> >>>> AssignmentManager >> >>>>>>> which >> >>>>>>>>> is >> >>>>>>>>>> to >> >>>>>>>>>>>>> make >> >>>>>>>>>>>>>> Master more stable. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Cheers >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < >> >>>>> palomino...@gmail.com >> >>>>>>> >> >>>>>>>>> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the >> >>>>>>> sequence >> >>>>>>>>> of >> >>>>>>>>>>>> calls >> >>>>>>>>>>>>>> when >> >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a >> >>>> regionserver >> >>>>>> so >> >>>>>>> it >> >>>>>>>>>>> extends >> >>>>>>>>>>>>>>> HRegionServer, and the initialization of >> >>>> HRegionServer >> >>>>>>>>> sometimes >> >>>>>>>>>>>> needs >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would >> >>>> cause >> >>>>>>>>>>> probabilistic >> >>>>>>>>>>>>> dead >> >>>>>>>>>>>>>>> lock or some strange NPEs... >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to >> >>>> add >> >>>>>> new >> >>>>>>>>>> features >> >>>>>>>>>>>> or >> >>>>>>>>>>>>>> add >> >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more >> >>>>>> works >> >>>>>>>> for >> >>>>>>>>>> the >> >>>>>>>>>>>>> start >> >>>>>>>>>>>>>>> up processing... >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Thanks. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < >> >>>> yuzhih...@gmail.com >> >>>>>> : >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I read through HADOOP-13433 >> >>>>>>>>>>>>>>>> <https://issues.apache.org/ >> >>>> jira/browse/HADOOP-13433> >> >>>>> - >> >>>>>>> the >> >>>>>>>>>> cited >> >>>>>>>>>>>>> race >> >>>>>>>>>>>>>>>> condition is in jdk. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it >> >>>>> moving. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a >> >>>>>> problem... >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is >> >>>> it >> >>>>> in >> >>>>>>> the >> >>>>>>>>>>> backup >> >>>>>>>>>>>> / >> >>>>>>>>>>>>>>>> restore mega patch ? >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Cheers >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < >> >>>>>>>> palomino...@gmail.com> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> If you guys have already implemented the feature >> >>>> in >> >>>>>> the >> >>>>>>>> MR >> >>>>>>>>>> way >> >>>>>>>>>>>> and >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on >> >>>>> it >> >>>>>>> as I >> >>>>>>>>> do >> >>>>>>>>>>> not >> >>>>>>>>>>>>> want >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> block the development progress. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit >> >>>> the >> >>>>>>>> design >> >>>>>>>>>> and >> >>>>>>>>>>>> see >> >>>>>>>>>>>>> if >> >>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as >> >>>>>>> possible. >> >>>>>>>>> HA >> >>>>>>>>>> is >> >>>>>>>>>>>>> not a >> >>>>>>>>>>>>>>> big >> >>>>>>>>>>>>>>>>> problem if you do not store any metada locally. >> >>>> But >> >>>>>> the >> >>>>>>>>> ugly >> >>>>>>>>>>> code >> >>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>> HMaster is readlly a problem... >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a >> >>>> long >> >>>>>>> time. >> >>>>>>>>> Can >> >>>>>>>>>>>>> someone >> >>>>>>>>>>>>>>>> help >> >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, >> >>>>> ugly >> >>>>>>>>> code... >> >>>>>>>>>>>>> logout >> >>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is >> >>>>> still >> >>>>>>>> being >> >>>>>>>>>>> used, >> >>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the >> >>>>>>> behivor >> >>>>>>>>> and >> >>>>>>>>>>> the >> >>>>>>>>>>>>> only >> >>>>>>>>>>>>>>> way >> >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly >> >>>> code... >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> https://issues.apache.org/ >> >>>> jira/browse/HADOOP-13433 >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < >> >>>>>>>>>>>>> vladrodio...@gmail.com >> >>>>>>>>>>>>>>> : >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of >> >>>> doing >> >>>>>>> this >> >>>>>>>>>>> without >> >>>>>>>>>>>>>> using >> >>>>>>>>>>>>>>>> MR, >> >>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>> can certainly consider that >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Our framework for distributed operations is >> >>>>>> abstract >> >>>>>>>> and >> >>>>>>>>>>> allows >> >>>>>>>>>>>>>>>>>> different implementations. MR is just one >> >>>>>>>> implementation >> >>>>>>>>> we >> >>>>>>>>>>>>>> provide. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> -Vlad >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < >> >>>>>>>>>>>>> d...@hortonworks.com >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the >> >>>>>> topic >> >>>>>>>> of >> >>>>>>>>>>>> MR-based >> >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about >> >>>> the >> >>>>>>>>>>> SpliceMachine >> >>>>>>>>>>>>>>>> approach >> >>>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>>> managing compactions in Spark where >> >>>> apparently >> >>>>>> they >> >>>>>>>>> saw a >> >>>>>>>>>>> lot >> >>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>> benefits. >> >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat >> >>>>>> Andrew; I >> >>>>>>>>>> really >> >>>>>>>>>>>>> didn't >> >>>>>>>>>>>>>>>> mean >> >>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>> :-) >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: >> >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that >> >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master >> >>>>>>>>>>>>>>>>>>> 2. Shell out from the master >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), >> >>>>> and I >> >>>>>>>> don't >> >>>>>>>>>>> think >> >>>>>>>>>>>>>> it's >> >>>>>>>>>>>>>>>> even >> >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something >> >>>>>> when >> >>>>>>> MR >> >>>>>>>>> is >> >>>>>>>>>>>>> already >> >>>>>>>>>>>>>>>> there, >> >>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>> being used by HBase already for some >> >>>>> operations. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of >> >>>>> issues - >> >>>>>>> HA >> >>>>>>>> of >> >>>>>>>>>> the >> >>>>>>>>>>>>>> server >> >>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>> being the least of them all. Security >> >>>> (kerberos >> >>>>>>>>>>>> authentication, >> >>>>>>>>>>>>>>>> another >> >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that >> >>>>>>> approach >> >>>>>>>>> is >> >>>>>>>>>>> DOA. >> >>>>>>>>>>>>>>> Instead >> >>>>>>>>>>>>>>>>>> let's >> >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I >> >>>>>>> haven't >> >>>>>>>>> seen >> >>>>>>>>>>> any >> >>>>>>>>>>>>>> good >> >>>>>>>>>>>>>>>>> reason >> >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs >> >>>>> if >> >>>>>>>>> needed. >> >>>>>>>>>>> It's >> >>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>> ideal; >> >>>>>>>>>>>>>>>>>>> agreed. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are >> >>>> the >> >>>>>>>>> benefits >> >>>>>>>>>> of >> >>>>>>>>>>>>>> running >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think >> >>>>> Ted >> >>>>>>> has >> >>>>>>>>>>>> summarized >> >>>>>>>>>>>>>>> some >> >>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>> issues that we need to take care of - >> >>>>> basically, >> >>>>>>> the >> >>>>>>>>>> master >> >>>>>>>>>>>> can >> >>>>>>>>>>>>>>> keep >> >>>>>>>>>>>>>>>>>> track >> >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the >> >>>> backup >> >>>>>>>> master >> >>>>>>>>>> can >> >>>>>>>>>>>>>> continue >> >>>>>>>>>>>>>>>>>> keeping >> >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been >> >>>>>>> recorded >> >>>>>>>>> in >> >>>>>>>>>>> the >> >>>>>>>>>>>>> proc >> >>>>>>>>>>>>>>>> WAL). >> >>>>>>>>>>>>>>>>>> The >> >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed >> >>>>>>>>> backup/restore >> >>>>>>>>>>>>>>> processes. >> >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to >> >>>>> run >> >>>>>> as >> >>>>>>>>>> 'hbase' >> >>>>>>>>>>>>> since >> >>>>>>>>>>>>>>> it >> >>>>>>>>>>>>>>>>> owns >> >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job >> >>>>> makes >> >>>>>> it >> >>>>>>>> get >> >>>>>>>>>>> that >> >>>>>>>>>>>>>>>> privilege. >> >>>>>>>>>>>>>>>>>> In >> >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the >> >>>>>> above >> >>>>>>>>>>>> management. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is >> >>>>>> ready >> >>>>>>>>> from >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>> overall >> >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review >> >>>> is >> >>>>>>> still >> >>>>>>>>>>> pending >> >>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>>> Matteo). >> >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of >> >>>> doing >> >>>>>> this >> >>>>>>>>>> without >> >>>>>>>>>>>>> using >> >>>>>>>>>>>>>>> MR, >> >>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't >> >>>>> think >> >>>>>> we >> >>>>>>>>>> should >> >>>>>>>>>>>>> block >> >>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>> patch >> >>>>>>>>>>>>>>>>>>> from getting merged. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> ________________________________________ >> >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> >> >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM >> >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org >> >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by >> >>>>>> Master >> >>>>>>>> or >> >>>>>>>>> RS >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> So what about a standalone service other than >> >>>>>>> master? >> >>>>>>>>> You >> >>>>>>>>>>> can >> >>>>>>>>>>>>> use >> >>>>>>>>>>>>>>>> your >> >>>>>>>>>>>>>>>>>> own >> >>>>>>>>>>>>>>>>>>> procedure store in that service? >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < >> >>>>>>>> yuzhih...@gmail.com >> >>>>>>>>>> : >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> An earlier implementation was client >> >>>> driven. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to >> >>>> resume >> >>>>> if >> >>>>>>>> there >> >>>>>>>>>> is >> >>>>>>>>>>>>> error >> >>>>>>>>>>>>>>>>> midway. >> >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / >> >>>> restore >> >>>>>>> more >> >>>>>>>>>>> robust. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It >> >>>> is >> >>>>>> hard >> >>>>>>>> to >> >>>>>>>>>>>> enforce >> >>>>>>>>>>>>>>>> security >> >>>>>>>>>>>>>>>>>> (to >> >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Cheers >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew >> >>>>> Purtell < >> >>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> >> >>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, >> >>>> which >> >>>>>> is >> >>>>>>>>>>> "shelling >> >>>>>>>>>>>>> out" >> >>>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why >> >>>> not >> >>>>>>> drive >> >>>>>>>>>> this >> >>>>>>>>>>>>> with a >> >>>>>>>>>>>>>>>>> utility >> >>>>>>>>>>>>>>>>>>>> derived from Tool? >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir >> >>>>>> Rodionov >> >>>>>>> < >> >>>>>>>>>>>>>>>>>> vladrodio...@gmail.com >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a >> >>>>> common >> >>>>>>>> case >> >>>>>>>>> we >> >>>>>>>>>>>> just >> >>>>>>>>>>>>>> have >> >>>>>>>>>>>>>>>>> HDFS >> >>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. >> >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR >> >>>> framework >> >>>>>>>>>> (especially >> >>>>>>>>>>>> some >> >>>>>>>>>>>>>>>>> features >> >>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced >> >>>>>>> another >> >>>>>>>>> cost >> >>>>>>>>>>> for >> >>>>>>>>>>>>>>>> maintain. >> >>>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this >> >>>>> case. >> >>>>>>> Many >> >>>>>>>>> our >> >>>>>>>>>>>>>> customers >> >>>>>>>>>>>>>>>>> have >> >>>>>>>>>>>>>>>>>>> full >> >>>>>>>>>>>>>>>>>>>>>> stack deployed and >> >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard >> >>>> feature. >> >>>>>>>> Besides >> >>>>>>>>>>> this, >> >>>>>>>>>>>>>>> nothing >> >>>>>>>>>>>>>>>>> will >> >>>>>>>>>>>>>>>>>>>> happen >> >>>>>>>>>>>>>>>>>>>>>> in your cluster >> >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R >> >>>>>>>>> dependency) >> >>>>>>>>>>> goes >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> nowhere. >> >>>>>>>>>>>>>>>>>>> We >> >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to >> >>>> suggest >> >>>>>>>> another >> >>>>>>>>>>>>> framework >> >>>>>>>>>>>>>>>> (other >> >>>>>>>>>>>>>>>>>>> than >> >>>>>>>>>>>>>>>>>>>> M/R) >> >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. >> >>>>> Still >> >>>>>>>>> waiting >> >>>>>>>>>>> for >> >>>>>>>>>>>>>>>>> suggestions. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> -Vlad >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted >> >>>> Yu < >> >>>>>>>>>>>>>> yuzhih...@gmail.com >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the >> >>>>>>> cluster, >> >>>>>>>>>> hbase >> >>>>>>>>>>>>> still >> >>>>>>>>>>>>>>>>>> functions >> >>>>>>>>>>>>>>>>>>>>>>> normally (post merge). >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we >> >>>>> have >> >>>>>>> long >> >>>>>>>>>> been >> >>>>>>>>>>>>>>> depending >> >>>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at >> >>>> ExportSnapshot. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Cheers >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng >> >>>>> Chen >> >>>>>> < >> >>>>>>>>>>>>>>>>>> heng.chen.1...@gmail.com >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a >> >>>>> common >> >>>>>>>> case >> >>>>>>>>> we >> >>>>>>>>>>>> just >> >>>>>>>>>>>>>> have >> >>>>>>>>>>>>>>>>> HDFS >> >>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. >> >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR >> >>>> framework >> >>>>>>>>>> (especially >> >>>>>>>>>>>> some >> >>>>>>>>>>>>>>>>> features >> >>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced >> >>>>>>> another >> >>>>>>>>> cost >> >>>>>>>>>>> for >> >>>>>>>>>>>>>>>> maintain. >> >>>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < >> >>>>>>>>>>> palomino...@gmail.com >> >>>>>>>>>>>>> : >> >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice >> >>>>>>>>>>> Backup/Restore >> >>>>>>>>>>>>>>> feature, >> >>>>>>>>>>>>>>>>> if >> >>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>> think >> >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, >> >>>>> then >> >>>>>>> we >> >>>>>>>>>> could >> >>>>>>>>>>>> make >> >>>>>>>>>>>>>> it >> >>>>>>>>>>>>>>>>> depend >> >>>>>>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>>>>> MR, >> >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager >> >>>>>>> instance >> >>>>>>>>>> that >> >>>>>>>>>>>>>> submits >> >>>>>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>> jobs >> >>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we >> >>>>>> think >> >>>>>>>>> this >> >>>>>>>>>>> is a >> >>>>>>>>>>>>>> core >> >>>>>>>>>>>>>>>>>> feature >> >>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd >> >>>>> better >> >>>>>>>>>> implement >> >>>>>>>>>>> it >> >>>>>>>>>>>>>>> without >> >>>>>>>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < >> >>>>>>>>>>> palomino...@gmail.com >> >>>>>>>>>>>>> : >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR >> >>>>>> jobs. >> >>>>>>>> It >> >>>>>>>>> is >> >>>>>>>>>>> OK >> >>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>> some >> >>>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>>> our >> >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think >> >>>> the >> >>>>>>> bottom >> >>>>>>>>>> line >> >>>>>>>>>>> is >> >>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>>>>>>>>> launch >> >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by >> >>>>>> other >> >>>>>>>>>>> services. >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew >> >>>>>> Purtell < >> >>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com >> >>>>>>>>>>>>>>>>>>>>> : >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is >> >>>> on >> >>>>>> the >> >>>>>>>>> line >> >>>>>>>>>> I >> >>>>>>>>>>>>> think, >> >>>>>>>>>>>>>>> so >> >>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>> fair >> >>>>>>>>>>>>>>>>>>>>>>>>>>> question. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility >> >>>>> derived >> >>>>>>>> from >> >>>>>>>>>> Tool >> >>>>>>>>>>>>> like >> >>>>>>>>>>>>>>> our >> >>>>>>>>>>>>>>>>>> other >> >>>>>>>>>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>>>>>>>> apps? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the >> >>>>>> AccessController >> >>>>>>>> to >> >>>>>>>>>>> decide >> >>>>>>>>>>>>> if >> >>>>>>>>>>>>>>>>> allowed? >> >>>>>>>>>>>>>>>>>>> But >> >>>>>>>>>>>>>>>>>>>>>>>> nothing >> >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the >> >>>>> job >> >>>>>>>>>>>>>>>>> manually/independently, >> >>>>>>>>>>>>>>>>>>>> right? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, >> >>>> Matteo >> >>>>>>>>> Bertozzi < >> >>>>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not >> >>>>> about >> >>>>>>>> tools >> >>>>>>>>>>> using >> >>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>> (everyone i >> >>>>>>>>>>>>>>>>>>>>>>>> think >> >>>>>>>>>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok >> >>>> with >> >>>>>>>> running >> >>>>>>>>>> MR >> >>>>>>>>>>>> jobs >> >>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>>> Master >> >>>>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>> RSs >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the >> >>>> first >> >>>>>> time >> >>>>>>>> we >> >>>>>>>>> do >> >>>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, >> >>>>>>> Devaraj >> >>>>>>>>> Das >> >>>>>>>>>> < >> >>>>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like >> >>>>>>>>>> ExportSnapshot >> >>>>>>>>>>> / >> >>>>>>>>>>>>>>> Backup / >> >>>>>>>>>>>>>>>>>>>> Restore, >> >>>>>>>>>>>>>>>>>>>>>>>> it's >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is >> >>>>> the >> >>>>>>>> right >> >>>>>>>>>>>>> framework >> >>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>> such. >> >>>>>>>>>>>>>>>>>>>> We >> >>>>>>>>>>>>>>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR >> >>>> (just >> >>>>>>> saying >> >>>>>>>>> :) >> >>>>>>>>>> ) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ >> >>>>>>> __________ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < >> >>>> yuzhih...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, >> >>>> 2016 >> >>>>>> 2:00 >> >>>>>>>> PM >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs >> >>>>>>> started >> >>>>>>>>> by >> >>>>>>>>>>>> Master >> >>>>>>>>>>>>>> or >> >>>>>>>>>>>>>>> RS >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in >> >>>>> the >> >>>>>>> same >> >>>>>>>>>>>> category >> >>>>>>>>>>>>> as >> >>>>>>>>>>>>>>>>> import >> >>>>>>>>>>>>>>>>>> / >> >>>>>>>>>>>>>>>>>>>>>>>> export. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, >> >>>>>> Andrew >> >>>>>>>>>>> Purtell < >> >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around >> >>>>> core >> >>>>>> in >> >>>>>>>> my >> >>>>>>>>>>>> opinion. >> >>>>>>>>>>>>>>> Like >> >>>>>>>>>>>>>>>>>> import >> >>>>>>>>>>>>>>>>>>>> or >> >>>>>>>>>>>>>>>>>>>>>>>>>>> export. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's >> >>>>> fine. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, >> >>>>> Matteo >> >>>>>>>>>> Bertozzi >> >>>>>>>>>>> < >> >>>>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion >> >>>> around >> >>>>>>>> running >> >>>>>>>>> MR >> >>>>>>>>>>>> jobs >> >>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>> hbase >> >>>>>>>>>>>>>>>>>>>>>>>> (Master >> >>>>>>>>>>>>>>>>>>>>>>>>>>> or >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that >> >>>> there >> >>>>>> was >> >>>>>>>>>>>> discussion >> >>>>>>>>>>>>>>> about >> >>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>> having >> >>>>>>>>>>>>>>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion >> >>>> where >> >>>>>>> around >> >>>>>>>>> MOB >> >>>>>>>>>>>> that >> >>>>>>>>>>>>>> had >> >>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>> job >> >>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a >> >>>>>>> non-MR >> >>>>>>>>> job >> >>>>>>>>>> to >> >>>>>>>>>>>> be >> >>>>>>>>>>>>>>>> merged, >> >>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>> think >> >>>>>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log >> >>>>>>>> split/replay. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup >> >>>>>> feature >> >>>>>>>>>>>>> (HBASE-7912), >> >>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>> runs >> >>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>>>>>>>> job >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or >> >>>>> restore >> >>>>>>>> data. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really >> >>>> core" >> >>>>>> as >> >>>>>>>> in.. >> >>>>>>>>>> if >> >>>>>>>>>>>> you >> >>>>>>>>>>>>>>> don't >> >>>>>>>>>>>>>>>>> use >> >>>>>>>>>>>>>>>>>>>>>>> backup >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but >> >>>>>> this >> >>>>>>>> was >> >>>>>>>>>>>> probably >> >>>>>>>>>>>>>>> true >> >>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>> MOB >> >>>>>>>>>>>>>>>>>>>>>>> as >> >>>>>>>>>>>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't >> >>>>> need >> >>>>>>>> MR") >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that >> >>>>>> says >> >>>>>>>> "we >> >>>>>>>>>>> don't >> >>>>>>>>>>>>> want >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> have >> >>>>>>>>>>>>>>>>>>>>>>> hbase >> >>>>>>>>>>>>>>>>>>>>>>>> run >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started >> >>>> manually >> >>>>> by >> >>>>>>> the >> >>>>>>>>>> user >> >>>>>>>>>>>> can >> >>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>> that". >> >>>>>>>>>>>>>>>>>> or >> >>>>>>>>>>>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without >> >>>>>>>> problems? >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >> >> > >