Ok, we had internal discussion and this is what we are suggesting now: 1. We will create separate module (hbase-backup) and move server-side code there. 2. Master and RS will be MR and backup free. 3. The code from Master will be moved into standalone service (BackupService) for procedure orchestration, operation resume/abort and SECURITY. It means - one additional (process) similar to REST/Thrift server will be required to operate backup.
I would like to note that separate process running under hbase super user is required to implement security properly in a multi-tenant environment, otherwise, only hbase super user will be allowed to operate backups Please let us know, what do you think, HBase people :? -Vlad On Sat, Sep 24, 2016 at 2:49 PM, Stack <st...@duboce.net> wrote: > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <andrew.purt...@gmail.com> > wrote: > > > At branch merge voting time now more eyes are getting on the design > issues > > with dissenting opinion emerging. This is the branch merge process > working > > as our community has designed it. Because this is the first full project > > review of the code and implementation I think we all have to be > flexible. I > > see the community as trying to narrow the technical objection at issue to > > the smallest possible scope. It's simple: don't call out to an external > > execution framework we don't own from core master (and by extension > > regionserver) code. We had this objection before to a proposed external > > compaction implementation for > > MOB so should not come as a surprise. Please let me know if I have > > misstated this. > > > > > The above is my understanding also. > > > > This would seem to require a modest refactor of coordination to move > > invocation of MR code out from any core code path. To restate what I > think > > is an emerging recommendation: Move cross HBase and MR coordination to a > > separate tool. This tool can ask the master to invoke procedures on the > > HBase side that do first mile export and last mile restore. (Internally > the > > tool can also use the procedure framework for state durability, perhaps, > > just a thought.) Then the tool can further drive the things done with MR > > like shipping data off cluster or moving remote data in place and > preparing > > it for import. These activities do not need procedure coordination and > > involvement of the HBase master. Only the first and last mile of the > > process needs atomicity within the HBase deploy. Please let me know if I > > have misstated this. > > > > > > Above is my understanding of our recommendation. > > St.Ack > > > > > > On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > bq. procedure gives you a retry mechanism on failure > > > > > > We do need this mechanism. Take a look at the multi-step > > > in FullTableBackupProcedure, etc. > > > > > > bq. let the user export it later when he wants > > > > > > This would make supporting security more complex (user A shouldn't be > > > exporting user B's backup). And it is not user friendly - at the time > > > backup request is issued, the following is specified: > > > > > > + + " BACKUP_ROOT The full root path to store the backup > > > image,\n" > > > + + " the prefix can be hdfs, webhdfs or > gpfs\n" > > > > > > Backup root is an integral part of backup manifest. > > > > > > Cheers > > > > > > > > > On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < > > theo.berto...@gmail.com> > > > wrote: > > > > > >>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > >>> > > >>> Ideally the export should have one job running which does the retry > (on > > >>> failed partition) itself. > > >>> > > >> > > >> procedure gives you a retry mechanism on failure. if you don't use > that, > > >> than you don't need procedure. > > >> if you want you can start a procedure executor in a non master process > > (the > > >> hbase-procedure is a separate package and does not depend on master). > > but > > >> again, export seems a case where you don't need procedure. > > >> > > >> like snapshot, the logic may just be: ask the master to take a backup. > > and > > >> let the user export it later when he wants. so you avoid having a MR > job > > >> started by the master since people does not seems to like it. > > >> > > >> for restore (I think that is where you use the MR splitter) you can > > >> probably just have a backup ready (already splitted). there is > already a > > >> jira that should do that HBASE-14135. instead of doing the operation > of > > >> split/merge on restore. you consolidate the backup "offline" (mr job > > >> started by the user) and then ask to restore the backup. > > >> > > >> > > >>> > > >>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < > > >> theo.berto...@gmail.com> > > >>> wrote: > > >>> > > >>>> as far as I understand the code, you don't need procedure for the > > >> export > > >>>> itself. > > >>>> the export operation is already idempotent, since you are just > copying > > >>>> files. > > >>>> if the file exist and is complete (check length, checksum, ...) you > > can > > >>>> skip it, > > >>>> otherwise you'll send it over again. > > >>>> > > >>>> you need the proc for taking the backup and restoring, > > >>>> because you want to complete the operation and end up with a > > consistent > > >>>> state > > >>>> across the multiple components you are updating (meta, fs, ...) > > >>>> but again, for export you can just run the tool over and over until > > the > > >>>> operation succeed, and that should be ok. > > >>>> > > >>>> > > >>>> > > >>>> Matteo > > >>>> > > >>>> > > >>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com> > wrote: > > >>>>> > > >>>>> Master is involved in this discussion because currently only Master > > >>>>> instantiates ProcedureExecutor which runs the 3 Procedures for > > >> backup / > > >>>>> restore. > > >>>>> > > >>>>> What if an optional standalone service which hosts > ProcedureExecutor > > >> is > > >>>>> used for this purpose ? > > >>>>> Would that have better chance of giving us middle ground so that we > > >> can > > >>>>> move this forward ? > > >>>>> > > >>>>> Cheers > > >>>>> > > >>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> wrote: > > >>>>>> > > >>>>>> (Moved out of the Master doing MR DISCUSSION) > > >>>>>> > > >>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < > > >>>>>> vladrodio...@gmail.com> > > >>>>>> wrote: > > >>>>>> > > >>>>>>>>> -1 on that backup be in core hbase > > >>>>>>> > > >>>>>>> Not sure I understand what it means. > > >>>>>>> > > >>>>>>> Sorry for the imprecision. > > >>>>>> > > >>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a > dependency > > >>> and > > >>>>> so > > >>>>>> -1 on the Master running backup/restore MR jobs, even if optional. > > >>>>>> > > >>>>>> Master should not depend on MR. We've gone out of our way to avoid > > >>>> taking > > >>>>>> MR on as dependency in the past. Seems late in the game for us to > > >>>> change > > >>>>>> our opinion on this. If we didn't do it for distributed log > > >>> splitting, > > >>>> or > > >>>>>> MOB, why would we do it to support an optional backup/restore? > > >>>>>> > > >>>>>> I have opinions on the questions below -- i.e. that Master running > > >>>>>> backup/restore is outside of the Master's charge -- but they are > > >> not > > >>>>> worth > > >>>>>> much since I've not done much by way of review or contrib to > > >>>>> backup/restore > > >>>>>> other than to try it as a 'user' so I'll keep them to myself until > > >> I > > >>>> do. > > >>>>> I > > >>>>>> only came out from under my shell to participate on the MR as > > >>>> dependency > > >>>>>> chat. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> M > > >>>>>> > > >>>>>> > > >>>>>> 1. We are not allowed to use Master to orchestrate the whole > > >> process? > > >>>>>> > > >>>>>> > > >>>>>> We > > >>>>>>> have already brought up all advantages of using > > >>>>>>> Master and distributed procedures for backup and restore. > > >>>>>>> > > >>>>>>> > > >>>>>>> Downside of moving this to client tool is lack of fault > > >> tolerance: > > >>>>>>> 1.1 Client won't be allowed to do any operations, that can, > > >>>>> potentially > > >>>>>>> affect > > >>>>>>> cluster, such as disabling splits/merges, balancer. > > >>>>>>> 1.2 In case of client failure who will be doing the whole > > >> rollback > > >>>>>> stuff? > > >>>>>>> We are trying to make it atomic. > > >>>>>>> > > >>>>>>> Security is not clear. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> 2. We are not allowed to modify code of existing HBase core > classes > > >>>> (what > > >>>>>>> does core mean anyway)? > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>>> 3. We are not allowed to create backup system table > > >> (hbase:backup) > > >>>> in a > > >>>>>>> system space? Only in user space? The table is global. > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we > > >> have > > >>>>>> touched, > > >>>>>>> of course some existing HBase code. > > >>>>>>> 3. is not that critical, of course we can move backup system into > > >>>> user > > >>>>>>> space. > > >>>>>>> > > >>>>>>> And finally, will moving backup into external tool give us +1 > > >> from > > >>>>> stack? > > >>>>>>> > > >>>>>>> -Vlad > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net> > > >> wrote: > > >>>>>>> > > >>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < > > >>>>>>>> vladrodio...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>>>> + MR is dead > > >>>>>>>>> > > >>>>>>>>> Does MR know that? :) > > >>>>>>>>> > > >>>>>>>>> Again. With all due respect, stack - still no suggestions > > >> what > > >>>>> should > > >>>>>>> we > > >>>>>>>>> use for "bulk data move and transformation" instead of MR? > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark, > > >>>>>>> distributed > > >>>>>>>> shell -- just don't have HBase core depend on it, even > > >>> optionally. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my > > >>>>>> opinion, > > >>>>>>>> some > > >>>>>>>>> group members still not sure about that and some will give -1 > > >>>>>>>>> in any case. Just because ... > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase > > >> (+1 > > >>>> on > > >>>>>>> adding > > >>>>>>>> all the API any such external tool might need to run). > > >>>>>>>> > > >>>>>>>> St.Ack > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> -Vlad > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net> > > >>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < > > >>>>>>>>> theo.berto...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> let me try to go back to my original topic. > > >>>>>>>>>>> this question was meant to be generic, and provide some > > >>> rule > > >>>>> for > > >>>>>>>> future > > >>>>>>>>>>> code. > > >>>>>>>>>>> > > >>>>>>>>>>> from what I can gather, a rule that may satisfy everyone > > >>> can > > >>>>> be: > > >>>>>>>>>>> - we don't want any core feature (e.g. > > >>>>> compaction/log-split/log- > > >>>>>>>>> reply) > > >>>>>>>>>>> over MR, because some cluster may not want or may have an > > >>>>>>>>>>> external/uncontrolled MR setup. > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> +1 > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> - we allow non-core features (e.g. features enabled by a > > >>>> flag) > > >>>>>> to > > >>>>>>>> run > > >>>>>>>>> MR > > >>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > > >> is > > >>>> not > > >>>>>>>>> required. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind > > >> a > > >>>> flag > > >>>>>> or > > >>>>>>>> not > > >>>>>>>>> -- > > >>>>>>>>>> ever being able to launch MR jobs. > > >>>>>>>>>> > > >>>>>>>>>> + MR is dead. We should be busy working hard to undo it > > >> from > > >>>>>>>> hbase-server > > >>>>>>>>>> moving it out to be an optional module (Spark would be its > > >>>> peer). > > >>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy > > >>> are > > >>>>>> busy > > >>>>>>>>>> working hard on moving it up on to a new foundation. Lets > > >> not > > >>>>>> clutter > > >>>>>>>>> task > > >>>>>>>>>> harder by piling on more moving parts. > > >>>>>>>>>> > > >>>>>>>>>> St.Ack > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> Matteo > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > >>> yuzhih...@gmail.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> I suggest you look at Matteo's work for > > >> AssignmentManager > > >>>>> which > > >>>>>>> is > > >>>>>>>> to > > >>>>>>>>>>> make > > >>>>>>>>>>>> Master more stable. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Cheers > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > >>> palomino...@gmail.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > > >>>>> sequence > > >>>>>>> of > > >>>>>>>>>> calls > > >>>>>>>>>>>> when > > >>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > >> regionserver > > >>>> so > > >>>>> it > > >>>>>>>>> extends > > >>>>>>>>>>>>> HRegionServer, and the initialization of > > >> HRegionServer > > >>>>>>> sometimes > > >>>>>>>>>> needs > > >>>>>>>>>>> to > > >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > > >> cause > > >>>>>>>>> probabilistic > > >>>>>>>>>>> dead > > >>>>>>>>>>>>> lock or some strange NPEs... > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > > >> add > > >>>> new > > >>>>>>>> features > > >>>>>>>>>> or > > >>>>>>>>>>>> add > > >>>>>>>>>>>>> external dependencies to HMaster, especially add more > > >>>> works > > >>>>>> for > > >>>>>>>> the > > >>>>>>>>>>> start > > >>>>>>>>>>>>> up processing... > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thanks. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > > >> yuzhih...@gmail.com > > >>>> : > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> I read through HADOOP-13433 > > >>>>>>>>>>>>>> <https://issues.apache.org/ > > >> jira/browse/HADOOP-13433> > > >>> - > > >>>>> the > > >>>>>>>> cited > > >>>>>>>>>>> race > > >>>>>>>>>>>>>> condition is in jdk. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > > >>> moving. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > > >>>> problem... > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > > >> it > > >>> in > > >>>>> the > > >>>>>>>>> backup > > >>>>>>>>>> / > > >>>>>>>>>>>>>> restore mega patch ? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > > >>>>>> palomino...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> If you guys have already implemented the feature > > >> in > > >>>> the > > >>>>>> MR > > >>>>>>>> way > > >>>>>>>>>> and > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > > >>> it > > >>>>> as I > > >>>>>>> do > > >>>>>>>>> not > > >>>>>>>>>>> want > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>> block the development progress. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > > >> the > > >>>>>> design > > >>>>>>>> and > > >>>>>>>>>> see > > >>>>>>>>>>> if > > >>>>>>>>>>>>> we > > >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > > >>>>> possible. > > >>>>>>> HA > > >>>>>>>> is > > >>>>>>>>>>> not a > > >>>>>>>>>>>>> big > > >>>>>>>>>>>>>>> problem if you do not store any metada locally. > > >> But > > >>>> the > > >>>>>>> ugly > > >>>>>>>>> code > > >>>>>>>>>>> in > > >>>>>>>>>>>>>>> HMaster is readlly a problem... > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> And for security, I have a issue pending for a > > >> long > > >>>>> time. > > >>>>>>> Can > > >>>>>>>>>>> someone > > >>>>>>>>>>>>>> help > > >>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, > > >>> ugly > > >>>>>>> code... > > >>>>>>>>>>> logout > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>> destroy the credentials in a subject when it is > > >>> still > > >>>>>> being > > >>>>>>>>> used, > > >>>>>>>>>>> and > > >>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the > > >>>>> behivor > > >>>>>>> and > > >>>>>>>>> the > > >>>>>>>>>>> only > > >>>>>>>>>>>>> way > > >>>>>>>>>>>>>>> to fix it is to write another piece of ugly > > >> code... > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> https://issues.apache.org/ > > >> jira/browse/HADOOP-13433 > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > >>>>>>>>>>> vladrodio...@gmail.com > > >>>>>>>>>>>>> : > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > >> doing > > >>>>> this > > >>>>>>>>> without > > >>>>>>>>>>>> using > > >>>>>>>>>>>>>> MR, > > >>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>> can certainly consider that > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Our framework for distributed operations is > > >>>> abstract > > >>>>>> and > > >>>>>>>>> allows > > >>>>>>>>>>>>>>>> different implementations. MR is just one > > >>>>>> implementation > > >>>>>>> we > > >>>>>>>>>>>> provide. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> -Vlad > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > >>>>>>>>>>> d...@hortonworks.com > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the > > >>>> topic > > >>>>>> of > > >>>>>>>>>> MR-based > > >>>>>>>>>>>>>>>>> compactions.. But I was thinking more about > > >> the > > >>>>>>>>> SpliceMachine > > >>>>>>>>>>>>>> approach > > >>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>> managing compactions in Spark where > > >> apparently > > >>>> they > > >>>>>>> saw a > > >>>>>>>>> lot > > >>>>>>>>>>> of > > >>>>>>>>>>>>>>>> benefits. > > >>>>>>>>>>>>>>>>> Apologies for giving you that sore throat > > >>>> Andrew; I > > >>>>>>>> really > > >>>>>>>>>>> didn't > > >>>>>>>>>>>>>> mean > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> :-) > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: > > >>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that > > >>>>>>>>>>>>>>>>> 1. Run a standalone service other than master > > >>>>>>>>>>>>>>>>> 2. Shell out from the master > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), > > >>> and I > > >>>>>> don't > > >>>>>>>>> think > > >>>>>>>>>>>> it's > > >>>>>>>>>>>>>> even > > >>>>>>>>>>>>>>>>> worth the effort of trying to build something > > >>>> when > > >>>>> MR > > >>>>>>> is > > >>>>>>>>>>> already > > >>>>>>>>>>>>>> there, > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>> being used by HBase already for some > > >>> operations. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of > > >>> issues - > > >>>>> HA > > >>>>>> of > > >>>>>>>> the > > >>>>>>>>>>>> server > > >>>>>>>>>>>>>> not > > >>>>>>>>>>>>>>>>> being the least of them all. Security > > >> (kerberos > > >>>>>>>>>> authentication, > > >>>>>>>>>>>>>> another > > >>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that > > >>>>> approach > > >>>>>>> is > > >>>>>>>>> DOA. > > >>>>>>>>>>>>> Instead > > >>>>>>>>>>>>>>>> let's > > >>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I > > >>>>> haven't > > >>>>>>> seen > > >>>>>>>>> any > > >>>>>>>>>>>> good > > >>>>>>>>>>>>>>> reason > > >>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs > > >>> if > > >>>>>>> needed. > > >>>>>>>>> It's > > >>>>>>>>>>> not > > >>>>>>>>>>>>>>> ideal; > > >>>>>>>>>>>>>>>>> agreed. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Now before going to (2), let's see what are > > >> the > > >>>>>>> benefits > > >>>>>>>> of > > >>>>>>>>>>>> running > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think > > >>> Ted > > >>>>> has > > >>>>>>>>>> summarized > > >>>>>>>>>>>>> some > > >>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> issues that we need to take care of - > > >>> basically, > > >>>>> the > > >>>>>>>> master > > >>>>>>>>>> can > > >>>>>>>>>>>>> keep > > >>>>>>>>>>>>>>>> track > > >>>>>>>>>>>>>>>>> of running jobs, and should it fail, the > > >> backup > > >>>>>> master > > >>>>>>>> can > > >>>>>>>>>>>> continue > > >>>>>>>>>>>>>>>> keeping > > >>>>>>>>>>>>>>>>> track of it (since the jobId would have been > > >>>>> recorded > > >>>>>>> in > > >>>>>>>>> the > > >>>>>>>>>>> proc > > >>>>>>>>>>>>>> WAL). > > >>>>>>>>>>>>>>>> The > > >>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed > > >>>>>>> backup/restore > > >>>>>>>>>>>>> processes. > > >>>>>>>>>>>>>>>>> Security is another issue - the job needs to > > >>> run > > >>>> as > > >>>>>>>> 'hbase' > > >>>>>>>>>>> since > > >>>>>>>>>>>>> it > > >>>>>>>>>>>>>>> owns > > >>>>>>>>>>>>>>>>> the data. Having the master launch the job > > >>> makes > > >>>> it > > >>>>>> get > > >>>>>>>>> that > > >>>>>>>>>>>>>> privilege. > > >>>>>>>>>>>>>>>> In > > >>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the > > >>>> above > > >>>>>>>>>> management. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is > > >>>> ready > > >>>>>>> from > > >>>>>>>>> the > > >>>>>>>>>>>>> overall > > >>>>>>>>>>>>>>>>> design/arch point of view (maybe code review > > >> is > > >>>>> still > > >>>>>>>>> pending > > >>>>>>>>>>>> from > > >>>>>>>>>>>>>>>> Matteo). > > >>>>>>>>>>>>>>>>> If in the future, we find better ways of > > >> doing > > >>>> this > > >>>>>>>> without > > >>>>>>>>>>> using > > >>>>>>>>>>>>> MR, > > >>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't > > >>> think > > >>>> we > > >>>>>>>> should > > >>>>>>>>>>> block > > >>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>> patch > > >>>>>>>>>>>>>>>>> from getting merged. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> ________________________________________ > > >>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> > > >>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM > > >>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > >>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by > > >>>> Master > > >>>>>> or > > >>>>>>> RS > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> So what about a standalone service other than > > >>>>> master? > > >>>>>>> You > > >>>>>>>>> can > > >>>>>>>>>>> use > > >>>>>>>>>>>>>> your > > >>>>>>>>>>>>>>>> own > > >>>>>>>>>>>>>>>>> procedure store in that service? > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < > > >>>>>> yuzhih...@gmail.com > > >>>>>>>> : > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> An earlier implementation was client > > >> driven. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> But with that approach, it is hard to > > >> resume > > >>> if > > >>>>>> there > > >>>>>>>> is > > >>>>>>>>>>> error > > >>>>>>>>>>>>>>> midway. > > >>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / > > >> restore > > >>>>> more > > >>>>>>>>> robust. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Another consideration is for security. It > > >> is > > >>>> hard > > >>>>>> to > > >>>>>>>>>> enforce > > >>>>>>>>>>>>>> security > > >>>>>>>>>>>>>>>> (to > > >>>>>>>>>>>>>>>>>> be implemented) for client driven actions. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew > > >>> Purtell < > > >>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, > > >> which > > >>>> is > > >>>>>>>>> "shelling > > >>>>>>>>>>> out" > > >>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why > > >> not > > >>>>> drive > > >>>>>>>> this > > >>>>>>>>>>> with a > > >>>>>>>>>>>>>>> utility > > >>>>>>>>>>>>>>>>>> derived from Tool? > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir > > >>>> Rodionov > > >>>>> < > > >>>>>>>>>>>>>>>> vladrodio...@gmail.com > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > >>> common > > >>>>>> case > > >>>>>>> we > > >>>>>>>>>> just > > >>>>>>>>>>>> have > > >>>>>>>>>>>>>>> HDFS > > >>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > >> framework > > >>>>>>>> (especially > > >>>>>>>>>> some > > >>>>>>>>>>>>>>> features > > >>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > >>>>> another > > >>>>>>> cost > > >>>>>>>>> for > > >>>>>>>>>>>>>> maintain. > > >>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> So , you are not backup users in this > > >>> case. > > >>>>> Many > > >>>>>>> our > > >>>>>>>>>>>> customers > > >>>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>> full > > >>>>>>>>>>>>>>>>>>>> stack deployed and > > >>>>>>>>>>>>>>>>>>>> want see backup to be a standard > > >> feature. > > >>>>>> Besides > > >>>>>>>>> this, > > >>>>>>>>>>>>> nothing > > >>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>> happen > > >>>>>>>>>>>>>>>>>>>> in your cluster > > >>>>>>>>>>>>>>>>>>>> if you won't be doing backups. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R > > >>>>>>> dependency) > > >>>>>>>>> goes > > >>>>>>>>>>> to > > >>>>>>>>>>>>>>> nowhere. > > >>>>>>>>>>>>>>>>> We > > >>>>>>>>>>>>>>>>>>>> asked already, at least twice, to > > >> suggest > > >>>>>> another > > >>>>>>>>>>> framework > > >>>>>>>>>>>>>> (other > > >>>>>>>>>>>>>>>>> than > > >>>>>>>>>>>>>>>>>> M/R) > > >>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. > > >>> Still > > >>>>>>> waiting > > >>>>>>>>> for > > >>>>>>>>>>>>>>> suggestions. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> -Vlad > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted > > >> Yu < > > >>>>>>>>>>>> yuzhih...@gmail.com > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the > > >>>>> cluster, > > >>>>>>>> hbase > > >>>>>>>>>>> still > > >>>>>>>>>>>>>>>> functions > > >>>>>>>>>>>>>>>>>>>>> normally (post merge). > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we > > >>> have > > >>>>> long > > >>>>>>>> been > > >>>>>>>>>>>>> depending > > >>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at > > >> ExportSnapshot. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng > > >>> Chen > > >>>> < > > >>>>>>>>>>>>>>>> heng.chen.1...@gmail.com > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > >>> common > > >>>>>> case > > >>>>>>> we > > >>>>>>>>>> just > > >>>>>>>>>>>> have > > >>>>>>>>>>>>>>> HDFS > > >>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > >> framework > > >>>>>>>> (especially > > >>>>>>>>>> some > > >>>>>>>>>>>>>>> features > > >>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > >>>>> another > > >>>>>>> cost > > >>>>>>>>> for > > >>>>>>>>>>>>>> maintain. > > >>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < > > >>>>>>>>> palomino...@gmail.com > > >>>>>>>>>>> : > > >>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice > > >>>>>>>>> Backup/Restore > > >>>>>>>>>>>>> feature, > > >>>>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, > > >>> then > > >>>>> we > > >>>>>>>> could > > >>>>>>>>>> make > > >>>>>>>>>>>> it > > >>>>>>>>>>>>>>> depend > > >>>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>> MR, > > >>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager > > >>>>> instance > > >>>>>>>> that > > >>>>>>>>>>>> submits > > >>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>> jobs > > >>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>> do > > >>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we > > >>>> think > > >>>>>>> this > > >>>>>>>>> is a > > >>>>>>>>>>>> core > > >>>>>>>>>>>>>>>> feature > > >>>>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd > > >>> better > > >>>>>>>> implement > > >>>>>>>>> it > > >>>>>>>>>>>>> without > > >>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Thanks. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < > > >>>>>>>>> palomino...@gmail.com > > >>>>>>>>>>> : > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR > > >>>> jobs. > > >>>>>> It > > >>>>>>> is > > >>>>>>>>> OK > > >>>>>>>>>>> that > > >>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>> our > > >>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think > > >> the > > >>>>> bottom > > >>>>>>>> line > > >>>>>>>>> is > > >>>>>>>>>>>> that > > >>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>>>>>>>> launch > > >>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by > > >>>> other > > >>>>>>>>> services. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew > > >>>> Purtell < > > >>>>>>>>>>>>>>>>> andrew.purt...@gmail.com > > >>>>>>>>>>>>>>>>>>> : > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is > > >> on > > >>>> the > > >>>>>>> line > > >>>>>>>> I > > >>>>>>>>>>> think, > > >>>>>>>>>>>>> so > > >>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>> fair > > >>>>>>>>>>>>>>>>>>>>>>>>> question. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility > > >>> derived > > >>>>>> from > > >>>>>>>> Tool > > >>>>>>>>>>> like > > >>>>>>>>>>>>> our > > >>>>>>>>>>>>>>>> other > > >>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>> apps? > > >>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the > > >>>> AccessController > > >>>>>> to > > >>>>>>>>> decide > > >>>>>>>>>>> if > > >>>>>>>>>>>>>>> allowed? > > >>>>>>>>>>>>>>>>> But > > >>>>>>>>>>>>>>>>>>>>>> nothing > > >>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the > > >>> job > > >>>>>>>>>>>>>>> manually/independently, > > >>>>>>>>>>>>>>>>>> right? > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, > > >> Matteo > > >>>>>>> Bertozzi < > > >>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not > > >>> about > > >>>>>> tools > > >>>>>>>>> using > > >>>>>>>>>> MR > > >>>>>>>>>>>>>>>> (everyone i > > >>>>>>>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). > > >>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok > > >> with > > >>>>>> running > > >>>>>>>> MR > > >>>>>>>>>> jobs > > >>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>> Master > > >>>>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>> RSs > > >>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the > > >> first > > >>>> time > > >>>>>> we > > >>>>>>> do > > >>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Matteo > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, > > >>>>> Devaraj > > >>>>>>> Das > > >>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like > > >>>>>>>> ExportSnapshot > > >>>>>>>>> / > > >>>>>>>>>>>>> Backup / > > >>>>>>>>>>>>>>>>>> Restore, > > >>>>>>>>>>>>>>>>>>>>>> it's > > >>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is > > >>> the > > >>>>>> right > > >>>>>>>>>>> framework > > >>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>> such. > > >>>>>>>>>>>>>>>>>> We > > >>>>>>>>>>>>>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR > > >> (just > > >>>>> saying > > >>>>>>> :) > > >>>>>>>> ) > > >>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ > > >>>>> __________ > > >>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < > > >> yuzhih...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, > > >> 2016 > > >>>> 2:00 > > >>>>>> PM > > >>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs > > >>>>> started > > >>>>>>> by > > >>>>>>>>>> Master > > >>>>>>>>>>>> or > > >>>>>>>>>>>>> RS > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in > > >>> the > > >>>>> same > > >>>>>>>>>> category > > >>>>>>>>>>> as > > >>>>>>>>>>>>>>> import > > >>>>>>>>>>>>>>>> / > > >>>>>>>>>>>>>>>>>>>>>> export. > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, > > >>>> Andrew > > >>>>>>>>> Purtell < > > >>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around > > >>> core > > >>>> in > > >>>>>> my > > >>>>>>>>>> opinion. > > >>>>>>>>>>>>> Like > > >>>>>>>>>>>>>>>> import > > >>>>>>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>>>>>>>>>>>> export. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's > > >>> fine. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, > > >>> Matteo > > >>>>>>>> Bertozzi > > >>>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion > > >> around > > >>>>>> running > > >>>>>>> MR > > >>>>>>>>>> jobs > > >>>>>>>>>>>> from > > >>>>>>>>>>>>>>> hbase > > >>>>>>>>>>>>>>>>>>>>>> (Master > > >>>>>>>>>>>>>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that > > >> there > > >>>> was > > >>>>>>>>>> discussion > > >>>>>>>>>>>>> about > > >>>>>>>>>>>>>>> not > > >>>>>>>>>>>>>>>>>>>>> having > > >>>>>>>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>>>>>> has > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion > > >> where > > >>>>> around > > >>>>>>> MOB > > >>>>>>>>>> that > > >>>>>>>>>>>> had > > >>>>>>>>>>>>> a > > >>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>> job > > >>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>>>>>> compact, > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a > > >>>>> non-MR > > >>>>>>> job > > >>>>>>>> to > > >>>>>>>>>> be > > >>>>>>>>>>>>>> merged, > > >>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>>>>>>> had a > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log > > >>>>>> split/replay. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup > > >>>> feature > > >>>>>>>>>>> (HBASE-7912), > > >>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>> runs > > >>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>> job > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or > > >>> restore > > >>>>>> data. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really > > >> core" > > >>>> as > > >>>>>> in.. > > >>>>>>>> if > > >>>>>>>>>> you > > >>>>>>>>>>>>> don't > > >>>>>>>>>>>>>>> use > > >>>>>>>>>>>>>>>>>>>>> backup > > >>>>>>>>>>>>>>>>>>>>>>>>>>> you'll > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but > > >>>> this > > >>>>>> was > > >>>>>>>>>> probably > > >>>>>>>>>>>>> true > > >>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>> MOB > > >>>>>>>>>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>>>>>>> "if > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't > > >>> need > > >>>>>> MR") > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that > > >>>> says > > >>>>>> "we > > >>>>>>>>> don't > > >>>>>>>>>>> want > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>>>>>> hbase > > >>>>>>>>>>>>>>>>>>>>>> run > > >>>>>>>>>>>>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started > > >> manually > > >>> by > > >>>>> the > > >>>>>>>> user > > >>>>>>>>>> can > > >>>>>>>>>>> do > > >>>>>>>>>>>>>>> that". > > >>>>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>>>>>>>> can > > >>>>>>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> start > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without > > >>>>>> problems? > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > >