>> So the standalone service would run out of proc - in the same vein as REST or thrift server.
Ted, running separate process/service to coordinate backups is not a good idea. We have already a lot of them. On Sat, Sep 24, 2016 at 11:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. don't call out to an external framework we don't own from master (or > regionserver) code > > So the standalone service would run out of proc - in the same vein as REST > or thrift server. > > Cheers > > On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purt...@gmail.com > > > wrote: > > > I was attempting to summarize Ted. > > > > A new maven module sounds like a good idea to me. Or we could move all > the > > tools that use MR out to one. Or... > > > > The key takeaway seems to be don't call out to an external framework we > > don't own from master (or regionserver) code. > > > > > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > bq. Internally the tool can also use the procedure framework for state > > > durability > > > > > > Isn't this the standalone service I proposed this morning ? > > > > > > bq. Move cross HBase and MR coordination to a separate tool > > > > > > Where should this tool live (hbase-backup module) ? > > > > > > Thanks > > > > > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > > andrew.purt...@gmail.com> > > > wrote: > > > > > >> At branch merge voting time now more eyes are getting on the design > > issues > > >> with dissenting opinion emerging. This is the branch merge process > > working > > >> as our community has designed it. Because this is the first full > project > > >> review of the code and implementation I think we all have to be > > flexible. I > > >> see the community as trying to narrow the technical objection at issue > > to > > >> the smallest possible scope. It's simple: don't call out to an > external > > >> execution framework we don't own from core master (and by extension > > >> regionserver) code. We had this objection before to a proposed > external > > >> compaction implementation for > > >> MOB so should not come as a surprise. Please let me know if I have > > >> misstated this. > > >> > > >> This would seem to require a modest refactor of coordination to move > > >> invocation of MR code out from any core code path. To restate what I > > think > > >> is an emerging recommendation: Move cross HBase and MR coordination > to a > > >> separate tool. This tool can ask the master to invoke procedures on > the > > >> HBase side that do first mile export and last mile restore. > (Internally > > the > > >> tool can also use the procedure framework for state durability, > perhaps, > > >> just a thought.) Then the tool can further drive the things done with > MR > > >> like shipping data off cluster or moving remote data in place and > > preparing > > >> it for import. These activities do not need procedure coordination and > > >> involvement of the HBase master. Only the first and last mile of the > > >> process needs atomicity within the HBase deploy. Please let me know > if I > > >> have misstated this. > > >> > > >> > > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > >>> > > >>> bq. procedure gives you a retry mechanism on failure > > >>> > > >>> We do need this mechanism. Take a look at the multi-step > > >>> in FullTableBackupProcedure, etc. > > >>> > > >>> bq. let the user export it later when he wants > > >>> > > >>> This would make supporting security more complex (user A shouldn't be > > >>> exporting user B's backup). And it is not user friendly - at the time > > >>> backup request is issued, the following is specified: > > >>> > > >>> + + " BACKUP_ROOT The full root path to store the backup > > >>> image,\n" > > >>> + + " the prefix can be hdfs, webhdfs or > > gpfs\n" > > >>> > > >>> Backup root is an integral part of backup manifest. > > >>> > > >>> Cheers > > >>> > > >>> > > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < > > >> theo.berto...@gmail.com> > > >>> wrote: > > >>> > > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com> > wrote: > > >>>>> > > >>>>> Ideally the export should have one job running which does the retry > > (on > > >>>>> failed partition) itself. > > >>>>> > > >>>> > > >>>> procedure gives you a retry mechanism on failure. if you don't use > > that, > > >>>> than you don't need procedure. > > >>>> if you want you can start a procedure executor in a non master > process > > >> (the > > >>>> hbase-procedure is a separate package and does not depend on > master). > > >> but > > >>>> again, export seems a case where you don't need procedure. > > >>>> > > >>>> like snapshot, the logic may just be: ask the master to take a > backup. > > >> and > > >>>> let the user export it later when he wants. so you avoid having a MR > > job > > >>>> started by the master since people does not seems to like it. > > >>>> > > >>>> for restore (I think that is where you use the MR splitter) you can > > >>>> probably just have a backup ready (already splitted). there is > > already a > > >>>> jira that should do that HBASE-14135. instead of doing the operation > > of > > >>>> split/merge on restore. you consolidate the backup "offline" (mr job > > >>>> started by the user) and then ask to restore the backup. > > >>>> > > >>>> > > >>>>> > > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < > > >>>> theo.berto...@gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> as far as I understand the code, you don't need procedure for the > > >>>> export > > >>>>>> itself. > > >>>>>> the export operation is already idempotent, since you are just > > copying > > >>>>>> files. > > >>>>>> if the file exist and is complete (check length, checksum, ...) > you > > >> can > > >>>>>> skip it, > > >>>>>> otherwise you'll send it over again. > > >>>>>> > > >>>>>> you need the proc for taking the backup and restoring, > > >>>>>> because you want to complete the operation and end up with a > > >> consistent > > >>>>>> state > > >>>>>> across the multiple components you are updating (meta, fs, ...) > > >>>>>> but again, for export you can just run the tool over and over > until > > >> the > > >>>>>> operation succeed, and that should be ok. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Matteo > > >>>>>> > > >>>>>> > > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com> > > wrote: > > >>>>>>> > > >>>>>>> Master is involved in this discussion because currently only > Master > > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for > > >>>> backup / > > >>>>>>> restore. > > >>>>>>> > > >>>>>>> What if an optional standalone service which hosts > > ProcedureExecutor > > >>>> is > > >>>>>>> used for this purpose ? > > >>>>>>> Would that have better chance of giving us middle ground so that > we > > >>>> can > > >>>>>>> move this forward ? > > >>>>>>> > > >>>>>>> Cheers > > >>>>>>> > > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> > wrote: > > >>>>>>>> > > >>>>>>>> (Moved out of the Master doing MR DISCUSSION) > > >>>>>>>> > > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < > > >>>>>>>> vladrodio...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>>>> -1 on that backup be in core hbase > > >>>>>>>>> > > >>>>>>>>> Not sure I understand what it means. > > >>>>>>>>> > > >>>>>>>>> Sorry for the imprecision. > > >>>>>>>> > > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a > > dependency > > >>>>> and > > >>>>>>> so > > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if > optional. > > >>>>>>>> > > >>>>>>>> Master should not depend on MR. We've gone out of our way to > avoid > > >>>>>> taking > > >>>>>>>> MR on as dependency in the past. Seems late in the game for us > to > > >>>>>> change > > >>>>>>>> our opinion on this. If we didn't do it for distributed log > > >>>>> splitting, > > >>>>>> or > > >>>>>>>> MOB, why would we do it to support an optional backup/restore? > > >>>>>>>> > > >>>>>>>> I have opinions on the questions below -- i.e. that Master > running > > >>>>>>>> backup/restore is outside of the Master's charge -- but they are > > >>>> not > > >>>>>>> worth > > >>>>>>>> much since I've not done much by way of review or contrib to > > >>>>>>> backup/restore > > >>>>>>>> other than to try it as a 'user' so I'll keep them to myself > until > > >>>> I > > >>>>>> do. > > >>>>>>> I > > >>>>>>>> only came out from under my shell to participate on the MR as > > >>>>>> dependency > > >>>>>>>> chat. > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> M > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole > > >>>> process? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> We > > >>>>>>>>> have already brought up all advantages of using > > >>>>>>>>> Master and distributed procedures for backup and restore. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Downside of moving this to client tool is lack of fault > > >>>> tolerance: > > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can, > > >>>>>>> potentially > > >>>>>>>>> affect > > >>>>>>>>> cluster, such as disabling splits/merges, balancer. > > >>>>>>>>> 1.2 In case of client failure who will be doing the whole > > >>>> rollback > > >>>>>>>> stuff? > > >>>>>>>>> We are trying to make it atomic. > > >>>>>>>>> > > >>>>>>>>> Security is not clear. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> 2. We are not allowed to modify code of existing HBase core > > classes > > >>>>>> (what > > >>>>>>>>> does core mean anyway)? > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> 3. We are not allowed to create backup system table > > >>>> (hbase:backup) > > >>>>>> in a > > >>>>>>>>> system space? Only in user space? The table is global. > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we > > >>>> have > > >>>>>>>> touched, > > >>>>>>>>> of course some existing HBase code. > > >>>>>>>>> 3. is not that critical, of course we can move backup system > into > > >>>>>> user > > >>>>>>>>> space. > > >>>>>>>>> > > >>>>>>>>> And finally, will moving backup into external tool give us +1 > > >>>> from > > >>>>>>> stack? > > >>>>>>>>> > > >>>>>>>>> -Vlad > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net> > > >>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < > > >>>>>>>>>> vladrodio...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>>>> + MR is dead > > >>>>>>>>>>> > > >>>>>>>>>>> Does MR know that? :) > > >>>>>>>>>>> > > >>>>>>>>>>> Again. With all due respect, stack - still no suggestions > > >>>> what > > >>>>>>> should > > >>>>>>>>> we > > >>>>>>>>>>> use for "bulk data move and transformation" instead of MR? > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark, > > >>>>>>>>> distributed > > >>>>>>>>>> shell -- just don't have HBase core depend on it, even > > >>>>> optionally. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my > > >>>>>>>> opinion, > > >>>>>>>>>> some > > >>>>>>>>>>> group members still not sure about that and some will give -1 > > >>>>>>>>>>> in any case. Just because ... > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase > > >>>> (+1 > > >>>>>> on > > >>>>>>>>> adding > > >>>>>>>>>> all the API any such external tool might need to run). > > >>>>>>>>>> > > >>>>>>>>>> St.Ack > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> -Vlad > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net> > > >>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < > > >>>>>>>>>>> theo.berto...@gmail.com> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> let me try to go back to my original topic. > > >>>>>>>>>>>>> this question was meant to be generic, and provide some > > >>>>> rule > > >>>>>>> for > > >>>>>>>>>> future > > >>>>>>>>>>>>> code. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone > > >>>>> can > > >>>>>>> be: > > >>>>>>>>>>>>> - we don't want any core feature (e.g. > > >>>>>>> compaction/log-split/log- > > >>>>>>>>>>> reply) > > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an > > >>>>>>>>>>>>> external/uncontrolled MR setup. > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> +1 > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a > > >>>>>> flag) > > >>>>>>>> to > > >>>>>>>>>> run > > >>>>>>>>>>> MR > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > > >>>> is > > >>>>>> not > > >>>>>>>>>>> required. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind > > >>>> a > > >>>>>> flag > > >>>>>>>> or > > >>>>>>>>>> not > > >>>>>>>>>>> -- > > >>>>>>>>>>>> ever being able to launch MR jobs. > > >>>>>>>>>>>> > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > > >>>> from > > >>>>>>>>>> hbase-server > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > > >>>>>> peer). > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy > > >>>>> are > > >>>>>>>> busy > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > > >>>> not > > >>>>>>>> clutter > > >>>>>>>>>>> task > > >>>>>>>>>>>> harder by piling on more moving parts. > > >>>>>>>>>>>> > > >>>>>>>>>>>> St.Ack > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Matteo > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > >>>>> yuzhih...@gmail.com > > >>>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > > >>>> AssignmentManager > > >>>>>>> which > > >>>>>>>>> is > > >>>>>>>>>> to > > >>>>>>>>>>>>> make > > >>>>>>>>>>>>>> Master more stable. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > >>>>> palomino...@gmail.com > > >>>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > > >>>>>>> sequence > > >>>>>>>>> of > > >>>>>>>>>>>> calls > > >>>>>>>>>>>>>> when > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > >>>> regionserver > > >>>>>> so > > >>>>>>> it > > >>>>>>>>>>> extends > > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > > >>>> HRegionServer > > >>>>>>>>> sometimes > > >>>>>>>>>>>> needs > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > > >>>> cause > > >>>>>>>>>>> probabilistic > > >>>>>>>>>>>>> dead > > >>>>>>>>>>>>>>> lock or some strange NPEs... > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > > >>>> add > > >>>>>> new > > >>>>>>>>>> features > > >>>>>>>>>>>> or > > >>>>>>>>>>>>>> add > > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more > > >>>>>> works > > >>>>>>>> for > > >>>>>>>>>> the > > >>>>>>>>>>>>> start > > >>>>>>>>>>>>>>> up processing... > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Thanks. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > > >>>> yuzhih...@gmail.com > > >>>>>> : > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I read through HADOOP-13433 > > >>>>>>>>>>>>>>>> <https://issues.apache.org/ > > >>>> jira/browse/HADOOP-13433> > > >>>>> - > > >>>>>>> the > > >>>>>>>>>> cited > > >>>>>>>>>>>>> race > > >>>>>>>>>>>>>>>> condition is in jdk. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > > >>>>> moving. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > > >>>>>> problem... > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > > >>>> it > > >>>>> in > > >>>>>>> the > > >>>>>>>>>>> backup > > >>>>>>>>>>>> / > > >>>>>>>>>>>>>>>> restore mega patch ? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > > >>>>>>>> palomino...@gmail.com> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature > > >>>> in > > >>>>>> the > > >>>>>>>> MR > > >>>>>>>>>> way > > >>>>>>>>>>>> and > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > > >>>>> it > > >>>>>>> as I > > >>>>>>>>> do > > >>>>>>>>>>> not > > >>>>>>>>>>>>> want > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> block the development progress. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > > >>>> the > > >>>>>>>> design > > >>>>>>>>>> and > > >>>>>>>>>>>> see > > >>>>>>>>>>>>> if > > >>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > > >>>>>>> possible. > > >>>>>>>>> HA > > >>>>>>>>>> is > > >>>>>>>>>>>>> not a > > >>>>>>>>>>>>>>> big > > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally. > > >>>> But > > >>>>>> the > > >>>>>>>>> ugly > > >>>>>>>>>>> code > > >>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>> HMaster is readlly a problem... > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a > > >>>> long > > >>>>>>> time. > > >>>>>>>>> Can > > >>>>>>>>>>>>> someone > > >>>>>>>>>>>>>>>> help > > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, > > >>>>> ugly > > >>>>>>>>> code... > > >>>>>>>>>>>>> logout > > >>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is > > >>>>> still > > >>>>>>>> being > > >>>>>>>>>>> used, > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the > > >>>>>>> behivor > > >>>>>>>>> and > > >>>>>>>>>>> the > > >>>>>>>>>>>>> only > > >>>>>>>>>>>>>>> way > > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly > > >>>> code... > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> https://issues.apache.org/ > > >>>> jira/browse/HADOOP-13433 > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > >>>>>>>>>>>>> vladrodio...@gmail.com > > >>>>>>>>>>>>>>> : > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > >>>> doing > > >>>>>>> this > > >>>>>>>>>>> without > > >>>>>>>>>>>>>> using > > >>>>>>>>>>>>>>>> MR, > > >>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>> can certainly consider that > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is > > >>>>>> abstract > > >>>>>>>> and > > >>>>>>>>>>> allows > > >>>>>>>>>>>>>>>>>> different implementations. MR is just one > > >>>>>>>> implementation > > >>>>>>>>> we > > >>>>>>>>>>>>>> provide. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> -Vlad > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > >>>>>>>>>>>>> d...@hortonworks.com > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the > > >>>>>> topic > > >>>>>>>> of > > >>>>>>>>>>>> MR-based > > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about > > >>>> the > > >>>>>>>>>>> SpliceMachine > > >>>>>>>>>>>>>>>> approach > > >>>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where > > >>>> apparently > > >>>>>> they > > >>>>>>>>> saw a > > >>>>>>>>>>> lot > > >>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>> benefits. > > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat > > >>>>>> Andrew; I > > >>>>>>>>>> really > > >>>>>>>>>>>>> didn't > > >>>>>>>>>>>>>>>> mean > > >>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>> :-) > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: > > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that > > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master > > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), > > >>>>> and I > > >>>>>>>> don't > > >>>>>>>>>>> think > > >>>>>>>>>>>>>> it's > > >>>>>>>>>>>>>>>> even > > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something > > >>>>>> when > > >>>>>>> MR > > >>>>>>>>> is > > >>>>>>>>>>>>> already > > >>>>>>>>>>>>>>>> there, > > >>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>> being used by HBase already for some > > >>>>> operations. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of > > >>>>> issues - > > >>>>>>> HA > > >>>>>>>> of > > >>>>>>>>>> the > > >>>>>>>>>>>>>> server > > >>>>>>>>>>>>>>>> not > > >>>>>>>>>>>>>>>>>>> being the least of them all. Security > > >>>> (kerberos > > >>>>>>>>>>>> authentication, > > >>>>>>>>>>>>>>>> another > > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that > > >>>>>>> approach > > >>>>>>>>> is > > >>>>>>>>>>> DOA. > > >>>>>>>>>>>>>>> Instead > > >>>>>>>>>>>>>>>>>> let's > > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I > > >>>>>>> haven't > > >>>>>>>>> seen > > >>>>>>>>>>> any > > >>>>>>>>>>>>>> good > > >>>>>>>>>>>>>>>>> reason > > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs > > >>>>> if > > >>>>>>>>> needed. > > >>>>>>>>>>> It's > > >>>>>>>>>>>>> not > > >>>>>>>>>>>>>>>>> ideal; > > >>>>>>>>>>>>>>>>>>> agreed. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are > > >>>> the > > >>>>>>>>> benefits > > >>>>>>>>>> of > > >>>>>>>>>>>>>> running > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think > > >>>>> Ted > > >>>>>>> has > > >>>>>>>>>>>> summarized > > >>>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>> issues that we need to take care of - > > >>>>> basically, > > >>>>>>> the > > >>>>>>>>>> master > > >>>>>>>>>>>> can > > >>>>>>>>>>>>>>> keep > > >>>>>>>>>>>>>>>>>> track > > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the > > >>>> backup > > >>>>>>>> master > > >>>>>>>>>> can > > >>>>>>>>>>>>>> continue > > >>>>>>>>>>>>>>>>>> keeping > > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been > > >>>>>>> recorded > > >>>>>>>>> in > > >>>>>>>>>>> the > > >>>>>>>>>>>>> proc > > >>>>>>>>>>>>>>>> WAL). > > >>>>>>>>>>>>>>>>>> The > > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed > > >>>>>>>>> backup/restore > > >>>>>>>>>>>>>>> processes. > > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to > > >>>>> run > > >>>>>> as > > >>>>>>>>>> 'hbase' > > >>>>>>>>>>>>> since > > >>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> owns > > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job > > >>>>> makes > > >>>>>> it > > >>>>>>>> get > > >>>>>>>>>>> that > > >>>>>>>>>>>>>>>> privilege. > > >>>>>>>>>>>>>>>>>> In > > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the > > >>>>>> above > > >>>>>>>>>>>> management. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is > > >>>>>> ready > > >>>>>>>>> from > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>> overall > > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review > > >>>> is > > >>>>>>> still > > >>>>>>>>>>> pending > > >>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>> Matteo). > > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > >>>> doing > > >>>>>> this > > >>>>>>>>>> without > > >>>>>>>>>>>>> using > > >>>>>>>>>>>>>>> MR, > > >>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't > > >>>>> think > > >>>>>> we > > >>>>>>>>>> should > > >>>>>>>>>>>>> block > > >>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>> patch > > >>>>>>>>>>>>>>>>>>> from getting merged. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> ________________________________________ > > >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> > > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM > > >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by > > >>>>>> Master > > >>>>>>>> or > > >>>>>>>>> RS > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than > > >>>>>>> master? > > >>>>>>>>> You > > >>>>>>>>>>> can > > >>>>>>>>>>>>> use > > >>>>>>>>>>>>>>>> your > > >>>>>>>>>>>>>>>>>> own > > >>>>>>>>>>>>>>>>>>> procedure store in that service? > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < > > >>>>>>>> yuzhih...@gmail.com > > >>>>>>>>>> : > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client > > >>>> driven. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to > > >>>> resume > > >>>>> if > > >>>>>>>> there > > >>>>>>>>>> is > > >>>>>>>>>>>>> error > > >>>>>>>>>>>>>>>>> midway. > > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / > > >>>> restore > > >>>>>>> more > > >>>>>>>>>>> robust. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It > > >>>> is > > >>>>>> hard > > >>>>>>>> to > > >>>>>>>>>>>> enforce > > >>>>>>>>>>>>>>>> security > > >>>>>>>>>>>>>>>>>> (to > > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew > > >>>>> Purtell < > > >>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > >>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, > > >>>> which > > >>>>>> is > > >>>>>>>>>>> "shelling > > >>>>>>>>>>>>> out" > > >>>>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why > > >>>> not > > >>>>>>> drive > > >>>>>>>>>> this > > >>>>>>>>>>>>> with a > > >>>>>>>>>>>>>>>>> utility > > >>>>>>>>>>>>>>>>>>>> derived from Tool? > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir > > >>>>>> Rodionov > > >>>>>>> < > > >>>>>>>>>>>>>>>>>> vladrodio...@gmail.com > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > >>>>> common > > >>>>>>>> case > > >>>>>>>>> we > > >>>>>>>>>>>> just > > >>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>> HDFS > > >>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > >>>> framework > > >>>>>>>>>> (especially > > >>>>>>>>>>>> some > > >>>>>>>>>>>>>>>>> features > > >>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > >>>>>>> another > > >>>>>>>>> cost > > >>>>>>>>>>> for > > >>>>>>>>>>>>>>>> maintain. > > >>>>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this > > >>>>> case. > > >>>>>>> Many > > >>>>>>>>> our > > >>>>>>>>>>>>>> customers > > >>>>>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>>>> full > > >>>>>>>>>>>>>>>>>>>>>> stack deployed and > > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard > > >>>> feature. > > >>>>>>>> Besides > > >>>>>>>>>>> this, > > >>>>>>>>>>>>>>> nothing > > >>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>>> happen > > >>>>>>>>>>>>>>>>>>>>>> in your cluster > > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R > > >>>>>>>>> dependency) > > >>>>>>>>>>> goes > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> nowhere. > > >>>>>>>>>>>>>>>>>>> We > > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to > > >>>> suggest > > >>>>>>>> another > > >>>>>>>>>>>>> framework > > >>>>>>>>>>>>>>>> (other > > >>>>>>>>>>>>>>>>>>> than > > >>>>>>>>>>>>>>>>>>>> M/R) > > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. > > >>>>> Still > > >>>>>>>>> waiting > > >>>>>>>>>>> for > > >>>>>>>>>>>>>>>>> suggestions. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> -Vlad > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted > > >>>> Yu < > > >>>>>>>>>>>>>> yuzhih...@gmail.com > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the > > >>>>>>> cluster, > > >>>>>>>>>> hbase > > >>>>>>>>>>>>> still > > >>>>>>>>>>>>>>>>>> functions > > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge). > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we > > >>>>> have > > >>>>>>> long > > >>>>>>>>>> been > > >>>>>>>>>>>>>>> depending > > >>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at > > >>>> ExportSnapshot. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng > > >>>>> Chen > > >>>>>> < > > >>>>>>>>>>>>>>>>>> heng.chen.1...@gmail.com > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > >>>>> common > > >>>>>>>> case > > >>>>>>>>> we > > >>>>>>>>>>>> just > > >>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>> HDFS > > >>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > >>>> framework > > >>>>>>>>>> (especially > > >>>>>>>>>>>> some > > >>>>>>>>>>>>>>>>> features > > >>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > >>>>>>> another > > >>>>>>>>> cost > > >>>>>>>>>>> for > > >>>>>>>>>>>>>>>> maintain. > > >>>>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < > > >>>>>>>>>>> palomino...@gmail.com > > >>>>>>>>>>>>> : > > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice > > >>>>>>>>>>> Backup/Restore > > >>>>>>>>>>>>>>> feature, > > >>>>>>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, > > >>>>> then > > >>>>>>> we > > >>>>>>>>>> could > > >>>>>>>>>>>> make > > >>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> depend > > >>>>>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>>>> MR, > > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager > > >>>>>>> instance > > >>>>>>>>>> that > > >>>>>>>>>>>>>> submits > > >>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>> jobs > > >>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>> do > > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we > > >>>>>> think > > >>>>>>>>> this > > >>>>>>>>>>> is a > > >>>>>>>>>>>>>> core > > >>>>>>>>>>>>>>>>>> feature > > >>>>>>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd > > >>>>> better > > >>>>>>>>>> implement > > >>>>>>>>>>> it > > >>>>>>>>>>>>>>> without > > >>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < > > >>>>>>>>>>> palomino...@gmail.com > > >>>>>>>>>>>>> : > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR > > >>>>>> jobs. > > >>>>>>>> It > > >>>>>>>>> is > > >>>>>>>>>>> OK > > >>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>>> our > > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think > > >>>> the > > >>>>>>> bottom > > >>>>>>>>>> line > > >>>>>>>>>>> is > > >>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>>>>>>>>>> launch > > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by > > >>>>>> other > > >>>>>>>>>>> services. > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew > > >>>>>> Purtell < > > >>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com > > >>>>>>>>>>>>>>>>>>>>> : > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is > > >>>> on > > >>>>>> the > > >>>>>>>>> line > > >>>>>>>>>> I > > >>>>>>>>>>>>> think, > > >>>>>>>>>>>>>>> so > > >>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>> fair > > >>>>>>>>>>>>>>>>>>>>>>>>>>> question. > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility > > >>>>> derived > > >>>>>>>> from > > >>>>>>>>>> Tool > > >>>>>>>>>>>>> like > > >>>>>>>>>>>>>>> our > > >>>>>>>>>>>>>>>>>> other > > >>>>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>>> apps? > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the > > >>>>>> AccessController > > >>>>>>>> to > > >>>>>>>>>>> decide > > >>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>>> allowed? > > >>>>>>>>>>>>>>>>>>> But > > >>>>>>>>>>>>>>>>>>>>>>>> nothing > > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the > > >>>>> job > > >>>>>>>>>>>>>>>>> manually/independently, > > >>>>>>>>>>>>>>>>>>>> right? > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, > > >>>> Matteo > > >>>>>>>>> Bertozzi < > > >>>>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not > > >>>>> about > > >>>>>>>> tools > > >>>>>>>>>>> using > > >>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>> (everyone i > > >>>>>>>>>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok > > >>>> with > > >>>>>>>> running > > >>>>>>>>>> MR > > >>>>>>>>>>>> jobs > > >>>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>> Master > > >>>>>>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>>> RSs > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the > > >>>> first > > >>>>>> time > > >>>>>>>> we > > >>>>>>>>> do > > >>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, > > >>>>>>> Devaraj > > >>>>>>>>> Das > > >>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like > > >>>>>>>>>> ExportSnapshot > > >>>>>>>>>>> / > > >>>>>>>>>>>>>>> Backup / > > >>>>>>>>>>>>>>>>>>>> Restore, > > >>>>>>>>>>>>>>>>>>>>>>>> it's > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is > > >>>>> the > > >>>>>>>> right > > >>>>>>>>>>>>> framework > > >>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>> such. > > >>>>>>>>>>>>>>>>>>>> We > > >>>>>>>>>>>>>>>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR > > >>>> (just > > >>>>>>> saying > > >>>>>>>>> :) > > >>>>>>>>>> ) > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ > > >>>>>>> __________ > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < > > >>>> yuzhih...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, > > >>>> 2016 > > >>>>>> 2:00 > > >>>>>>>> PM > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs > > >>>>>>> started > > >>>>>>>>> by > > >>>>>>>>>>>> Master > > >>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>> RS > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in > > >>>>> the > > >>>>>>> same > > >>>>>>>>>>>> category > > >>>>>>>>>>>>> as > > >>>>>>>>>>>>>>>>> import > > >>>>>>>>>>>>>>>>>> / > > >>>>>>>>>>>>>>>>>>>>>>>> export. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, > > >>>>>> Andrew > > >>>>>>>>>>> Purtell < > > >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around > > >>>>> core > > >>>>>> in > > >>>>>>>> my > > >>>>>>>>>>>> opinion. > > >>>>>>>>>>>>>>> Like > > >>>>>>>>>>>>>>>>>> import > > >>>>>>>>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>>>>>>>>>>>>>> export. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's > > >>>>> fine. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, > > >>>>> Matteo > > >>>>>>>>>> Bertozzi > > >>>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion > > >>>> around > > >>>>>>>> running > > >>>>>>>>> MR > > >>>>>>>>>>>> jobs > > >>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>> hbase > > >>>>>>>>>>>>>>>>>>>>>>>> (Master > > >>>>>>>>>>>>>>>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that > > >>>> there > > >>>>>> was > > >>>>>>>>>>>> discussion > > >>>>>>>>>>>>>>> about > > >>>>>>>>>>>>>>>>> not > > >>>>>>>>>>>>>>>>>>>>>>> having > > >>>>>>>>>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion > > >>>> where > > >>>>>>> around > > >>>>>>>>> MOB > > >>>>>>>>>>>> that > > >>>>>>>>>>>>>> had > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>> job > > >>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a > > >>>>>>> non-MR > > >>>>>>>>> job > > >>>>>>>>>> to > > >>>>>>>>>>>> be > > >>>>>>>>>>>>>>>> merged, > > >>>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log > > >>>>>>>> split/replay. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup > > >>>>>> feature > > >>>>>>>>>>>>> (HBASE-7912), > > >>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>> runs > > >>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>>> job > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or > > >>>>> restore > > >>>>>>>> data. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really > > >>>> core" > > >>>>>> as > > >>>>>>>> in.. > > >>>>>>>>>> if > > >>>>>>>>>>>> you > > >>>>>>>>>>>>>>> don't > > >>>>>>>>>>>>>>>>> use > > >>>>>>>>>>>>>>>>>>>>>>> backup > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but > > >>>>>> this > > >>>>>>>> was > > >>>>>>>>>>>> probably > > >>>>>>>>>>>>>>> true > > >>>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>> MOB > > >>>>>>>>>>>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't > > >>>>> need > > >>>>>>>> MR") > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that > > >>>>>> says > > >>>>>>>> "we > > >>>>>>>>>>> don't > > >>>>>>>>>>>>> want > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>>>>>>>> hbase > > >>>>>>>>>>>>>>>>>>>>>>>> run > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started > > >>>> manually > > >>>>> by > > >>>>>>> the > > >>>>>>>>>> user > > >>>>>>>>>>>> can > > >>>>>>>>>>>>> do > > >>>>>>>>>>>>>>>>> that". > > >>>>>>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>>>>>>>>>> can > > >>>>>>>>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without > > >>>>>>>> problems? > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > > >