I don't see you prevailing with this line of argument but you are welcome to try. Don't shoot the messenger please.
On Sep 24, 2016, at 11:08 AM, Vladimir Rodionov <vladrodio...@gmail.com> wrote: >>> The key takeaway seems to be don't call out to an external framework we > don't own from master (or regionserver) code. > Should we ban HDFS as well? > > HBase is a founding partner of a Hadoop stack: HDFS, MapReduce, HBase > > -Vlad > > On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purt...@gmail.com> > wrote: > >> I was attempting to summarize Ted. >> >> A new maven module sounds like a good idea to me. Or we could move all the >> tools that use MR out to one. Or... >> >> The key takeaway seems to be don't call out to an external framework we >> don't own from master (or regionserver) code. >> >>> On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>> bq. Internally the tool can also use the procedure framework for state >>> durability >>> >>> Isn't this the standalone service I proposed this morning ? >>> >>> bq. Move cross HBase and MR coordination to a separate tool >>> >>> Where should this tool live (hbase-backup module) ? >>> >>> Thanks >>> >>> >>> On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < >> andrew.purt...@gmail.com> >>> wrote: >>> >>>> At branch merge voting time now more eyes are getting on the design >> issues >>>> with dissenting opinion emerging. This is the branch merge process >> working >>>> as our community has designed it. Because this is the first full project >>>> review of the code and implementation I think we all have to be >> flexible. I >>>> see the community as trying to narrow the technical objection at issue >> to >>>> the smallest possible scope. It's simple: don't call out to an external >>>> execution framework we don't own from core master (and by extension >>>> regionserver) code. We had this objection before to a proposed external >>>> compaction implementation for >>>> MOB so should not come as a surprise. Please let me know if I have >>>> misstated this. >>>> >>>> This would seem to require a modest refactor of coordination to move >>>> invocation of MR code out from any core code path. To restate what I >> think >>>> is an emerging recommendation: Move cross HBase and MR coordination to a >>>> separate tool. This tool can ask the master to invoke procedures on the >>>> HBase side that do first mile export and last mile restore. (Internally >> the >>>> tool can also use the procedure framework for state durability, perhaps, >>>> just a thought.) Then the tool can further drive the things done with MR >>>> like shipping data off cluster or moving remote data in place and >> preparing >>>> it for import. These activities do not need procedure coordination and >>>> involvement of the HBase master. Only the first and last mile of the >>>> process needs atomicity within the HBase deploy. Please let me know if I >>>> have misstated this. >>>> >>>> >>>>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>> >>>>> bq. procedure gives you a retry mechanism on failure >>>>> >>>>> We do need this mechanism. Take a look at the multi-step >>>>> in FullTableBackupProcedure, etc. >>>>> >>>>> bq. let the user export it later when he wants >>>>> >>>>> This would make supporting security more complex (user A shouldn't be >>>>> exporting user B's backup). And it is not user friendly - at the time >>>>> backup request is issued, the following is specified: >>>>> >>>>> + + " BACKUP_ROOT The full root path to store the backup >>>>> image,\n" >>>>> + + " the prefix can be hdfs, webhdfs or >> gpfs\n" >>>>> >>>>> Backup root is an integral part of backup manifest. >>>>> >>>>> Cheers >>>>> >>>>> >>>>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < >>>> theo.berto...@gmail.com> >>>>> wrote: >>>>> >>>>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>> >>>>>>> Ideally the export should have one job running which does the retry >> (on >>>>>>> failed partition) itself. >>>>>>> >>>>>> >>>>>> procedure gives you a retry mechanism on failure. if you don't use >> that, >>>>>> than you don't need procedure. >>>>>> if you want you can start a procedure executor in a non master process >>>> (the >>>>>> hbase-procedure is a separate package and does not depend on master). >>>> but >>>>>> again, export seems a case where you don't need procedure. >>>>>> >>>>>> like snapshot, the logic may just be: ask the master to take a backup. >>>> and >>>>>> let the user export it later when he wants. so you avoid having a MR >> job >>>>>> started by the master since people does not seems to like it. >>>>>> >>>>>> for restore (I think that is where you use the MR splitter) you can >>>>>> probably just have a backup ready (already splitted). there is >> already a >>>>>> jira that should do that HBASE-14135. instead of doing the operation >> of >>>>>> split/merge on restore. you consolidate the backup "offline" (mr job >>>>>> started by the user) and then ask to restore the backup. >>>>>> >>>>>> >>>>>>> >>>>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < >>>>>> theo.berto...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> as far as I understand the code, you don't need procedure for the >>>>>> export >>>>>>>> itself. >>>>>>>> the export operation is already idempotent, since you are just >> copying >>>>>>>> files. >>>>>>>> if the file exist and is complete (check length, checksum, ...) you >>>> can >>>>>>>> skip it, >>>>>>>> otherwise you'll send it over again. >>>>>>>> >>>>>>>> you need the proc for taking the backup and restoring, >>>>>>>> because you want to complete the operation and end up with a >>>> consistent >>>>>>>> state >>>>>>>> across the multiple components you are updating (meta, fs, ...) >>>>>>>> but again, for export you can just run the tool over and over until >>>> the >>>>>>>> operation succeed, and that should be ok. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Matteo >>>>>>>> >>>>>>>> >>>>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com> >> wrote: >>>>>>>>> >>>>>>>>> Master is involved in this discussion because currently only Master >>>>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for >>>>>> backup / >>>>>>>>> restore. >>>>>>>>> >>>>>>>>> What if an optional standalone service which hosts >> ProcedureExecutor >>>>>> is >>>>>>>>> used for this purpose ? >>>>>>>>> Would that have better chance of giving us middle ground so that we >>>>>> can >>>>>>>>> move this forward ? >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> >>>>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> wrote: >>>>>>>>>> >>>>>>>>>> (Moved out of the Master doing MR DISCUSSION) >>>>>>>>>> >>>>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < >>>>>>>>>> vladrodio...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>>> -1 on that backup be in core hbase >>>>>>>>>>> >>>>>>>>>>> Not sure I understand what it means. >>>>>>>>>>> >>>>>>>>>>> Sorry for the imprecision. >>>>>>>>>> >>>>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a >> dependency >>>>>>> and >>>>>>>>> so >>>>>>>>>> -1 on the Master running backup/restore MR jobs, even if optional. >>>>>>>>>> >>>>>>>>>> Master should not depend on MR. We've gone out of our way to avoid >>>>>>>> taking >>>>>>>>>> MR on as dependency in the past. Seems late in the game for us to >>>>>>>> change >>>>>>>>>> our opinion on this. If we didn't do it for distributed log >>>>>>> splitting, >>>>>>>> or >>>>>>>>>> MOB, why would we do it to support an optional backup/restore? >>>>>>>>>> >>>>>>>>>> I have opinions on the questions below -- i.e. that Master running >>>>>>>>>> backup/restore is outside of the Master's charge -- but they are >>>>>> not >>>>>>>>> worth >>>>>>>>>> much since I've not done much by way of review or contrib to >>>>>>>>> backup/restore >>>>>>>>>> other than to try it as a 'user' so I'll keep them to myself until >>>>>> I >>>>>>>> do. >>>>>>>>> I >>>>>>>>>> only came out from under my shell to participate on the MR as >>>>>>>> dependency >>>>>>>>>> chat. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> M >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1. We are not allowed to use Master to orchestrate the whole >>>>>> process? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> We >>>>>>>>>>> have already brought up all advantages of using >>>>>>>>>>> Master and distributed procedures for backup and restore. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Downside of moving this to client tool is lack of fault >>>>>> tolerance: >>>>>>>>>>> 1.1 Client won't be allowed to do any operations, that can, >>>>>>>>> potentially >>>>>>>>>>> affect >>>>>>>>>>> cluster, such as disabling splits/merges, balancer. >>>>>>>>>>> 1.2 In case of client failure who will be doing the whole >>>>>> rollback >>>>>>>>>> stuff? >>>>>>>>>>> We are trying to make it atomic. >>>>>>>>>>> >>>>>>>>>>> Security is not clear. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2. We are not allowed to modify code of existing HBase core >> classes >>>>>>>> (what >>>>>>>>>>> does core mean anyway)? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 3. We are not allowed to create backup system table >>>>>> (hbase:backup) >>>>>>>> in a >>>>>>>>>>> system space? Only in user space? The table is global. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we >>>>>> have >>>>>>>>>> touched, >>>>>>>>>>> of course some existing HBase code. >>>>>>>>>>> 3. is not that critical, of course we can move backup system into >>>>>>>> user >>>>>>>>>>> space. >>>>>>>>>>> >>>>>>>>>>> And finally, will moving backup into external tool give us +1 >>>>>> from >>>>>>>>> stack? >>>>>>>>>>> >>>>>>>>>>> -Vlad >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net> >>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < >>>>>>>>>>>> vladrodio...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>>>> + MR is dead >>>>>>>>>>>>> >>>>>>>>>>>>> Does MR know that? :) >>>>>>>>>>>>> >>>>>>>>>>>>> Again. With all due respect, stack - still no suggestions >>>>>> what >>>>>>>>> should >>>>>>>>>>> we >>>>>>>>>>>>> use for "bulk data move and transformation" instead of MR? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark, >>>>>>>>>>> distributed >>>>>>>>>>>> shell -- just don't have HBase core depend on it, even >>>>>>> optionally. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my >>>>>>>>>> opinion, >>>>>>>>>>>> some >>>>>>>>>>>>> group members still not sure about that and some will give -1 >>>>>>>>>>>>> in any case. Just because ... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase >>>>>> (+1 >>>>>>>> on >>>>>>>>>>> adding >>>>>>>>>>>> all the API any such external tool might need to run). >>>>>>>>>>>> >>>>>>>>>>>> St.Ack >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> -Vlad >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net> >>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < >>>>>>>>>>>>> theo.berto...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> let me try to go back to my original topic. >>>>>>>>>>>>>>> this question was meant to be generic, and provide some >>>>>>> rule >>>>>>>>> for >>>>>>>>>>>> future >>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone >>>>>>> can >>>>>>>>> be: >>>>>>>>>>>>>>> - we don't want any core feature (e.g. >>>>>>>>> compaction/log-split/log- >>>>>>>>>>>>> reply) >>>>>>>>>>>>>>> over MR, because some cluster may not want or may have an >>>>>>>>>>>>>>> external/uncontrolled MR setup. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> +1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a >>>>>>>> flag) >>>>>>>>>> to >>>>>>>>>>>> run >>>>>>>>>>>>> MR >>>>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR >>>>>> is >>>>>>>> not >>>>>>>>>>>>> required. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind >>>>>> a >>>>>>>> flag >>>>>>>>>> or >>>>>>>>>>>> not >>>>>>>>>>>>> -- >>>>>>>>>>>>>> ever being able to launch MR jobs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it >>>>>> from >>>>>>>>>>>> hbase-server >>>>>>>>>>>>>> moving it out to be an optional module (Spark would be its >>>>>>>> peer). >>>>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy >>>>>>> are >>>>>>>>>> busy >>>>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets >>>>>> not >>>>>>>>>> clutter >>>>>>>>>>>>> task >>>>>>>>>>>>>> harder by piling on more moving parts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> St.Ack >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Matteo >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < >>>>>>> yuzhih...@gmail.com >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I suggest you look at Matteo's work for >>>>>> AssignmentManager >>>>>>>>> which >>>>>>>>>>> is >>>>>>>>>>>> to >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> Master more stable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < >>>>>>> palomino...@gmail.com >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the >>>>>>>>> sequence >>>>>>>>>>> of >>>>>>>>>>>>>> calls >>>>>>>>>>>>>>>> when >>>>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a >>>>>> regionserver >>>>>>>> so >>>>>>>>> it >>>>>>>>>>>>> extends >>>>>>>>>>>>>>>>> HRegionServer, and the initialization of >>>>>> HRegionServer >>>>>>>>>>> sometimes >>>>>>>>>>>>>> needs >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would >>>>>> cause >>>>>>>>>>>>> probabilistic >>>>>>>>>>>>>>> dead >>>>>>>>>>>>>>>>> lock or some strange NPEs... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to >>>>>> add >>>>>>>> new >>>>>>>>>>>> features >>>>>>>>>>>>>> or >>>>>>>>>>>>>>>> add >>>>>>>>>>>>>>>>> external dependencies to HMaster, especially add more >>>>>>>> works >>>>>>>>>> for >>>>>>>>>>>> the >>>>>>>>>>>>>>> start >>>>>>>>>>>>>>>>> up processing... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < >>>>>> yuzhih...@gmail.com >>>>>>>> : >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I read through HADOOP-13433 >>>>>>>>>>>>>>>>>> <https://issues.apache.org/ >>>>>> jira/browse/HADOOP-13433> >>>>>>> - >>>>>>>>> the >>>>>>>>>>>> cited >>>>>>>>>>>>>>> race >>>>>>>>>>>>>>>>>> condition is in jdk. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it >>>>>>> moving. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a >>>>>>>> problem... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is >>>>>> it >>>>>>> in >>>>>>>>> the >>>>>>>>>>>>> backup >>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>> restore mega patch ? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < >>>>>>>>>> palomino...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If you guys have already implemented the feature >>>>>> in >>>>>>>> the >>>>>>>>>> MR >>>>>>>>>>>> way >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on >>>>>>> it >>>>>>>>> as I >>>>>>>>>>> do >>>>>>>>>>>>> not >>>>>>>>>>>>>>> want >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> block the development progress. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit >>>>>> the >>>>>>>>>> design >>>>>>>>>>>> and >>>>>>>>>>>>>> see >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as >>>>>>>>> possible. >>>>>>>>>>> HA >>>>>>>>>>>> is >>>>>>>>>>>>>>> not a >>>>>>>>>>>>>>>>> big >>>>>>>>>>>>>>>>>>> problem if you do not store any metada locally. >>>>>> But >>>>>>>> the >>>>>>>>>>> ugly >>>>>>>>>>>>> code >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> HMaster is readlly a problem... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> And for security, I have a issue pending for a >>>>>> long >>>>>>>>> time. >>>>>>>>>>> Can >>>>>>>>>>>>>>> someone >>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, >>>>>>> ugly >>>>>>>>>>> code... >>>>>>>>>>>>>>> logout >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is >>>>>>> still >>>>>>>>>> being >>>>>>>>>>>>> used, >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the >>>>>>>>> behivor >>>>>>>>>>> and >>>>>>>>>>>>> the >>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>> way >>>>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly >>>>>> code... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://issues.apache.org/ >>>>>> jira/browse/HADOOP-13433 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < >>>>>>>>>>>>>>> vladrodio...@gmail.com >>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of >>>>>> doing >>>>>>>>> this >>>>>>>>>>>>> without >>>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>>>> MR, >>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>> can certainly consider that >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Our framework for distributed operations is >>>>>>>> abstract >>>>>>>>>> and >>>>>>>>>>>>> allows >>>>>>>>>>>>>>>>>>>> different implementations. MR is just one >>>>>>>>>> implementation >>>>>>>>>>> we >>>>>>>>>>>>>>>> provide. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Vlad >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < >>>>>>>>>>>>>>> d...@hortonworks.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the >>>>>>>> topic >>>>>>>>>> of >>>>>>>>>>>>>> MR-based >>>>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about >>>>>> the >>>>>>>>>>>>> SpliceMachine >>>>>>>>>>>>>>>>>> approach >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> managing compactions in Spark where >>>>>> apparently >>>>>>>> they >>>>>>>>>>> saw a >>>>>>>>>>>>> lot >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>> benefits. >>>>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat >>>>>>>> Andrew; I >>>>>>>>>>>> really >>>>>>>>>>>>>>> didn't >>>>>>>>>>>>>>>>>> mean >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> :-) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: >>>>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that >>>>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master >>>>>>>>>>>>>>>>>>>>> 2. Shell out from the master >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), >>>>>>> and I >>>>>>>>>> don't >>>>>>>>>>>>> think >>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>> even >>>>>>>>>>>>>>>>>>>>> worth the effort of trying to build something >>>>>>>> when >>>>>>>>> MR >>>>>>>>>>> is >>>>>>>>>>>>>>> already >>>>>>>>>>>>>>>>>> there, >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> being used by HBase already for some >>>>>>> operations. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of >>>>>>> issues - >>>>>>>>> HA >>>>>>>>>> of >>>>>>>>>>>> the >>>>>>>>>>>>>>>> server >>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>> being the least of them all. Security >>>>>> (kerberos >>>>>>>>>>>>>> authentication, >>>>>>>>>>>>>>>>>> another >>>>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that >>>>>>>>> approach >>>>>>>>>>> is >>>>>>>>>>>>> DOA. >>>>>>>>>>>>>>>>> Instead >>>>>>>>>>>>>>>>>>>> let's >>>>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I >>>>>>>>> haven't >>>>>>>>>>> seen >>>>>>>>>>>>> any >>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs >>>>>>> if >>>>>>>>>>> needed. >>>>>>>>>>>>> It's >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>> ideal; >>>>>>>>>>>>>>>>>>>>> agreed. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are >>>>>> the >>>>>>>>>>> benefits >>>>>>>>>>>> of >>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think >>>>>>> Ted >>>>>>>>> has >>>>>>>>>>>>>> summarized >>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> issues that we need to take care of - >>>>>>> basically, >>>>>>>>> the >>>>>>>>>>>> master >>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> keep >>>>>>>>>>>>>>>>>>>> track >>>>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the >>>>>> backup >>>>>>>>>> master >>>>>>>>>>>> can >>>>>>>>>>>>>>>> continue >>>>>>>>>>>>>>>>>>>> keeping >>>>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been >>>>>>>>> recorded >>>>>>>>>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>>>> proc >>>>>>>>>>>>>>>>>> WAL). >>>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed >>>>>>>>>>> backup/restore >>>>>>>>>>>>>>>>> processes. >>>>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to >>>>>>> run >>>>>>>> as >>>>>>>>>>>> 'hbase' >>>>>>>>>>>>>>> since >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>> owns >>>>>>>>>>>>>>>>>>>>> the data. Having the master launch the job >>>>>>> makes >>>>>>>> it >>>>>>>>>> get >>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> privilege. >>>>>>>>>>>>>>>>>>>> In >>>>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the >>>>>>>> above >>>>>>>>>>>>>> management. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is >>>>>>>> ready >>>>>>>>>>> from >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> overall >>>>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review >>>>>> is >>>>>>>>> still >>>>>>>>>>>>> pending >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>> Matteo). >>>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of >>>>>> doing >>>>>>>> this >>>>>>>>>>>> without >>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>>> MR, >>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't >>>>>>> think >>>>>>>> we >>>>>>>>>>>> should >>>>>>>>>>>>>>> block >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>> patch >>>>>>>>>>>>>>>>>>>>> from getting merged. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com> >>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM >>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org >>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by >>>>>>>> Master >>>>>>>>>> or >>>>>>>>>>> RS >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So what about a standalone service other than >>>>>>>>> master? >>>>>>>>>>> You >>>>>>>>>>>>> can >>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>> procedure store in that service? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < >>>>>>>>>> yuzhih...@gmail.com >>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> An earlier implementation was client >>>>>> driven. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to >>>>>> resume >>>>>>> if >>>>>>>>>> there >>>>>>>>>>>> is >>>>>>>>>>>>>>> error >>>>>>>>>>>>>>>>>>> midway. >>>>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / >>>>>> restore >>>>>>>>> more >>>>>>>>>>>>> robust. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Another consideration is for security. It >>>>>> is >>>>>>>> hard >>>>>>>>>> to >>>>>>>>>>>>>> enforce >>>>>>>>>>>>>>>>>> security >>>>>>>>>>>>>>>>>>>> (to >>>>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew >>>>>>> Purtell < >>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, >>>>>> which >>>>>>>> is >>>>>>>>>>>>> "shelling >>>>>>>>>>>>>>> out" >>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why >>>>>> not >>>>>>>>> drive >>>>>>>>>>>> this >>>>>>>>>>>>>>> with a >>>>>>>>>>>>>>>>>>> utility >>>>>>>>>>>>>>>>>>>>>> derived from Tool? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir >>>>>>>> Rodionov >>>>>>>>> < >>>>>>>>>>>>>>>>>>>> vladrodio...@gmail.com >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a >>>>>>> common >>>>>>>>>> case >>>>>>>>>>> we >>>>>>>>>>>>>> just >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. >>>>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR >>>>>> framework >>>>>>>>>>>> (especially >>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> features >>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced >>>>>>>>> another >>>>>>>>>>> cost >>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> maintain. >>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this >>>>>>> case. >>>>>>>>> Many >>>>>>>>>>> our >>>>>>>>>>>>>>>> customers >>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>> full >>>>>>>>>>>>>>>>>>>>>>>> stack deployed and >>>>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard >>>>>> feature. >>>>>>>>>> Besides >>>>>>>>>>>>> this, >>>>>>>>>>>>>>>>> nothing >>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>> happen >>>>>>>>>>>>>>>>>>>>>>>> in your cluster >>>>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R >>>>>>>>>>> dependency) >>>>>>>>>>>>> goes >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> nowhere. >>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to >>>>>> suggest >>>>>>>>>> another >>>>>>>>>>>>>>> framework >>>>>>>>>>>>>>>>>> (other >>>>>>>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>> M/R) >>>>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. >>>>>>> Still >>>>>>>>>>> waiting >>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> suggestions. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -Vlad >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted >>>>>> Yu < >>>>>>>>>>>>>>>> yuzhih...@gmail.com >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the >>>>>>>>> cluster, >>>>>>>>>>>> hbase >>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>>>> functions >>>>>>>>>>>>>>>>>>>>>>>>> normally (post merge). >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we >>>>>>> have >>>>>>>>> long >>>>>>>>>>>> been >>>>>>>>>>>>>>>>> depending >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at >>>>>> ExportSnapshot. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng >>>>>>> Chen >>>>>>>> < >>>>>>>>>>>>>>>>>>>> heng.chen.1...@gmail.com >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a >>>>>>> common >>>>>>>>>> case >>>>>>>>>>> we >>>>>>>>>>>>>> just >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. >>>>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR >>>>>> framework >>>>>>>>>>>> (especially >>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> features >>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced >>>>>>>>> another >>>>>>>>>>> cost >>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> maintain. >>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < >>>>>>>>>>>>> palomino...@gmail.com >>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice >>>>>>>>>>>>> Backup/Restore >>>>>>>>>>>>>>>>> feature, >>>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, >>>>>>> then >>>>>>>>> we >>>>>>>>>>>> could >>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>> depend >>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>> MR, >>>>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager >>>>>>>>> instance >>>>>>>>>>>> that >>>>>>>>>>>>>>>> submits >>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>> jobs >>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we >>>>>>>> think >>>>>>>>>>> this >>>>>>>>>>>>> is a >>>>>>>>>>>>>>>> core >>>>>>>>>>>>>>>>>>>> feature >>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd >>>>>>> better >>>>>>>>>>>> implement >>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < >>>>>>>>>>>>> palomino...@gmail.com >>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR >>>>>>>> jobs. >>>>>>>>>> It >>>>>>>>>>> is >>>>>>>>>>>>> OK >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think >>>>>> the >>>>>>>>> bottom >>>>>>>>>>>> line >>>>>>>>>>>>> is >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>> launch >>>>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by >>>>>>>> other >>>>>>>>>>>>> services. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew >>>>>>>> Purtell < >>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is >>>>>> on >>>>>>>> the >>>>>>>>>>> line >>>>>>>>>>>> I >>>>>>>>>>>>>>> think, >>>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>> fair >>>>>>>>>>>>>>>>>>>>>>>>>>>>> question. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility >>>>>>> derived >>>>>>>>>> from >>>>>>>>>>>> Tool >>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>> apps? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the >>>>>>>> AccessController >>>>>>>>>> to >>>>>>>>>>>>> decide >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>> allowed? >>>>>>>>>>>>>>>>>>>>> But >>>>>>>>>>>>>>>>>>>>>>>>>> nothing >>>>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the >>>>>>> job >>>>>>>>>>>>>>>>>>> manually/independently, >>>>>>>>>>>>>>>>>>>>>> right? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, >>>>>> Matteo >>>>>>>>>>> Bertozzi < >>>>>>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not >>>>>>> about >>>>>>>>>> tools >>>>>>>>>>>>> using >>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>> (everyone i >>>>>>>>>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok >>>>>> with >>>>>>>>>> running >>>>>>>>>>>> MR >>>>>>>>>>>>>> jobs >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>> Master >>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> RSs >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the >>>>>> first >>>>>>>> time >>>>>>>>>> we >>>>>>>>>>> do >>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, >>>>>>>>> Devaraj >>>>>>>>>>> Das >>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like >>>>>>>>>>>> ExportSnapshot >>>>>>>>>>>>> / >>>>>>>>>>>>>>>>> Backup / >>>>>>>>>>>>>>>>>>>>>> Restore, >>>>>>>>>>>>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is >>>>>>> the >>>>>>>>>> right >>>>>>>>>>>>>>> framework >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> such. >>>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR >>>>>> (just >>>>>>>>> saying >>>>>>>>>>> :) >>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ >>>>>>>>> __________ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < >>>>>> yuzhih...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, >>>>>> 2016 >>>>>>>> 2:00 >>>>>>>>>> PM >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs >>>>>>>>> started >>>>>>>>>>> by >>>>>>>>>>>>>> Master >>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>> RS >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in >>>>>>> the >>>>>>>>> same >>>>>>>>>>>>>> category >>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>> import >>>>>>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>>>>>>>>>> export. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, >>>>>>>> Andrew >>>>>>>>>>>>> Purtell < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around >>>>>>> core >>>>>>>> in >>>>>>>>>> my >>>>>>>>>>>>>> opinion. >>>>>>>>>>>>>>>>> Like >>>>>>>>>>>>>>>>>>>> import >>>>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>>> export. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's >>>>>>> fine. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, >>>>>>> Matteo >>>>>>>>>>>> Bertozzi >>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion >>>>>> around >>>>>>>>>> running >>>>>>>>>>> MR >>>>>>>>>>>>>> jobs >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>> hbase >>>>>>>>>>>>>>>>>>>>>>>>>> (Master >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that >>>>>> there >>>>>>>> was >>>>>>>>>>>>>> discussion >>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion >>>>>> where >>>>>>>>> around >>>>>>>>>>> MOB >>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> had >>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>> job >>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a >>>>>>>>> non-MR >>>>>>>>>>> job >>>>>>>>>>>> to >>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>> merged, >>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log >>>>>>>>>> split/replay. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup >>>>>>>> feature >>>>>>>>>>>>>>> (HBASE-7912), >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> runs >>>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>> job >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or >>>>>>> restore >>>>>>>>>> data. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really >>>>>> core" >>>>>>>> as >>>>>>>>>> in.. >>>>>>>>>>>> if >>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>>>>>>>> backup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but >>>>>>>> this >>>>>>>>>> was >>>>>>>>>>>>>> probably >>>>>>>>>>>>>>>>> true >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>> MOB >>>>>>>>>>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't >>>>>>> need >>>>>>>>>> MR") >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that >>>>>>>> says >>>>>>>>>> "we >>>>>>>>>>>>> don't >>>>>>>>>>>>>>> want >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>> hbase >>>>>>>>>>>>>>>>>>>>>>>>>> run >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started >>>>>> manually >>>>>>> by >>>>>>>>> the >>>>>>>>>>>> user >>>>>>>>>>>>>> can >>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>> that". >>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without >>>>>>>>>> problems? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>