The standalone service so far seems to be middle ground having the following advantages:
1. utilization of existing proc V2 framework for fault tolerance 2. friendliness to security support to be implemented in the next phase - security is hard to enforce from client side 3. not introducing MR calls in master or region servers Cheers On Sat, Sep 24, 2016 at 11:26 AM, Vladimir Rodionov <[email protected]> wrote: > >> So the standalone service would run out of proc - in the same vein as > REST > or thrift server. > > Ted, running separate process/service to coordinate backups is not a good > idea. We have already a lot of them. > > On Sat, Sep 24, 2016 at 11:20 AM, Ted Yu <[email protected]> wrote: > > > bq. don't call out to an external framework we don't own from master (or > > regionserver) code > > > > So the standalone service would run out of proc - in the same vein as > REST > > or thrift server. > > > > Cheers > > > > On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell < > [email protected] > > > > > wrote: > > > > > I was attempting to summarize Ted. > > > > > > A new maven module sounds like a good idea to me. Or we could move all > > the > > > tools that use MR out to one. Or... > > > > > > The key takeaway seems to be don't call out to an external framework we > > > don't own from master (or regionserver) code. > > > > > > > On Sep 24, 2016, at 10:15 AM, Ted Yu <[email protected]> wrote: > > > > > > > > bq. Internally the tool can also use the procedure framework for > state > > > > durability > > > > > > > > Isn't this the standalone service I proposed this morning ? > > > > > > > > bq. Move cross HBase and MR coordination to a separate tool > > > > > > > > Where should this tool live (hbase-backup module) ? > > > > > > > > Thanks > > > > > > > > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > > > [email protected]> > > > > wrote: > > > > > > > >> At branch merge voting time now more eyes are getting on the design > > > issues > > > >> with dissenting opinion emerging. This is the branch merge process > > > working > > > >> as our community has designed it. Because this is the first full > > project > > > >> review of the code and implementation I think we all have to be > > > flexible. I > > > >> see the community as trying to narrow the technical objection at > issue > > > to > > > >> the smallest possible scope. It's simple: don't call out to an > > external > > > >> execution framework we don't own from core master (and by extension > > > >> regionserver) code. We had this objection before to a proposed > > external > > > >> compaction implementation for > > > >> MOB so should not come as a surprise. Please let me know if I have > > > >> misstated this. > > > >> > > > >> This would seem to require a modest refactor of coordination to move > > > >> invocation of MR code out from any core code path. To restate what I > > > think > > > >> is an emerging recommendation: Move cross HBase and MR coordination > > to a > > > >> separate tool. This tool can ask the master to invoke procedures on > > the > > > >> HBase side that do first mile export and last mile restore. > > (Internally > > > the > > > >> tool can also use the procedure framework for state durability, > > perhaps, > > > >> just a thought.) Then the tool can further drive the things done > with > > MR > > > >> like shipping data off cluster or moving remote data in place and > > > preparing > > > >> it for import. These activities do not need procedure coordination > and > > > >> involvement of the HBase master. Only the first and last mile of the > > > >> process needs atomicity within the HBase deploy. Please let me know > > if I > > > >> have misstated this. > > > >> > > > >> > > > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <[email protected]> wrote: > > > >>> > > > >>> bq. procedure gives you a retry mechanism on failure > > > >>> > > > >>> We do need this mechanism. Take a look at the multi-step > > > >>> in FullTableBackupProcedure, etc. > > > >>> > > > >>> bq. let the user export it later when he wants > > > >>> > > > >>> This would make supporting security more complex (user A shouldn't > be > > > >>> exporting user B's backup). And it is not user friendly - at the > time > > > >>> backup request is issued, the following is specified: > > > >>> > > > >>> + + " BACKUP_ROOT The full root path to store the > backup > > > >>> image,\n" > > > >>> + + " the prefix can be hdfs, webhdfs or > > > gpfs\n" > > > >>> > > > >>> Backup root is an integral part of backup manifest. > > > >>> > > > >>> Cheers > > > >>> > > > >>> > > > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < > > > >> [email protected]> > > > >>> wrote: > > > >>> > > > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <[email protected]> > > wrote: > > > >>>>> > > > >>>>> Ideally the export should have one job running which does the > retry > > > (on > > > >>>>> failed partition) itself. > > > >>>>> > > > >>>> > > > >>>> procedure gives you a retry mechanism on failure. if you don't use > > > that, > > > >>>> than you don't need procedure. > > > >>>> if you want you can start a procedure executor in a non master > > process > > > >> (the > > > >>>> hbase-procedure is a separate package and does not depend on > > master). > > > >> but > > > >>>> again, export seems a case where you don't need procedure. > > > >>>> > > > >>>> like snapshot, the logic may just be: ask the master to take a > > backup. > > > >> and > > > >>>> let the user export it later when he wants. so you avoid having a > MR > > > job > > > >>>> started by the master since people does not seems to like it. > > > >>>> > > > >>>> for restore (I think that is where you use the MR splitter) you > can > > > >>>> probably just have a backup ready (already splitted). there is > > > already a > > > >>>> jira that should do that HBASE-14135. instead of doing the > operation > > > of > > > >>>> split/merge on restore. you consolidate the backup "offline" (mr > job > > > >>>> started by the user) and then ask to restore the backup. > > > >>>> > > > >>>> > > > >>>>> > > > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < > > > >>>> [email protected]> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> as far as I understand the code, you don't need procedure for > the > > > >>>> export > > > >>>>>> itself. > > > >>>>>> the export operation is already idempotent, since you are just > > > copying > > > >>>>>> files. > > > >>>>>> if the file exist and is complete (check length, checksum, ...) > > you > > > >> can > > > >>>>>> skip it, > > > >>>>>> otherwise you'll send it over again. > > > >>>>>> > > > >>>>>> you need the proc for taking the backup and restoring, > > > >>>>>> because you want to complete the operation and end up with a > > > >> consistent > > > >>>>>> state > > > >>>>>> across the multiple components you are updating (meta, fs, ...) > > > >>>>>> but again, for export you can just run the tool over and over > > until > > > >> the > > > >>>>>> operation succeed, and that should be ok. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Matteo > > > >>>>>> > > > >>>>>> > > > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <[email protected]> > > > wrote: > > > >>>>>>> > > > >>>>>>> Master is involved in this discussion because currently only > > Master > > > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for > > > >>>> backup / > > > >>>>>>> restore. > > > >>>>>>> > > > >>>>>>> What if an optional standalone service which hosts > > > ProcedureExecutor > > > >>>> is > > > >>>>>>> used for this purpose ? > > > >>>>>>> Would that have better chance of giving us middle ground so > that > > we > > > >>>> can > > > >>>>>>> move this forward ? > > > >>>>>>> > > > >>>>>>> Cheers > > > >>>>>>> > > > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <[email protected]> > > wrote: > > > >>>>>>>> > > > >>>>>>>> (Moved out of the Master doing MR DISCUSSION) > > > >>>>>>>> > > > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < > > > >>>>>>>> [email protected]> > > > >>>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>>>> -1 on that backup be in core hbase > > > >>>>>>>>> > > > >>>>>>>>> Not sure I understand what it means. > > > >>>>>>>>> > > > >>>>>>>>> Sorry for the imprecision. > > > >>>>>>>> > > > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a > > > dependency > > > >>>>> and > > > >>>>>>> so > > > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if > > optional. > > > >>>>>>>> > > > >>>>>>>> Master should not depend on MR. We've gone out of our way to > > avoid > > > >>>>>> taking > > > >>>>>>>> MR on as dependency in the past. Seems late in the game for us > > to > > > >>>>>> change > > > >>>>>>>> our opinion on this. If we didn't do it for distributed log > > > >>>>> splitting, > > > >>>>>> or > > > >>>>>>>> MOB, why would we do it to support an optional backup/restore? > > > >>>>>>>> > > > >>>>>>>> I have opinions on the questions below -- i.e. that Master > > running > > > >>>>>>>> backup/restore is outside of the Master's charge -- but they > are > > > >>>> not > > > >>>>>>> worth > > > >>>>>>>> much since I've not done much by way of review or contrib to > > > >>>>>>> backup/restore > > > >>>>>>>> other than to try it as a 'user' so I'll keep them to myself > > until > > > >>>> I > > > >>>>>> do. > > > >>>>>>> I > > > >>>>>>>> only came out from under my shell to participate on the MR as > > > >>>>>> dependency > > > >>>>>>>> chat. > > > >>>>>>>> > > > >>>>>>>> Thanks, > > > >>>>>>>> M > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole > > > >>>> process? > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> We > > > >>>>>>>>> have already brought up all advantages of using > > > >>>>>>>>> Master and distributed procedures for backup and restore. > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> Downside of moving this to client tool is lack of fault > > > >>>> tolerance: > > > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can, > > > >>>>>>> potentially > > > >>>>>>>>> affect > > > >>>>>>>>> cluster, such as disabling splits/merges, balancer. > > > >>>>>>>>> 1.2 In case of client failure who will be doing the whole > > > >>>> rollback > > > >>>>>>>> stuff? > > > >>>>>>>>> We are trying to make it atomic. > > > >>>>>>>>> > > > >>>>>>>>> Security is not clear. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> 2. We are not allowed to modify code of existing HBase core > > > classes > > > >>>>>> (what > > > >>>>>>>>> does core mean anyway)? > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>> 3. We are not allowed to create backup system table > > > >>>> (hbase:backup) > > > >>>>>> in a > > > >>>>>>>>> system space? Only in user space? The table is global. > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we > > > >>>> have > > > >>>>>>>> touched, > > > >>>>>>>>> of course some existing HBase code. > > > >>>>>>>>> 3. is not that critical, of course we can move backup system > > into > > > >>>>>> user > > > >>>>>>>>> space. > > > >>>>>>>>> > > > >>>>>>>>> And finally, will moving backup into external tool give us +1 > > > >>>> from > > > >>>>>>> stack? > > > >>>>>>>>> > > > >>>>>>>>> -Vlad > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <[email protected]> > > > >>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < > > > >>>>>>>>>> [email protected]> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>>>>> + MR is dead > > > >>>>>>>>>>> > > > >>>>>>>>>>> Does MR know that? :) > > > >>>>>>>>>>> > > > >>>>>>>>>>> Again. With all due respect, stack - still no suggestions > > > >>>> what > > > >>>>>>> should > > > >>>>>>>>> we > > > >>>>>>>>>>> use for "bulk data move and transformation" instead of MR? > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, > Spark, > > > >>>>>>>>> distributed > > > >>>>>>>>>> shell -- just don't have HBase core depend on it, even > > > >>>>> optionally. > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In > my > > > >>>>>>>> opinion, > > > >>>>>>>>>> some > > > >>>>>>>>>>> group members still not sure about that and some will give > -1 > > > >>>>>>>>>>> in any case. Just because ... > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core > hbase > > > >>>> (+1 > > > >>>>>> on > > > >>>>>>>>> adding > > > >>>>>>>>>> all the API any such external tool might need to run). > > > >>>>>>>>>> > > > >>>>>>>>>> St.Ack > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>>> -Vlad > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <[email protected]> > > > >>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < > > > >>>>>>>>>>> [email protected]> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> let me try to go back to my original topic. > > > >>>>>>>>>>>>> this question was meant to be generic, and provide some > > > >>>>> rule > > > >>>>>>> for > > > >>>>>>>>>> future > > > >>>>>>>>>>>>> code. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone > > > >>>>> can > > > >>>>>>> be: > > > >>>>>>>>>>>>> - we don't want any core feature (e.g. > > > >>>>>>> compaction/log-split/log- > > > >>>>>>>>>>> reply) > > > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an > > > >>>>>>>>>>>>> external/uncontrolled MR setup. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> +1 > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a > > > >>>>>> flag) > > > >>>>>>>> to > > > >>>>>>>>>> run > > > >>>>>>>>>>> MR > > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > > > >>>> is > > > >>>>>> not > > > >>>>>>>>>>> required. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind > > > >>>> a > > > >>>>>> flag > > > >>>>>>>> or > > > >>>>>>>>>> not > > > >>>>>>>>>>> -- > > > >>>>>>>>>>>> ever being able to launch MR jobs. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > > > >>>> from > > > >>>>>>>>>> hbase-server > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > > > >>>>>> peer). > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and > Appy > > > >>>>> are > > > >>>>>>>> busy > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > > > >>>> not > > > >>>>>>>> clutter > > > >>>>>>>>>>> task > > > >>>>>>>>>>>> harder by piling on more moving parts. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> St.Ack > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Matteo > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > > >>>>> [email protected] > > > >>>>>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > > > >>>> AssignmentManager > > > >>>>>>> which > > > >>>>>>>>> is > > > >>>>>>>>>> to > > > >>>>>>>>>>>>> make > > > >>>>>>>>>>>>>> Master more stable. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Cheers > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > > >>>>> [email protected] > > > >>>>>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > > > >>>>>>> sequence > > > >>>>>>>>> of > > > >>>>>>>>>>>> calls > > > >>>>>>>>>>>>>> when > > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > > >>>> regionserver > > > >>>>>> so > > > >>>>>>> it > > > >>>>>>>>>>> extends > > > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > > > >>>> HRegionServer > > > >>>>>>>>> sometimes > > > >>>>>>>>>>>> needs > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > > > >>>> cause > > > >>>>>>>>>>> probabilistic > > > >>>>>>>>>>>>> dead > > > >>>>>>>>>>>>>>> lock or some strange NPEs... > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > > > >>>> add > > > >>>>>> new > > > >>>>>>>>>> features > > > >>>>>>>>>>>> or > > > >>>>>>>>>>>>>> add > > > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more > > > >>>>>> works > > > >>>>>>>> for > > > >>>>>>>>>> the > > > >>>>>>>>>>>>> start > > > >>>>>>>>>>>>>>> up processing... > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Thanks. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > > > >>>> [email protected] > > > >>>>>> : > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> I read through HADOOP-13433 > > > >>>>>>>>>>>>>>>> <https://issues.apache.org/ > > > >>>> jira/browse/HADOOP-13433> > > > >>>>> - > > > >>>>>>> the > > > >>>>>>>>>> cited > > > >>>>>>>>>>>>> race > > > >>>>>>>>>>>>>>>> condition is in jdk. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > > > >>>>> moving. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > > > >>>>>> problem... > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > > > >>>> it > > > >>>>> in > > > >>>>>>> the > > > >>>>>>>>>>> backup > > > >>>>>>>>>>>> / > > > >>>>>>>>>>>>>>>> restore mega patch ? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Cheers > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > > > >>>>>>>> [email protected]> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature > > > >>>> in > > > >>>>>> the > > > >>>>>>>> MR > > > >>>>>>>>>> way > > > >>>>>>>>>>>> and > > > >>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > > > >>>>> it > > > >>>>>>> as I > > > >>>>>>>>> do > > > >>>>>>>>>>> not > > > >>>>>>>>>>>>> want > > > >>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>> block the development progress. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > > > >>>> the > > > >>>>>>>> design > > > >>>>>>>>>> and > > > >>>>>>>>>>>> see > > > >>>>>>>>>>>>> if > > > >>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > > > >>>>>>> possible. > > > >>>>>>>>> HA > > > >>>>>>>>>> is > > > >>>>>>>>>>>>> not a > > > >>>>>>>>>>>>>>> big > > > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally. > > > >>>> But > > > >>>>>> the > > > >>>>>>>>> ugly > > > >>>>>>>>>>> code > > > >>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>> HMaster is readlly a problem... > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a > > > >>>> long > > > >>>>>>> time. > > > >>>>>>>>> Can > > > >>>>>>>>>>>>> someone > > > >>>>>>>>>>>>>>>> help > > > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, > > > >>>>> ugly > > > >>>>>>>>> code... > > > >>>>>>>>>>>>> logout > > > >>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is > > > >>>>> still > > > >>>>>>>> being > > > >>>>>>>>>>> used, > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the > > > >>>>>>> behivor > > > >>>>>>>>> and > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>> only > > > >>>>>>>>>>>>>>> way > > > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly > > > >>>> code... > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> https://issues.apache.org/ > > > >>>> jira/browse/HADOOP-13433 > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > >>>>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>>>> : > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > > >>>> doing > > > >>>>>>> this > > > >>>>>>>>>>> without > > > >>>>>>>>>>>>>> using > > > >>>>>>>>>>>>>>>> MR, > > > >>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>> can certainly consider that > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is > > > >>>>>> abstract > > > >>>>>>>> and > > > >>>>>>>>>>> allows > > > >>>>>>>>>>>>>>>>>> different implementations. MR is just one > > > >>>>>>>> implementation > > > >>>>>>>>> we > > > >>>>>>>>>>>>>> provide. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> -Vlad > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > >>>>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the > > > >>>>>> topic > > > >>>>>>>> of > > > >>>>>>>>>>>> MR-based > > > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about > > > >>>> the > > > >>>>>>>>>>> SpliceMachine > > > >>>>>>>>>>>>>>>> approach > > > >>>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where > > > >>>> apparently > > > >>>>>> they > > > >>>>>>>>> saw a > > > >>>>>>>>>>> lot > > > >>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>> benefits. > > > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat > > > >>>>>> Andrew; I > > > >>>>>>>>>> really > > > >>>>>>>>>>>>> didn't > > > >>>>>>>>>>>>>>>> mean > > > >>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>> :-) > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: > > > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that > > > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master > > > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), > > > >>>>> and I > > > >>>>>>>> don't > > > >>>>>>>>>>> think > > > >>>>>>>>>>>>>> it's > > > >>>>>>>>>>>>>>>> even > > > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something > > > >>>>>> when > > > >>>>>>> MR > > > >>>>>>>>> is > > > >>>>>>>>>>>>> already > > > >>>>>>>>>>>>>>>> there, > > > >>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>> being used by HBase already for some > > > >>>>> operations. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of > > > >>>>> issues - > > > >>>>>>> HA > > > >>>>>>>> of > > > >>>>>>>>>> the > > > >>>>>>>>>>>>>> server > > > >>>>>>>>>>>>>>>> not > > > >>>>>>>>>>>>>>>>>>> being the least of them all. Security > > > >>>> (kerberos > > > >>>>>>>>>>>> authentication, > > > >>>>>>>>>>>>>>>> another > > > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that > > > >>>>>>> approach > > > >>>>>>>>> is > > > >>>>>>>>>>> DOA. > > > >>>>>>>>>>>>>>> Instead > > > >>>>>>>>>>>>>>>>>> let's > > > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I > > > >>>>>>> haven't > > > >>>>>>>>> seen > > > >>>>>>>>>>> any > > > >>>>>>>>>>>>>> good > > > >>>>>>>>>>>>>>>>> reason > > > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs > > > >>>>> if > > > >>>>>>>>> needed. > > > >>>>>>>>>>> It's > > > >>>>>>>>>>>>> not > > > >>>>>>>>>>>>>>>>> ideal; > > > >>>>>>>>>>>>>>>>>>> agreed. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are > > > >>>> the > > > >>>>>>>>> benefits > > > >>>>>>>>>> of > > > >>>>>>>>>>>>>> running > > > >>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think > > > >>>>> Ted > > > >>>>>>> has > > > >>>>>>>>>>>> summarized > > > >>>>>>>>>>>>>>> some > > > >>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>> issues that we need to take care of - > > > >>>>> basically, > > > >>>>>>> the > > > >>>>>>>>>> master > > > >>>>>>>>>>>> can > > > >>>>>>>>>>>>>>> keep > > > >>>>>>>>>>>>>>>>>> track > > > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the > > > >>>> backup > > > >>>>>>>> master > > > >>>>>>>>>> can > > > >>>>>>>>>>>>>> continue > > > >>>>>>>>>>>>>>>>>> keeping > > > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been > > > >>>>>>> recorded > > > >>>>>>>>> in > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>> proc > > > >>>>>>>>>>>>>>>> WAL). > > > >>>>>>>>>>>>>>>>>> The > > > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed > > > >>>>>>>>> backup/restore > > > >>>>>>>>>>>>>>> processes. > > > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to > > > >>>>> run > > > >>>>>> as > > > >>>>>>>>>> 'hbase' > > > >>>>>>>>>>>>> since > > > >>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>> owns > > > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job > > > >>>>> makes > > > >>>>>> it > > > >>>>>>>> get > > > >>>>>>>>>>> that > > > >>>>>>>>>>>>>>>> privilege. > > > >>>>>>>>>>>>>>>>>> In > > > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the > > > >>>>>> above > > > >>>>>>>>>>>> management. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is > > > >>>>>> ready > > > >>>>>>>>> from > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>>> overall > > > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review > > > >>>> is > > > >>>>>>> still > > > >>>>>>>>>>> pending > > > >>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>>> Matteo). > > > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > > > >>>> doing > > > >>>>>> this > > > >>>>>>>>>> without > > > >>>>>>>>>>>>> using > > > >>>>>>>>>>>>>>> MR, > > > >>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't > > > >>>>> think > > > >>>>>> we > > > >>>>>>>>>> should > > > >>>>>>>>>>>>> block > > > >>>>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>>>> patch > > > >>>>>>>>>>>>>>>>>>> from getting merged. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> ________________________________________ > > > >>>>>>>>>>>>>>>>>>> From: 张铎 <[email protected]> > > > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM > > > >>>>>>>>>>>>>>>>>>> To: [email protected] > > > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by > > > >>>>>> Master > > > >>>>>>>> or > > > >>>>>>>>> RS > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than > > > >>>>>>> master? > > > >>>>>>>>> You > > > >>>>>>>>>>> can > > > >>>>>>>>>>>>> use > > > >>>>>>>>>>>>>>>> your > > > >>>>>>>>>>>>>>>>>> own > > > >>>>>>>>>>>>>>>>>>> procedure store in that service? > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < > > > >>>>>>>> [email protected] > > > >>>>>>>>>> : > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client > > > >>>> driven. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to > > > >>>> resume > > > >>>>> if > > > >>>>>>>> there > > > >>>>>>>>>> is > > > >>>>>>>>>>>>> error > > > >>>>>>>>>>>>>>>>> midway. > > > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / > > > >>>> restore > > > >>>>>>> more > > > >>>>>>>>>>> robust. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It > > > >>>> is > > > >>>>>> hard > > > >>>>>>>> to > > > >>>>>>>>>>>> enforce > > > >>>>>>>>>>>>>>>> security > > > >>>>>>>>>>>>>>>>>> (to > > > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Cheers > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew > > > >>>>> Purtell < > > > >>>>>>>>>>>>>>>>>> [email protected]> > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, > > > >>>> which > > > >>>>>> is > > > >>>>>>>>>>> "shelling > > > >>>>>>>>>>>>> out" > > > >>>>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why > > > >>>> not > > > >>>>>>> drive > > > >>>>>>>>>> this > > > >>>>>>>>>>>>> with a > > > >>>>>>>>>>>>>>>>> utility > > > >>>>>>>>>>>>>>>>>>>> derived from Tool? > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir > > > >>>>>> Rodionov > > > >>>>>>> < > > > >>>>>>>>>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > > >>>>> common > > > >>>>>>>> case > > > >>>>>>>>> we > > > >>>>>>>>>>>> just > > > >>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>>>> HDFS > > > >>>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > > >>>> framework > > > >>>>>>>>>> (especially > > > >>>>>>>>>>>> some > > > >>>>>>>>>>>>>>>>> features > > > >>>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > > >>>>>>> another > > > >>>>>>>>> cost > > > >>>>>>>>>>> for > > > >>>>>>>>>>>>>>>> maintain. > > > >>>>>>>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this > > > >>>>> case. > > > >>>>>>> Many > > > >>>>>>>>> our > > > >>>>>>>>>>>>>> customers > > > >>>>>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>>>>>> full > > > >>>>>>>>>>>>>>>>>>>>>> stack deployed and > > > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard > > > >>>> feature. > > > >>>>>>>> Besides > > > >>>>>>>>>>> this, > > > >>>>>>>>>>>>>>> nothing > > > >>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>>> happen > > > >>>>>>>>>>>>>>>>>>>>>> in your cluster > > > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R > > > >>>>>>>>> dependency) > > > >>>>>>>>>>> goes > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>> nowhere. > > > >>>>>>>>>>>>>>>>>>> We > > > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to > > > >>>> suggest > > > >>>>>>>> another > > > >>>>>>>>>>>>> framework > > > >>>>>>>>>>>>>>>> (other > > > >>>>>>>>>>>>>>>>>>> than > > > >>>>>>>>>>>>>>>>>>>> M/R) > > > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. > > > >>>>> Still > > > >>>>>>>>> waiting > > > >>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>> suggestions. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> -Vlad > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted > > > >>>> Yu < > > > >>>>>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the > > > >>>>>>> cluster, > > > >>>>>>>>>> hbase > > > >>>>>>>>>>>>> still > > > >>>>>>>>>>>>>>>>>> functions > > > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge). > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we > > > >>>>> have > > > >>>>>>> long > > > >>>>>>>>>> been > > > >>>>>>>>>>>>>>> depending > > > >>>>>>>>>>>>>>>> on > > > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at > > > >>>> ExportSnapshot. > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng > > > >>>>> Chen > > > >>>>>> < > > > >>>>>>>>>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > > > >>>>> common > > > >>>>>>>> case > > > >>>>>>>>> we > > > >>>>>>>>>>>> just > > > >>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>>>> HDFS > > > >>>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > > > >>>> framework > > > >>>>>>>>>> (especially > > > >>>>>>>>>>>> some > > > >>>>>>>>>>>>>>>>> features > > > >>>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > > > >>>>>>> another > > > >>>>>>>>> cost > > > >>>>>>>>>>> for > > > >>>>>>>>>>>>>>>> maintain. > > > >>>>>>>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 < > > > >>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>> : > > > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice > > > >>>>>>>>>>> Backup/Restore > > > >>>>>>>>>>>>>>> feature, > > > >>>>>>>>>>>>>>>>> if > > > >>>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, > > > >>>>> then > > > >>>>>>> we > > > >>>>>>>>>> could > > > >>>>>>>>>>>> make > > > >>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>> depend > > > >>>>>>>>>>>>>>>>>>> on > > > >>>>>>>>>>>>>>>>>>>>>>> MR, > > > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager > > > >>>>>>> instance > > > >>>>>>>>>> that > > > >>>>>>>>>>>>>> submits > > > >>>>>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>> jobs > > > >>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>>> do > > > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we > > > >>>>>> think > > > >>>>>>>>> this > > > >>>>>>>>>>> is a > > > >>>>>>>>>>>>>> core > > > >>>>>>>>>>>>>>>>>> feature > > > >>>>>>>>>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd > > > >>>>> better > > > >>>>>>>>>> implement > > > >>>>>>>>>>> it > > > >>>>>>>>>>>>>>> without > > > >>>>>>>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks. > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 < > > > >>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>> : > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR > > > >>>>>> jobs. > > > >>>>>>>> It > > > >>>>>>>>> is > > > >>>>>>>>>>> OK > > > >>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>>> some > > > >>>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>>> our > > > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think > > > >>>> the > > > >>>>>>> bottom > > > >>>>>>>>>> line > > > >>>>>>>>>>> is > > > >>>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>> should > > > >>>>>>>>>>>>>>>>>>>>>>>> launch > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by > > > >>>>>> other > > > >>>>>>>>>>> services. > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew > > > >>>>>> Purtell < > > > >>>>>>>>>>>>>>>>>>> [email protected] > > > >>>>>>>>>>>>>>>>>>>>> : > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is > > > >>>> on > > > >>>>>> the > > > >>>>>>>>> line > > > >>>>>>>>>> I > > > >>>>>>>>>>>>> think, > > > >>>>>>>>>>>>>>> so > > > >>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>> fair > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> question. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility > > > >>>>> derived > > > >>>>>>>> from > > > >>>>>>>>>> Tool > > > >>>>>>>>>>>>> like > > > >>>>>>>>>>>>>>> our > > > >>>>>>>>>>>>>>>>>> other > > > >>>>>>>>>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>>>>>>>> apps? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the > > > >>>>>> AccessController > > > >>>>>>>> to > > > >>>>>>>>>>> decide > > > >>>>>>>>>>>>> if > > > >>>>>>>>>>>>>>>>> allowed? > > > >>>>>>>>>>>>>>>>>>> But > > > >>>>>>>>>>>>>>>>>>>>>>>> nothing > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the > > > >>>>> job > > > >>>>>>>>>>>>>>>>> manually/independently, > > > >>>>>>>>>>>>>>>>>>>> right? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, > > > >>>> Matteo > > > >>>>>>>>> Bertozzi < > > > >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not > > > >>>>> about > > > >>>>>>>> tools > > > >>>>>>>>>>> using > > > >>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>> (everyone i > > > >>>>>>>>>>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok > > > >>>> with > > > >>>>>>>> running > > > >>>>>>>>>> MR > > > >>>>>>>>>>>> jobs > > > >>>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>>> Master > > > >>>>>>>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>>>>> RSs > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the > > > >>>> first > > > >>>>>> time > > > >>>>>>>> we > > > >>>>>>>>> do > > > >>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, > > > >>>>>>> Devaraj > > > >>>>>>>>> Das > > > >>>>>>>>>> < > > > >>>>>>>>>>>>>>>>>>>>>>> [email protected]> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like > > > >>>>>>>>>> ExportSnapshot > > > >>>>>>>>>>> / > > > >>>>>>>>>>>>>>> Backup / > > > >>>>>>>>>>>>>>>>>>>> Restore, > > > >>>>>>>>>>>>>>>>>>>>>>>> it's > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is > > > >>>>> the > > > >>>>>>>> right > > > >>>>>>>>>>>>> framework > > > >>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>> such. > > > >>>>>>>>>>>>>>>>>>>> We > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> should > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR > > > >>>> (just > > > >>>>>>> saying > > > >>>>>>>>> :) > > > >>>>>>>>>> ) > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ > > > >>>>>>> __________ > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < > > > >>>> [email protected]> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, > > > >>>> 2016 > > > >>>>>> 2:00 > > > >>>>>>>> PM > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: [email protected] > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs > > > >>>>>>> started > > > >>>>>>>>> by > > > >>>>>>>>>>>> Master > > > >>>>>>>>>>>>>> or > > > >>>>>>>>>>>>>>> RS > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in > > > >>>>> the > > > >>>>>>> same > > > >>>>>>>>>>>> category > > > >>>>>>>>>>>>> as > > > >>>>>>>>>>>>>>>>> import > > > >>>>>>>>>>>>>>>>>> / > > > >>>>>>>>>>>>>>>>>>>>>>>> export. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, > > > >>>>>> Andrew > > > >>>>>>>>>>> Purtell < > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around > > > >>>>> core > > > >>>>>> in > > > >>>>>>>> my > > > >>>>>>>>>>>> opinion. > > > >>>>>>>>>>>>>>> Like > > > >>>>>>>>>>>>>>>>>> import > > > >>>>>>>>>>>>>>>>>>>> or > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> export. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's > > > >>>>> fine. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, > > > >>>>> Matteo > > > >>>>>>>>>> Bertozzi > > > >>>>>>>>>>> < > > > >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion > > > >>>> around > > > >>>>>>>> running > > > >>>>>>>>> MR > > > >>>>>>>>>>>> jobs > > > >>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>> hbase > > > >>>>>>>>>>>>>>>>>>>>>>>> (Master > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> or > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that > > > >>>> there > > > >>>>>> was > > > >>>>>>>>>>>> discussion > > > >>>>>>>>>>>>>>> about > > > >>>>>>>>>>>>>>>>> not > > > >>>>>>>>>>>>>>>>>>>>>>> having > > > >>>>>>>>>>>>>>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion > > > >>>> where > > > >>>>>>> around > > > >>>>>>>>> MOB > > > >>>>>>>>>>>> that > > > >>>>>>>>>>>>>> had > > > >>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>> job > > > >>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a > > > >>>>>>> non-MR > > > >>>>>>>>> job > > > >>>>>>>>>> to > > > >>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>> merged, > > > >>>>>>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log > > > >>>>>>>> split/replay. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup > > > >>>>>> feature > > > >>>>>>>>>>>>> (HBASE-7912), > > > >>>>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>>>>> runs > > > >>>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>>>>>>>> job > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or > > > >>>>> restore > > > >>>>>>>> data. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really > > > >>>> core" > > > >>>>>> as > > > >>>>>>>> in.. > > > >>>>>>>>>> if > > > >>>>>>>>>>>> you > > > >>>>>>>>>>>>>>> don't > > > >>>>>>>>>>>>>>>>> use > > > >>>>>>>>>>>>>>>>>>>>>>> backup > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but > > > >>>>>> this > > > >>>>>>>> was > > > >>>>>>>>>>>> probably > > > >>>>>>>>>>>>>>> true > > > >>>>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>> MOB > > > >>>>>>>>>>>>>>>>>>>>>>> as > > > >>>>>>>>>>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't > > > >>>>> need > > > >>>>>>>> MR") > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that > > > >>>>>> says > > > >>>>>>>> "we > > > >>>>>>>>>>> don't > > > >>>>>>>>>>>>> want > > > >>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>>>>>>>>>> hbase > > > >>>>>>>>>>>>>>>>>>>>>>>> run > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started > > > >>>> manually > > > >>>>> by > > > >>>>>>> the > > > >>>>>>>>>> user > > > >>>>>>>>>>>> can > > > >>>>>>>>>>>>> do > > > >>>>>>>>>>>>>>>>> that". > > > >>>>>>>>>>>>>>>>>> or > > > >>>>>>>>>>>>>>>>>>>>>>> can > > > >>>>>>>>>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without > > > >>>>>>>> problems? > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >> > > > > > >
