No, this misses Matteo's finer point, which is "shelling out" from the master directly to run MR is a first. Why not drive this with a utility derived from Tool?
On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <vladrodio...@gmail.com> wrote: >>> In our production cluster, it is a common case we just have HDFS and >>> HBase deployed. >>> If our Master/RS depend on MR framework (especially some features we >>> have not used at all), it introduced another cost for maintain. I >>> don't think it is a good idea. > > So , you are not backup users in this case. Many our customers have full > stack deployed and > want see backup to be a standard feature. Besides this, nothing will happen > in your cluster > if you won't be doing backups. > > This discussion (we do not want see M/R dependency) goes to nowhere. We > asked already, at least twice, to suggest another framework (other than M/R) > for bulk data copy with *conversion*. Still waiting for suggestions. > > -Vlad > > > > >> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> If MR framework is not deployed in the cluster, hbase still functions >> normally (post merge). >> >> In terms of build time dependency, we have long been depending on >> mapreduce. Take a look at ExportSnapshot. >> >> Cheers >> >> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen <heng.chen.1...@gmail.com> >> wrote: >> >>> In our production cluster, it is a common case we just have HDFS and >>> HBase deployed. >>> If our Master/RS depend on MR framework (especially some features we >>> have not used at all), it introduced another cost for maintain. I >>> don't think it is a good idea. >>> >>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino...@gmail.com>: >>>> To be specific, for example, our nice Backup/Restore feature, if we >> think >>>> this is not a core feature of HBase, then we could make it depend on >> MR, >>>> and start a standalone BackupManager instance that submits MR jobs to >> do >>>> periodical maintenance job. And if we think this is a core feature that >>>> everyone should use it, then we'd better implement it without MR >>>> dependency, like DLS. >>>> >>>> Thanks. >>>> >>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino...@gmail.com>: >>>> >>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >>>>> features depend on MR but I think the bottom line is that we should >>> launch >>>>> the jobs from outside manually or by other services. >>>>> >>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <andrew.purt...@gmail.com>: >>>>> >>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair >>>>>> question. >>>>>> >>>>>> Can this be driven by a utility derived from Tool like our other MR >>> apps? >>>>>> The issue is needing the AccessController to decide if allowed? But >>> nothing >>>>>> prevents the user from running the job manually/independently, right? >>>>>> >>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >>> theo.berto...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> just a remark. my query was not about tools using MR (everyone i >>> think >>>>>> is >>>>>>> ok with those). >>>>>>> the topic was about: "are we ok with running MR jobs from Master >> and >>> RSs >>>>>>> code?" since this will be the first time we do this >>>>>>> >>>>>>> Matteo >>>>>>> >>>>>>> >>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < >> d...@hortonworks.com> >>>>>> wrote: >>>>>>>> >>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / Restore, >>> it's >>>>>>>> fine to be dependent on MR. MR is the right framework for such. We >>>>>> should >>>>>>>> also do compactions using MR (just saying :) ) >>>>>>>> ________________________________________ >>>>>>>> From: Ted Yu <yuzhih...@gmail.com> >>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM >>>>>>>> To: dev@hbase.apache.org >>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >>>>>>>> >>>>>>>> I agree - backup / restore is in the same category as import / >>> export. >>>>>>>> >>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < >>>>>> andrew.purt...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Backup is extra tooling around core in my opinion. Like import or >>>>>> export. >>>>>>>>> Or the optional MOB tool. It's fine. >>>>>>>>> >>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < >>> mberto...@apache.org> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> What's the latest opinion around running MR jobs from hbase >>> (Master >>>>>> or >>>>>>>>> RS)? >>>>>>>>>> >>>>>>>>>> I remember in the past that there was discussion about not >> having >>> MR >>>>>>>> has >>>>>>>>>> direct dependency of hbase. >>>>>>>>>> >>>>>>>>>> I think some of discussion where around MOB that had a MR job to >>>>>>>> compact, >>>>>>>>>> that later was transformed in a non-MR job to be merged, I think >>> we >>>>>>>> had a >>>>>>>>>> similar discussion for log split/replay. >>>>>>>>>> >>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that runs a >> MR >>> job >>>>>>>>> from >>>>>>>>>> the master to copy data or restore data. >>>>>>>>>> (backup is also "not really core" as in.. if you don't use >> backup >>>>>>>> you'll >>>>>>>>>> not end up running MR jobs, but this was probably true for MOB >> as >>> in >>>>>>>> "if >>>>>>>>>> you don't enable MOB you don't need MR") >>>>>>>>>> >>>>>>>>>> any thoughts? do we a rule that says "we don't want to have >> hbase >>> run >>>>>>>> MR >>>>>>>>>> jobs, only tool started manually by the user can do that". or >> can >>> we >>>>>>>>> start >>>>>>>>>> adding MR calls around without problems? >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>> >>