>> HBase is a founding partner of a Hadoop stack: HDFS, MapReduce, HBase

and should stay in Hadoop stack (with HDFS and Yarn/MapReduce). The world
(of NoSQL) outside of Hadoop is scary (C* is probably
the least scariest of all). I personally do not mind code refactoring and
moving everything from Master to a separate client tool.
We have already hbck to repair HBase, we will have backup repair tool as
well - to repair failed backup/restore sessions. We
will delegate all these fault-tolerance duties to a user.

-Vlad

On Sat, Sep 24, 2016 at 11:08 AM, Vladimir Rodionov <vladrodio...@gmail.com>
wrote:

> >> The key takeaway seems to be don't call out to an external framework we
> don't own from master (or regionserver) code.
> Should we ban HDFS as well?
>
> HBase is a founding partner of a Hadoop stack: HDFS, MapReduce, HBase
>
> -Vlad
>
> On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purt...@gmail.com
> > wrote:
>
>> I was attempting to summarize Ted.
>>
>> A new maven module sounds like a good idea to me. Or we could move all
>> the tools that use MR out to one. Or...
>>
>> The key takeaway seems to be don't call out to an external framework we
>> don't own from master (or regionserver) code.
>>
>> > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>> >
>> > bq. Internally the tool can also use the procedure framework for state
>> > durability
>> >
>> > Isn't this the standalone service I proposed this morning ?
>> >
>> > bq. Move cross HBase and MR coordination to a separate tool
>> >
>> > Where should this tool live (hbase-backup module) ?
>> >
>> > Thanks
>> >
>> >
>> > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
>> andrew.purt...@gmail.com>
>> > wrote:
>> >
>> >> At branch merge voting time now more eyes are getting on the design
>> issues
>> >> with dissenting opinion emerging. This is the branch merge process
>> working
>> >> as our community has designed it. Because this is the first full
>> project
>> >> review of the code and implementation I think we all have to be
>> flexible. I
>> >> see the community as trying to narrow the technical objection at issue
>> to
>> >> the smallest possible scope. It's simple: don't call out to an external
>> >> execution framework we don't own from core master (and by extension
>> >> regionserver) code. We had this objection before to a proposed external
>> >> compaction implementation for
>> >> MOB so should not come as a surprise. Please let me know if I have
>> >> misstated this.
>> >>
>> >> This would seem to require a modest refactor of coordination to move
>> >> invocation of MR code out from any core code path. To restate what I
>> think
>> >> is an emerging recommendation: Move cross HBase and MR coordination to
>> a
>> >> separate tool. This tool can ask the master to invoke procedures on the
>> >> HBase side that do first mile export and last mile restore.
>> (Internally the
>> >> tool can also use the procedure framework for state durability,
>> perhaps,
>> >> just a thought.) Then the tool can further drive the things done with
>> MR
>> >> like shipping data off cluster or moving remote data in place and
>> preparing
>> >> it for import. These activities do not need procedure coordination and
>> >> involvement of the HBase master. Only the first and last mile of the
>> >> process needs atomicity within the HBase deploy. Please let me know if
>> I
>> >> have misstated this.
>> >>
>> >>
>> >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>> >>>
>> >>> bq. procedure gives you a retry mechanism on failure
>> >>>
>> >>> We do need this mechanism. Take a look at the multi-step
>> >>> in FullTableBackupProcedure, etc.
>> >>>
>> >>> bq. let the user export it later when he wants
>> >>>
>> >>> This would make supporting security more complex (user A shouldn't be
>> >>> exporting user B's backup). And it is not user friendly - at the time
>> >>> backup request is issued, the following is specified:
>> >>>
>> >>> +          + " BACKUP_ROOT     The full root path to store the backup
>> >>> image,\n"
>> >>> +          + "                 the prefix can be hdfs, webhdfs or
>> gpfs\n"
>> >>>
>> >>> Backup root is an integral part of backup manifest.
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
>> >> theo.berto...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhih...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> Ideally the export should have one job running which does the retry
>> (on
>> >>>>> failed partition) itself.
>> >>>>>
>> >>>>
>> >>>> procedure gives you a retry mechanism on failure. if you don't use
>> that,
>> >>>> than you don't need procedure.
>> >>>> if you want you can start a procedure executor in a non master
>> process
>> >> (the
>> >>>> hbase-procedure is a separate package and does not depend on master).
>> >> but
>> >>>> again, export seems a case where you don't need procedure.
>> >>>>
>> >>>> like snapshot, the logic may just be: ask the master to take a
>> backup.
>> >> and
>> >>>> let the user export it later when he wants. so you avoid having a MR
>> job
>> >>>> started by the master since people does not seems to like it.
>> >>>>
>> >>>> for restore (I think that is where you use the MR splitter) you can
>> >>>> probably just have a backup ready (already splitted). there is
>> already a
>> >>>> jira that should do that HBASE-14135. instead of doing the operation
>> of
>> >>>> split/merge on restore. you consolidate the backup "offline" (mr job
>> >>>> started by the user) and then ask to restore the backup.
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
>> >>>> theo.berto...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> as far as I understand the code, you don't need procedure for the
>> >>>> export
>> >>>>>> itself.
>> >>>>>> the export operation is already idempotent, since you are just
>> copying
>> >>>>>> files.
>> >>>>>> if the file exist and is complete (check length, checksum, ...) you
>> >> can
>> >>>>>> skip it,
>> >>>>>> otherwise you'll send it over again.
>> >>>>>>
>> >>>>>> you need the proc for taking the backup and restoring,
>> >>>>>> because you want to complete the operation and end up with a
>> >> consistent
>> >>>>>> state
>> >>>>>> across the multiple components you are updating (meta, fs, ...)
>> >>>>>> but again, for export you can just run the tool over and over until
>> >> the
>> >>>>>> operation succeed, and that should be ok.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Matteo
>> >>>>>>
>> >>>>>>
>> >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhih...@gmail.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> Master is involved in this discussion because currently only
>> Master
>> >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
>> >>>> backup /
>> >>>>>>> restore.
>> >>>>>>>
>> >>>>>>> What if an optional standalone service which hosts
>> ProcedureExecutor
>> >>>> is
>> >>>>>>> used for this purpose ?
>> >>>>>>> Would that have better chance of giving us middle ground so that
>> we
>> >>>> can
>> >>>>>>> move this forward ?
>> >>>>>>>
>> >>>>>>> Cheers
>> >>>>>>>
>> >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <st...@duboce.net> wrote:
>> >>>>>>>>
>> >>>>>>>> (Moved out of the Master doing MR DISCUSSION)
>> >>>>>>>>
>> >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
>> >>>>>>>> vladrodio...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>>>> -1 on that backup be in core hbase
>> >>>>>>>>>
>> >>>>>>>>> Not sure I understand what it means.
>> >>>>>>>>>
>> >>>>>>>>> Sorry for the imprecision.
>> >>>>>>>>
>> >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
>> dependency
>> >>>>> and
>> >>>>>>> so
>> >>>>>>>> -1 on the Master running backup/restore MR jobs, even if
>> optional.
>> >>>>>>>>
>> >>>>>>>> Master should not depend on MR. We've gone out of our way to
>> avoid
>> >>>>>> taking
>> >>>>>>>> MR on as dependency in the past. Seems late in the game for us to
>> >>>>>> change
>> >>>>>>>> our opinion on this. If we didn't do it for distributed log
>> >>>>> splitting,
>> >>>>>> or
>> >>>>>>>> MOB, why would we do it to support an optional backup/restore?
>> >>>>>>>>
>> >>>>>>>> I have opinions on the questions below -- i.e. that Master
>> running
>> >>>>>>>> backup/restore is outside of the Master's charge -- but they are
>> >>>> not
>> >>>>>>> worth
>> >>>>>>>> much since I've not done much by way of review or contrib to
>> >>>>>>> backup/restore
>> >>>>>>>> other than to try it as a 'user' so I'll keep them to myself
>> until
>> >>>> I
>> >>>>>> do.
>> >>>>>>> I
>> >>>>>>>> only came out from under my shell to participate on the MR as
>> >>>>>> dependency
>> >>>>>>>> chat.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> M
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole
>> >>>> process?
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> We
>> >>>>>>>>> have already brought up all advantages of using
>> >>>>>>>>>  Master and distributed procedures for backup and restore.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Downside of moving this to client tool is lack of fault
>> >>>> tolerance:
>> >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can,
>> >>>>>>> potentially
>> >>>>>>>>> affect
>> >>>>>>>>> cluster, such as disabling splits/merges, balancer.
>> >>>>>>>>> 1.2 In case of client failure who will be doing the whole
>> >>>> rollback
>> >>>>>>>> stuff?
>> >>>>>>>>> We are trying to make it atomic.
>> >>>>>>>>>
>> >>>>>>>>> Security is not clear.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> 2. We are not allowed to modify code of existing HBase core
>> classes
>> >>>>>> (what
>> >>>>>>>>> does core mean anyway)?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> 3. We are not allowed to create backup system table
>> >>>> (hbase:backup)
>> >>>>>> in a
>> >>>>>>>>> system space? Only in user space? The table is global.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
>> >>>> have
>> >>>>>>>> touched,
>> >>>>>>>>> of course some existing HBase code.
>> >>>>>>>>> 3. is not that critical, of course we can move backup system
>> into
>> >>>>>> user
>> >>>>>>>>> space.
>> >>>>>>>>>
>> >>>>>>>>> And finally, will moving backup into external tool give us +1
>> >>>> from
>> >>>>>>> stack?
>> >>>>>>>>>
>> >>>>>>>>> -Vlad
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <st...@duboce.net>
>> >>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
>> >>>>>>>>>> vladrodio...@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>>>> + MR is dead
>> >>>>>>>>>>>
>> >>>>>>>>>>> Does MR know that? :)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Again. With all due respect, stack - still no suggestions
>> >>>> what
>> >>>>>>> should
>> >>>>>>>>> we
>> >>>>>>>>>>> use for "bulk data move and transformation" instead of MR?
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark,
>> >>>>>>>>> distributed
>> >>>>>>>>>> shell -- just don't have HBase core depend on it, even
>> >>>>> optionally.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my
>> >>>>>>>> opinion,
>> >>>>>>>>>> some
>> >>>>>>>>>>> group members still not sure about that and some will give -1
>> >>>>>>>>>>> in any case. Just because ...
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase
>> >>>> (+1
>> >>>>>> on
>> >>>>>>>>> adding
>> >>>>>>>>>> all the API any such external tool might need to run).
>> >>>>>>>>>>
>> >>>>>>>>>> St.Ack
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> -Vlad
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <st...@duboce.net>
>> >>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
>> >>>>>>>>>>> theo.berto...@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> let me try to go back to my original topic.
>> >>>>>>>>>>>>> this question was meant to be generic, and provide some
>> >>>>> rule
>> >>>>>>> for
>> >>>>>>>>>> future
>> >>>>>>>>>>>>> code.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
>> >>>>> can
>> >>>>>>> be:
>> >>>>>>>>>>>>> - we don't want any core feature (e.g.
>> >>>>>>> compaction/log-split/log-
>> >>>>>>>>>>> reply)
>> >>>>>>>>>>>>> over MR, because some cluster may not want or may have an
>> >>>>>>>>>>>>> external/uncontrolled MR setup.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> +1
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
>> >>>>>> flag)
>> >>>>>>>> to
>> >>>>>>>>>> run
>> >>>>>>>>>>> MR
>> >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
>> >>>> is
>> >>>>>> not
>> >>>>>>>>>>> required.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
>> >>>> a
>> >>>>>> flag
>> >>>>>>>> or
>> >>>>>>>>>> not
>> >>>>>>>>>>> --
>> >>>>>>>>>>>> ever being able to launch MR jobs.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
>> >>>> from
>> >>>>>>>>>> hbase-server
>> >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
>> >>>>>> peer).
>> >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
>> >>>>> are
>> >>>>>>>> busy
>> >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
>> >>>> not
>> >>>>>>>> clutter
>> >>>>>>>>>>> task
>> >>>>>>>>>>>> harder by piling on more moving parts.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> St.Ack
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Matteo
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
>> >>>>> yuzhih...@gmail.com
>> >>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I suggest you look at Matteo's work for
>> >>>> AssignmentManager
>> >>>>>>> which
>> >>>>>>>>> is
>> >>>>>>>>>> to
>> >>>>>>>>>>>>> make
>> >>>>>>>>>>>>>> Master more stable.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
>> >>>>> palomino...@gmail.com
>> >>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
>> >>>>>>> sequence
>> >>>>>>>>> of
>> >>>>>>>>>>>> calls
>> >>>>>>>>>>>>>> when
>> >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
>> >>>> regionserver
>> >>>>>> so
>> >>>>>>> it
>> >>>>>>>>>>> extends
>> >>>>>>>>>>>>>>> HRegionServer, and the initialization of
>> >>>> HRegionServer
>> >>>>>>>>> sometimes
>> >>>>>>>>>>>> needs
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
>> >>>> cause
>> >>>>>>>>>>> probabilistic
>> >>>>>>>>>>>>> dead
>> >>>>>>>>>>>>>>> lock or some strange NPEs...
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
>> >>>> add
>> >>>>>> new
>> >>>>>>>>>> features
>> >>>>>>>>>>>> or
>> >>>>>>>>>>>>>> add
>> >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
>> >>>>>> works
>> >>>>>>>> for
>> >>>>>>>>>> the
>> >>>>>>>>>>>>> start
>> >>>>>>>>>>>>>>> up processing...
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Thanks.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
>> >>>> yuzhih...@gmail.com
>> >>>>>> :
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I read through HADOOP-13433
>> >>>>>>>>>>>>>>>> <https://issues.apache.org/
>> >>>> jira/browse/HADOOP-13433>
>> >>>>> -
>> >>>>>>> the
>> >>>>>>>>>> cited
>> >>>>>>>>>>>>> race
>> >>>>>>>>>>>>>>>> condition is in jdk.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
>> >>>>> moving.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
>> >>>>>> problem...
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
>> >>>> it
>> >>>>> in
>> >>>>>>> the
>> >>>>>>>>>>> backup
>> >>>>>>>>>>>> /
>> >>>>>>>>>>>>>>>> restore mega patch ?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
>> >>>>>>>> palomino...@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> If you guys have already implemented the feature
>> >>>> in
>> >>>>>> the
>> >>>>>>>> MR
>> >>>>>>>>>> way
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
>> >>>>> it
>> >>>>>>> as I
>> >>>>>>>>> do
>> >>>>>>>>>>> not
>> >>>>>>>>>>>>> want
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>> block the development progress.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
>> >>>> the
>> >>>>>>>> design
>> >>>>>>>>>> and
>> >>>>>>>>>>>> see
>> >>>>>>>>>>>>> if
>> >>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
>> >>>>>>> possible.
>> >>>>>>>>> HA
>> >>>>>>>>>> is
>> >>>>>>>>>>>>> not a
>> >>>>>>>>>>>>>>> big
>> >>>>>>>>>>>>>>>>> problem if you do not store any metada locally.
>> >>>> But
>> >>>>>> the
>> >>>>>>>>> ugly
>> >>>>>>>>>>> code
>> >>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>> HMaster is readlly a problem...
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a
>> >>>> long
>> >>>>>>> time.
>> >>>>>>>>> Can
>> >>>>>>>>>>>>> someone
>> >>>>>>>>>>>>>>>> help
>> >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
>> >>>>> ugly
>> >>>>>>>>> code...
>> >>>>>>>>>>>>> logout
>> >>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is
>> >>>>> still
>> >>>>>>>> being
>> >>>>>>>>>>> used,
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
>> >>>>>>> behivor
>> >>>>>>>>> and
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>> only
>> >>>>>>>>>>>>>>> way
>> >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly
>> >>>> code...
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> https://issues.apache.org/
>> >>>> jira/browse/HADOOP-13433
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
>> >>>>>>>>>>>>> vladrodio...@gmail.com
>> >>>>>>>>>>>>>>> :
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
>> >>>> doing
>> >>>>>>> this
>> >>>>>>>>>>> without
>> >>>>>>>>>>>>>> using
>> >>>>>>>>>>>>>>>> MR,
>> >>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>> can certainly consider that
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Our framework for distributed operations is
>> >>>>>> abstract
>> >>>>>>>> and
>> >>>>>>>>>>> allows
>> >>>>>>>>>>>>>>>>>> different implementations. MR is just one
>> >>>>>>>> implementation
>> >>>>>>>>> we
>> >>>>>>>>>>>>>> provide.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> -Vlad
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
>> >>>>>>>>>>>>> d...@hortonworks.com
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
>> >>>>>> topic
>> >>>>>>>> of
>> >>>>>>>>>>>> MR-based
>> >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
>> >>>> the
>> >>>>>>>>>>> SpliceMachine
>> >>>>>>>>>>>>>>>> approach
>> >>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>> managing compactions in Spark where
>> >>>> apparently
>> >>>>>> they
>> >>>>>>>>> saw a
>> >>>>>>>>>>> lot
>> >>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>> benefits.
>> >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
>> >>>>>> Andrew; I
>> >>>>>>>>>> really
>> >>>>>>>>>>>>> didn't
>> >>>>>>>>>>>>>>>> mean
>> >>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> :-)
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
>> >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
>> >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
>> >>>>>>>>>>>>>>>>>>> 2. Shell out from the master
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
>> >>>>> and I
>> >>>>>>>> don't
>> >>>>>>>>>>> think
>> >>>>>>>>>>>>>> it's
>> >>>>>>>>>>>>>>>> even
>> >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something
>> >>>>>> when
>> >>>>>>> MR
>> >>>>>>>>> is
>> >>>>>>>>>>>>> already
>> >>>>>>>>>>>>>>>> there,
>> >>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>> being used by HBase already for some
>> >>>>> operations.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
>> >>>>> issues -
>> >>>>>>> HA
>> >>>>>>>> of
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>> server
>> >>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>> being the least of them all. Security
>> >>>> (kerberos
>> >>>>>>>>>>>> authentication,
>> >>>>>>>>>>>>>>>> another
>> >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
>> >>>>>>> approach
>> >>>>>>>>> is
>> >>>>>>>>>>> DOA.
>> >>>>>>>>>>>>>>> Instead
>> >>>>>>>>>>>>>>>>>> let's
>> >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
>> >>>>>>> haven't
>> >>>>>>>>> seen
>> >>>>>>>>>>> any
>> >>>>>>>>>>>>>> good
>> >>>>>>>>>>>>>>>>> reason
>> >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
>> >>>>> if
>> >>>>>>>>> needed.
>> >>>>>>>>>>> It's
>> >>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>> ideal;
>> >>>>>>>>>>>>>>>>>>> agreed.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
>> >>>> the
>> >>>>>>>>> benefits
>> >>>>>>>>>> of
>> >>>>>>>>>>>>>> running
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
>> >>>>> Ted
>> >>>>>>> has
>> >>>>>>>>>>>> summarized
>> >>>>>>>>>>>>>>> some
>> >>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> issues that we need to take care of -
>> >>>>> basically,
>> >>>>>>> the
>> >>>>>>>>>> master
>> >>>>>>>>>>>> can
>> >>>>>>>>>>>>>>> keep
>> >>>>>>>>>>>>>>>>>> track
>> >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
>> >>>> backup
>> >>>>>>>> master
>> >>>>>>>>>> can
>> >>>>>>>>>>>>>> continue
>> >>>>>>>>>>>>>>>>>> keeping
>> >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been
>> >>>>>>> recorded
>> >>>>>>>>> in
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>> proc
>> >>>>>>>>>>>>>>>> WAL).
>> >>>>>>>>>>>>>>>>>> The
>> >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
>> >>>>>>>>> backup/restore
>> >>>>>>>>>>>>>>> processes.
>> >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to
>> >>>>> run
>> >>>>>> as
>> >>>>>>>>>> 'hbase'
>> >>>>>>>>>>>>> since
>> >>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>> owns
>> >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job
>> >>>>> makes
>> >>>>>> it
>> >>>>>>>> get
>> >>>>>>>>>>> that
>> >>>>>>>>>>>>>>>> privilege.
>> >>>>>>>>>>>>>>>>>> In
>> >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
>> >>>>>> above
>> >>>>>>>>>>>> management.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
>> >>>>>> ready
>> >>>>>>>>> from
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> overall
>> >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
>> >>>> is
>> >>>>>>> still
>> >>>>>>>>>>> pending
>> >>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>> Matteo).
>> >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
>> >>>> doing
>> >>>>>> this
>> >>>>>>>>>> without
>> >>>>>>>>>>>>> using
>> >>>>>>>>>>>>>>> MR,
>> >>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
>> >>>>> think
>> >>>>>> we
>> >>>>>>>>>> should
>> >>>>>>>>>>>>> block
>> >>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>> patch
>> >>>>>>>>>>>>>>>>>>> from getting merged.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> ________________________________________
>> >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino...@gmail.com>
>> >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
>> >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
>> >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
>> >>>>>> Master
>> >>>>>>>> or
>> >>>>>>>>> RS
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> So what about a standalone service other than
>> >>>>>>> master?
>> >>>>>>>>> You
>> >>>>>>>>>>> can
>> >>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>>> own
>> >>>>>>>>>>>>>>>>>>> procedure store in that service?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
>> >>>>>>>> yuzhih...@gmail.com
>> >>>>>>>>>> :
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> An earlier implementation was client
>> >>>> driven.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to
>> >>>> resume
>> >>>>> if
>> >>>>>>>> there
>> >>>>>>>>>> is
>> >>>>>>>>>>>>> error
>> >>>>>>>>>>>>>>>>> midway.
>> >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
>> >>>> restore
>> >>>>>>> more
>> >>>>>>>>>>> robust.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It
>> >>>> is
>> >>>>>> hard
>> >>>>>>>> to
>> >>>>>>>>>>>> enforce
>> >>>>>>>>>>>>>>>> security
>> >>>>>>>>>>>>>>>>>> (to
>> >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
>> >>>>> Purtell <
>> >>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com>
>> >>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
>> >>>> which
>> >>>>>> is
>> >>>>>>>>>>> "shelling
>> >>>>>>>>>>>>> out"
>> >>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
>> >>>> not
>> >>>>>>> drive
>> >>>>>>>>>> this
>> >>>>>>>>>>>>> with a
>> >>>>>>>>>>>>>>>>> utility
>> >>>>>>>>>>>>>>>>>>>> derived from Tool?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
>> >>>>>> Rodionov
>> >>>>>>> <
>> >>>>>>>>>>>>>>>>>> vladrodio...@gmail.com
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
>> >>>>> common
>> >>>>>>>> case
>> >>>>>>>>> we
>> >>>>>>>>>>>> just
>> >>>>>>>>>>>>>> have
>> >>>>>>>>>>>>>>>>> HDFS
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
>> >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
>> >>>> framework
>> >>>>>>>>>> (especially
>> >>>>>>>>>>>> some
>> >>>>>>>>>>>>>>>>> features
>> >>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
>> >>>>>>> another
>> >>>>>>>>> cost
>> >>>>>>>>>>> for
>> >>>>>>>>>>>>>>>> maintain.
>> >>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
>> >>>>> case.
>> >>>>>>> Many
>> >>>>>>>>> our
>> >>>>>>>>>>>>>> customers
>> >>>>>>>>>>>>>>>>> have
>> >>>>>>>>>>>>>>>>>>> full
>> >>>>>>>>>>>>>>>>>>>>>> stack deployed and
>> >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard
>> >>>> feature.
>> >>>>>>>> Besides
>> >>>>>>>>>>> this,
>> >>>>>>>>>>>>>>> nothing
>> >>>>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>>>>>> happen
>> >>>>>>>>>>>>>>>>>>>>>> in your cluster
>> >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
>> >>>>>>>>> dependency)
>> >>>>>>>>>>> goes
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>> nowhere.
>> >>>>>>>>>>>>>>>>>>> We
>> >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
>> >>>> suggest
>> >>>>>>>> another
>> >>>>>>>>>>>>> framework
>> >>>>>>>>>>>>>>>> (other
>> >>>>>>>>>>>>>>>>>>> than
>> >>>>>>>>>>>>>>>>>>>> M/R)
>> >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
>> >>>>> Still
>> >>>>>>>>> waiting
>> >>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>> suggestions.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> -Vlad
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
>> >>>> Yu <
>> >>>>>>>>>>>>>> yuzhih...@gmail.com
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
>> >>>>>>> cluster,
>> >>>>>>>>>> hbase
>> >>>>>>>>>>>>> still
>> >>>>>>>>>>>>>>>>>> functions
>> >>>>>>>>>>>>>>>>>>>>>>> normally (post merge).
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
>> >>>>> have
>> >>>>>>> long
>> >>>>>>>>>> been
>> >>>>>>>>>>>>>>> depending
>> >>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
>> >>>> ExportSnapshot.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
>> >>>>> Chen
>> >>>>>> <
>> >>>>>>>>>>>>>>>>>> heng.chen.1...@gmail.com
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
>> >>>>> common
>> >>>>>>>> case
>> >>>>>>>>> we
>> >>>>>>>>>>>> just
>> >>>>>>>>>>>>>> have
>> >>>>>>>>>>>>>>>>> HDFS
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
>> >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
>> >>>> framework
>> >>>>>>>>>> (especially
>> >>>>>>>>>>>> some
>> >>>>>>>>>>>>>>>>> features
>> >>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
>> >>>>>>> another
>> >>>>>>>>> cost
>> >>>>>>>>>>> for
>> >>>>>>>>>>>>>>>> maintain.
>> >>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
>> >>>>>>>>>>> palomino...@gmail.com
>> >>>>>>>>>>>>> :
>> >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
>> >>>>>>>>>>> Backup/Restore
>> >>>>>>>>>>>>>>> feature,
>> >>>>>>>>>>>>>>>>> if
>> >>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>> think
>> >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
>> >>>>> then
>> >>>>>>> we
>> >>>>>>>>>> could
>> >>>>>>>>>>>> make
>> >>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>> depend
>> >>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>> MR,
>> >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
>> >>>>>>> instance
>> >>>>>>>>>> that
>> >>>>>>>>>>>>>> submits
>> >>>>>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>> jobs
>> >>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>> do
>> >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
>> >>>>>> think
>> >>>>>>>>> this
>> >>>>>>>>>>> is a
>> >>>>>>>>>>>>>> core
>> >>>>>>>>>>>>>>>>>> feature
>> >>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
>> >>>>> better
>> >>>>>>>>>> implement
>> >>>>>>>>>>> it
>> >>>>>>>>>>>>>>> without
>> >>>>>>>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
>> >>>>>>>>>>> palomino...@gmail.com
>> >>>>>>>>>>>>> :
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
>> >>>>>> jobs.
>> >>>>>>>> It
>> >>>>>>>>> is
>> >>>>>>>>>>> OK
>> >>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>> some
>> >>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>> our
>> >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
>> >>>> the
>> >>>>>>> bottom
>> >>>>>>>>>> line
>> >>>>>>>>>>> is
>> >>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>>>>>>>>> launch
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
>> >>>>>> other
>> >>>>>>>>>>> services.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
>> >>>>>> Purtell <
>> >>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com
>> >>>>>>>>>>>>>>>>>>>>> :
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
>> >>>> on
>> >>>>>> the
>> >>>>>>>>> line
>> >>>>>>>>>> I
>> >>>>>>>>>>>>> think,
>> >>>>>>>>>>>>>>> so
>> >>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>> fair
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> question.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
>> >>>>> derived
>> >>>>>>>> from
>> >>>>>>>>>> Tool
>> >>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>> our
>> >>>>>>>>>>>>>>>>>> other
>> >>>>>>>>>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>>>>>>>> apps?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
>> >>>>>> AccessController
>> >>>>>>>> to
>> >>>>>>>>>>> decide
>> >>>>>>>>>>>>> if
>> >>>>>>>>>>>>>>>>> allowed?
>> >>>>>>>>>>>>>>>>>>> But
>> >>>>>>>>>>>>>>>>>>>>>>>> nothing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
>> >>>>> job
>> >>>>>>>>>>>>>>>>> manually/independently,
>> >>>>>>>>>>>>>>>>>>>> right?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
>> >>>> Matteo
>> >>>>>>>>> Bertozzi <
>> >>>>>>>>>>>>>>>>>>>>>>>> theo.berto...@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
>> >>>>> about
>> >>>>>>>> tools
>> >>>>>>>>>>> using
>> >>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>> (everyone i
>> >>>>>>>>>>>>>>>>>>>>>>>> think
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
>> >>>> with
>> >>>>>>>> running
>> >>>>>>>>>> MR
>> >>>>>>>>>>>> jobs
>> >>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>> Master
>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>> RSs
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
>> >>>> first
>> >>>>>> time
>> >>>>>>>> we
>> >>>>>>>>> do
>> >>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
>> >>>>>>> Devaraj
>> >>>>>>>>> Das
>> >>>>>>>>>> <
>> >>>>>>>>>>>>>>>>>>>>>>> d...@hortonworks.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
>> >>>>>>>>>> ExportSnapshot
>> >>>>>>>>>>> /
>> >>>>>>>>>>>>>>> Backup /
>> >>>>>>>>>>>>>>>>>>>> Restore,
>> >>>>>>>>>>>>>>>>>>>>>>>> it's
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
>> >>>>> the
>> >>>>>>>> right
>> >>>>>>>>>>>>> framework
>> >>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>> such.
>> >>>>>>>>>>>>>>>>>>>> We
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
>> >>>> (just
>> >>>>>>> saying
>> >>>>>>>>> :)
>> >>>>>>>>>> )
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
>> >>>>>>> __________
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
>> >>>> yuzhih...@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
>> >>>> 2016
>> >>>>>> 2:00
>> >>>>>>>> PM
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
>> >>>>>>> started
>> >>>>>>>>> by
>> >>>>>>>>>>>> Master
>> >>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>> RS
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
>> >>>>> the
>> >>>>>>> same
>> >>>>>>>>>>>> category
>> >>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>> import
>> >>>>>>>>>>>>>>>>>> /
>> >>>>>>>>>>>>>>>>>>>>>>>> export.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
>> >>>>>> Andrew
>> >>>>>>>>>>> Purtell <
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purt...@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
>> >>>>> core
>> >>>>>> in
>> >>>>>>>> my
>> >>>>>>>>>>>> opinion.
>> >>>>>>>>>>>>>>> Like
>> >>>>>>>>>>>>>>>>>> import
>> >>>>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> export.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
>> >>>>> fine.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
>> >>>>> Matteo
>> >>>>>>>>>> Bertozzi
>> >>>>>>>>>>> <
>> >>>>>>>>>>>>>>>>>>>>>>>> mberto...@apache.org>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
>> >>>> around
>> >>>>>>>> running
>> >>>>>>>>> MR
>> >>>>>>>>>>>> jobs
>> >>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>> hbase
>> >>>>>>>>>>>>>>>>>>>>>>>> (Master
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
>> >>>> there
>> >>>>>> was
>> >>>>>>>>>>>> discussion
>> >>>>>>>>>>>>>>> about
>> >>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>> having
>> >>>>>>>>>>>>>>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
>> >>>> where
>> >>>>>>> around
>> >>>>>>>>> MOB
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>>> had
>> >>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
>> >>>>>>> non-MR
>> >>>>>>>>> job
>> >>>>>>>>>> to
>> >>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>> merged,
>> >>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>> think
>> >>>>>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
>> >>>>>>>> split/replay.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
>> >>>>>> feature
>> >>>>>>>>>>>>> (HBASE-7912),
>> >>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>> runs
>> >>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
>> >>>>> restore
>> >>>>>>>> data.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
>> >>>> core"
>> >>>>>> as
>> >>>>>>>> in..
>> >>>>>>>>>> if
>> >>>>>>>>>>>> you
>> >>>>>>>>>>>>>>> don't
>> >>>>>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>> backup
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
>> >>>>>> this
>> >>>>>>>> was
>> >>>>>>>>>>>> probably
>> >>>>>>>>>>>>>>> true
>> >>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>> MOB
>> >>>>>>>>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
>> >>>>> need
>> >>>>>>>> MR")
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
>> >>>>>> says
>> >>>>>>>> "we
>> >>>>>>>>>>> don't
>> >>>>>>>>>>>>> want
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>> have
>> >>>>>>>>>>>>>>>>>>>>>>> hbase
>> >>>>>>>>>>>>>>>>>>>>>>>> run
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
>> >>>> manually
>> >>>>> by
>> >>>>>>> the
>> >>>>>>>>>> user
>> >>>>>>>>>>>> can
>> >>>>>>>>>>>>> do
>> >>>>>>>>>>>>>>>>> that".
>> >>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
>> >>>>>>>> problems?
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>>
>
>

Reply via email to