Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-10-05 Thread Ted Yu
I ran backup test suite on Linux. 

They passed - took 28 minutes. 

> On Oct 5, 2016, at 3:18 PM, Devaraj Das  wrote:
> 
> If tests pass with the patch (which I believe they are), let's commit the 
> patch. Follow it up with an updated mega patch for review...
> 
> 
> From: Ted Yu 
> Sent: Tuesday, October 04, 2016 6:28 PM
> To: dev@hbase.apache.org
> Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started 
> by Master or RS)
> 
> Refactoring work over in HBASE-16727 is ready for review.
> 
> Kindly provide your feedback.
> 
> Thanks
> 
>> On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell  wrote:
>> 
>> This sounds good to me.
>> I'd be at least +0 as to merging the branch as long as we are not 'shelling
>> out' to MR from master.
>> 
>>> All or most of the Backup/Restore operations (especially the MR job
>> spawns) should be moved to the client.
>> 
>> We have a home grown backup solution at Salesforce that to a first order of
>> approximation is this. I would like to see something like this merged.
>> 
>>> In the future, if someone needs to support self-service operations (any
>> user can take a backup/restore his/her tables), we can discuss the "backup
>> service" or something else.
>> 
>> I can't commit the time of the team here (smile), but we always strive to
>> minimize the amount of local code we need to manage HBase. For example, we
>> use VerifyReplication and other tools that ship with HBase, and we have
>> contributed minor operational improvements as we've developed them (like
>> the region mover and canary stuff). I suspect we will have some adoption of
>> this tooling and further refinement insofar it fits into a backup workflow
>> at 30kft view using snapshots, replication (or file shipping), and WAL
>> replay.
>> 
>> 
>>> On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das  wrote:
>>> 
>>> Vlad, thinking about it a little more, since the master is not
>>> orchestrating the backup, let's make it dead simple as a first pass. I
>>> think we should do the following: All or most of the Backup/Restore
>>> operations (especially the MR job spawns) should be moved to the client.
>>> Ignore security for the moment - let's live with what we have as the
>>> current "limitation" for tools that need HDFS access - they need to run
>> as
>>> hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs
>> to
>>> be handled as well as much as possible - if the client fails after
>>> initiating the backup/restore, who restores consistency in the
>> hbase:backup
>>> table, or cleans up the half copied data in the hdfs dirs, etc.
>>> In the future, if someone needs to support self-service operations (any
>>> user can take a backup/restore his/her tables), we can discuss the
>> "backup
>>> service" or something else.
>>> Folks - Stack / Andrew / Matteo / others, please speak up if you disagree
>>> with the above. Would like to get over this merge-to-master hump
>> obviously.
>>> 
>>> 
>>> From: Vladimir Rodionov 
>>> Sent: Monday, September 26, 2016 11:48 AM
>>> To: dev@hbase.apache.org
>>> Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs
>>> started by Master or RS)
>>> 
>>> Ok, we had internal discussion and this is what we are suggesting now:
>>> 
>>> 1. We will create separate module (hbase-backup) and move server-side
>> code
>>> there.
>>> 2. Master and RS will be MR and backup free.
>>> 3. The code from Master will be moved into standalone service
>>> (BackupService) for procedure orchestration,
>>> operation resume/abort and SECURITY. It means - one additional
>>> (process) similar to REST/Thrift server will be required
>>>to operate backup.
>>> 
>>> I would like to note that separate process running under hbase super user
>>> is required to implement security properly in a multi-tenant environment,
>>> otherwise, only hbase super user will be allowed to operate backups
>>> 
>>> Please let us know, what do you think, HBase people :?
>>> 
>>> -Vlad
>>> 
>>> 
>>> 
>>>> On Sat, Sep 24, 2016 at 2:49 PM, Stack  wrote:
>>>> 
>>>> On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
>>> andrew.purt...@gmail.c

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-10-05 Thread Devaraj Das
If tests pass with the patch (which I believe they are), let's commit the 
patch. Follow it up with an updated mega patch for review...


From: Ted Yu 
Sent: Tuesday, October 04, 2016 6:28 PM
To: dev@hbase.apache.org
Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by 
Master or RS)

Refactoring work over in HBASE-16727 is ready for review.

Kindly provide your feedback.

Thanks

On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell  wrote:

> This sounds good to me.
> I'd be at least +0 as to merging the branch as long as we are not 'shelling
> out' to MR from master.
>
> > All or most of the Backup/Restore operations (especially the MR job
> spawns) should be moved to the client.
>
> We have a home grown backup solution at Salesforce that to a first order of
> approximation is this. I would like to see something like this merged.
>
> > In the future, if someone needs to support self-service operations (any
> user can take a backup/restore his/her tables), we can discuss the "backup
> service" or something else.
>
> I can't commit the time of the team here (smile), but we always strive to
> minimize the amount of local code we need to manage HBase. For example, we
> use VerifyReplication and other tools that ship with HBase, and we have
> contributed minor operational improvements as we've developed them (like
> the region mover and canary stuff). I suspect we will have some adoption of
> this tooling and further refinement insofar it fits into a backup workflow
> at 30kft view using snapshots, replication (or file shipping), and WAL
> replay.
>
>
> On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das  wrote:
>
> > Vlad, thinking about it a little more, since the master is not
> > orchestrating the backup, let's make it dead simple as a first pass. I
> > think we should do the following: All or most of the Backup/Restore
> > operations (especially the MR job spawns) should be moved to the client.
> > Ignore security for the moment - let's live with what we have as the
> > current "limitation" for tools that need HDFS access - they need to run
> as
> > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs
> to
> > be handled as well as much as possible - if the client fails after
> > initiating the backup/restore, who restores consistency in the
> hbase:backup
> > table, or cleans up the half copied data in the hdfs dirs, etc.
> > In the future, if someone needs to support self-service operations (any
> > user can take a backup/restore his/her tables), we can discuss the
> "backup
> > service" or something else.
> > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree
> > with the above. Would like to get over this merge-to-master hump
> obviously.
> >
> > 
> > From: Vladimir Rodionov 
> > Sent: Monday, September 26, 2016 11:48 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs
> > started by Master or RS)
> >
> > Ok, we had internal discussion and this is what we are suggesting now:
> >
> > 1. We will create separate module (hbase-backup) and move server-side
> code
> > there.
> > 2. Master and RS will be MR and backup free.
> > 3. The code from Master will be moved into standalone service
> > (BackupService) for procedure orchestration,
> >  operation resume/abort and SECURITY. It means - one additional
> > (process) similar to REST/Thrift server will be required
> > to operate backup.
> >
> > I would like to note that separate process running under hbase super user
> > is required to implement security properly in a multi-tenant environment,
> > otherwise, only hbase super user will be allowed to operate backups
> >
> > Please let us know, what do you think, HBase people :?
> >
> > -Vlad
> >
> >
> >
> > On Sat, Sep 24, 2016 at 2:49 PM, Stack  wrote:
> >
> > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> > andrew.purt...@gmail.com>
> > > wrote:
> > >
> > > > At branch merge voting time now more eyes are getting on the design
> > > issues
> > > > with dissenting opinion emerging. This is the branch merge process
> > > working
> > > > as our community has designed it. Because this is the first full
> > project
> > > > review of the code and implementation I think we all have to be
> > > flexible. I
> > > > see the community

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-10-04 Thread Ted Yu
Refactoring work over in HBASE-16727 is ready for review.

Kindly provide your feedback.

Thanks

On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell  wrote:

> This sounds good to me.
> I'd be at least +0 as to merging the branch as long as we are not 'shelling
> out' to MR from master.
>
> > All or most of the Backup/Restore operations (especially the MR job
> spawns) should be moved to the client.
>
> We have a home grown backup solution at Salesforce that to a first order of
> approximation is this. I would like to see something like this merged.
>
> > In the future, if someone needs to support self-service operations (any
> user can take a backup/restore his/her tables), we can discuss the "backup
> service" or something else.
>
> I can't commit the time of the team here (smile), but we always strive to
> minimize the amount of local code we need to manage HBase. For example, we
> use VerifyReplication and other tools that ship with HBase, and we have
> contributed minor operational improvements as we've developed them (like
> the region mover and canary stuff). I suspect we will have some adoption of
> this tooling and further refinement insofar it fits into a backup workflow
> at 30kft view using snapshots, replication (or file shipping), and WAL
> replay.
>
>
> On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das  wrote:
>
> > Vlad, thinking about it a little more, since the master is not
> > orchestrating the backup, let's make it dead simple as a first pass. I
> > think we should do the following: All or most of the Backup/Restore
> > operations (especially the MR job spawns) should be moved to the client.
> > Ignore security for the moment - let's live with what we have as the
> > current "limitation" for tools that need HDFS access - they need to run
> as
> > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs
> to
> > be handled as well as much as possible - if the client fails after
> > initiating the backup/restore, who restores consistency in the
> hbase:backup
> > table, or cleans up the half copied data in the hdfs dirs, etc.
> > In the future, if someone needs to support self-service operations (any
> > user can take a backup/restore his/her tables), we can discuss the
> "backup
> > service" or something else.
> > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree
> > with the above. Would like to get over this merge-to-master hump
> obviously.
> >
> > 
> > From: Vladimir Rodionov 
> > Sent: Monday, September 26, 2016 11:48 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs
> > started by Master or RS)
> >
> > Ok, we had internal discussion and this is what we are suggesting now:
> >
> > 1. We will create separate module (hbase-backup) and move server-side
> code
> > there.
> > 2. Master and RS will be MR and backup free.
> > 3. The code from Master will be moved into standalone service
> > (BackupService) for procedure orchestration,
> >  operation resume/abort and SECURITY. It means - one additional
> > (process) similar to REST/Thrift server will be required
> > to operate backup.
> >
> > I would like to note that separate process running under hbase super user
> > is required to implement security properly in a multi-tenant environment,
> > otherwise, only hbase super user will be allowed to operate backups
> >
> > Please let us know, what do you think, HBase people :?
> >
> > -Vlad
> >
> >
> >
> > On Sat, Sep 24, 2016 at 2:49 PM, Stack  wrote:
> >
> > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> > andrew.purt...@gmail.com>
> > > wrote:
> > >
> > > > At branch merge voting time now more eyes are getting on the design
> > > issues
> > > > with dissenting opinion emerging. This is the branch merge process
> > > working
> > > > as our community has designed it. Because this is the first full
> > project
> > > > review of the code and implementation I think we all have to be
> > > flexible. I
> > > > see the community as trying to narrow the technical objection at
> issue
> > to
> > > > the smallest possible scope. It's simple: don't call out to an
> external
> > > > execution framework we don't own from core master (and by extension
> > > > regionserver) code. We had this objection before to a proposed
> external
&g

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-10-03 Thread Andrew Purtell
This sounds good to me.
I'd be at least +0 as to merging the branch as long as we are not 'shelling
out' to MR from master.

> All or most of the Backup/Restore operations (especially the MR job
spawns) should be moved to the client.

We have a home grown backup solution at Salesforce that to a first order of
approximation is this. I would like to see something like this merged.

> In the future, if someone needs to support self-service operations (any
user can take a backup/restore his/her tables), we can discuss the "backup
service" or something else.

I can't commit the time of the team here (smile), but we always strive to
minimize the amount of local code we need to manage HBase. For example, we
use VerifyReplication and other tools that ship with HBase, and we have
contributed minor operational improvements as we've developed them (like
the region mover and canary stuff). I suspect we will have some adoption of
this tooling and further refinement insofar it fits into a backup workflow
at 30kft view using snapshots, replication (or file shipping), and WAL
replay.


On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das  wrote:

> Vlad, thinking about it a little more, since the master is not
> orchestrating the backup, let's make it dead simple as a first pass. I
> think we should do the following: All or most of the Backup/Restore
> operations (especially the MR job spawns) should be moved to the client.
> Ignore security for the moment - let's live with what we have as the
> current "limitation" for tools that need HDFS access - they need to run as
> hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs to
> be handled as well as much as possible - if the client fails after
> initiating the backup/restore, who restores consistency in the hbase:backup
> table, or cleans up the half copied data in the hdfs dirs, etc.
> In the future, if someone needs to support self-service operations (any
> user can take a backup/restore his/her tables), we can discuss the "backup
> service" or something else.
> Folks - Stack / Andrew / Matteo / others, please speak up if you disagree
> with the above. Would like to get over this merge-to-master hump obviously.
>
> 
> From: Vladimir Rodionov 
> Sent: Monday, September 26, 2016 11:48 AM
> To: dev@hbase.apache.org
> Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs
> started by Master or RS)
>
> Ok, we had internal discussion and this is what we are suggesting now:
>
> 1. We will create separate module (hbase-backup) and move server-side code
> there.
> 2. Master and RS will be MR and backup free.
> 3. The code from Master will be moved into standalone service
> (BackupService) for procedure orchestration,
>  operation resume/abort and SECURITY. It means - one additional
> (process) similar to REST/Thrift server will be required
> to operate backup.
>
> I would like to note that separate process running under hbase super user
> is required to implement security properly in a multi-tenant environment,
> otherwise, only hbase super user will be allowed to operate backups
>
> Please let us know, what do you think, HBase people :?
>
> -Vlad
>
>
>
> On Sat, Sep 24, 2016 at 2:49 PM, Stack  wrote:
>
> > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> andrew.purt...@gmail.com>
> > wrote:
> >
> > > At branch merge voting time now more eyes are getting on the design
> > issues
> > > with dissenting opinion emerging. This is the branch merge process
> > working
> > > as our community has designed it. Because this is the first full
> project
> > > review of the code and implementation I think we all have to be
> > flexible. I
> > > see the community as trying to narrow the technical objection at issue
> to
> > > the smallest possible scope. It's simple: don't call out to an external
> > > execution framework we don't own from core master (and by extension
> > > regionserver) code. We had this objection before to a proposed external
> > > compaction implementation for
> > > MOB so should not come as a surprise. Please let me know if I have
> > > misstated this.
> > >
> > >
> > The above is my understanding also.
> >
> >
> > > This would seem to require a modest refactor of coordination to move
> > > invocation of MR code out from any core code path. To restate what I
> > think
> > > is an emerging recommendation: Move cross HBase and MR coordination to
> a
> > > separate tool. This tool can ask the master to invoke procedures on the
> > > HBase

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-27 Thread Matteo Bertozzi
+1 for the simplified approach.
if most of the backup code is on client side, it may be easy to move that
to a backup module in case people ask. but for now, I'd say stick with
hbase-server if that is easier

Matteo


On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das  wrote:

> Vlad, thinking about it a little more, since the master is not
> orchestrating the backup, let's make it dead simple as a first pass. I
> think we should do the following: All or most of the Backup/Restore
> operations (especially the MR job spawns) should be moved to the client.
> Ignore security for the moment - let's live with what we have as the
> current "limitation" for tools that need HDFS access - they need to run as
> hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs to
> be handled as well as much as possible - if the client fails after
> initiating the backup/restore, who restores consistency in the hbase:backup
> table, or cleans up the half copied data in the hdfs dirs, etc.
> In the future, if someone needs to support self-service operations (any
> user can take a backup/restore his/her tables), we can discuss the "backup
> service" or something else.
> Folks - Stack / Andrew / Matteo / others, please speak up if you disagree
> with the above. Would like to get over this merge-to-master hump obviously.
>
> 
> From: Vladimir Rodionov 
> Sent: Monday, September 26, 2016 11:48 AM
> To: dev@hbase.apache.org
> Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs
> started by Master or RS)
>
> Ok, we had internal discussion and this is what we are suggesting now:
>
> 1. We will create separate module (hbase-backup) and move server-side code
> there.
> 2. Master and RS will be MR and backup free.
> 3. The code from Master will be moved into standalone service
> (BackupService) for procedure orchestration,
>  operation resume/abort and SECURITY. It means - one additional
> (process) similar to REST/Thrift server will be required
> to operate backup.
>
> I would like to note that separate process running under hbase super user
> is required to implement security properly in a multi-tenant environment,
> otherwise, only hbase super user will be allowed to operate backups
>
> Please let us know, what do you think, HBase people :?
>
> -Vlad
>
>
>
> On Sat, Sep 24, 2016 at 2:49 PM, Stack  wrote:
>
> > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> andrew.purt...@gmail.com>
> > wrote:
> >
> > > At branch merge voting time now more eyes are getting on the design
> > issues
> > > with dissenting opinion emerging. This is the branch merge process
> > working
> > > as our community has designed it. Because this is the first full
> project
> > > review of the code and implementation I think we all have to be
> > flexible. I
> > > see the community as trying to narrow the technical objection at issue
> to
> > > the smallest possible scope. It's simple: don't call out to an external
> > > execution framework we don't own from core master (and by extension
> > > regionserver) code. We had this objection before to a proposed external
> > > compaction implementation for
> > > MOB so should not come as a surprise. Please let me know if I have
> > > misstated this.
> > >
> > >
> > The above is my understanding also.
> >
> >
> > > This would seem to require a modest refactor of coordination to move
> > > invocation of MR code out from any core code path. To restate what I
> > think
> > > is an emerging recommendation: Move cross HBase and MR coordination to
> a
> > > separate tool. This tool can ask the master to invoke procedures on the
> > > HBase side that do first mile export and last mile restore. (Internally
> > the
> > > tool can also use the procedure framework for state durability,
> perhaps,
> > > just a thought.) Then the tool can further drive the things done with
> MR
> > > like shipping data off cluster or moving remote data in place and
> > preparing
> > > it for import. These activities do not need procedure coordination and
> > > involvement of the HBase master. Only the first and last mile of the
> > > process needs atomicity within the HBase deploy. Please let me know if
> I
> > > have misstated this.
> > >
> > >
> > > Above is my understanding of our recommendation.
> >
> > St.Ack
> >
> >
> >
> > > > On Sep 24, 2016, at 8:17 AM, Ted Yu

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-26 Thread Devaraj Das
Vlad, thinking about it a little more, since the master is not orchestrating 
the backup, let's make it dead simple as a first pass. I think we should do the 
following: All or most of the Backup/Restore operations (especially the MR job 
spawns) should be moved to the client. Ignore security for the moment - let's 
live with what we have as the current "limitation" for tools that need HDFS 
access - they need to run as hbase (or whatever the hbase daemons runs as). 
Consistency/cleanup needs to be handled as well as much as possible - if the 
client fails after initiating the backup/restore, who restores consistency in 
the hbase:backup table, or cleans up the half copied data in the hdfs dirs, etc.
In the future, if someone needs to support self-service operations (any user 
can take a backup/restore his/her tables), we can discuss the "backup service" 
or something else.
Folks - Stack / Andrew / Matteo / others, please speak up if you disagree with 
the above. Would like to get over this merge-to-master hump obviously.


From: Vladimir Rodionov 
Sent: Monday, September 26, 2016 11:48 AM
To: dev@hbase.apache.org
Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by 
Master or RS)

Ok, we had internal discussion and this is what we are suggesting now:

1. We will create separate module (hbase-backup) and move server-side code
there.
2. Master and RS will be MR and backup free.
3. The code from Master will be moved into standalone service
(BackupService) for procedure orchestration,
 operation resume/abort and SECURITY. It means - one additional
(process) similar to REST/Thrift server will be required
to operate backup.

I would like to note that separate process running under hbase super user
is required to implement security properly in a multi-tenant environment,
otherwise, only hbase super user will be allowed to operate backups

Please let us know, what do you think, HBase people :?

-Vlad



On Sat, Sep 24, 2016 at 2:49 PM, Stack  wrote:

> On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell 
> wrote:
>
> > At branch merge voting time now more eyes are getting on the design
> issues
> > with dissenting opinion emerging. This is the branch merge process
> working
> > as our community has designed it. Because this is the first full project
> > review of the code and implementation I think we all have to be
> flexible. I
> > see the community as trying to narrow the technical objection at issue to
> > the smallest possible scope. It's simple: don't call out to an external
> > execution framework we don't own from core master (and by extension
> > regionserver) code. We had this objection before to a proposed external
> > compaction implementation for
> > MOB so should not come as a surprise. Please let me know if I have
> > misstated this.
> >
> >
> The above is my understanding also.
>
>
> > This would seem to require a modest refactor of coordination to move
> > invocation of MR code out from any core code path. To restate what I
> think
> > is an emerging recommendation: Move cross HBase and MR coordination to a
> > separate tool. This tool can ask the master to invoke procedures on the
> > HBase side that do first mile export and last mile restore. (Internally
> the
> > tool can also use the procedure framework for state durability, perhaps,
> > just a thought.) Then the tool can further drive the things done with MR
> > like shipping data off cluster or moving remote data in place and
> preparing
> > it for import. These activities do not need procedure coordination and
> > involvement of the HBase master. Only the first and last mile of the
> > process needs atomicity within the HBase deploy. Please let me know if I
> > have misstated this.
> >
> >
> > Above is my understanding of our recommendation.
>
> St.Ack
>
>
>
> > > On Sep 24, 2016, at 8:17 AM, Ted Yu  wrote:
> > >
> > > bq. procedure gives you a retry mechanism on failure
> > >
> > > We do need this mechanism. Take a look at the multi-step
> > > in FullTableBackupProcedure, etc.
> > >
> > > bq. let the user export it later when he wants
> > >
> > > This would make supporting security more complex (user A shouldn't be
> > > exporting user B's backup). And it is not user friendly - at the time
> > > backup request is issued, the following is specified:
> > >
> > > +  + " BACKUP_ROOT The full root path to store the backup
> > > image,\n"
> > > +  + " the prefix can be hdfs, webhdfs or
> gpfs\n"
> &

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-26 Thread Vladimir Rodionov
>>>>>>>>>>
> > >>>>>>>>>>> Matteo
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > >>> yuzhih...@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I suggest you look at Matteo's work for
> > >> AssignmentManager
> > >>>>> which
> > >>>>>>> is
> > >>>>>>>> to
> > >>>>>>>>>>> make
> > >>>>>>>>>>>> Master more stable.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > >>> palomino...@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > >>>>> sequence
> > >>>>>>> of
> > >>>>>>>>>> calls
> > >>>>>>>>>>>> when
> > >>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > >> regionserver
> > >>>> so
> > >>>>> it
> > >>>>>>>>> extends
> > >>>>>>>>>>>>> HRegionServer, and the initialization of
> > >> HRegionServer
> > >>>>>>> sometimes
> > >>>>>>>>>> needs
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > >> cause
> > >>>>>>>>> probabilistic
> > >>>>>>>>>>> dead
> > >>>>>>>>>>>>> lock or some strange NPEs...
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > >> add
> > >>>> new
> > >>>>>>>> features
> > >>>>>>>>>> or
> > >>>>>>>>>>>> add
> > >>>>>>>>>>>>> external dependencies to HMaster, especially add more
> > >>>> works
> > >>>>>> for
> > >>>>>>>> the
> > >>>>>>>>>>> start
> > >>>>>>>>>>>>> up processing...
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> > >> yuzhih...@gmail.com
> > >>>> :
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I read through HADOOP-13433
> > >>>>>>>>>>>>>> <https://issues.apache.org/
> > >> jira/browse/HADOOP-13433>
> > >>> -
> > >>>>> the
> > >>>>>>>> cited
> > >>>>>>>>>>> race
> > >>>>>>>>>>>>>> condition is in jdk.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> > >>> moving.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> > >>>> problem...
> > >>>>>>>>>>>>>>
> > >>>>>>>&

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Stack
>>>>>>>>> That's why I'm very nervous when somebody wants to
> >> add
> >>>> new
> >>>>>>>> features
> >>>>>>>>>> or
> >>>>>>>>>>>> add
> >>>>>>>>>>>>> external dependencies to HMaster, especially add more
> >>>> works
> >>>>>> for
> >>>>>>>> the
> >>>>>>>>>>> start
> >>>>>>>>>>>>> up processing...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> >> yuzhih...@gmail.com
> >>>> :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I read through HADOOP-13433
> >>>>>>>>>>>>>> <https://issues.apache.org/
> >> jira/browse/HADOOP-13433>
> >>> -
> >>>>> the
> >>>>>>>> cited
> >>>>>>>>>>> race
> >>>>>>>>>>>>>> condition is in jdk.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> >>> moving.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> >>>> problem...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> >> it
> >>> in
> >>>>> the
> >>>>>>>>> backup
> >>>>>>>>>> /
> >>>>>>>>>>>>>> restore mega patch ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> >>>>>> palomino...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you guys have already implemented the feature
> >> in
> >>>> the
> >>>>>> MR
> >>>>>>>> way
> >>>>>>>>>> and
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> >>> it
> >>>>> as I
> >>>>>>> do
> >>>>>>>>> not
> >>>>>>>>>>> want
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> block the development progress.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> >> the
> >>>>>> design
> >>>>>>>> and
> >>>>>>>>>> see
> >>>>>>>>>>> if
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> >>>>> possible.
> >>>>>>> HA
> >>>>>>>> is
> >>>>>>>>>>> not a
> >>>>>>>>>>>>> big
> >>>>>>>>>>>>>>> problem if you do not store any metada locally.
> >> But
> >>>> the
> >>>>>>> ugly
> >>>>>>>>> code
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>> HMaster is readlly a problem...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> And for security, I have a issue pending for a
> >> long
> >>>>

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Ted Yu
>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether
> > behind
> > > > > >>>> a
> > > > > >>>>>> flag
> > > > > >>>>>>>> or
> > > > > >>>>>>>>>> not
> > > > > >>>>>>>>>>> --
> > > > > >>>>>>>>>>>> ever being able to launch MR jobs.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo
> it
> > > > > >>>> from
> > > > > >>>>>>>>>> hbase-server
> > > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be
> > its
> > > > > >>>>>> peer).
> > > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and
> > > Appy
> > > > > >>>>> are
> > > > > >>>>>>>> busy
> > > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation.
> > Lets
> > > > > >>>> not
> > > > > >>>>>>>> clutter
> > > > > >>>>>>>>>>> task
> > > > > >>>>>>>>>>>> harder by piling on more moving parts.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> St.Ack
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Matteo
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > > > > >>>>> yuzhih...@gmail.com
> > > > > >>>>>>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> > > > > >>>> AssignmentManager
> > > > > >>>>>>> which
> > > > > >>>>>>>>> is
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>>> make
> > > > > >>>>>>>>>>>>>> Master more stable.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Cheers
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > > > > >>>>> palomino...@gmail.com
> > > > > >>>>>>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me
> the
> > > > > >>>>>>> sequence
> > > > > >>>>>>>>> of
> > > > > >>>>>>>>>>>> calls
> > > > > >>>>>>>>>>>>>> when
> > > > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > > > > >>>> regionserver
> > > > > >>>>>> so
> > > > > >>&

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Vladimir Rodionov
gt; > > >>>>>>>>>>>>> this question was meant to be generic, and provide some
> > > > >>>>> rule
> > > > >>>>>>> for
> > > > >>>>>>>>>> future
> > > > >>>>>>>>>>>>> code.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy
> everyone
> > > > >>>>> can
> > > > >>>>>>> be:
> > > > >>>>>>>>>>>>> - we don't want any core feature (e.g.
> > > > >>>>>>> compaction/log-split/log-
> > > > >>>>>>>>>>> reply)
> > > > >>>>>>>>>>>>> over MR, because some cluster may not want or may have
> an
> > > > >>>>>>>>>>>>> external/uncontrolled MR setup.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> +1
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by
> a
> > > > >>>>>> flag)
> > > > >>>>>>>> to
> > > > >>>>>>>>>> run
> > > > >>>>>>>>>>> MR
> > > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > > > >>>> is
> > > > >>>>>> not
> > > > >>>>>>>>>>> required.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether
> behind
> > > > >>>> a
> > > > >>>>>> flag
> > > > >>>>>>>> or
> > > > >>>>>>>>>> not
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>>> ever being able to launch MR jobs.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > > > >>>> from
> > > > >>>>>>>>>> hbase-server
> > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be
> its
> > > > >>>>>> peer).
> > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and
> > Appy
> > > > >>>>> are
> > > > >>>>>>>> busy
> > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation.
> Lets
> > > > >>>> not
> > > > >>>>>>>> clutter
> > > > >>>>>>>>>>> task
> > > > >>>>>>>>>>>> harder by piling on more moving parts.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> St.Ack
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Matteo
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > > > >>>>> yuzhih...@gmail.com
> > > > >>>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I suggest you look at Mat

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Ted Yu
>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > > >>>> from
> > > >>>>>>>>>> hbase-server
> > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
> > > >>>>>> peer).
> > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and
> Appy
> > > >>>>> are
> > > >>>>>>>> busy
> > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> > > >>>> not
> > > >>>>>>>> clutter
> > > >>>>>>>>>>> task
> > > >>>>>>>>>>>> harder by piling on more moving parts.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> St.Ack
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Matteo
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > > >>>>> yuzhih...@gmail.com
> > > >>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> > > >>>> AssignmentManager
> > > >>>>>>> which
> > > >>>>>>>>> is
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>> Master more stable.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > > >>>>> palomino...@gmail.com
> > > >>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > > >>>>>>> sequence
> > > >>>>>>>>> of
> > > >>>>>>>>>>>> calls
> > > >>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > > >>>> regionserver
> > > >>>>>> so
> > > >>>>>>> it
> > > >>>>>>>>>>> extends
> > > >>>>>>>>>>>>>>> HRegionServer, and the initialization of
> > > >>>> HRegionServer
> > > >>>>>>>>> sometimes
> > > >>>>>>>>>>>> needs
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > > >>>> cause
> > > >>>>>>>>>>> probabilistic
> > > >>>>>>>>>>>>> dead
> > > >>>>>>>>>>>>>>> lock or some strange NPEs...
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > > >>>> add
> > > >>>>>> new
> > > >>>>>>>>>> features
> > > >>>>>>>>>>>> or
> >

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Andrew Purtell
;>>>>>>>>>>>> 
>>>>>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase
>>>>>>> (+1
>>>>>>>>> on
>>>>>>>>>>>> adding
>>>>>>>>>>>>> all the API any such external tool might need to run).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> St.Ack
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Vlad
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
>>>>>>>>>>>>>> theo.berto...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> let me try to go back to my original topic.
>>>>>>>>>>>>>>>> this question was meant to be generic, and provide some
>>>>>>>> rule
>>>>>>>>>> for
>>>>>>>>>>>>> future
>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
>>>>>>>> can
>>>>>>>>>> be:
>>>>>>>>>>>>>>>> - we don't want any core feature (e.g.
>>>>>>>>>> compaction/log-split/log-
>>>>>>>>>>>>>> reply)
>>>>>>>>>>>>>>>> over MR, because some cluster may not want or may have an
>>>>>>>>>>>>>>>> external/uncontrolled MR setup.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
>>>>>>>>> flag)
>>>>>>>>>>> to
>>>>>>>>>>>>> run
>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
>>>>>>> is
>>>>>>>>> not
>>>>>>>>>>>>>> required.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
>>>>>>> a
>>>>>>>>> flag
>>>>>>>>>>> or
>>>>>>>>>>>>> not
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> ever being able to launch MR jobs.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
>>>>>>> from
>>>>>>>>>>>>> hbase-server
>>>>>>>>>>>>>>> moving it out to be an optional module (Spark would be its
>>>>>>>>> peer).
>>>>>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
>>>>>>>> are
>>>>>>>>>>> busy
>>>>>>>>>>>>>>> wor

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Andrew Purtell
gt;>>>>>>>>> over MR, because some cluster may not want or may have an
>>>>>>>>>>>>>>> external/uncontrolled MR setup.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
>>>>>>>> flag)
>>>>>>>>>> to
>>>>>>>>>>>> run
>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
>>>>>> is
>>>>>>>> not
>>>>>>>>>>>>> required.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
>>>>>> a
>>>>>>>> flag
>>>>>>>>>> or
>>>>>>>>>>>> not
>>>>>>>>>>>>> --
>>>>>>>>>>>>>> ever being able to launch MR jobs.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
>>>>>> from
>>>>>>>>>>>> hbase-server
>>>>>>>>>>>>>> moving it out to be an optional module (Spark would be its
>>>>>>>> peer).
>>>>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
>>>>>>> are
>>>>>>>>>> busy
>>>>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
>>>>>> not
>>>>>>>>>> clutter
>>>>>>>>>>>>> task
>>>>>>>>>>>>>> harder by piling on more moving parts.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> St.Ack
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Matteo
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
>>>>>>> yuzhih...@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I suggest you look at Matteo's work for
>>>>>> AssignmentManager
>>>>>>>>> which
>>>>>>>>>>> is
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>> Master more stable.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
>>>>>>> palomino...@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
>>>>>>>>> sequence
>>>>>>>>>>> of
>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
>>>>>> regionserver
>>>>>>>> so
>>>>>>>>> it
>

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Vladimir Rodionov
gt;>>> rule
> > >>>>>>> for
> > >>>>>>>>>> future
> > >>>>>>>>>>>>> code.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
> > >>>>> can
> > >>>>>>> be:
> > >>>>>>>>>>>>> - we don't want any core feature (e.g.
> > >>>>>>> compaction/log-split/log-
> > >>>>>>>>>>> reply)
> > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an
> > >>>>>>>>>>>>> external/uncontrolled MR setup.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> +1
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
> > >>>>>> flag)
> > >>>>>>>> to
> > >>>>>>>>>> run
> > >>>>>>>>>>> MR
> > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > >>>> is
> > >>>>>> not
> > >>>>>>>>>>> required.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
> > >>>> a
> > >>>>>> flag
> > >>>>>>>> or
> > >>>>>>>>>> not
> > >>>>>>>>>>> --
> > >>>>>>>>>>>> ever being able to launch MR jobs.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > >>>> from
> > >>>>>>>>>> hbase-server
> > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
> > >>>>>> peer).
> > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
> > >>>>> are
> > >>>>>>>> busy
> > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> > >>>> not
> > >>>>>>>> clutter
> > >>>>>>>>>>> task
> > >>>>>>>>>>>> harder by piling on more moving parts.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> St.Ack
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Matteo
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > >>>>> yuzhih...@gmail.com
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> > >>>> AssignmentManager
> > >>>>>>> which
> > >>>>>>>>> is
> > >>>>>>>>>> to
> > >>>>>>>>>>>>> make
> > >>>>>>>>>>>>>> Master more stable.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > >>>>> palomino...@gmail.com
> > >>>>>>>
> > >>>>>>>>> wr

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Vladimir Rodionov
e generic, and provide some
>> >>>>> rule
>> >>>>>>> for
>> >>>>>>>>>> future
>> >>>>>>>>>>>>> code.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
>> >>>>> can
>> >>>>>>> be:
>> >>>>>>>>>>>>> - we don't want any core feature (e.g.
>> >>>>>>> compaction/log-split/log-
>> >>>>>>>>>>> reply)
>> >>>>>>>>>>>>> over MR, because some cluster may not want or may have an
>> >>>>>>>>>>>>> external/uncontrolled MR setup.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> +1
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
>> >>>>>> flag)
>> >>>>>>>> to
>> >>>>>>>>>> run
>> >>>>>>>>>>> MR
>> >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
>> >>>> is
>> >>>>>> not
>> >>>>>>>>>>> required.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
>> >>>> a
>> >>>>>> flag
>> >>>>>>>> or
>> >>>>>>>>>> not
>> >>>>>>>>>>> --
>> >>>>>>>>>>>> ever being able to launch MR jobs.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
>> >>>> from
>> >>>>>>>>>> hbase-server
>> >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
>> >>>>>> peer).
>> >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
>> >>>>> are
>> >>>>>>>> busy
>> >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
>> >>>> not
>> >>>>>>>> clutter
>> >>>>>>>>>>> task
>> >>>>>>>>>>>> harder by piling on more moving parts.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> St.Ack
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Matteo
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
>> >>>>> yuzhih...@gmail.com
>> >>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I suggest you look at Matteo's work for
>> >>>> AssignmentManager
>> >>>>>>> which
>> >>>>>>>>> is
>> >>>>>>>>>> to
>> >>>>>>>>>>>>> make
>> >>>>>>>>>>>>>> Master more stable.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
>> >>>>> palomino...@gmail.com
>> >>>>>>>
>> >>>>>>>>> wrot

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Ted Yu
>>>>>>>>>>> ever being able to launch MR jobs.
> >>>>>>>>>>>>
> >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> >>>> from
> >>>>>>>>>> hbase-server
> >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
> >>>>>> peer).
> >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
> >>>>> are
> >>>>>>>> busy
> >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> >>>> not
> >>>>>>>> clutter
> >>>>>>>>>>> task
> >>>>>>>>>>>> harder by piling on more moving parts.
> >>>>>>>>>>>>
> >>>>>>>>>>>> St.Ack
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Matteo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> >>>>> yuzhih...@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> >>>> AssignmentManager
> >>>>>>> which
> >>>>>>>>> is
> >>>>>>>>>> to
> >>>>>>>>>>>>> make
> >>>>>>>>>>>>>> Master more stable.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> >>>>> palomino...@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> >>>>>>> sequence
> >>>>>>>>> of
> >>>>>>>>>>>> calls
> >>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> >>>> regionserver
> >>>>>> so
> >>>>>>> it
> >>>>>>>>>>> extends
> >>>>>>>>>>>>>>> HRegionServer, and the initialization of
> >>>> HRegionServer
> >>>>>>>>> sometimes
> >>>>>>>>>>>> needs
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> >>>> cause
> >>>>>>>>>>> probabilistic
> >>>>>>>>>>>>> dead
> >>>>>>>>>>>>>>> lock or some strange NPEs...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> >>>> add
> >>>>>> new
> >>>>>>>>>> features
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>> add
> >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
> >>>>>> works
> >>>>>>>> for
> >>>>>>>>>> the
> >>>>>>>>>>>>> start
> >>>>>>>>>>>>>>> up processing...
> >>>>>>>>>>>>>>>
> >>>>>

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Vladimir Rodionov
>>>>>>>>> --
> >>>>>>>>>>>> ever being able to launch MR jobs.
> >>>>>>>>>>>>
> >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> >>>> from
> >>>>>>>>>> hbase-server
> >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
> >>>>>> peer).
> >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
> >>>>> are
> >>>>>>>> busy
> >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> >>>> not
> >>>>>>>> clutter
> >>>>>>>>>>> task
> >>>>>>>>>>>> harder by piling on more moving parts.
> >>>>>>>>>>>>
> >>>>>>>>>>>> St.Ack
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Matteo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> >>>>> yuzhih...@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> >>>> AssignmentManager
> >>>>>>> which
> >>>>>>>>> is
> >>>>>>>>>> to
> >>>>>>>>>>>>> make
> >>>>>>>>>>>>>> Master more stable.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> >>>>> palomino...@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> >>>>>>> sequence
> >>>>>>>>> of
> >>>>>>>>>>>> calls
> >>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> >>>> regionserver
> >>>>>> so
> >>>>>>> it
> >>>>>>>>>>> extends
> >>>>>>>>>>>>>>> HRegionServer, and the initialization of
> >>>> HRegionServer
> >>>>>>>>> sometimes
> >>>>>>>>>>>> needs
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> >>>> cause
> >>>>>>>>>>> probabilistic
> >>>>>>>>>>>>> dead
> >>>>>>>>>>>>>>> lock or some strange NPEs...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> >>>> add
> >>>>>> new
> >>>>>>>>>> features
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>> add
> >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
> >>>>>> works
> >>>>>>>> for
> >>>>>>>>>> the
> >>>>>>>>>>>>> start
> >>>>>>>>>>>>>>> up processing...
> >>>>>>>>>>>>>>>

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Andrew Purtell
t;>>>>> 
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
>>>>> palomino...@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
>>>>>>> sequence
>>>>>>>>> of
>>>>>>>>>>>> calls
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
>>>> regionserver
>>>>>> so
>>>>>>> it
>>>>>>>>>>> extends
>>>>>>>>>>>>>>> HRegionServer, and the initialization of
>>>> HRegionServer
>>>>>>>>> sometimes
>>>>>>>>>>>> needs
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
>>>> cause
>>>>>>>>>>> probabilistic
>>>>>>>>>>>>> dead
>>>>>>>>>>>>>>> lock or some strange NPEs...
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
>>>> add
>>>>>> new
>>>>>>>>>> features
>>>>>>>>>>>> or
>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
>>>>>> works
>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>>>>> start
>>>>>>>>>>>>>>> up processing...
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
>>>> yuzhih...@gmail.com
>>>>>> :
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I read through HADOOP-13433
>>>>>>>>>>>>>>>> <https://issues.apache.org/
>>>> jira/browse/HADOOP-13433>
>>>>> -
>>>>>>> the
>>>>>>>>>> cited
>>>>>>>>>>>>> race
>>>>>>>>>>>>>>>> condition is in jdk.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
>>>>> moving.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
>>>>>> problem...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
>>>> it
>>>>> in
>>>>>>> the
>>>>>>>>>>> backup
>>>>>>>>>>>> /
>>>>>>>>>>>>>>>> restore mega patch ?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
>>>>>>>> palomino...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you guys have already implemented the feature
>>>> in
>>>>>> the
>>>>>>>> MR
&

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Ted Yu
> >>>>>>>>>>>>> lock or some strange NPEs...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> >> add
> >>>> new
> >>>>>>>> features
> >>>>>>>>>> or
> >>>>>>>>>>>> add
> >>>>>>>>>>>>> external dependencies to HMaster, especially add more
> >>>> works
> >>>>>> for
> >>>>>>>> the
> >>>>>>>>>>> start
> >>>>>>>>>>>>> up processing...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> >> yuzhih...@gmail.com
> >>>> :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I read through HADOOP-13433
> >>>>>>>>>>>>>> <https://issues.apache.org/
> >> jira/browse/HADOOP-13433>
> >>> -
> >>>>> the
> >>>>>>>> cited
> >>>>>>>>>>> race
> >>>>>>>>>>>>>> condition is in jdk.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> >>> moving.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> >>>> problem...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> >> it
> >>> in
> >>>>> the
> >>>>>>>>> backup
> >>>>>>>>>> /
> >>>>>>>>>>>>>> restore mega patch ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> >>>>>> palomino...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you guys have already implemented the feature
> >> in
> >>>> the
> >>>>>> MR
> >>>>>>>> way
> >>>>>>>>>> and
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> >>> it
> >>>>> as I
> >>>>>>> do
> >>>>>>>>> not
> >>>>>>>>>>> want
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> block the development progress.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> >> the
> >>>>>> design
> >>>>>>>> and
> >>>>>>>>>> see
> >>>>>>>>>>> if
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> >>>>> possible.
> >>>>>>> HA
> >>>>>>>> is
> >>>>>>>>>>> not a
> >>>>>>>>>>>>> big
> >>>>>>>>>>>>>>> problem if you do not store any metada locally.
> >> But
> >>>> the
> >>>>>>> ugly
> >>>>>>>>> code
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>> HMaster is readlly a problem...
> >>>>>>>>>>>>>

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Andrew Purtell
t;>>> moving it out to be an optional module (Spark would be its
>>>> peer).
>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
>>> are
>>>>>> busy
>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
>> not
>>>>>> clutter
>>>>>>>>> task
>>>>>>>>>> harder by piling on more moving parts.
>>>>>>>>>> 
>>>>>>>>>> St.Ack
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Matteo
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
>>> yuzhih...@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I suggest you look at Matteo's work for
>> AssignmentManager
>>>>> which
>>>>>>> is
>>>>>>>> to
>>>>>>>>>>> make
>>>>>>>>>>>> Master more stable.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
>>> palomino...@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
>>>>> sequence
>>>>>>> of
>>>>>>>>>> calls
>>>>>>>>>>>> when
>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
>> regionserver
>>>> so
>>>>> it
>>>>>>>>> extends
>>>>>>>>>>>>> HRegionServer, and the initialization of
>> HRegionServer
>>>>>>> sometimes
>>>>>>>>>> needs
>>>>>>>>>>> to
>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
>> cause
>>>>>>>>> probabilistic
>>>>>>>>>>> dead
>>>>>>>>>>>>> lock or some strange NPEs...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
>> add
>>>> new
>>>>>>>> features
>>>>>>>>>> or
>>>>>>>>>>>> add
>>>>>>>>>>>>> external dependencies to HMaster, especially add more
>>>> works
>>>>>> for
>>>>>>>> the
>>>>>>>>>>> start
>>>>>>>>>>>>> up processing...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
>> yuzhih...@gmail.com
>>>> :
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I read through HADOOP-13433
>>>>>>>>>>>>>> <https://issues.apache.org/
>> jira/browse/HADOOP-13433>
>>> -
>>>>> the
>>>>>>>> cited
>>>>>>>>>>> race
>>>>>>>>>>>>>> condition is in jdk.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
>>> moving.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
>>>> problem...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
>> it
>>> in
>>>>

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Ted Yu
uggest later we need to revisit
> the
> > > > > design
> > > > > > > and
> > > > > > > > > see
> > > > > > > > > > if
> > > > > > > > > > > > we
> > > > > > > > > > > > > > can seperated the logic from HMaster as much as
> > > > possible.
> > > > > > HA
> > > > > > > is
> > > > > > > > > > not a
> > > > > > > > > > > > big
> > > > > > > > > > > > > > problem if you do not store any metada locally.
> But
> > > the
> > > > > > ugly
> > > > > > > > code
> > > > > > > > > > in
> > > > > > > > > > > > > > HMaster is readlly a problem...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > And for security, I have a issue pending for a
> long
> > > > time.
> > > > > > Can
> > > > > > > > > > someone
> > > > > > > > > > > > > help
> > > > > > > > > > > > > > taking a simple look at it? This is what I mean,
> > ugly
> > > > > > code...
> > > > > > > > > > logout
> > > > > > > > > > > > and
> > > > > > > > > > > > > > destroy the credentials in a subject when it is
> > still
> > > > > being
> > > > > > > > used,
> > > > > > > > > > and
> > > > > > > > > > > > > > declared as LimitPrivacy so I can not change the
> > > > behivor
> > > > > > and
> > > > > > > > the
> > > > > > > > > > only
> > > > > > > > > > > > way
> > > > > > > > > > > > > > to fix it is to write another piece of ugly
> code...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > https://issues.apache.org/
> jira/browse/HADOOP-13433
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > > > > > > > vladrodio...@gmail.com
> > > > > > > > > > > >:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >> If in the future, we find better ways of
> doing
> > > > this
> > > > > > > > without
> > > > > > > > > > > using
> > > > > > > > > > > > > MR,
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > can certainly consider that
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Our framework for distributed operations is
> > > abstract
> > > > > and
> > > > > > > > allows
> > > > > > > > > > > > > > > different implementations. MR is just one
> > > > > implementation
> > > > > > we
> > > > > > > > > > > provide.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -Vlad
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > > > > > > > d...@hortonworks.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the
> > > topic
> > > > > of
> > > > > > > > > MR-based
> > > > > > > > > > > > > > > > comp

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Matteo Bertozzi
; > > > > <https://issues.apache.org/jira/browse/HADOOP-13433>
> -
> > > the
> > > > > > cited
> > > > > > > > > race
> > > > > > > > > > > > condition is in jdk.
> > > > > > > > > > > >
> > > > > > > > > > > > Suggest pinging the reviewer on JIRA to get it
> moving.
> > > > > > > > > > > >
> > > > > > > > > > > > bq. But the ugly code in HMaster is readlly a
> > problem...
> > > > > > > > > > > >
> > > > > > > > > > > > Can you be specific as to which code is ugly ? Is it
> in
> > > the
> > > > > > > backup
> > > > > > > > /
> > > > > > > > > > > > restore mega patch ?
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > > > palomino...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > If you guys have already implemented the feature in
> > the
> > > > MR
> > > > > > way
> > > > > > > > and
> > > > > > > > > > the
> > > > > > > > > > > > > patch is ready for landing on master, I'm a -0 on
> it
> > > as I
> > > > > do
> > > > > > > not
> > > > > > > > > want
> > > > > > > > > > > to
> > > > > > > > > > > > > block the development progress.
> > > > > > > > > > > > >
> > > > > > > > > > > > > But I strongly suggest later we need to revisit the
> > > > design
> > > > > > and
> > > > > > > > see
> > > > > > > > > if
> > > > > > > > > > > we
> > > > > > > > > > > > > can seperated the logic from HMaster as much as
> > > possible.
> > > > > HA
> > > > > > is
> > > > > > > > > not a
> > > > > > > > > > > big
> > > > > > > > > > > > > problem if you do not store any metada locally. But
> > the
> > > > > ugly
> > > > > > > code
> > > > > > > > > in
> > > > > > > > > > > > > HMaster is readlly a problem...
> > > > > > > > > > > > >
> > > > > > > > > > > > > And for security, I have a issue pending for a long
> > > time.
> > > > > Can
> > > > > > > > > someone
> > > > > > > > > > > > help
> > > > > > > > > > > > > taking a simple look at it? This is what I mean,
> ugly
> > > > > code...
> > > > > > > > > logout
> > > > > > > > > > > and
> > > > > > > > > > > > > destroy the credentials in a subject when it is
> still
> > > > being
> > > > > > > used,
> > > > > > > > > and
> > > > > > > > > > > > > declared as LimitPrivacy so I can not change the
> > > behivor
> > > > > and
> > > > > > > the
> > > > > > > > > only
> > > > > > > > > > > way
> > > > > > > > > > > > > to fix it is to write another piece of ugly code...
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > > > > > > vladrodio...@gmail.com
> > > > > > > > > > >:
> > > > > >

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Ted Yu
sit the
> > > design
> > > > > and
> > > > > > > see
> > > > > > > > if
> > > > > > > > > > we
> > > > > > > > > > > > can seperated the logic from HMaster as much as
> > possible.
> > > > HA
> > > > > is
> > > > > > > > not a
> > > > > > > > > > big
> > > > > > > > > > > > problem if you do not store any metada locally. But
> the
> > > > ugly
> > > > > > code
> > > > > > > > in
> > > > > > > > > > > > HMaster is readlly a problem...
> > > > > > > > > > > >
> > > > > > > > > > > > And for security, I have a issue pending for a long
> > time.
> > > > Can
> > > > > > > > someone
> > > > > > > > > > > help
> > > > > > > > > > > > taking a simple look at it? This is what I mean, ugly
> > > > code...
> > > > > > > > logout
> > > > > > > > > > and
> > > > > > > > > > > > destroy the credentials in a subject when it is still
> > > being
> > > > > > used,
> > > > > > > > and
> > > > > > > > > > > > declared as LimitPrivacy so I can not change the
> > behivor
> > > > and
> > > > > > the
> > > > > > > > only
> > > > > > > > > > way
> > > > > > > > > > > > to fix it is to write another piece of ugly code...
> > > > > > > > > > > >
> > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > > > > > > > > >
> > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > > > > > vladrodio...@gmail.com
> > > > > > > > > >:
> > > > > > > > > > > >
> > > > > > > > > > > > > >> If in the future, we find better ways of doing
> > this
> > > > > > without
> > > > > > > > > using
> > > > > > > > > > > MR,
> > > > > > > > > > > > we
> > > > > > > > > > > > > can certainly consider that
> > > > > > > > > > > > >
> > > > > > > > > > > > > Our framework for distributed operations is
> abstract
> > > and
> > > > > > allows
> > > > > > > > > > > > > different implementations. MR is just one
> > > implementation
> > > > we
> > > > > > > > > provide.
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Vlad
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > > > > > d...@hortonworks.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Guys, first off apologies for bringing in the
> topic
> > > of
> > > > > > > MR-based
> > > > > > > > > > > > > > compactions.. But I was thinking more about the
> > > > > > SpliceMachine
> > > > > > > > > > > approach
> > > > > > > > > > > > of
> > > > > > > > > > > > > > managing compactions in Spark where apparently
> they
> > > > saw a
> > > > > > lot
> > > > > > > > of
> > > > > > > > > > > > > benefits.
> > > > > > > > > > > > > > Apologies for giving you that sore throat
> Andrew; I
> > > > > really
> > > > > > > > didn't
> > > > > > > > > > &g

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Matteo Bertozzi
> > vladrodio...@gmail.com
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > >> If in the future, we find better ways of doing
> this
> > > > > without
> > > > > > > > using
> > > > > > > > > > MR,
> > > > > > > > > > > we
> > > > > > > > > > > > can certainly consider that
> > > > > > > > > > > >
> > > > > > > > > > > > Our framework for distributed operations is abstract
> > and
> > > > > allows
> > > > > > > > > > > > different implementations. MR is just one
> > implementation
> > > we
> > > > > > > > provide.
> > > > > > > > > > > >
> > > > > > > > > > > > -Vlad
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > > > > d...@hortonworks.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Guys, first off apologies for bringing in the topic
> > of
> > > > > > MR-based
> > > > > > > > > > > > > compactions.. But I was thinking more about the
> > > > > SpliceMachine
> > > > > > > > > > approach
> > > > > > > > > > > of
> > > > > > > > > > > > > managing compactions in Spark where apparently they
> > > saw a
> > > > > lot
> > > > > > > of
> > > > > > > > > > > > benefits.
> > > > > > > > > > > > > Apologies for giving you that sore throat Andrew; I
> > > > really
> > > > > > > didn't
> > > > > > > > > > mean
> > > > > > > > > > > to
> > > > > > > > > > > > > :-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > So on this issue, we have these on the plate:
> > > > > > > > > > > > > 0. Somehow not use MR but something like that
> > > > > > > > > > > > > 1. Run a standalone service other than master
> > > > > > > > > > > > > 2. Shell out from the master
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think we have a good answer to (0), and I
> > don't
> > > > > think
> > > > > > > > it's
> > > > > > > > > > even
> > > > > > > > > > > > > worth the effort of trying to build something when
> MR
> > > is
> > > > > > > already
> > > > > > > > > > there,
> > > > > > > > > > > > and
> > > > > > > > > > > > > being used by HBase already for some operations.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On (1), we have to deal with a myriad of issues -
> HA
> > of
> > > > the
> > > > > > > > server
> > > > > > > > > > not
> > > > > > > > > > > > > being the least of them all. Security (kerberos
> > > > > > authentication,
> > > > > > > > > > another
> > > > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that
> approach
> > > is
> > > > > DOA.
> > > > > > > > > Instead
> > > > > > > > > > > > let's
> > > > > > > > > > > > > substitute that (1) with the HBase Master. I
> haven't
> > > seen
> > > > > any
> > > > > > > > good
> > > > > > > > > > > reason
> > > >

Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-24 Thread Ted Yu
t; > > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> palomino...@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > If you guys have already implemented the feature in the
> MR
> > > way
> > > > > and
> > > > > > > the
> > > > > > > > > > patch is ready for landing on master, I'm a -0 on it as I
> > do
> > > > not
> > > > > > want
> > > > > > > > to
> > > > > > > > > > block the development progress.
> > > > > > > > > >
> > > > > > > > > > But I strongly suggest later we need to revisit the
> design
> > > and
> > > > > see
> > > > > > if
> > > > > > > > we
> > > > > > > > > > can seperated the logic from HMaster as much as possible.
> > HA
> > > is
> > > > > > not a
> > > > > > > > big
> > > > > > > > > > problem if you do not store any metada locally. But the
> > ugly
> > > > code
> > > > > > in
> > > > > > > > > > HMaster is readlly a problem...
> > > > > > > > > >
> > > > > > > > > > And for security, I have a issue pending for a long time.
> > Can
> > > > > > someone
> > > > > > > > > help
> > > > > > > > > > taking a simple look at it? This is what I mean, ugly
> > code...
> > > > > > logout
> > > > > > > > and
> > > > > > > > > > destroy the credentials in a subject when it is still
> being
> > > > used,
> > > > > > and
> > > > > > > > > > declared as LimitPrivacy so I can not change the behivor
> > and
> > > > the
> > > > > > only
> > > > > > > > way
> > > > > > > > > > to fix it is to write another piece of ugly code...
> > > > > > > > > >
> > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > > > > > > >
> > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > > > vladrodio...@gmail.com
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > >> If in the future, we find better ways of doing this
> > > > without
> > > > > > > using
> > > > > > > > > MR,
> > > > > > > > > > we
> > > > > > > > > > > can certainly consider that
> > > > > > > > > > >
> > > > > > > > > > > Our framework for distributed operations is abstract
> and
> > > > allows
> > > > > > > > > > > different implementations. MR is just one
> implementation
> > we
> > > > > > > provide.
> > > > > > > > > > >
> > > > > > > > > > > -Vlad
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > > > d...@hortonworks.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Guys, first off apologies for bringing in the topic
> of
> > > > > MR-based
> > > > > > > > > > > > compactions.. But I was thinking more about the
> > > > SpliceMachine
> > > > > > > > > approach
> > > > > > > > > > of
> > > > > > > > > > > > managing compactions in Spark where apparently they
> > saw a
> > > > lot
> > > > > > of
> > > > > > > > > > > benefits.
> > > > > > > > > > > > Apologies for giving you that sore throat Andrew; I
> > > really
> > > > > > didn't
> > > > > > > > > mean
> > > > > > > > 

Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)

2016-09-23 Thread Stack
t it? This is what I mean, ugly
> code...
> > > > > logout
> > > > > > > and
> > > > > > > > > destroy the credentials in a subject when it is still being
> > > used,
> > > > > and
> > > > > > > > > declared as LimitPrivacy so I can not change the behivor
> and
> > > the
> > > > > only
> > > > > > > way
> > > > > > > > > to fix it is to write another piece of ugly code...
> > > > > > > > >
> > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > > > > > >
> > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > > vladrodio...@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > >> If in the future, we find better ways of doing this
> > > without
> > > > > > using
> > > > > > > > MR,
> > > > > > > > > we
> > > > > > > > > > can certainly consider that
> > > > > > > > > >
> > > > > > > > > > Our framework for distributed operations is abstract and
> > > allows
> > > > > > > > > > different implementations. MR is just one implementation
> we
> > > > > > provide.
> > > > > > > > > >
> > > > > > > > > > -Vlad
> > > > > > > > > >
> > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > > d...@hortonworks.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Guys, first off apologies for bringing in the topic of
> > > > MR-based
> > > > > > > > > > > compactions.. But I was thinking more about the
> > > SpliceMachine
> > > > > > > > approach
> > > > > > > > > of
> > > > > > > > > > > managing compactions in Spark where apparently they
> saw a
> > > lot
> > > > > of
> > > > > > > > > > benefits.
> > > > > > > > > > > Apologies for giving you that sore throat Andrew; I
> > really
> > > > > didn't
> > > > > > > > mean
> > > > > > > > > to
> > > > > > > > > > > :-)
> > > > > > > > > > >
> > > > > > > > > > > So on this issue, we have these on the plate:
> > > > > > > > > > > 0. Somehow not use MR but something like that
> > > > > > > > > > > 1. Run a standalone service other than master
> > > > > > > > > > > 2. Shell out from the master
> > > > > > > > > > >
> > > > > > > > > > > I don't think we have a good answer to (0), and I don't
> > > think
> > > > > > it's
> > > > > > > > even
> > > > > > > > > > > worth the effort of trying to build something when MR
> is
> > > > > already
> > > > > > > > there,
> > > > > > > > > > and
> > > > > > > > > > > being used by HBase already for some operations.
> > > > > > > > > > >
> > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of
> > the
> > > > > > server
> > > > > > > > not
> > > > > > > > > > > being the least of them all. Security (kerberos
> > > > authentication,
> > > > > > > > another
> > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach
> is
> > > DOA.
> > > > > > > Instead
> > > > > > > > > > let's
> > > > > > > > > > > substitute that (1) with the HBase Master. I haven't
> seen
> > > any
> > > > > > good
> > > > > > > > > r

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Andrew Purtell
; > > > > > >
> > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > > vladrodio...@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > >> If in the future, we find better ways of doing this
> > > without
> > > > > > using
> > > > > > > > MR,
> > > > > > > > > we
> > > > > > > > > > can certainly consider that
> > > > > > > > > >
> > > > > > > > > > Our framework for distributed operations is abstract and
> > > allows
> > > > > > > > > > different implementations. MR is just one implementation
> we
> > > > > > provide.
> > > > > > > > > >
> > > > > > > > > > -Vlad
> > > > > > > > > >
> > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > > d...@hortonworks.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Guys, first off apologies for bringing in the topic of
> > > > MR-based
> > > > > > > > > > > compactions.. But I was thinking more about the
> > > SpliceMachine
> > > > > > > > approach
> > > > > > > > > of
> > > > > > > > > > > managing compactions in Spark where apparently they
> saw a
> > > lot
> > > > > of
> > > > > > > > > > benefits.
> > > > > > > > > > > Apologies for giving you that sore throat Andrew; I
> > really
> > > > > didn't
> > > > > > > > mean
> > > > > > > > > to
> > > > > > > > > > > :-)
> > > > > > > > > > >
> > > > > > > > > > > So on this issue, we have these on the plate:
> > > > > > > > > > > 0. Somehow not use MR but something like that
> > > > > > > > > > > 1. Run a standalone service other than master
> > > > > > > > > > > 2. Shell out from the master
> > > > > > > > > > >
> > > > > > > > > > > I don't think we have a good answer to (0), and I don't
> > > think
> > > > > > it's
> > > > > > > > even
> > > > > > > > > > > worth the effort of trying to build something when MR
> is
> > > > > already
> > > > > > > > there,
> > > > > > > > > > and
> > > > > > > > > > > being used by HBase already for some operations.
> > > > > > > > > > >
> > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of
> > the
> > > > > > server
> > > > > > > > not
> > > > > > > > > > > being the least of them all. Security (kerberos
> > > > authentication,
> > > > > > > > another
> > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach
> is
> > > DOA.
> > > > > > > Instead
> > > > > > > > > > let's
> > > > > > > > > > > substitute that (1) with the HBase Master. I haven't
> seen
> > > any
> > > > > > good
> > > > > > > > > reason
> > > > > > > > > > > why the HBase master shouldn't launch MR jobs if
> needed.
> > > It's
> > > > > not
> > > > > > > > > ideal;
> > > > > > > > > > > agreed.
> > > > > > > > > > >
> > > > > > > > > > > Now before going to (2), let's see what are the
> benefits
> > of
> > > > > > running
> > > > > > > > the
> > > > > > > > > > > backup/restore jobs from the master. I think Ted has
> > > > summarized
> > > > &

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Vladimir Rodionov
 > > > > > > compactions.. But I was thinking more about the
> > SpliceMachine
> > > > > > > approach
> > > > > > > > of
> > > > > > > > > > managing compactions in Spark where apparently they saw a
> > lot
> > > > of
> > > > > > > > > benefits.
> > > > > > > > > > Apologies for giving you that sore throat Andrew; I
> really
> > > > didn't
> > > > > > > mean
> > > > > > > > to
> > > > > > > > > > :-)
> > > > > > > > > >
> > > > > > > > > > So on this issue, we have these on the plate:
> > > > > > > > > > 0. Somehow not use MR but something like that
> > > > > > > > > > 1. Run a standalone service other than master
> > > > > > > > > > 2. Shell out from the master
> > > > > > > > > >
> > > > > > > > > > I don't think we have a good answer to (0), and I don't
> > think
> > > > > it's
> > > > > > > even
> > > > > > > > > > worth the effort of trying to build something when MR is
> > > > already
> > > > > > > there,
> > > > > > > > > and
> > > > > > > > > > being used by HBase already for some operations.
> > > > > > > > > >
> > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of
> the
> > > > > server
> > > > > > > not
> > > > > > > > > > being the least of them all. Security (kerberos
> > > authentication,
> > > > > > > another
> > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is
> > DOA.
> > > > > > Instead
> > > > > > > > > let's
> > > > > > > > > > substitute that (1) with the HBase Master. I haven't seen
> > any
> > > > > good
> > > > > > > > reason
> > > > > > > > > > why the HBase master shouldn't launch MR jobs if needed.
> > It's
> > > > not
> > > > > > > > ideal;
> > > > > > > > > > agreed.
> > > > > > > > > >
> > > > > > > > > > Now before going to (2), let's see what are the benefits
> of
> > > > > running
> > > > > > > the
> > > > > > > > > > backup/restore jobs from the master. I think Ted has
> > > summarized
> > > > > > some
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > issues that we need to take care of - basically, the
> master
> > > can
> > > > > > keep
> > > > > > > > > track
> > > > > > > > > > of running jobs, and should it fail, the backup master
> can
> > > > > continue
> > > > > > > > > keeping
> > > > > > > > > > track of it (since the jobId would have been recorded in
> > the
> > > > proc
> > > > > > > WAL).
> > > > > > > > > The
> > > > > > > > > > master can also do cleanup, etc. of failed backup/restore
> > > > > > processes.
> > > > > > > > > > Security is another issue - the job needs to run as
> 'hbase'
> > > > since
> > > > > > it
> > > > > > > > owns
> > > > > > > > > > the data. Having the master launch the job makes it get
> > that
> > > > > > > privilege.
> > > > > > > > > In
> > > > > > > > > > the (2) approach, it's hard to do some of the above
> > > management.
> > > > > > > > > >
> > > > > > > > > > Guys, just to reiterate, the patch as such is ready from
> > the
> > > > > > overall
> > > > > > > > > > design/arch point of view (maybe code review is still
> > pending
> > > > > from
> > > > > > > &

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Stack
; > > want
> > > > > to
> > > > > > > block the development progress.
> > > > > > >
> > > > > > > But I strongly suggest later we need to revisit the design and
> > see
> > > if
> > > > > we
> > > > > > > can seperated the logic from HMaster as much as possible. HA is
> > > not a
> > > > > big
> > > > > > > problem if you do not store any metada locally. But the ugly
> code
> > > in
> > > > > > > HMaster is readlly a problem...
> > > > > > >
> > > > > > > And for security, I have a issue pending for a long time. Can
> > > someone
> > > > > > help
> > > > > > > taking a simple look at it? This is what I mean, ugly code...
> > > logout
> > > > > and
> > > > > > > destroy the credentials in a subject when it is still being
> used,
> > > and
> > > > > > > declared as LimitPrivacy so I can not change the behivor and
> the
> > > only
> > > > > way
> > > > > > > to fix it is to write another piece of ugly code...
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > > > >
> > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > vladrodio...@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > >> If in the future, we find better ways of doing this
> without
> > > > using
> > > > > > MR,
> > > > > > > we
> > > > > > > > can certainly consider that
> > > > > > > >
> > > > > > > > Our framework for distributed operations is abstract and
> allows
> > > > > > > > different implementations. MR is just one implementation we
> > > > provide.
> > > > > > > >
> > > > > > > > -Vlad
> > > > > > > >
> > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > d...@hortonworks.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Guys, first off apologies for bringing in the topic of
> > MR-based
> > > > > > > > > compactions.. But I was thinking more about the
> SpliceMachine
> > > > > > approach
> > > > > > > of
> > > > > > > > > managing compactions in Spark where apparently they saw a
> lot
> > > of
> > > > > > > > benefits.
> > > > > > > > > Apologies for giving you that sore throat Andrew; I really
> > > didn't
> > > > > > mean
> > > > > > > to
> > > > > > > > > :-)
> > > > > > > > >
> > > > > > > > > So on this issue, we have these on the plate:
> > > > > > > > > 0. Somehow not use MR but something like that
> > > > > > > > > 1. Run a standalone service other than master
> > > > > > > > > 2. Shell out from the master
> > > > > > > > >
> > > > > > > > > I don't think we have a good answer to (0), and I don't
> think
> > > > it's
> > > > > > even
> > > > > > > > > worth the effort of trying to build something when MR is
> > > already
> > > > > > there,
> > > > > > > > and
> > > > > > > > > being used by HBase already for some operations.
> > > > > > > > >
> > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the
> > > > server
> > > > > > not
> > > > > > > > > being the least of them all. Security (kerberos
> > authentication,
> > > > > > another
> > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is
> DOA.
> > > > > Instead
> > > > > > > > let's
> > > > > > > > > substitute that (1) with the HBase Master. I haven't seen
> any
> > > > good
> > > > 

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Vladimir Rodionov
t; > > > > help
> > > > > > taking a simple look at it? This is what I mean, ugly code...
> > logout
> > > > and
> > > > > > destroy the credentials in a subject when it is still being used,
> > and
> > > > > > declared as LimitPrivacy so I can not change the behivor and the
> > only
> > > > way
> > > > > > to fix it is to write another piece of ugly code...
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > > >
> > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > vladrodio...@gmail.com
> > > >:
> > > > > >
> > > > > > > >> If in the future, we find better ways of doing this without
> > > using
> > > > > MR,
> > > > > > we
> > > > > > > can certainly consider that
> > > > > > >
> > > > > > > Our framework for distributed operations is abstract and allows
> > > > > > > different implementations. MR is just one implementation we
> > > provide.
> > > > > > >
> > > > > > > -Vlad
> > > > > > >
> > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > d...@hortonworks.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Guys, first off apologies for bringing in the topic of
> MR-based
> > > > > > > > compactions.. But I was thinking more about the SpliceMachine
> > > > > approach
> > > > > > of
> > > > > > > > managing compactions in Spark where apparently they saw a lot
> > of
> > > > > > > benefits.
> > > > > > > > Apologies for giving you that sore throat Andrew; I really
> > didn't
> > > > > mean
> > > > > > to
> > > > > > > > :-)
> > > > > > > >
> > > > > > > > So on this issue, we have these on the plate:
> > > > > > > > 0. Somehow not use MR but something like that
> > > > > > > > 1. Run a standalone service other than master
> > > > > > > > 2. Shell out from the master
> > > > > > > >
> > > > > > > > I don't think we have a good answer to (0), and I don't think
> > > it's
> > > > > even
> > > > > > > > worth the effort of trying to build something when MR is
> > already
> > > > > there,
> > > > > > > and
> > > > > > > > being used by HBase already for some operations.
> > > > > > > >
> > > > > > > > On (1), we have to deal with a myriad of issues - HA of the
> > > server
> > > > > not
> > > > > > > > being the least of them all. Security (kerberos
> authentication,
> > > > > another
> > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA.
> > > > Instead
> > > > > > > let's
> > > > > > > > substitute that (1) with the HBase Master. I haven't seen any
> > > good
> > > > > > reason
> > > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's
> > not
> > > > > > ideal;
> > > > > > > > agreed.
> > > > > > > >
> > > > > > > > Now before going to (2), let's see what are the benefits of
> > > running
> > > > > the
> > > > > > > > backup/restore jobs from the master. I think Ted has
> summarized
> > > > some
> > > > > of
> > > > > > > the
> > > > > > > > issues that we need to take care of - basically, the master
> can
> > > > keep
> > > > > > > track
> > > > > > > > of running jobs, and should it fail, the backup master can
> > > continue
> > > > > > > keeping
> > > > > > > > track of it (since the jobId would have been recorded in the
> > proc
> > > > > WAL).
> > > > > > > The
> > > > > > > > master can also do clean

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Stack
> > > > > >
> > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> d...@hortonworks.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Guys, first off apologies for bringing in the topic of MR-based
> > > > > > > compactions.. But I was thinking more about the SpliceMachine
> > > > approach
> > > > > of
> > > > > > > managing compactions in Spark where apparently they saw a lot
> of
> > > > > > benefits.
> > > > > > > Apologies for giving you that sore throat Andrew; I really
> didn't
> > > > mean
> > > > > to
> > > > > > > :-)
> > > > > > >
> > > > > > > So on this issue, we have these on the plate:
> > > > > > > 0. Somehow not use MR but something like that
> > > > > > > 1. Run a standalone service other than master
> > > > > > > 2. Shell out from the master
> > > > > > >
> > > > > > > I don't think we have a good answer to (0), and I don't think
> > it's
> > > > even
> > > > > > > worth the effort of trying to build something when MR is
> already
> > > > there,
> > > > > > and
> > > > > > > being used by HBase already for some operations.
> > > > > > >
> > > > > > > On (1), we have to deal with a myriad of issues - HA of the
> > server
> > > > not
> > > > > > > being the least of them all. Security (kerberos authentication,
> > > > another
> > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA.
> > > Instead
> > > > > > let's
> > > > > > > substitute that (1) with the HBase Master. I haven't seen any
> > good
> > > > > reason
> > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's
> not
> > > > > ideal;
> > > > > > > agreed.
> > > > > > >
> > > > > > > Now before going to (2), let's see what are the benefits of
> > running
> > > > the
> > > > > > > backup/restore jobs from the master. I think Ted has summarized
> > > some
> > > > of
> > > > > > the
> > > > > > > issues that we need to take care of - basically, the master can
> > > keep
> > > > > > track
> > > > > > > of running jobs, and should it fail, the backup master can
> > continue
> > > > > > keeping
> > > > > > > track of it (since the jobId would have been recorded in the
> proc
> > > > WAL).
> > > > > > The
> > > > > > > master can also do cleanup, etc. of failed backup/restore
> > > processes.
> > > > > > > Security is another issue - the job needs to run as 'hbase'
> since
> > > it
> > > > > owns
> > > > > > > the data. Having the master launch the job makes it get that
> > > > privilege.
> > > > > > In
> > > > > > > the (2) approach, it's hard to do some of the above management.
> > > > > > >
> > > > > > > Guys, just to reiterate, the patch as such is ready from the
> > > overall
> > > > > > > design/arch point of view (maybe code review is still pending
> > from
> > > > > > Matteo).
> > > > > > > If in the future, we find better ways of doing this without
> using
> > > MR,
> > > > > we
> > > > > > > can certainly consider that. But IMO don't think we should
> block
> > > this
> > > > > > patch
> > > > > > > from getting merged.
> > > > > > >
> > > > > > > 
> > > > > > > From: 张铎 
> > > > > > > Sent: Thursday, September 22, 2016 8:32 PM
> > > > > > > To: dev@hbase.apache.org
> > > > > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > > > > > >
> > > > > > > So what about a standalone service other than master? You can
> use
> > > > your
> >

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Devaraj Das
Great points. I am getting a feeling that we have beaten this topic to death 
and have spent enough and more time on it, and it's time to move on. My 
takeaway is that it's fine to do MR for backup/restore - backup/restore is an 
optional feature - unless you use it, you don't need MR.
Fingers crossed,
Devaraj.

P.S. To Nick's point, if it makes sense to do MR/Spark even for core features 
like Compactions, we should be open to it. But that's a topic for another 
[Friday] DISCUSS thread.


From: Jerry He 
Sent: Friday, September 23, 2016 8:40 AM
To: dev
Subject: Re: [DISCUSSION] MR jobs started by Master or RS

That is the point, Matteo.

Put it another way, there is nothing that stops a user from deploying
custom procedure, custom co-processor that calls out MR job.
The optional feature should satisfy some basic requirements. .e.g. No
impact if not deployed or used.  Limited impact if used.
It can be made with isolated dynamic loading of extra configuration (Yarn),
non-blocking non-occupying on the server handlers, or separate handler.
The impact would mostly be on the overall cluster resources. In this sense,
there is no difference, using another standalone server or a command tool.
The exportEnapshot can then be moved to the server as well.

Also, thinking about in the higher level.  It is probably beneficial if you
allow HBase to call out an external framework to do computation. It can be
think of as a UDF, a distributed UDF.
The execution of this UDF is totally in separate address spaces, and you
only need to poll the status.  This would be like a dream in traditional
database.

My 2 cents.

Jerry


On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi 
wrote:

> let me try to go back to my original topic.
> this question was meant to be generic, and provide some rule for future
> code.
>
> from what I can gather, a rule that may satisfy everyone can be:
>  - we don't want any core feature (e.g. compaction/log-split/log-reply)
> over MR, because some cluster may not want or may have an
> external/uncontrolled MR setup.
>  - we allow non-core features (e.g. features enabled by a flag) to run MR
> jobs from hbase, because unless you use the feature, MR is not required.
>
> Matteo
>
>
> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu  wrote:
>
> > I suggest you look at Matteo's work for AssignmentManager which is to
> make
> > Master more stable.
> >
> > Cheers
> >
> > On Fri, Sep 23, 2016 at 5:32 AM, 张铎  wrote:
> >
> > > No, not your fault, at lease, not this time:)
> > >
> > > Why I call the code ugly? Can you simply tell me the sequence of calls
> > when
> > > starting up the HMaster? HMaster is also a regionserver so it extends
> > > HRegionServer, and the initialization of HRegionServer sometimes needs
> to
> > > make rpc calls to HMaster. A simple change would cause probabilistic
> dead
> > > lock or some strange NPEs...
> > >
> > > That's why I'm very nervous when somebody wants to add new features or
> > add
> > > external dependencies to HMaster, especially add more works for the
> start
> > > up processing...
> > >
> > > Thanks.
> > >
> > > 2016-09-23 20:02 GMT+08:00 Ted Yu :
> > >
> > > > I read through HADOOP-13433
> > > > <https://issues.apache.org/jira/browse/HADOOP-13433> - the cited
> race
> > > > condition is in jdk.
> > > >
> > > > Suggest pinging the reviewer on JIRA to get it moving.
> > > >
> > > > bq. But the ugly code in HMaster is readlly a problem...
> > > >
> > > > Can you be specific as to which code is ugly ? Is it in the backup /
> > > > restore mega patch ?
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎  wrote:
> > > >
> > > > > If you guys have already implemented the feature in the MR way and
> > the
> > > > > patch is ready for landing on master, I'm a -0 on it as I do not
> want
> > > to
> > > > > block the development progress.
> > > > >
> > > > > But I strongly suggest later we need to revisit the design and see
> if
> > > we
> > > > > can seperated the logic from HMaster as much as possible. HA is
> not a
> > > big
> > > > > problem if you do not store any metada locally. But the ugly code
> in
> > > > > HMaster is readlly a problem...
> > > > >
> > > > > And for security, I have a issue pending for a long time. Can
> someone
&g

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Jerry He
> > > > >
> > > > > > >> If in the future, we find better ways of doing this without
> > using
> > > > MR,
> > > > > we
> > > > > > can certainly consider that
> > > > > >
> > > > > > Our framework for distributed operations is abstract and allows
> > > > > > different implementations. MR is just one implementation we
> > provide.
> > > > > >
> > > > > > -Vlad
> > > > > >
> > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> d...@hortonworks.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Guys, first off apologies for bringing in the topic of MR-based
> > > > > > > compactions.. But I was thinking more about the SpliceMachine
> > > > approach
> > > > > of
> > > > > > > managing compactions in Spark where apparently they saw a lot
> of
> > > > > > benefits.
> > > > > > > Apologies for giving you that sore throat Andrew; I really
> didn't
> > > > mean
> > > > > to
> > > > > > > :-)
> > > > > > >
> > > > > > > So on this issue, we have these on the plate:
> > > > > > > 0. Somehow not use MR but something like that
> > > > > > > 1. Run a standalone service other than master
> > > > > > > 2. Shell out from the master
> > > > > > >
> > > > > > > I don't think we have a good answer to (0), and I don't think
> > it's
> > > > even
> > > > > > > worth the effort of trying to build something when MR is
> already
> > > > there,
> > > > > > and
> > > > > > > being used by HBase already for some operations.
> > > > > > >
> > > > > > > On (1), we have to deal with a myriad of issues - HA of the
> > server
> > > > not
> > > > > > > being the least of them all. Security (kerberos authentication,
> > > > another
> > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA.
> > > Instead
> > > > > > let's
> > > > > > > substitute that (1) with the HBase Master. I haven't seen any
> > good
> > > > > reason
> > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's
> not
> > > > > ideal;
> > > > > > > agreed.
> > > > > > >
> > > > > > > Now before going to (2), let's see what are the benefits of
> > running
> > > > the
> > > > > > > backup/restore jobs from the master. I think Ted has summarized
> > > some
> > > > of
> > > > > > the
> > > > > > > issues that we need to take care of - basically, the master can
> > > keep
> > > > > > track
> > > > > > > of running jobs, and should it fail, the backup master can
> > continue
> > > > > > keeping
> > > > > > > track of it (since the jobId would have been recorded in the
> proc
> > > > WAL).
> > > > > > The
> > > > > > > master can also do cleanup, etc. of failed backup/restore
> > > processes.
> > > > > > > Security is another issue - the job needs to run as 'hbase'
> since
> > > it
> > > > > owns
> > > > > > > the data. Having the master launch the job makes it get that
> > > > privilege.
> > > > > > In
> > > > > > > the (2) approach, it's hard to do some of the above management.
> > > > > > >
> > > > > > > Guys, just to reiterate, the patch as such is ready from the
> > > overall
> > > > > > > design/arch point of view (maybe code review is still pending
> > from
> > > > > > Matteo).
> > > > > > > If in the future, we find better ways of doing this without
> using
> > > MR,
> > > > > we
> > > > > > > can certainly consider that. But IMO don't think we should
> block
> > > this
> > > > > > patch
> > > > > > > from getting merged.
> &

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Nick Dimiduk
> Our framework for distributed operations is abstract and allows
> > > > > > different implementations. MR is just one implementation we
> > provide.
> > > > > >
> > > > > > -Vlad
> > > > > >
> > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> d...@hortonworks.com 
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Guys, first off apologies for bringing in the topic of MR-based
> > > > > > > compactions.. But I was thinking more about the SpliceMachine
> > > > approach
> > > > > of
> > > > > > > managing compactions in Spark where apparently they saw a lot
> of
> > > > > > benefits.
> > > > > > > Apologies for giving you that sore throat Andrew; I really
> didn't
> > > > mean
> > > > > to
> > > > > > > :-)
> > > > > > >
> > > > > > > So on this issue, we have these on the plate:
> > > > > > > 0. Somehow not use MR but something like that
> > > > > > > 1. Run a standalone service other than master
> > > > > > > 2. Shell out from the master
> > > > > > >
> > > > > > > I don't think we have a good answer to (0), and I don't think
> > it's
> > > > even
> > > > > > > worth the effort of trying to build something when MR is
> already
> > > > there,
> > > > > > and
> > > > > > > being used by HBase already for some operations.
> > > > > > >
> > > > > > > On (1), we have to deal with a myriad of issues - HA of the
> > server
> > > > not
> > > > > > > being the least of them all. Security (kerberos authentication,
> > > > another
> > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA.
> > > Instead
> > > > > > let's
> > > > > > > substitute that (1) with the HBase Master. I haven't seen any
> > good
> > > > > reason
> > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's
> not
> > > > > ideal;
> > > > > > > agreed.
> > > > > > >
> > > > > > > Now before going to (2), let's see what are the benefits of
> > running
> > > > the
> > > > > > > backup/restore jobs from the master. I think Ted has summarized
> > > some
> > > > of
> > > > > > the
> > > > > > > issues that we need to take care of - basically, the master can
> > > keep
> > > > > > track
> > > > > > > of running jobs, and should it fail, the backup master can
> > continue
> > > > > > keeping
> > > > > > > track of it (since the jobId would have been recorded in the
> proc
> > > > WAL).
> > > > > > The
> > > > > > > master can also do cleanup, etc. of failed backup/restore
> > > processes.
> > > > > > > Security is another issue - the job needs to run as 'hbase'
> since
> > > it
> > > > > owns
> > > > > > > the data. Having the master launch the job makes it get that
> > > > privilege.
> > > > > > In
> > > > > > > the (2) approach, it's hard to do some of the above management.
> > > > > > >
> > > > > > > Guys, just to reiterate, the patch as such is ready from the
> > > overall
> > > > > > > design/arch point of view (maybe code review is still pending
> > from
> > > > > > Matteo).
> > > > > > > If in the future, we find better ways of doing this without
> using
> > > MR,
> > > > > we
> > > > > > > can certainly consider that. But IMO don't think we should
> block
> > > this
> > > > > > patch
> > > > > > > from getting merged.
> > > > > > >
> > > > > > > 
> > > > > > > From: 张铎 >
> > > > > > > Sent: Thursday, September 22, 2016 8:32 PM
> > > > > > > To: dev@hbase.apac

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Matteo Bertozzi
x27;t think we have a good answer to (0), and I don't think
> it's
> > > even
> > > > > > worth the effort of trying to build something when MR is already
> > > there,
> > > > > and
> > > > > > being used by HBase already for some operations.
> > > > > >
> > > > > > On (1), we have to deal with a myriad of issues - HA of the
> server
> > > not
> > > > > > being the least of them all. Security (kerberos authentication,
> > > another
> > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA.
> > Instead
> > > > > let's
> > > > > > substitute that (1) with the HBase Master. I haven't seen any
> good
> > > > reason
> > > > > > why the HBase master shouldn't launch MR jobs if needed. It's not
> > > > ideal;
> > > > > > agreed.
> > > > > >
> > > > > > Now before going to (2), let's see what are the benefits of
> running
> > > the
> > > > > > backup/restore jobs from the master. I think Ted has summarized
> > some
> > > of
> > > > > the
> > > > > > issues that we need to take care of - basically, the master can
> > keep
> > > > > track
> > > > > > of running jobs, and should it fail, the backup master can
> continue
> > > > > keeping
> > > > > > track of it (since the jobId would have been recorded in the proc
> > > WAL).
> > > > > The
> > > > > > master can also do cleanup, etc. of failed backup/restore
> > processes.
> > > > > > Security is another issue - the job needs to run as 'hbase' since
> > it
> > > > owns
> > > > > > the data. Having the master launch the job makes it get that
> > > privilege.
> > > > > In
> > > > > > the (2) approach, it's hard to do some of the above management.
> > > > > >
> > > > > > Guys, just to reiterate, the patch as such is ready from the
> > overall
> > > > > > design/arch point of view (maybe code review is still pending
> from
> > > > > Matteo).
> > > > > > If in the future, we find better ways of doing this without using
> > MR,
> > > > we
> > > > > > can certainly consider that. But IMO don't think we should block
> > this
> > > > > patch
> > > > > > from getting merged.
> > > > > >
> > > > > > 
> > > > > > From: 张铎 
> > > > > > Sent: Thursday, September 22, 2016 8:32 PM
> > > > > > To: dev@hbase.apache.org
> > > > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > > > > >
> > > > > > So what about a standalone service other than master? You can use
> > > your
> > > > > own
> > > > > > procedure store in that service?
> > > > > >
> > > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu :
> > > > > >
> > > > > > > An earlier implementation was client driven.
> > > > > > >
> > > > > > > But with that approach, it is hard to resume if there is error
> > > > midway.
> > > > > > > Using Procedure V2 makes the backup / restore more robust.
> > > > > > >
> > > > > > > Another consideration is for security. It is hard to enforce
> > > security
> > > > > (to
> > > > > > > be implemented) for client driven actions.
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <
> > > > > andrew.purt...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > No, this misses Matteo's finer point, which is "shelling out"
> > > from
> > > > > the
> > > > > > > master directly to run MR is a first. Why not drive this with a
> > > > utility
> > > > > > > derived from Tool?
> > > > > > > >
> > > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Ted Yu
t; the
> > > > > backup/restore jobs from the master. I think Ted has summarized
> some
> > of
> > > > the
> > > > > issues that we need to take care of - basically, the master can
> keep
> > > > track
> > > > > of running jobs, and should it fail, the backup master can continue
> > > > keeping
> > > > > track of it (since the jobId would have been recorded in the proc
> > WAL).
> > > > The
> > > > > master can also do cleanup, etc. of failed backup/restore
> processes.
> > > > > Security is another issue - the job needs to run as 'hbase' since
> it
> > > owns
> > > > > the data. Having the master launch the job makes it get that
> > privilege.
> > > > In
> > > > > the (2) approach, it's hard to do some of the above management.
> > > > >
> > > > > Guys, just to reiterate, the patch as such is ready from the
> overall
> > > > > design/arch point of view (maybe code review is still pending from
> > > > Matteo).
> > > > > If in the future, we find better ways of doing this without using
> MR,
> > > we
> > > > > can certainly consider that. But IMO don't think we should block
> this
> > > > patch
> > > > > from getting merged.
> > > > >
> > > > > 
> > > > > From: 张铎 
> > > > > Sent: Thursday, September 22, 2016 8:32 PM
> > > > > To: dev@hbase.apache.org
> > > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > > > >
> > > > > So what about a standalone service other than master? You can use
> > your
> > > > own
> > > > > procedure store in that service?
> > > > >
> > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu :
> > > > >
> > > > > > An earlier implementation was client driven.
> > > > > >
> > > > > > But with that approach, it is hard to resume if there is error
> > > midway.
> > > > > > Using Procedure V2 makes the backup / restore more robust.
> > > > > >
> > > > > > Another consideration is for security. It is hard to enforce
> > security
> > > > (to
> > > > > > be implemented) for client driven actions.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <
> > > > andrew.purt...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > No, this misses Matteo's finer point, which is "shelling out"
> > from
> > > > the
> > > > > > master directly to run MR is a first. Why not drive this with a
> > > utility
> > > > > > derived from Tool?
> > > > > > >
> > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <
> > > > vladrodio...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >>>> In our production cluster,  it is a common case we just have
> > > HDFS
> > > > > and
> > > > > > >>>> HBase deployed.
> > > > > > >>>> If our Master/RS depend on MR framework (especially some
> > > features
> > > > we
> > > > > > >>>> have not used at all),  it introduced another cost for
> > maintain.
> > > > I
> > > > > > >>>> don't think it is a good idea.
> > > > > > >>
> > > > > > >> So , you are not backup users in this case. Many our customers
> > > have
> > > > > full
> > > > > > >> stack deployed and
> > > > > > >> want see backup to be a standard feature. Besides this,
> nothing
> > > will
> > > > > > happen
> > > > > > >> in your cluster
> > > > > > >> if you won't be doing backups.
> > > > > > >>
> > > > > > >> This discussion (we do not want see M/R dependency) goes to
> > > nowhere.
> > > > > We
> > > > > > >> asked already, at le

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread 张铎
No, not your fault, at lease, not this time:)

Why I call the code ugly? Can you simply tell me the sequence of calls when
starting up the HMaster? HMaster is also a regionserver so it extends
HRegionServer, and the initialization of HRegionServer sometimes needs to
make rpc calls to HMaster. A simple change would cause probabilistic dead
lock or some strange NPEs...

That's why I'm very nervous when somebody wants to add new features or add
external dependencies to HMaster, especially add more works for the start
up processing...

Thanks.

2016-09-23 20:02 GMT+08:00 Ted Yu :

> I read through HADOOP-13433
> <https://issues.apache.org/jira/browse/HADOOP-13433> - the cited race
> condition is in jdk.
>
> Suggest pinging the reviewer on JIRA to get it moving.
>
> bq. But the ugly code in HMaster is readlly a problem...
>
> Can you be specific as to which code is ugly ? Is it in the backup /
> restore mega patch ?
>
> Cheers
>
> On Thu, Sep 22, 2016 at 10:44 PM, 张铎  wrote:
>
> > If you guys have already implemented the feature in the MR way and the
> > patch is ready for landing on master, I'm a -0 on it as I do not want to
> > block the development progress.
> >
> > But I strongly suggest later we need to revisit the design and see if we
> > can seperated the logic from HMaster as much as possible. HA is not a big
> > problem if you do not store any metada locally. But the ugly code in
> > HMaster is readlly a problem...
> >
> > And for security, I have a issue pending for a long time. Can someone
> help
> > taking a simple look at it? This is what I mean, ugly code... logout and
> > destroy the credentials in a subject when it is still being used, and
> > declared as LimitPrivacy so I can not change the behivor and the only way
> > to fix it is to write another piece of ugly code...
> >
> > https://issues.apache.org/jira/browse/HADOOP-13433
> >
> > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov :
> >
> > > >> If in the future, we find better ways of doing this without using
> MR,
> > we
> > > can certainly consider that
> > >
> > > Our framework for distributed operations is abstract and allows
> > > different implementations. MR is just one implementation we provide.
> > >
> > > -Vlad
> > >
> > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das 
> > wrote:
> > >
> > > > Guys, first off apologies for bringing in the topic of MR-based
> > > > compactions.. But I was thinking more about the SpliceMachine
> approach
> > of
> > > > managing compactions in Spark where apparently they saw a lot of
> > > benefits.
> > > > Apologies for giving you that sore throat Andrew; I really didn't
> mean
> > to
> > > > :-)
> > > >
> > > > So on this issue, we have these on the plate:
> > > > 0. Somehow not use MR but something like that
> > > > 1. Run a standalone service other than master
> > > > 2. Shell out from the master
> > > >
> > > > I don't think we have a good answer to (0), and I don't think it's
> even
> > > > worth the effort of trying to build something when MR is already
> there,
> > > and
> > > > being used by HBase already for some operations.
> > > >
> > > > On (1), we have to deal with a myriad of issues - HA of the server
> not
> > > > being the least of them all. Security (kerberos authentication,
> another
> > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead
> > > let's
> > > > substitute that (1) with the HBase Master. I haven't seen any good
> > reason
> > > > why the HBase master shouldn't launch MR jobs if needed. It's not
> > ideal;
> > > > agreed.
> > > >
> > > > Now before going to (2), let's see what are the benefits of running
> the
> > > > backup/restore jobs from the master. I think Ted has summarized some
> of
> > > the
> > > > issues that we need to take care of - basically, the master can keep
> > > track
> > > > of running jobs, and should it fail, the backup master can continue
> > > keeping
> > > > track of it (since the jobId would have been recorded in the proc
> WAL).
> > > The
> > > > master can also do cleanup, etc. of failed backup/restore processes.
> > > > Security is another issue - the job needs to run as 'hbase' since it
> > owns
> > > > the

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-23 Thread Ted Yu
I read through HADOOP-13433
<https://issues.apache.org/jira/browse/HADOOP-13433> - the cited race
condition is in jdk.

Suggest pinging the reviewer on JIRA to get it moving.

bq. But the ugly code in HMaster is readlly a problem...

Can you be specific as to which code is ugly ? Is it in the backup /
restore mega patch ?

Cheers

On Thu, Sep 22, 2016 at 10:44 PM, 张铎  wrote:

> If you guys have already implemented the feature in the MR way and the
> patch is ready for landing on master, I'm a -0 on it as I do not want to
> block the development progress.
>
> But I strongly suggest later we need to revisit the design and see if we
> can seperated the logic from HMaster as much as possible. HA is not a big
> problem if you do not store any metada locally. But the ugly code in
> HMaster is readlly a problem...
>
> And for security, I have a issue pending for a long time. Can someone help
> taking a simple look at it? This is what I mean, ugly code... logout and
> destroy the credentials in a subject when it is still being used, and
> declared as LimitPrivacy so I can not change the behivor and the only way
> to fix it is to write another piece of ugly code...
>
> https://issues.apache.org/jira/browse/HADOOP-13433
>
> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov :
>
> > >> If in the future, we find better ways of doing this without using MR,
> we
> > can certainly consider that
> >
> > Our framework for distributed operations is abstract and allows
> > different implementations. MR is just one implementation we provide.
> >
> > -Vlad
> >
> > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das 
> wrote:
> >
> > > Guys, first off apologies for bringing in the topic of MR-based
> > > compactions.. But I was thinking more about the SpliceMachine approach
> of
> > > managing compactions in Spark where apparently they saw a lot of
> > benefits.
> > > Apologies for giving you that sore throat Andrew; I really didn't mean
> to
> > > :-)
> > >
> > > So on this issue, we have these on the plate:
> > > 0. Somehow not use MR but something like that
> > > 1. Run a standalone service other than master
> > > 2. Shell out from the master
> > >
> > > I don't think we have a good answer to (0), and I don't think it's even
> > > worth the effort of trying to build something when MR is already there,
> > and
> > > being used by HBase already for some operations.
> > >
> > > On (1), we have to deal with a myriad of issues - HA of the server not
> > > being the least of them all. Security (kerberos authentication, another
> > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead
> > let's
> > > substitute that (1) with the HBase Master. I haven't seen any good
> reason
> > > why the HBase master shouldn't launch MR jobs if needed. It's not
> ideal;
> > > agreed.
> > >
> > > Now before going to (2), let's see what are the benefits of running the
> > > backup/restore jobs from the master. I think Ted has summarized some of
> > the
> > > issues that we need to take care of - basically, the master can keep
> > track
> > > of running jobs, and should it fail, the backup master can continue
> > keeping
> > > track of it (since the jobId would have been recorded in the proc WAL).
> > The
> > > master can also do cleanup, etc. of failed backup/restore processes.
> > > Security is another issue - the job needs to run as 'hbase' since it
> owns
> > > the data. Having the master launch the job makes it get that privilege.
> > In
> > > the (2) approach, it's hard to do some of the above management.
> > >
> > > Guys, just to reiterate, the patch as such is ready from the overall
> > > design/arch point of view (maybe code review is still pending from
> > Matteo).
> > > If in the future, we find better ways of doing this without using MR,
> we
> > > can certainly consider that. But IMO don't think we should block this
> > patch
> > > from getting merged.
> > >
> > > 
> > > From: 张铎 
> > > Sent: Thursday, September 22, 2016 8:32 PM
> > > To: dev@hbase.apache.org
> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > >
> > > So what about a standalone service other than master? You can use your
> > own
> > > procedure store in that service?
> > >
> > > 2016-09-2

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
If you guys have already implemented the feature in the MR way and the
patch is ready for landing on master, I'm a -0 on it as I do not want to
block the development progress.

But I strongly suggest later we need to revisit the design and see if we
can seperated the logic from HMaster as much as possible. HA is not a big
problem if you do not store any metada locally. But the ugly code in
HMaster is readlly a problem...

And for security, I have a issue pending for a long time. Can someone help
taking a simple look at it? This is what I mean, ugly code... logout and
destroy the credentials in a subject when it is still being used, and
declared as LimitPrivacy so I can not change the behivor and the only way
to fix it is to write another piece of ugly code...

https://issues.apache.org/jira/browse/HADOOP-13433

2016-09-23 12:53 GMT+08:00 Vladimir Rodionov :

> >> If in the future, we find better ways of doing this without using MR, we
> can certainly consider that
>
> Our framework for distributed operations is abstract and allows
> different implementations. MR is just one implementation we provide.
>
> -Vlad
>
> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das  wrote:
>
> > Guys, first off apologies for bringing in the topic of MR-based
> > compactions.. But I was thinking more about the SpliceMachine approach of
> > managing compactions in Spark where apparently they saw a lot of
> benefits.
> > Apologies for giving you that sore throat Andrew; I really didn't mean to
> > :-)
> >
> > So on this issue, we have these on the plate:
> > 0. Somehow not use MR but something like that
> > 1. Run a standalone service other than master
> > 2. Shell out from the master
> >
> > I don't think we have a good answer to (0), and I don't think it's even
> > worth the effort of trying to build something when MR is already there,
> and
> > being used by HBase already for some operations.
> >
> > On (1), we have to deal with a myriad of issues - HA of the server not
> > being the least of them all. Security (kerberos authentication, another
> > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead
> let's
> > substitute that (1) with the HBase Master. I haven't seen any good reason
> > why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
> > agreed.
> >
> > Now before going to (2), let's see what are the benefits of running the
> > backup/restore jobs from the master. I think Ted has summarized some of
> the
> > issues that we need to take care of - basically, the master can keep
> track
> > of running jobs, and should it fail, the backup master can continue
> keeping
> > track of it (since the jobId would have been recorded in the proc WAL).
> The
> > master can also do cleanup, etc. of failed backup/restore processes.
> > Security is another issue - the job needs to run as 'hbase' since it owns
> > the data. Having the master launch the job makes it get that privilege.
> In
> > the (2) approach, it's hard to do some of the above management.
> >
> > Guys, just to reiterate, the patch as such is ready from the overall
> > design/arch point of view (maybe code review is still pending from
> Matteo).
> > If in the future, we find better ways of doing this without using MR, we
> > can certainly consider that. But IMO don't think we should block this
> patch
> > from getting merged.
> >
> > 
> > From: 张铎 
> > Sent: Thursday, September 22, 2016 8:32 PM
> > To: dev@hbase.apache.org
> > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >
> > So what about a standalone service other than master? You can use your
> own
> > procedure store in that service?
> >
> > 2016-09-23 11:28 GMT+08:00 Ted Yu :
> >
> > > An earlier implementation was client driven.
> > >
> > > But with that approach, it is hard to resume if there is error midway.
> > > Using Procedure V2 makes the backup / restore more robust.
> > >
> > > Another consideration is for security. It is hard to enforce security
> (to
> > > be implemented) for client driven actions.
> > >
> > > Cheers
> > >
> > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <
> andrew.purt...@gmail.com>
> > > wrote:
> > > >
> > > > No, this misses Matteo's finer point, which is "shelling out" from
> the
> > > master directly to run MR is a first. Why not drive this with a utility
> > > derived from Tool?
>

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
2016-09-23 12:38 GMT+08:00 Devaraj Das :

> Guys, first off apologies for bringing in the topic of MR-based
> compactions.. But I was thinking more about the SpliceMachine approach of
> managing compactions in Spark where apparently they saw a lot of benefits.
> Apologies for giving you that sore throat Andrew; I really didn't mean to
> :-)
>
> So on this issue, we have these on the plate:
> 0. Somehow not use MR but something like that
> 1. Run a standalone service other than master
> 2. Shell out from the master
>
> I don't think we have a good answer to (0), and I don't think it's even
> worth the effort of trying to build something when MR is already there, and
> being used by HBase already for some operations.
>
> On (1), we have to deal with a myriad of issues - HA of the server not
> being the least of them all. Security (kerberos authentication, another
> keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's
> substitute that (1) with the HBase Master. I haven't seen any good reason
> why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
> agreed.
>
> We have already put lots of stuffs in HMaster, especially on startup, we
even need to run the initializations in a background thread to let the
master get up quickly and causes lots of races. I do not think it is a good
idea to keep messing up the code there.
For MR, as I said above, configuration. I do not want to restart HMaster if
I just want to tune the config of a backup MR job. Yes, you could introduce
new shell commands to do that, change job config, change YARN cluster, and
persist the change to some places, maybe zookeeper? Oh no...


> Now before going to (2), let's see what are the benefits of running the
> backup/restore jobs from the master. I think Ted has summarized some of the
> issues that we need to take care of - basically, the master can keep track
> of running jobs, and should it fail, the backup master can continue keeping
> track of it (since the jobId would have been recorded in the proc WAL). The
> master can also do cleanup, etc. of failed backup/restore processes.
> Security is another issue - the job needs to run as 'hbase' since it owns
> the data. Having the master launch the job makes it get that privilege. In
> the (2) approach, it's hard to do some of the above management.
>
> Guys, just to reiterate, the patch as such is ready from the overall
> design/arch point of view (maybe code review is still pending from Matteo).
> If in the future, we find better ways of doing this without using MR, we
> can certainly consider that. But IMO don't think we should block this patch
> from getting merged.
>
> 
> From: 张铎 
> Sent: Thursday, September 22, 2016 8:32 PM
> To: dev@hbase.apache.org
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> So what about a standalone service other than master? You can use your own
> procedure store in that service?
>
> 2016-09-23 11:28 GMT+08:00 Ted Yu :
>
> > An earlier implementation was client driven.
> >
> > But with that approach, it is hard to resume if there is error midway.
> > Using Procedure V2 makes the backup / restore more robust.
> >
> > Another consideration is for security. It is hard to enforce security (to
> > be implemented) for client driven actions.
> >
> > Cheers
> >
> > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell 
> > wrote:
> > >
> > > No, this misses Matteo's finer point, which is "shelling out" from the
> > master directly to run MR is a first. Why not drive this with a utility
> > derived from Tool?
> > >
> > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov  >
> > wrote:
> > >
> > >>>> In our production cluster,  it is a common case we just have HDFS
> and
> > >>>> HBase deployed.
> > >>>> If our Master/RS depend on MR framework (especially some features we
> > >>>> have not used at all),  it introduced another cost for maintain.  I
> > >>>> don't think it is a good idea.
> > >>
> > >> So , you are not backup users in this case. Many our customers have
> full
> > >> stack deployed and
> > >> want see backup to be a standard feature. Besides this, nothing will
> > happen
> > >> in your cluster
> > >> if you won't be doing backups.
> > >>
> > >> This discussion (we do not want see M/R dependency) goes to nowhere.
> We
> > >> asked already, at least twice, to suggest another framework (other
&

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
Just wanted to add one argument of doing this in a Master way :

Client - based backups/restore are very hard (if possible) to make fully
fault tolerant. If client fails abruptly half way, some system data will be
broken, cluster will never return into original state. We disable, for
example splits/merges, balancer during full backup and restore. Failed
client will leave cluster in that state (disabled splits/merges)

-Vlad

On Thu, Sep 22, 2016 at 9:53 PM, Vladimir Rodionov 
wrote:

> >> If in the future, we find better ways of doing this without using MR,
> we can certainly consider that
>
> Our framework for distributed operations is abstract and allows
> different implementations. MR is just one implementation we provide.
>
> -Vlad
>
> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das  wrote:
>
>> Guys, first off apologies for bringing in the topic of MR-based
>> compactions.. But I was thinking more about the SpliceMachine approach of
>> managing compactions in Spark where apparently they saw a lot of benefits.
>> Apologies for giving you that sore throat Andrew; I really didn't mean to
>> :-)
>>
>> So on this issue, we have these on the plate:
>> 0. Somehow not use MR but something like that
>> 1. Run a standalone service other than master
>> 2. Shell out from the master
>>
>> I don't think we have a good answer to (0), and I don't think it's even
>> worth the effort of trying to build something when MR is already there, and
>> being used by HBase already for some operations.
>>
>> On (1), we have to deal with a myriad of issues - HA of the server not
>> being the least of them all. Security (kerberos authentication, another
>> keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's
>> substitute that (1) with the HBase Master. I haven't seen any good reason
>> why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
>> agreed.
>>
>> Now before going to (2), let's see what are the benefits of running the
>> backup/restore jobs from the master. I think Ted has summarized some of the
>> issues that we need to take care of - basically, the master can keep track
>> of running jobs, and should it fail, the backup master can continue keeping
>> track of it (since the jobId would have been recorded in the proc WAL). The
>> master can also do cleanup, etc. of failed backup/restore processes.
>> Security is another issue - the job needs to run as 'hbase' since it owns
>> the data. Having the master launch the job makes it get that privilege. In
>> the (2) approach, it's hard to do some of the above management.
>>
>> Guys, just to reiterate, the patch as such is ready from the overall
>> design/arch point of view (maybe code review is still pending from Matteo).
>> If in the future, we find better ways of doing this without using MR, we
>> can certainly consider that. But IMO don't think we should block this patch
>> from getting merged.
>>
>> 
>> From: 张铎 
>> Sent: Thursday, September 22, 2016 8:32 PM
>> To: dev@hbase.apache.org
>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>>
>> So what about a standalone service other than master? You can use your own
>> procedure store in that service?
>>
>> 2016-09-23 11:28 GMT+08:00 Ted Yu :
>>
>> > An earlier implementation was client driven.
>> >
>> > But with that approach, it is hard to resume if there is error midway.
>> > Using Procedure V2 makes the backup / restore more robust.
>> >
>> > Another consideration is for security. It is hard to enforce security
>> (to
>> > be implemented) for client driven actions.
>> >
>> > Cheers
>> >
>> > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell > >
>> > wrote:
>> > >
>> > > No, this misses Matteo's finer point, which is "shelling out" from the
>> > master directly to run MR is a first. Why not drive this with a utility
>> > derived from Tool?
>> > >
>> > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <
>> vladrodio...@gmail.com>
>> > wrote:
>> > >
>> > >>>> In our production cluster,  it is a common case we just have HDFS
>> and
>> > >>>> HBase deployed.
>> > >>>> If our Master/RS depend on MR framework (especially some features
>> we
>> > >>>> have not used at all),  it introduced another cost for maintain.  I
>> &g

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
All the better, Vlad!





On Thu, Sep 22, 2016 at 9:53 PM -0700, "Vladimir Rodionov" 
mailto:vladrodio...@gmail.com>> wrote:

>> If in the future, we find better ways of doing this without using MR, we
can certainly consider that

Our framework for distributed operations is abstract and allows
different implementations. MR is just one implementation we provide.

-Vlad

On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das  wrote:

> Guys, first off apologies for bringing in the topic of MR-based
> compactions.. But I was thinking more about the SpliceMachine approach of
> managing compactions in Spark where apparently they saw a lot of benefits.
> Apologies for giving you that sore throat Andrew; I really didn't mean to
> :-)
>
> So on this issue, we have these on the plate:
> 0. Somehow not use MR but something like that
> 1. Run a standalone service other than master
> 2. Shell out from the master
>
> I don't think we have a good answer to (0), and I don't think it's even
> worth the effort of trying to build something when MR is already there, and
> being used by HBase already for some operations.
>
> On (1), we have to deal with a myriad of issues - HA of the server not
> being the least of them all. Security (kerberos authentication, another
> keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's
> substitute that (1) with the HBase Master. I haven't seen any good reason
> why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
> agreed.
>
> Now before going to (2), let's see what are the benefits of running the
> backup/restore jobs from the master. I think Ted has summarized some of the
> issues that we need to take care of - basically, the master can keep track
> of running jobs, and should it fail, the backup master can continue keeping
> track of it (since the jobId would have been recorded in the proc WAL). The
> master can also do cleanup, etc. of failed backup/restore processes.
> Security is another issue - the job needs to run as 'hbase' since it owns
> the data. Having the master launch the job makes it get that privilege. In
> the (2) approach, it's hard to do some of the above management.
>
> Guys, just to reiterate, the patch as such is ready from the overall
> design/arch point of view (maybe code review is still pending from Matteo).
> If in the future, we find better ways of doing this without using MR, we
> can certainly consider that. But IMO don't think we should block this patch
> from getting merged.
>
> 
> From: 张铎 
> Sent: Thursday, September 22, 2016 8:32 PM
> To: dev@hbase.apache.org
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> So what about a standalone service other than master? You can use your own
> procedure store in that service?
>
> 2016-09-23 11:28 GMT+08:00 Ted Yu :
>
> > An earlier implementation was client driven.
> >
> > But with that approach, it is hard to resume if there is error midway.
> > Using Procedure V2 makes the backup / restore more robust.
> >
> > Another consideration is for security. It is hard to enforce security (to
> > be implemented) for client driven actions.
> >
> > Cheers
> >
> > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell 
> > wrote:
> > >
> > > No, this misses Matteo's finer point, which is "shelling out" from the
> > master directly to run MR is a first. Why not drive this with a utility
> > derived from Tool?
> > >
> > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov  >
> > wrote:
> > >
> > >>>> In our production cluster,  it is a common case we just have HDFS
> and
> > >>>> HBase deployed.
> > >>>> If our Master/RS depend on MR framework (especially some features we
> > >>>> have not used at all),  it introduced another cost for maintain.  I
> > >>>> don't think it is a good idea.
> > >>
> > >> So , you are not backup users in this case. Many our customers have
> full
> > >> stack deployed and
> > >> want see backup to be a standard feature. Besides this, nothing will
> > happen
> > >> in your cluster
> > >> if you won't be doing backups.
> > >>
> > >> This discussion (we do not want see M/R dependency) goes to nowhere.
> We
> > >> asked already, at least twice, to suggest another framework (other
> than
> > M/R)
> > >> for bulk data copy with *conversion*. Still waiting for suggestions.
> > >>
> > >>

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> If in the future, we find better ways of doing this without using MR, we
can certainly consider that

Our framework for distributed operations is abstract and allows
different implementations. MR is just one implementation we provide.

-Vlad

On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das  wrote:

> Guys, first off apologies for bringing in the topic of MR-based
> compactions.. But I was thinking more about the SpliceMachine approach of
> managing compactions in Spark where apparently they saw a lot of benefits.
> Apologies for giving you that sore throat Andrew; I really didn't mean to
> :-)
>
> So on this issue, we have these on the plate:
> 0. Somehow not use MR but something like that
> 1. Run a standalone service other than master
> 2. Shell out from the master
>
> I don't think we have a good answer to (0), and I don't think it's even
> worth the effort of trying to build something when MR is already there, and
> being used by HBase already for some operations.
>
> On (1), we have to deal with a myriad of issues - HA of the server not
> being the least of them all. Security (kerberos authentication, another
> keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's
> substitute that (1) with the HBase Master. I haven't seen any good reason
> why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
> agreed.
>
> Now before going to (2), let's see what are the benefits of running the
> backup/restore jobs from the master. I think Ted has summarized some of the
> issues that we need to take care of - basically, the master can keep track
> of running jobs, and should it fail, the backup master can continue keeping
> track of it (since the jobId would have been recorded in the proc WAL). The
> master can also do cleanup, etc. of failed backup/restore processes.
> Security is another issue - the job needs to run as 'hbase' since it owns
> the data. Having the master launch the job makes it get that privilege. In
> the (2) approach, it's hard to do some of the above management.
>
> Guys, just to reiterate, the patch as such is ready from the overall
> design/arch point of view (maybe code review is still pending from Matteo).
> If in the future, we find better ways of doing this without using MR, we
> can certainly consider that. But IMO don't think we should block this patch
> from getting merged.
>
> 
> From: 张铎 
> Sent: Thursday, September 22, 2016 8:32 PM
> To: dev@hbase.apache.org
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> So what about a standalone service other than master? You can use your own
> procedure store in that service?
>
> 2016-09-23 11:28 GMT+08:00 Ted Yu :
>
> > An earlier implementation was client driven.
> >
> > But with that approach, it is hard to resume if there is error midway.
> > Using Procedure V2 makes the backup / restore more robust.
> >
> > Another consideration is for security. It is hard to enforce security (to
> > be implemented) for client driven actions.
> >
> > Cheers
> >
> > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell 
> > wrote:
> > >
> > > No, this misses Matteo's finer point, which is "shelling out" from the
> > master directly to run MR is a first. Why not drive this with a utility
> > derived from Tool?
> > >
> > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov  >
> > wrote:
> > >
> > >>>> In our production cluster,  it is a common case we just have HDFS
> and
> > >>>> HBase deployed.
> > >>>> If our Master/RS depend on MR framework (especially some features we
> > >>>> have not used at all),  it introduced another cost for maintain.  I
> > >>>> don't think it is a good idea.
> > >>
> > >> So , you are not backup users in this case. Many our customers have
> full
> > >> stack deployed and
> > >> want see backup to be a standard feature. Besides this, nothing will
> > happen
> > >> in your cluster
> > >> if you won't be doing backups.
> > >>
> > >> This discussion (we do not want see M/R dependency) goes to nowhere.
> We
> > >> asked already, at least twice, to suggest another framework (other
> than
> > M/R)
> > >> for bulk data copy with *conversion*. Still waiting for suggestions.
> > >>
> > >> -Vlad
> > >>
> > >>
> > >>
> > >>
> > >>> On Thu, Sep 22, 2016 at 7:49 PM, Te

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Guys, first off apologies for bringing in the topic of MR-based compactions.. 
But I was thinking more about the SpliceMachine approach of managing 
compactions in Spark where apparently they saw a lot of benefits. Apologies for 
giving you that sore throat Andrew; I really didn't mean to :-)

So on this issue, we have these on the plate:
0. Somehow not use MR but something like that
1. Run a standalone service other than master
2. Shell out from the master

I don't think we have a good answer to (0), and I don't think it's even worth 
the effort of trying to build something when MR is already there, and being 
used by HBase already for some operations.

On (1), we have to deal with a myriad of issues - HA of the server not being 
the least of them all. Security (kerberos authentication, another keytab to 
manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's substitute 
that (1) with the HBase Master. I haven't seen any good reason why the HBase 
master shouldn't launch MR jobs if needed. It's not ideal; agreed.

Now before going to (2), let's see what are the benefits of running the 
backup/restore jobs from the master. I think Ted has summarized some of the 
issues that we need to take care of - basically, the master can keep track of 
running jobs, and should it fail, the backup master can continue keeping track 
of it (since the jobId would have been recorded in the proc WAL). The master 
can also do cleanup, etc. of failed backup/restore processes. Security is 
another issue - the job needs to run as 'hbase' since it owns the data. Having 
the master launch the job makes it get that privilege. In the (2) approach, 
it's hard to do some of the above management.

Guys, just to reiterate, the patch as such is ready from the overall 
design/arch point of view (maybe code review is still pending from Matteo). If 
in the future, we find better ways of doing this without using MR, we can 
certainly consider that. But IMO don't think we should block this patch from 
getting merged.


From: 张铎 
Sent: Thursday, September 22, 2016 8:32 PM
To: dev@hbase.apache.org
Subject: Re: [DISCUSSION] MR jobs started by Master or RS

So what about a standalone service other than master? You can use your own
procedure store in that service?

2016-09-23 11:28 GMT+08:00 Ted Yu :

> An earlier implementation was client driven.
>
> But with that approach, it is hard to resume if there is error midway.
> Using Procedure V2 makes the backup / restore more robust.
>
> Another consideration is for security. It is hard to enforce security (to
> be implemented) for client driven actions.
>
> Cheers
>
> > On Sep 22, 2016, at 8:15 PM, Andrew Purtell 
> wrote:
> >
> > No, this misses Matteo's finer point, which is "shelling out" from the
> master directly to run MR is a first. Why not drive this with a utility
> derived from Tool?
> >
> > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov 
> wrote:
> >
> >>>> In our production cluster,  it is a common case we just have HDFS and
> >>>> HBase deployed.
> >>>> If our Master/RS depend on MR framework (especially some features we
> >>>> have not used at all),  it introduced another cost for maintain.  I
> >>>> don't think it is a good idea.
> >>
> >> So , you are not backup users in this case. Many our customers have full
> >> stack deployed and
> >> want see backup to be a standard feature. Besides this, nothing will
> happen
> >> in your cluster
> >> if you won't be doing backups.
> >>
> >> This discussion (we do not want see M/R dependency) goes to nowhere. We
> >> asked already, at least twice, to suggest another framework (other than
> M/R)
> >> for bulk data copy with *conversion*. Still waiting for suggestions.
> >>
> >> -Vlad
> >>
> >>
> >>
> >>
> >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu  wrote:
> >>>
> >>> If MR framework is not deployed in the cluster, hbase still functions
> >>> normally (post merge).
> >>>
> >>> In terms of build time dependency, we have long been depending on
> >>> mapreduce. Take a look at ExportSnapshot.
> >>>
> >>> Cheers
> >>>
> >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
> >>> wrote:
> >>>
> >>>> In our production cluster,  it is a common case we just have HDFS and
> >>>> HBase deployed.
> >>>> If our Master/RS depend on MR framework (especially some features we
> >>>> have not used at all),  it introduced

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
if
> we
> > > >>> think
> > > >>>>> this is not a core feature of HBase, then we could make it depend
> > on
> > > >>> MR,
> > > >>>>> and start a standalone BackupManager instance that submits MR
> jobs
> > to
> > > >>> do
> > > >>>>> periodical maintenance job. And if we think this is a core
> feature
> > > that
> > > >>>>> everyone should use it, then we'd better implement it without MR
> > > >>>>> dependency, like DLS.
> > > >>>>>
> > > >>>>> Thanks.
> > > >>>>>
> > > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 :
> > > >>>>>
> > > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of
> > our
> > > >>>>>> features depend on MR but I think the bottom line is that we
> > should
> > > >>>> launch
> > > >>>>>> the jobs from outside manually or by other services.
> > > >>>>>>
> > > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <
> > andrew.purt...@gmail.com
> > > >:
> > > >>>>>>
> > > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a
> fair
> > > >>>>>>> question.
> > > >>>>>>>
> > > >>>>>>> Can this be driven by a utility derived from Tool like our
> other
> > MR
> > > >>>> apps?
> > > >>>>>>> The issue is needing the AccessController to decide if allowed?
> > But
> > > >>>> nothing
> > > >>>>>>> prevents the user from running the job manually/independently,
> > > right?
> > > >>>>>>>
> > > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> > > >>>> theo.berto...@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> just a remark. my query was not about tools using MR
> (everyone i
> > > >>>> think
> > > >>>>>>> is
> > > >>>>>>>> ok with those).
> > > >>>>>>>> the topic was about: "are we ok with running MR jobs from
> Master
> > > >>> and
> > > >>>> RSs
> > > >>>>>>>> code?" since this will be the first time we do this
> > > >>>>>>>>
> > > >>>>>>>> Matteo
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
> > > >>> d...@hortonworks.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup /
> > > Restore,
> > > >>>> it's
> > > >>>>>>>>> fine to be dependent on MR. MR is the right framework for
> such.
> > > We
> > > >>>>>>> should
> > > >>>>>>>>> also do compactions using MR (just saying :) )
> > > >>>>>>>>> 
> > > >>>>>>>>> From: Ted Yu 
> > > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM
> > > >>>>>>>>> To: dev@hbase.apache.org
> > > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > > >>>>>>>>>
> > > >>>>>>>>> I agree - backup / restore is in the same category as import
> /
> > > >>>> export.
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > > >>>>>>> andrew.purt...@gmail.com>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Backup is extra tooling around core in my opinion. Like
> import
> > > or
> > > >>>>>>> export.
> > > >>>>>>>>>> Or the optional MOB tool. It's fine.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> > > >>>> mberto...@apache.org>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase
> > > >>>> (Master
> > > >>>>>>> or
> > > >>>>>>>>>> RS)?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I remember in the past that there was discussion about not
> > > >>> having
> > > >>>> MR
> > > >>>>>>>>> has
> > > >>>>>>>>>>> direct dependency of hbase.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I think some of discussion where around MOB that had a MR
> job
> > > to
> > > >>>>>>>>> compact,
> > > >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I
> > > think
> > > >>>> we
> > > >>>>>>>>> had a
> > > >>>>>>>>>>> similar discussion for log split/replay.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that
> runs
> > a
> > > >>> MR
> > > >>>> job
> > > >>>>>>>>>> from
> > > >>>>>>>>>>> the master to copy data or restore data.
> > > >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use
> > > >>> backup
> > > >>>>>>>>> you'll
> > > >>>>>>>>>>> not end up running MR jobs, but this was probably true for
> > MOB
> > > >>> as
> > > >>>> in
> > > >>>>>>>>> "if
> > > >>>>>>>>>>> you don't enable MOB you don't need MR")
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have
> > > >>> hbase
> > > >>>> run
> > > >>>>>>>>> MR
> > > >>>>>>>>>>> jobs, only tool started manually by the user can do that".
> or
> > > >>> can
> > > >>>> we
> > > >>>>>>>>>> start
> > > >>>>>>>>>>> adding MR calls around without problems?
> > > >>>
> > >
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Ted Yu
;>>>>>
> > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
> > >>>>>>> question.
> > >>>>>>>
> > >>>>>>> Can this be driven by a utility derived from Tool like our other
> MR
> > >>>> apps?
> > >>>>>>> The issue is needing the AccessController to decide if allowed?
> But
> > >>>> nothing
> > >>>>>>> prevents the user from running the job manually/independently,
> > right?
> > >>>>>>>
> > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> > >>>> theo.berto...@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> just a remark. my query was not about tools using MR (everyone i
> > >>>> think
> > >>>>>>> is
> > >>>>>>>> ok with those).
> > >>>>>>>> the topic was about: "are we ok with running MR jobs from Master
> > >>> and
> > >>>> RSs
> > >>>>>>>> code?" since this will be the first time we do this
> > >>>>>>>>
> > >>>>>>>> Matteo
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
> > >>> d...@hortonworks.com>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup /
> > Restore,
> > >>>> it's
> > >>>>>>>>> fine to be dependent on MR. MR is the right framework for such.
> > We
> > >>>>>>> should
> > >>>>>>>>> also do compactions using MR (just saying :) )
> > >>>>>>>>> 
> > >>>>>>>>> From: Ted Yu 
> > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM
> > >>>>>>>>> To: dev@hbase.apache.org
> > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > >>>>>>>>>
> > >>>>>>>>> I agree - backup / restore is in the same category as import /
> > >>>> export.
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > >>>>>>> andrew.purt...@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Backup is extra tooling around core in my opinion. Like import
> > or
> > >>>>>>> export.
> > >>>>>>>>>> Or the optional MOB tool. It's fine.
> > >>>>>>>>>>
> > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> > >>>> mberto...@apache.org>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase
> > >>>> (Master
> > >>>>>>> or
> > >>>>>>>>>> RS)?
> > >>>>>>>>>>>
> > >>>>>>>>>>> I remember in the past that there was discussion about not
> > >>> having
> > >>>> MR
> > >>>>>>>>> has
> > >>>>>>>>>>> direct dependency of hbase.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think some of discussion where around MOB that had a MR job
> > to
> > >>>>>>>>> compact,
> > >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I
> > think
> > >>>> we
> > >>>>>>>>> had a
> > >>>>>>>>>>> similar discussion for log split/replay.
> > >>>>>>>>>>>
> > >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that runs
> a
> > >>> MR
> > >>>> job
> > >>>>>>>>>> from
> > >>>>>>>>>>> the master to copy data or restore data.
> > >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use
> > >>> backup
> > >>>>>>>>> you'll
> > >>>>>>>>>>> not end up running MR jobs, but this was probably true for
> MOB
> > >>> as
> > >>>> in
> > >>>>>>>>> "if
> > >>>>>>>>>>> you don't enable MOB you don't need MR")
> > >>>>>>>>>>>
> > >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have
> > >>> hbase
> > >>>> run
> > >>>>>>>>> MR
> > >>>>>>>>>>> jobs, only tool started manually by the user can do that". or
> > >>> can
> > >>>> we
> > >>>>>>>>>> start
> > >>>>>>>>>>> adding MR calls around without problems?
> > >>>
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
So what about a standalone service other than master? You can use your own
procedure store in that service?

2016-09-23 11:28 GMT+08:00 Ted Yu :

> An earlier implementation was client driven.
>
> But with that approach, it is hard to resume if there is error midway.
> Using Procedure V2 makes the backup / restore more robust.
>
> Another consideration is for security. It is hard to enforce security (to
> be implemented) for client driven actions.
>
> Cheers
>
> > On Sep 22, 2016, at 8:15 PM, Andrew Purtell 
> wrote:
> >
> > No, this misses Matteo's finer point, which is "shelling out" from the
> master directly to run MR is a first. Why not drive this with a utility
> derived from Tool?
> >
> > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov 
> wrote:
> >
> >>>> In our production cluster,  it is a common case we just have HDFS and
> >>>> HBase deployed.
> >>>> If our Master/RS depend on MR framework (especially some features we
> >>>> have not used at all),  it introduced another cost for maintain.  I
> >>>> don't think it is a good idea.
> >>
> >> So , you are not backup users in this case. Many our customers have full
> >> stack deployed and
> >> want see backup to be a standard feature. Besides this, nothing will
> happen
> >> in your cluster
> >> if you won't be doing backups.
> >>
> >> This discussion (we do not want see M/R dependency) goes to nowhere. We
> >> asked already, at least twice, to suggest another framework (other than
> M/R)
> >> for bulk data copy with *conversion*. Still waiting for suggestions.
> >>
> >> -Vlad
> >>
> >>
> >>
> >>
> >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu  wrote:
> >>>
> >>> If MR framework is not deployed in the cluster, hbase still functions
> >>> normally (post merge).
> >>>
> >>> In terms of build time dependency, we have long been depending on
> >>> mapreduce. Take a look at ExportSnapshot.
> >>>
> >>> Cheers
> >>>
> >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
> >>> wrote:
> >>>
> >>>> In our production cluster,  it is a common case we just have HDFS and
> >>>> HBase deployed.
> >>>> If our Master/RS depend on MR framework (especially some features we
> >>>> have not used at all),  it introduced another cost for maintain.  I
> >>>> don't think it is a good idea.
> >>>>
> >>>> 2016-09-23 10:28 GMT+08:00 张铎 :
> >>>>> To be specific, for example, our nice Backup/Restore feature, if we
> >>> think
> >>>>> this is not a core feature of HBase, then we could make it depend on
> >>> MR,
> >>>>> and start a standalone BackupManager instance that submits MR jobs to
> >>> do
> >>>>> periodical maintenance job. And if we think this is a core feature
> that
> >>>>> everyone should use it, then we'd better implement it without MR
> >>>>> dependency, like DLS.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> 2016-09-23 10:11 GMT+08:00 张铎 :
> >>>>>
> >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
> >>>>>> features depend on MR but I think the bottom line is that we should
> >>>> launch
> >>>>>> the jobs from outside manually or by other services.
> >>>>>>
> >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell  >:
> >>>>>>
> >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
> >>>>>>> question.
> >>>>>>>
> >>>>>>> Can this be driven by a utility derived from Tool like our other MR
> >>>> apps?
> >>>>>>> The issue is needing the AccessController to decide if allowed? But
> >>>> nothing
> >>>>>>> prevents the user from running the job manually/independently,
> right?
> >>>>>>>
> >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> >>>> theo.berto...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> just a remark. my query was not about to

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Ted Yu
An earlier implementation was client driven. 

But with that approach, it is hard to resume if there is error midway. 
Using Procedure V2 makes the backup / restore more robust. 

Another consideration is for security. It is hard to enforce security (to be 
implemented) for client driven actions. 

Cheers

> On Sep 22, 2016, at 8:15 PM, Andrew Purtell  wrote:
> 
> No, this misses Matteo's finer point, which is "shelling out" from the master 
> directly to run MR is a first. Why not drive this with a utility derived from 
> Tool?
> 
> On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov  wrote:
> 
>>>> In our production cluster,  it is a common case we just have HDFS and
>>>> HBase deployed.
>>>> If our Master/RS depend on MR framework (especially some features we
>>>> have not used at all),  it introduced another cost for maintain.  I
>>>> don't think it is a good idea.
>> 
>> So , you are not backup users in this case. Many our customers have full
>> stack deployed and
>> want see backup to be a standard feature. Besides this, nothing will happen
>> in your cluster
>> if you won't be doing backups.
>> 
>> This discussion (we do not want see M/R dependency) goes to nowhere. We
>> asked already, at least twice, to suggest another framework (other than M/R)
>> for bulk data copy with *conversion*. Still waiting for suggestions.
>> 
>> -Vlad
>> 
>> 
>> 
>> 
>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu  wrote:
>>> 
>>> If MR framework is not deployed in the cluster, hbase still functions
>>> normally (post merge).
>>> 
>>> In terms of build time dependency, we have long been depending on
>>> mapreduce. Take a look at ExportSnapshot.
>>> 
>>> Cheers
>>> 
>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
>>> wrote:
>>> 
>>>> In our production cluster,  it is a common case we just have HDFS and
>>>> HBase deployed.
>>>> If our Master/RS depend on MR framework (especially some features we
>>>> have not used at all),  it introduced another cost for maintain.  I
>>>> don't think it is a good idea.
>>>> 
>>>> 2016-09-23 10:28 GMT+08:00 张铎 :
>>>>> To be specific, for example, our nice Backup/Restore feature, if we
>>> think
>>>>> this is not a core feature of HBase, then we could make it depend on
>>> MR,
>>>>> and start a standalone BackupManager instance that submits MR jobs to
>>> do
>>>>> periodical maintenance job. And if we think this is a core feature that
>>>>> everyone should use it, then we'd better implement it without MR
>>>>> dependency, like DLS.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 2016-09-23 10:11 GMT+08:00 张铎 :
>>>>> 
>>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>>>>>> features depend on MR but I think the bottom line is that we should
>>>> launch
>>>>>> the jobs from outside manually or by other services.
>>>>>> 
>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
>>>>>> 
>>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>>>>>>> question.
>>>>>>> 
>>>>>>> Can this be driven by a utility derived from Tool like our other MR
>>>> apps?
>>>>>>> The issue is needing the AccessController to decide if allowed? But
>>>> nothing
>>>>>>> prevents the user from running the job manually/independently, right?
>>>>>>> 
>>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>>>> theo.berto...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> just a remark. my query was not about tools using MR (everyone i
>>>> think
>>>>>>> is
>>>>>>>> ok with those).
>>>>>>>> the topic was about: "are we ok with running MR jobs from Master
>>> and
>>>> RSs
>>>>>>>> code?" since this will be the first time we do this
>>>>>>>> 
>>>>>>>> Matteo
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
>

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
Agreed, this would be interesting to contemplate. 

On Sep 22, 2016, at 8:03 PM, Vladimir Rodionov  wrote:

>>> No, never.
> 
> No need for M/R here, just a simple compaction-server colocated with RS on
> a same node.
> You save a lot on GC in RS. Ideally, it can be IO "nice" in Linux (by
> setting IO priority). But offtopic, of course :)
> 
> -Vlad
> 
> On Thu, Sep 22, 2016 at 7:57 PM, Vladimir Rodionov 
> wrote:
> 
>>>> And if MR not deployed,  Backup/Restore feature could not be used,
>> right?
>> 
>> Yes.
>> 
>> On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen 
>> wrote:
>> 
>>> {quote}
>>> If MR framework is not deployed in the cluster, hbase still functions
>>> normally (post merge).
>>> {quote}
>>> 
>>> If MR is not strong dependency for Master/RS,  it is OK for me.
>>> And if MR not deployed,  Backup/Restore feature could not be used, right?
>>> 
>>> 2016-09-23 10:49 GMT+08:00 Ted Yu :
>>>> If MR framework is not deployed in the cluster, hbase still functions
>>>> normally (post merge).
>>>> 
>>>> In terms of build time dependency, we have long been depending on
>>>> mapreduce. Take a look at ExportSnapshot.
>>>> 
>>>> Cheers
>>>> 
>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
>>> wrote:
>>>> 
>>>>> In our production cluster,  it is a common case we just have HDFS and
>>>>> HBase deployed.
>>>>> If our Master/RS depend on MR framework (especially some features we
>>>>> have not used at all),  it introduced another cost for maintain.  I
>>>>> don't think it is a good idea.
>>>>> 
>>>>> 2016-09-23 10:28 GMT+08:00 张铎 :
>>>>>> To be specific, for example, our nice Backup/Restore feature, if we
>>> think
>>>>>> this is not a core feature of HBase, then we could make it depend on
>>> MR,
>>>>>> and start a standalone BackupManager instance that submits MR jobs
>>> to do
>>>>>> periodical maintenance job. And if we think this is a core feature
>>> that
>>>>>> everyone should use it, then we'd better implement it without MR
>>>>>> dependency, like DLS.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 :
>>>>>> 
>>>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>>>>>>> features depend on MR but I think the bottom line is that we should
>>>>> launch
>>>>>>> the jobs from outside manually or by other services.
>>>>>>> 
>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell >>> :
>>>>>>> 
>>>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>>>>>>>> question.
>>>>>>>> 
>>>>>>>> Can this be driven by a utility derived from Tool like our other MR
>>>>> apps?
>>>>>>>> The issue is needing the AccessController to decide if allowed? But
>>>>> nothing
>>>>>>>> prevents the user from running the job manually/independently,
>>> right?
>>>>>>>> 
>>>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>>>>> theo.berto...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> just a remark. my query was not about tools using MR (everyone i
>>>>> think
>>>>>>>> is
>>>>>>>>> ok with those).
>>>>>>>>> the topic was about: "are we ok with running MR jobs from Master
>>> and
>>>>> RSs
>>>>>>>>> code?" since this will be the first time we do this
>>>>>>>>> 
>>>>>>>>> Matteo
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
>>> d...@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup /
>>> Restore,
>>>>> it's
>>>>>>>&g

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
No, this misses Matteo's finer point, which is "shelling out" from the master 
directly to run MR is a first. Why not drive this with a utility derived from 
Tool?

On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov  wrote:

>>> In our production cluster,  it is a common case we just have HDFS and
>>> HBase deployed.
>>> If our Master/RS depend on MR framework (especially some features we
>>> have not used at all),  it introduced another cost for maintain.  I
>>> don't think it is a good idea.
> 
> So , you are not backup users in this case. Many our customers have full
> stack deployed and
> want see backup to be a standard feature. Besides this, nothing will happen
> in your cluster
> if you won't be doing backups.
> 
> This discussion (we do not want see M/R dependency) goes to nowhere. We
> asked already, at least twice, to suggest another framework (other than M/R)
> for bulk data copy with *conversion*. Still waiting for suggestions.
> 
> -Vlad
> 
> 
> 
> 
>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu  wrote:
>> 
>> If MR framework is not deployed in the cluster, hbase still functions
>> normally (post merge).
>> 
>> In terms of build time dependency, we have long been depending on
>> mapreduce. Take a look at ExportSnapshot.
>> 
>> Cheers
>> 
>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
>> wrote:
>> 
>>> In our production cluster,  it is a common case we just have HDFS and
>>> HBase deployed.
>>> If our Master/RS depend on MR framework (especially some features we
>>> have not used at all),  it introduced another cost for maintain.  I
>>> don't think it is a good idea.
>>> 
>>> 2016-09-23 10:28 GMT+08:00 张铎 :
>>>> To be specific, for example, our nice Backup/Restore feature, if we
>> think
>>>> this is not a core feature of HBase, then we could make it depend on
>> MR,
>>>> and start a standalone BackupManager instance that submits MR jobs to
>> do
>>>> periodical maintenance job. And if we think this is a core feature that
>>>> everyone should use it, then we'd better implement it without MR
>>>> dependency, like DLS.
>>>> 
>>>> Thanks.
>>>> 
>>>> 2016-09-23 10:11 GMT+08:00 张铎 :
>>>> 
>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>>>>> features depend on MR but I think the bottom line is that we should
>>> launch
>>>>> the jobs from outside manually or by other services.
>>>>> 
>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
>>>>> 
>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>>>>>> question.
>>>>>> 
>>>>>> Can this be driven by a utility derived from Tool like our other MR
>>> apps?
>>>>>> The issue is needing the AccessController to decide if allowed? But
>>> nothing
>>>>>> prevents the user from running the job manually/independently, right?
>>>>>> 
>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>>> theo.berto...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> just a remark. my query was not about tools using MR (everyone i
>>> think
>>>>>> is
>>>>>>> ok with those).
>>>>>>> the topic was about: "are we ok with running MR jobs from Master
>> and
>>> RSs
>>>>>>> code?" since this will be the first time we do this
>>>>>>> 
>>>>>>> Matteo
>>>>>>> 
>>>>>>> 
>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
>> d...@hortonworks.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / Restore,
>>> it's
>>>>>>>> fine to be dependent on MR. MR is the right framework for such. We
>>>>>> should
>>>>>>>> also do compactions using MR (just saying :) )
>>>>>>>> 
>>>>>>>> From: Ted Yu 
>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM
>>>>>>>> To: dev@hbase.apache.org
>>>>>>>> Subject: Re: [DISCUSSION] MR jobs 

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> If MR is not strong dependency for Master/RS,  it is OK for me.

There is no strong MR dependency for Master/RS. They will function as
usual, until you try backup,
it will fail but Master won't.

-Vlad

On Thu, Sep 22, 2016 at 8:03 PM, Vladimir Rodionov 
wrote:

> >> No, never.
>
> No need for M/R here, just a simple compaction-server colocated with RS on
> a same node.
> You save a lot on GC in RS. Ideally, it can be IO "nice" in Linux (by
> setting IO priority). But offtopic, of course :)
>
> -Vlad
>
> On Thu, Sep 22, 2016 at 7:57 PM, Vladimir Rodionov  > wrote:
>
>> >> And if MR not deployed,  Backup/Restore feature could not be used,
>> right?
>>
>> Yes.
>>
>> On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen 
>> wrote:
>>
>>> {quote}
>>> If MR framework is not deployed in the cluster, hbase still functions
>>> normally (post merge).
>>> {quote}
>>>
>>> If MR is not strong dependency for Master/RS,  it is OK for me.
>>> And if MR not deployed,  Backup/Restore feature could not be used, right?
>>>
>>> 2016-09-23 10:49 GMT+08:00 Ted Yu :
>>> > If MR framework is not deployed in the cluster, hbase still functions
>>> > normally (post merge).
>>> >
>>> > In terms of build time dependency, we have long been depending on
>>> > mapreduce. Take a look at ExportSnapshot.
>>> >
>>> > Cheers
>>> >
>>> > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
>>> wrote:
>>> >
>>> >> In our production cluster,  it is a common case we just have HDFS and
>>> >> HBase deployed.
>>> >> If our Master/RS depend on MR framework (especially some features we
>>> >> have not used at all),  it introduced another cost for maintain.  I
>>> >> don't think it is a good idea.
>>> >>
>>> >> 2016-09-23 10:28 GMT+08:00 张铎 :
>>> >> > To be specific, for example, our nice Backup/Restore feature, if we
>>> think
>>> >> > this is not a core feature of HBase, then we could make it depend
>>> on MR,
>>> >> > and start a standalone BackupManager instance that submits MR jobs
>>> to do
>>> >> > periodical maintenance job. And if we think this is a core feature
>>> that
>>> >> > everyone should use it, then we'd better implement it without MR
>>> >> > dependency, like DLS.
>>> >> >
>>> >> > Thanks.
>>> >> >
>>> >> > 2016-09-23 10:11 GMT+08:00 张铎 :
>>> >> >
>>> >> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of
>>> our
>>> >> >> features depend on MR but I think the bottom line is that we should
>>> >> launch
>>> >> >> the jobs from outside manually or by other services.
>>> >> >>
>>> >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell >> >:
>>> >> >>
>>> >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>>> >> >>> question.
>>> >> >>>
>>> >> >>> Can this be driven by a utility derived from Tool like our other
>>> MR
>>> >> apps?
>>> >> >>> The issue is needing the AccessController to decide if allowed?
>>> But
>>> >> nothing
>>> >> >>> prevents the user from running the job manually/independently,
>>> right?
>>> >> >>>
>>> >> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>>> >> theo.berto...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> >
>>> >> >>> > just a remark. my query was not about tools using MR (everyone i
>>> >> think
>>> >> >>> is
>>> >> >>> > ok with those).
>>> >> >>> > the topic was about: "are we ok with running MR jobs from
>>> Master and
>>> >> RSs
>>> >> >>> > code?" since this will be the first time we do this
>>> >> >>> >
>>> >> >>> > Matteo
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
>>&g

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> No, never.

No need for M/R here, just a simple compaction-server colocated with RS on
a same node.
You save a lot on GC in RS. Ideally, it can be IO "nice" in Linux (by
setting IO priority). But offtopic, of course :)

-Vlad

On Thu, Sep 22, 2016 at 7:57 PM, Vladimir Rodionov 
wrote:

> >> And if MR not deployed,  Backup/Restore feature could not be used,
> right?
>
> Yes.
>
> On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen 
> wrote:
>
>> {quote}
>> If MR framework is not deployed in the cluster, hbase still functions
>> normally (post merge).
>> {quote}
>>
>> If MR is not strong dependency for Master/RS,  it is OK for me.
>> And if MR not deployed,  Backup/Restore feature could not be used, right?
>>
>> 2016-09-23 10:49 GMT+08:00 Ted Yu :
>> > If MR framework is not deployed in the cluster, hbase still functions
>> > normally (post merge).
>> >
>> > In terms of build time dependency, we have long been depending on
>> > mapreduce. Take a look at ExportSnapshot.
>> >
>> > Cheers
>> >
>> > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
>> wrote:
>> >
>> >> In our production cluster,  it is a common case we just have HDFS and
>> >> HBase deployed.
>> >> If our Master/RS depend on MR framework (especially some features we
>> >> have not used at all),  it introduced another cost for maintain.  I
>> >> don't think it is a good idea.
>> >>
>> >> 2016-09-23 10:28 GMT+08:00 张铎 :
>> >> > To be specific, for example, our nice Backup/Restore feature, if we
>> think
>> >> > this is not a core feature of HBase, then we could make it depend on
>> MR,
>> >> > and start a standalone BackupManager instance that submits MR jobs
>> to do
>> >> > periodical maintenance job. And if we think this is a core feature
>> that
>> >> > everyone should use it, then we'd better implement it without MR
>> >> > dependency, like DLS.
>> >> >
>> >> > Thanks.
>> >> >
>> >> > 2016-09-23 10:11 GMT+08:00 张铎 :
>> >> >
>> >> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>> >> >> features depend on MR but I think the bottom line is that we should
>> >> launch
>> >> >> the jobs from outside manually or by other services.
>> >> >>
>> >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell > >:
>> >> >>
>> >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>> >> >>> question.
>> >> >>>
>> >> >>> Can this be driven by a utility derived from Tool like our other MR
>> >> apps?
>> >> >>> The issue is needing the AccessController to decide if allowed? But
>> >> nothing
>> >> >>> prevents the user from running the job manually/independently,
>> right?
>> >> >>>
>> >> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>> >> theo.berto...@gmail.com>
>> >> >>> wrote:
>> >> >>> >
>> >> >>> > just a remark. my query was not about tools using MR (everyone i
>> >> think
>> >> >>> is
>> >> >>> > ok with those).
>> >> >>> > the topic was about: "are we ok with running MR jobs from Master
>> and
>> >> RSs
>> >> >>> > code?" since this will be the first time we do this
>> >> >>> >
>> >> >>> > Matteo
>> >> >>> >
>> >> >>> >
>> >> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
>> d...@hortonworks.com>
>> >> >>> wrote:
>> >> >>> >>
>> >> >>> >> Very much agree; for tools like ExportSnapshot / Backup /
>> Restore,
>> >> it's
>> >> >>> >> fine to be dependent on MR. MR is the right framework for such.
>> We
>> >> >>> should
>> >> >>> >> also do compactions using MR (just saying :) )
>> >> >>> >> 
>> >> >>> >> From: Ted Yu 
>> >> >>> >> Sent: Thursday, September 22, 2016 2:00 PM
>

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> And if MR not deployed,  Backup/Restore feature could not be used, right?

Yes.

On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen  wrote:

> {quote}
> If MR framework is not deployed in the cluster, hbase still functions
> normally (post merge).
> {quote}
>
> If MR is not strong dependency for Master/RS,  it is OK for me.
> And if MR not deployed,  Backup/Restore feature could not be used, right?
>
> 2016-09-23 10:49 GMT+08:00 Ted Yu :
> > If MR framework is not deployed in the cluster, hbase still functions
> > normally (post merge).
> >
> > In terms of build time dependency, we have long been depending on
> > mapreduce. Take a look at ExportSnapshot.
> >
> > Cheers
> >
> > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
> wrote:
> >
> >> In our production cluster,  it is a common case we just have HDFS and
> >> HBase deployed.
> >> If our Master/RS depend on MR framework (especially some features we
> >> have not used at all),  it introduced another cost for maintain.  I
> >> don't think it is a good idea.
> >>
> >> 2016-09-23 10:28 GMT+08:00 张铎 :
> >> > To be specific, for example, our nice Backup/Restore feature, if we
> think
> >> > this is not a core feature of HBase, then we could make it depend on
> MR,
> >> > and start a standalone BackupManager instance that submits MR jobs to
> do
> >> > periodical maintenance job. And if we think this is a core feature
> that
> >> > everyone should use it, then we'd better implement it without MR
> >> > dependency, like DLS.
> >> >
> >> > Thanks.
> >> >
> >> > 2016-09-23 10:11 GMT+08:00 张铎 :
> >> >
> >> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
> >> >> features depend on MR but I think the bottom line is that we should
> >> launch
> >> >> the jobs from outside manually or by other services.
> >> >>
> >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
> >> >>
> >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair
> >> >>> question.
> >> >>>
> >> >>> Can this be driven by a utility derived from Tool like our other MR
> >> apps?
> >> >>> The issue is needing the AccessController to decide if allowed? But
> >> nothing
> >> >>> prevents the user from running the job manually/independently,
> right?
> >> >>>
> >> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> >> theo.berto...@gmail.com>
> >> >>> wrote:
> >> >>> >
> >> >>> > just a remark. my query was not about tools using MR (everyone i
> >> think
> >> >>> is
> >> >>> > ok with those).
> >> >>> > the topic was about: "are we ok with running MR jobs from Master
> and
> >> RSs
> >> >>> > code?" since this will be the first time we do this
> >> >>> >
> >> >>> > Matteo
> >> >>> >
> >> >>> >
> >> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
> d...@hortonworks.com>
> >> >>> wrote:
> >> >>> >>
> >> >>> >> Very much agree; for tools like ExportSnapshot / Backup /
> Restore,
> >> it's
> >> >>> >> fine to be dependent on MR. MR is the right framework for such.
> We
> >> >>> should
> >> >>> >> also do compactions using MR (just saying :) )
> >> >>> >> 
> >> >>> >> From: Ted Yu 
> >> >>> >> Sent: Thursday, September 22, 2016 2:00 PM
> >> >>> >> To: dev@hbase.apache.org
> >> >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >> >>> >>
> >> >>> >> I agree - backup / restore is in the same category as import /
> >> export.
> >> >>> >>
> >> >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> >> >>> andrew.purt...@gmail.com>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >>> Backup is extra tooling around core in my opinion. Like import
> or
> >> >>> 

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>>  In our production cluster,  it is a common case we just have HDFS and
>> HBase deployed.
>> If our Master/RS depend on MR framework (especially some features we
>> have not used at all),  it introduced another cost for maintain.  I
>> don't think it is a good idea.

So , you are not backup users in this case. Many our customers have full
stack deployed and
want see backup to be a standard feature. Besides this, nothing will happen
in your cluster
if you won't be doing backups.

This discussion (we do not want see M/R dependency) goes to nowhere. We
asked already, at least twice, to suggest another framework (other than M/R)
for bulk data copy with *conversion*. Still waiting for suggestions.

-Vlad




On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu  wrote:

> If MR framework is not deployed in the cluster, hbase still functions
> normally (post merge).
>
> In terms of build time dependency, we have long been depending on
> mapreduce. Take a look at ExportSnapshot.
>
> Cheers
>
> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen 
> wrote:
>
> > In our production cluster,  it is a common case we just have HDFS and
> > HBase deployed.
> > If our Master/RS depend on MR framework (especially some features we
> > have not used at all),  it introduced another cost for maintain.  I
> > don't think it is a good idea.
> >
> > 2016-09-23 10:28 GMT+08:00 张铎 :
> > > To be specific, for example, our nice Backup/Restore feature, if we
> think
> > > this is not a core feature of HBase, then we could make it depend on
> MR,
> > > and start a standalone BackupManager instance that submits MR jobs to
> do
> > > periodical maintenance job. And if we think this is a core feature that
> > > everyone should use it, then we'd better implement it without MR
> > > dependency, like DLS.
> > >
> > > Thanks.
> > >
> > > 2016-09-23 10:11 GMT+08:00 张铎 :
> > >
> > >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
> > >> features depend on MR but I think the bottom line is that we should
> > launch
> > >> the jobs from outside manually or by other services.
> > >>
> > >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
> > >>
> > >>> Ok, got it. Well "shelling out" is on the line I think, so a fair
> > >>> question.
> > >>>
> > >>> Can this be driven by a utility derived from Tool like our other MR
> > apps?
> > >>> The issue is needing the AccessController to decide if allowed? But
> > nothing
> > >>> prevents the user from running the job manually/independently, right?
> > >>>
> > >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> > theo.berto...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> > just a remark. my query was not about tools using MR (everyone i
> > think
> > >>> is
> > >>> > ok with those).
> > >>> > the topic was about: "are we ok with running MR jobs from Master
> and
> > RSs
> > >>> > code?" since this will be the first time we do this
> > >>> >
> > >>> > Matteo
> > >>> >
> > >>> >
> > >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
> d...@hortonworks.com>
> > >>> wrote:
> > >>> >>
> > >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore,
> > it's
> > >>> >> fine to be dependent on MR. MR is the right framework for such. We
> > >>> should
> > >>> >> also do compactions using MR (just saying :) )
> > >>> >> 
> > >>> >> From: Ted Yu 
> > >>> >> Sent: Thursday, September 22, 2016 2:00 PM
> > >>> >> To: dev@hbase.apache.org
> > >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > >>> >>
> > >>> >> I agree - backup / restore is in the same category as import /
> > export.
> > >>> >>
> > >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > >>> andrew.purt...@gmail.com>
> > >>> >> wrote:
> > >>> >>
> > >>> >>> Backup is extra tooling around core in my opinion. Like import or
> > >>> export.
> 

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Heng Chen
{quote}
If MR framework is not deployed in the cluster, hbase still functions
normally (post merge).
{quote}

If MR is not strong dependency for Master/RS,  it is OK for me.
And if MR not deployed,  Backup/Restore feature could not be used, right?

2016-09-23 10:49 GMT+08:00 Ted Yu :
> If MR framework is not deployed in the cluster, hbase still functions
> normally (post merge).
>
> In terms of build time dependency, we have long been depending on
> mapreduce. Take a look at ExportSnapshot.
>
> Cheers
>
> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen  wrote:
>
>> In our production cluster,  it is a common case we just have HDFS and
>> HBase deployed.
>> If our Master/RS depend on MR framework (especially some features we
>> have not used at all),  it introduced another cost for maintain.  I
>> don't think it is a good idea.
>>
>> 2016-09-23 10:28 GMT+08:00 张铎 :
>> > To be specific, for example, our nice Backup/Restore feature, if we think
>> > this is not a core feature of HBase, then we could make it depend on MR,
>> > and start a standalone BackupManager instance that submits MR jobs to do
>> > periodical maintenance job. And if we think this is a core feature that
>> > everyone should use it, then we'd better implement it without MR
>> > dependency, like DLS.
>> >
>> > Thanks.
>> >
>> > 2016-09-23 10:11 GMT+08:00 张铎 :
>> >
>> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>> >> features depend on MR but I think the bottom line is that we should
>> launch
>> >> the jobs from outside manually or by other services.
>> >>
>> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
>> >>
>> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>> >>> question.
>> >>>
>> >>> Can this be driven by a utility derived from Tool like our other MR
>> apps?
>> >>> The issue is needing the AccessController to decide if allowed? But
>> nothing
>> >>> prevents the user from running the job manually/independently, right?
>> >>>
>> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>> theo.berto...@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> > just a remark. my query was not about tools using MR (everyone i
>> think
>> >>> is
>> >>> > ok with those).
>> >>> > the topic was about: "are we ok with running MR jobs from Master and
>> RSs
>> >>> > code?" since this will be the first time we do this
>> >>> >
>> >>> > Matteo
>> >>> >
>> >>> >
>> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
>> >>> wrote:
>> >>> >>
>> >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore,
>> it's
>> >>> >> fine to be dependent on MR. MR is the right framework for such. We
>> >>> should
>> >>> >> also do compactions using MR (just saying :) )
>> >>> >> 
>> >>> >> From: Ted Yu 
>> >>> >> Sent: Thursday, September 22, 2016 2:00 PM
>> >>> >> To: dev@hbase.apache.org
>> >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>> >>> >>
>> >>> >> I agree - backup / restore is in the same category as import /
>> export.
>> >>> >>
>> >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
>> >>> andrew.purt...@gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> Backup is extra tooling around core in my opinion. Like import or
>> >>> export.
>> >>> >>> Or the optional MOB tool. It's fine.
>> >>> >>>
>> >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
>> mberto...@apache.org>
>> >>> >>> wrote:
>> >>> >>>>
>> >>> >>>> What's the latest opinion around running MR jobs from hbase
>> (Master
>> >>> or
>> >>> >>> RS)?
>> >>> >>>>
>> >>> >>>> I remember in the past that there was discussion about not having
>> MR
>> >>> >> has
>> >>> >>>> direct dependency of hbase.
>> >>> >>>>
>> >>> >>>> I think some of discussion where around MOB that had a MR job to
>> >>> >> compact,
>> >>> >>>> that later was transformed in a non-MR job to be merged, I think
>> we
>> >>> >> had a
>> >>> >>>> similar discussion for log split/replay.
>> >>> >>>>
>> >>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR
>> job
>> >>> >>> from
>> >>> >>>> the master to copy data or restore data.
>> >>> >>>> (backup is also "not really core" as in.. if you don't use backup
>> >>> >> you'll
>> >>> >>>> not end up running MR jobs, but this was probably true for MOB as
>> in
>> >>> >> "if
>> >>> >>>> you don't enable MOB you don't need MR")
>> >>> >>>>
>> >>> >>>> any thoughts? do we a rule that says "we don't want to have hbase
>> run
>> >>> >> MR
>> >>> >>>> jobs, only tool started manually by the user can do that". or can
>> we
>> >>> >>> start
>> >>> >>>> adding MR calls around without problems?
>> >>> >>>
>> >>> >>
>> >>>
>> >>
>> >>
>>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Ted Yu
If MR framework is not deployed in the cluster, hbase still functions
normally (post merge).

In terms of build time dependency, we have long been depending on
mapreduce. Take a look at ExportSnapshot.

Cheers

On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen  wrote:

> In our production cluster,  it is a common case we just have HDFS and
> HBase deployed.
> If our Master/RS depend on MR framework (especially some features we
> have not used at all),  it introduced another cost for maintain.  I
> don't think it is a good idea.
>
> 2016-09-23 10:28 GMT+08:00 张铎 :
> > To be specific, for example, our nice Backup/Restore feature, if we think
> > this is not a core feature of HBase, then we could make it depend on MR,
> > and start a standalone BackupManager instance that submits MR jobs to do
> > periodical maintenance job. And if we think this is a core feature that
> > everyone should use it, then we'd better implement it without MR
> > dependency, like DLS.
> >
> > Thanks.
> >
> > 2016-09-23 10:11 GMT+08:00 张铎 :
> >
> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
> >> features depend on MR but I think the bottom line is that we should
> launch
> >> the jobs from outside manually or by other services.
> >>
> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
> >>
> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair
> >>> question.
> >>>
> >>> Can this be driven by a utility derived from Tool like our other MR
> apps?
> >>> The issue is needing the AccessController to decide if allowed? But
> nothing
> >>> prevents the user from running the job manually/independently, right?
> >>>
> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> theo.berto...@gmail.com>
> >>> wrote:
> >>> >
> >>> > just a remark. my query was not about tools using MR (everyone i
> think
> >>> is
> >>> > ok with those).
> >>> > the topic was about: "are we ok with running MR jobs from Master and
> RSs
> >>> > code?" since this will be the first time we do this
> >>> >
> >>> > Matteo
> >>> >
> >>> >
> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
> >>> wrote:
> >>> >>
> >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore,
> it's
> >>> >> fine to be dependent on MR. MR is the right framework for such. We
> >>> should
> >>> >> also do compactions using MR (just saying :) )
> >>> >> 
> >>> >> From: Ted Yu 
> >>> >> Sent: Thursday, September 22, 2016 2:00 PM
> >>> >> To: dev@hbase.apache.org
> >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >>> >>
> >>> >> I agree - backup / restore is in the same category as import /
> export.
> >>> >>
> >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> >>> andrew.purt...@gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >>> Backup is extra tooling around core in my opinion. Like import or
> >>> export.
> >>> >>> Or the optional MOB tool. It's fine.
> >>> >>>
> >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> mberto...@apache.org>
> >>> >>> wrote:
> >>> >>>>
> >>> >>>> What's the latest opinion around running MR jobs from hbase
> (Master
> >>> or
> >>> >>> RS)?
> >>> >>>>
> >>> >>>> I remember in the past that there was discussion about not having
> MR
> >>> >> has
> >>> >>>> direct dependency of hbase.
> >>> >>>>
> >>> >>>> I think some of discussion where around MOB that had a MR job to
> >>> >> compact,
> >>> >>>> that later was transformed in a non-MR job to be merged, I think
> we
> >>> >> had a
> >>> >>>> similar discussion for log split/replay.
> >>> >>>>
> >>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR
> job
> >>> >>> from
> >>> >>>> the master to copy data or restore data.
> >>> >>>> (backup is also "not really core" as in.. if you don't use backup
> >>> >> you'll
> >>> >>>> not end up running MR jobs, but this was probably true for MOB as
> in
> >>> >> "if
> >>> >>>> you don't enable MOB you don't need MR")
> >>> >>>>
> >>> >>>> any thoughts? do we a rule that says "we don't want to have hbase
> run
> >>> >> MR
> >>> >>>> jobs, only tool started manually by the user can do that". or can
> we
> >>> >>> start
> >>> >>>> adding MR calls around without problems?
> >>> >>>
> >>> >>
> >>>
> >>
> >>
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Heng Chen
In our production cluster,  it is a common case we just have HDFS and
HBase deployed.
If our Master/RS depend on MR framework (especially some features we
have not used at all),  it introduced another cost for maintain.  I
don't think it is a good idea.

2016-09-23 10:28 GMT+08:00 张铎 :
> To be specific, for example, our nice Backup/Restore feature, if we think
> this is not a core feature of HBase, then we could make it depend on MR,
> and start a standalone BackupManager instance that submits MR jobs to do
> periodical maintenance job. And if we think this is a core feature that
> everyone should use it, then we'd better implement it without MR
> dependency, like DLS.
>
> Thanks.
>
> 2016-09-23 10:11 GMT+08:00 张铎 :
>
>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>> features depend on MR but I think the bottom line is that we should launch
>> the jobs from outside manually or by other services.
>>
>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
>>
>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>>> question.
>>>
>>> Can this be driven by a utility derived from Tool like our other MR apps?
>>> The issue is needing the AccessController to decide if allowed? But nothing
>>> prevents the user from running the job manually/independently, right?
>>>
>>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi 
>>> wrote:
>>> >
>>> > just a remark. my query was not about tools using MR (everyone i think
>>> is
>>> > ok with those).
>>> > the topic was about: "are we ok with running MR jobs from Master and RSs
>>> > code?" since this will be the first time we do this
>>> >
>>> > Matteo
>>> >
>>> >
>>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
>>> wrote:
>>> >>
>>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
>>> >> fine to be dependent on MR. MR is the right framework for such. We
>>> should
>>> >> also do compactions using MR (just saying :) )
>>> >> 
>>> >> From: Ted Yu 
>>> >> Sent: Thursday, September 22, 2016 2:00 PM
>>> >> To: dev@hbase.apache.org
>>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>>> >>
>>> >> I agree - backup / restore is in the same category as import / export.
>>> >>
>>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
>>> andrew.purt...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Backup is extra tooling around core in my opinion. Like import or
>>> export.
>>> >>> Or the optional MOB tool. It's fine.
>>> >>>
>>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
>>> >>> wrote:
>>> >>>>
>>> >>>> What's the latest opinion around running MR jobs from hbase (Master
>>> or
>>> >>> RS)?
>>> >>>>
>>> >>>> I remember in the past that there was discussion about not having MR
>>> >> has
>>> >>>> direct dependency of hbase.
>>> >>>>
>>> >>>> I think some of discussion where around MOB that had a MR job to
>>> >> compact,
>>> >>>> that later was transformed in a non-MR job to be merged, I think we
>>> >> had a
>>> >>>> similar discussion for log split/replay.
>>> >>>>
>>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job
>>> >>> from
>>> >>>> the master to copy data or restore data.
>>> >>>> (backup is also "not really core" as in.. if you don't use backup
>>> >> you'll
>>> >>>> not end up running MR jobs, but this was probably true for MOB as in
>>> >> "if
>>> >>>> you don't enable MOB you don't need MR")
>>> >>>>
>>> >>>> any thoughts? do we a rule that says "we don't want to have hbase run
>>> >> MR
>>> >>>> jobs, only tool started manually by the user can do that". or can we
>>> >>> start
>>> >>>> adding MR calls around without problems?
>>> >>>
>>> >>
>>>
>>
>>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
To be specific, for example, our nice Backup/Restore feature, if we think
this is not a core feature of HBase, then we could make it depend on MR,
and start a standalone BackupManager instance that submits MR jobs to do
periodical maintenance job. And if we think this is a core feature that
everyone should use it, then we'd better implement it without MR
dependency, like DLS.

Thanks.

2016-09-23 10:11 GMT+08:00 张铎 :

> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
> features depend on MR but I think the bottom line is that we should launch
> the jobs from outside manually or by other services.
>
> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
>
>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>> question.
>>
>> Can this be driven by a utility derived from Tool like our other MR apps?
>> The issue is needing the AccessController to decide if allowed? But nothing
>> prevents the user from running the job manually/independently, right?
>>
>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi 
>> wrote:
>> >
>> > just a remark. my query was not about tools using MR (everyone i think
>> is
>> > ok with those).
>> > the topic was about: "are we ok with running MR jobs from Master and RSs
>> > code?" since this will be the first time we do this
>> >
>> > Matteo
>> >
>> >
>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
>> wrote:
>> >>
>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
>> >> fine to be dependent on MR. MR is the right framework for such. We
>> should
>> >> also do compactions using MR (just saying :) )
>> >> 
>> >> From: Ted Yu 
>> >> Sent: Thursday, September 22, 2016 2:00 PM
>> >> To: dev@hbase.apache.org
>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>> >>
>> >> I agree - backup / restore is in the same category as import / export.
>> >>
>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
>> andrew.purt...@gmail.com>
>> >> wrote:
>> >>
>> >>> Backup is extra tooling around core in my opinion. Like import or
>> export.
>> >>> Or the optional MOB tool. It's fine.
>> >>>
>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
>> >>> wrote:
>> >>>>
>> >>>> What's the latest opinion around running MR jobs from hbase (Master
>> or
>> >>> RS)?
>> >>>>
>> >>>> I remember in the past that there was discussion about not having MR
>> >> has
>> >>>> direct dependency of hbase.
>> >>>>
>> >>>> I think some of discussion where around MOB that had a MR job to
>> >> compact,
>> >>>> that later was transformed in a non-MR job to be merged, I think we
>> >> had a
>> >>>> similar discussion for log split/replay.
>> >>>>
>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job
>> >>> from
>> >>>> the master to copy data or restore data.
>> >>>> (backup is also "not really core" as in.. if you don't use backup
>> >> you'll
>> >>>> not end up running MR jobs, but this was probably true for MOB as in
>> >> "if
>> >>>> you don't enable MOB you don't need MR")
>> >>>>
>> >>>> any thoughts? do we a rule that says "we don't want to have hbase run
>> >> MR
>> >>>> jobs, only tool started manually by the user can do that". or can we
>> >>> start
>> >>>> adding MR calls around without problems?
>> >>>
>> >>
>>
>
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
features depend on MR but I think the bottom line is that we should launch
the jobs from outside manually or by other services.

2016-09-23 9:47 GMT+08:00 Andrew Purtell :

> Ok, got it. Well "shelling out" is on the line I think, so a fair question.
>
> Can this be driven by a utility derived from Tool like our other MR apps?
> The issue is needing the AccessController to decide if allowed? But nothing
> prevents the user from running the job manually/independently, right?
>
> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi 
> wrote:
> >
> > just a remark. my query was not about tools using MR (everyone i think is
> > ok with those).
> > the topic was about: "are we ok with running MR jobs from Master and RSs
> > code?" since this will be the first time we do this
> >
> > Matteo
> >
> >
> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
> wrote:
> >>
> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
> >> fine to be dependent on MR. MR is the right framework for such. We
> should
> >> also do compactions using MR (just saying :) )
> >> ________________________
> >> From: Ted Yu 
> >> Sent: Thursday, September 22, 2016 2:00 PM
> >> To: dev@hbase.apache.org
> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >>
> >> I agree - backup / restore is in the same category as import / export.
> >>
> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> andrew.purt...@gmail.com>
> >> wrote:
> >>
> >>> Backup is extra tooling around core in my opinion. Like import or
> export.
> >>> Or the optional MOB tool. It's fine.
> >>>
> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
> >>> wrote:
> >>>>
> >>>> What's the latest opinion around running MR jobs from hbase (Master or
> >>> RS)?
> >>>>
> >>>> I remember in the past that there was discussion about not having MR
> >> has
> >>>> direct dependency of hbase.
> >>>>
> >>>> I think some of discussion where around MOB that had a MR job to
> >> compact,
> >>>> that later was transformed in a non-MR job to be merged, I think we
> >> had a
> >>>> similar discussion for log split/replay.
> >>>>
> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job
> >>> from
> >>>> the master to copy data or restore data.
> >>>> (backup is also "not really core" as in.. if you don't use backup
> >> you'll
> >>>> not end up running MR jobs, but this was probably true for MOB as in
> >> "if
> >>>> you don't enable MOB you don't need MR")
> >>>>
> >>>> any thoughts? do we a rule that says "we don't want to have hbase run
> >> MR
> >>>> jobs, only tool started manually by the user can do that". or can we
> >>> start
> >>>> adding MR calls around without problems?
> >>>
> >>
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
(Back with a sore throat.)

Also for what it is worth - it may well be that the attempt to bolt 
containers-as-executors to YARN is too little too late and coordination of 
container based services and applications (such as distributed map-reduce 
workflows or more likely Spark) will be handled by the native management 
tooling we are all using to build container based infrastructure. It's unclear 
to me how long HBase will live into this new regime but I'd optimistically 
wager long enough so even having YARN around is a suspect assumption. 

> On Sep 22, 2016, at 6:58 PM, Andrew Purtell  wrote:
> 
> > We should also do compactions using MR (just saying :) 
> 
> No, never. It's not a good idea to wed any of our core function to something 
> that independently evolves, that some of us don't have commit rights on (and 
> never will), and has varying degrees of utility depending on deploy. Like JM 
> says in some places not having the MR runtime around is virtuous. 
> 
> (Runs away screaming.)
> 
> On Sep 22, 2016, at 2:49 PM, Devaraj Das  wrote:
> 
>> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's fine 
>> to be dependent on MR. MR is the right framework for such. We should also do 
>> compactions using MR (just saying :) )
>> 
>> From: Ted Yu 
>> Sent: Thursday, September 22, 2016 2:00 PM
>> To: dev@hbase.apache.org
>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>> 
>> I agree - backup / restore is in the same category as import / export.
>> 
>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell 
>> wrote:
>> 
>>> Backup is extra tooling around core in my opinion. Like import or export.
>>> Or the optional MOB tool. It's fine.
>>> 
>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
>>> wrote:
>>>> 
>>>> What's the latest opinion around running MR jobs from hbase (Master or
>>> RS)?
>>>> 
>>>> I remember in the past that there was discussion about not having MR has
>>>> direct dependency of hbase.
>>>> 
>>>> I think some of discussion where around MOB that had a MR job to compact,
>>>> that later was transformed in a non-MR job to be merged, I think we had a
>>>> similar discussion for log split/replay.
>>>> 
>>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job
>>> from
>>>> the master to copy data or restore data.
>>>> (backup is also "not really core" as in.. if you don't use backup you'll
>>>> not end up running MR jobs, but this was probably true for MOB as in "if
>>>> you don't enable MOB you don't need MR")
>>>> 
>>>> any thoughts? do we a rule that says "we don't want to have hbase run MR
>>>> jobs, only tool started manually by the user can do that". or can we
>>> start
>>>> adding MR calls around without problems?
>>> 


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
> We should also do compactions using MR (just saying :) 

No, never. It's not a good idea to wed any of our core function to something 
that independently evolves, that some of us don't have commit rights on (and 
never will), and has varying degrees of utility depending on deploy. Like JM 
says in some places not having the MR runtime around is virtuous. 

(Runs away screaming.)

> On Sep 22, 2016, at 2:49 PM, Devaraj Das  wrote:
> 
> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's fine 
> to be dependent on MR. MR is the right framework for such. We should also do 
> compactions using MR (just saying :) )
> 
> From: Ted Yu 
> Sent: Thursday, September 22, 2016 2:00 PM
> To: dev@hbase.apache.org
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> 
> I agree - backup / restore is in the same category as import / export.
> 
> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell 
> wrote:
> 
>> Backup is extra tooling around core in my opinion. Like import or export.
>> Or the optional MOB tool. It's fine.
>> 
>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
>> wrote:
>>> 
>>> What's the latest opinion around running MR jobs from hbase (Master or
>> RS)?
>>> 
>>> I remember in the past that there was discussion about not having MR has
>>> direct dependency of hbase.
>>> 
>>> I think some of discussion where around MOB that had a MR job to compact,
>>> that later was transformed in a non-MR job to be merged, I think we had a
>>> similar discussion for log split/replay.
>>> 
>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job
>> from
>>> the master to copy data or restore data.
>>> (backup is also "not really core" as in.. if you don't use backup you'll
>>> not end up running MR jobs, but this was probably true for MOB as in "if
>>> you don't enable MOB you don't need MR")
>>> 
>>> any thoughts? do we a rule that says "we don't want to have hbase run MR
>>> jobs, only tool started manually by the user can do that". or can we
>> start
>>> adding MR calls around without problems?
>> 


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
Ok, got it. Well "shelling out" is on the line I think, so a fair question.

Can this be driven by a utility derived from Tool like our other MR apps? The 
issue is needing the AccessController to decide if allowed? But nothing 
prevents the user from running the job manually/independently, right?

> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi  wrote:
> 
> just a remark. my query was not about tools using MR (everyone i think is
> ok with those).
> the topic was about: "are we ok with running MR jobs from Master and RSs
> code?" since this will be the first time we do this
> 
> Matteo
> 
> 
>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das  wrote:
>> 
>> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
>> fine to be dependent on MR. MR is the right framework for such. We should
>> also do compactions using MR (just saying :) )
>> 
>> From: Ted Yu 
>> Sent: Thursday, September 22, 2016 2:00 PM
>> To: dev@hbase.apache.org
>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>> 
>> I agree - backup / restore is in the same category as import / export.
>> 
>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell 
>> wrote:
>> 
>>> Backup is extra tooling around core in my opinion. Like import or export.
>>> Or the optional MOB tool. It's fine.
>>> 
>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
>>> wrote:
>>>> 
>>>> What's the latest opinion around running MR jobs from hbase (Master or
>>> RS)?
>>>> 
>>>> I remember in the past that there was discussion about not having MR
>> has
>>>> direct dependency of hbase.
>>>> 
>>>> I think some of discussion where around MOB that had a MR job to
>> compact,
>>>> that later was transformed in a non-MR job to be merged, I think we
>> had a
>>>> similar discussion for log split/replay.
>>>> 
>>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job
>>> from
>>>> the master to copy data or restore data.
>>>> (backup is also "not really core" as in.. if you don't use backup
>> you'll
>>>> not end up running MR jobs, but this was probably true for MOB as in
>> "if
>>>> you don't enable MOB you don't need MR")
>>>> 
>>>> any thoughts? do we a rule that says "we don't want to have hbase run
>> MR
>>>> jobs, only tool started manually by the user can do that". or can we
>>> start
>>>> adding MR calls around without problems?
>>> 
>> 


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Enis Söztutar
Once you are in the game of coordinating large scale tasks with
distribution, fault tolerance, etc other than implementing a similar
framework inside HBase, MR will be the way to go. Things like exporting
snapshots, dist cp, or backups (which uses these) must use such a
framework.

The issue about master launching MR jobs came in the review around that
time, and we concluded that it was fine since backups by definition require
such a framework.

Enis

On Thu, Sep 22, 2016 at 4:32 PM, Devaraj Das  wrote:

> Not practical to do those tools without MR, JM. We should be using the
> right framework for the use cases in hand. MR fits this really well.
> JM, when you say "if we can do without MR, then, why not?", do you have a
> framework in mind that performs/scale as well as MR? Curious.
> 
> From: Jean-Marc Spaggiari 
> Sent: Thursday, September 22, 2016 4:29 PM
> To: dev
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> Well, I'm just not using those features ;) But was hopping for the MOBs ;)
> My point is, if we can do it without MR, then, why not? )
>
> 2016-09-22 19:25 GMT-04:00 Vladimir Rodionov :
>
> > Forgot WALPlayer :)
> >
> > -Vlad
> >
> > On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov <
> vladrodio...@gmail.com
> > >
> > wrote:
> >
> > > >> and
> > > >> backups too, but don't want to bother having to install and
> configure
> > > YARN
> > > >> just for that, as well as removing resources from HBase to give it
> to
> > >
> > > Any suggestions on how to do bulk data move with transformation from/to
> > > HBase cluster w/o MapReduce?
> > >
> > > Opposition to M/R does not make sense imo, as since we have a lot of
> > tools
> > > in HBase which depend on MapReduce:
> > >
> > > CountRows
> > > CountCells
> > > Import
> > > Export
> > > ImportTsv
> > > ExportTsv
> > > CopyTable
> > > VerifyReplication
> > > ExportSnapshot
> > >
> > > and new backup create/restore of course.
> > >
> > >
> > > -Vlad
> > >
> > >
> > >
> > >
> > > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org> wrote:
> > >
> > >> My 2¢: I have a strong preference for NOT having a dependency on MR
> > >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I
> > >> like
> > >> all the features that we built. Would love to be able to use MOBs and
> > >> backups too, but don't want to bother having to install and configure
> > YARN
> > >> just for that, as well as removing resources from HBase to give it to
> > >> yarn
> > >>
> > >> JMS
> > >>
> > >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi :
> > >>
> > >> > just a remark. my query was not about tools using MR (everyone i
> think
> > >> is
> > >> > ok with those).
> > >> > the topic was about: "are we ok with running MR jobs from Master and
> > RSs
> > >> > code?" since this will be the first time we do this
> > >> >
> > >> > Matteo
> > >> >
> > >> >
> > >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
> > >> wrote:
> > >> >
> > >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore,
> > >> it's
> > >> > > fine to be dependent on MR. MR is the right framework for such. We
> > >> should
> > >> > > also do compactions using MR (just saying :) )
> > >> > > 
> > >> > > From: Ted Yu 
> > >> > > Sent: Thursday, September 22, 2016 2:00 PM
> > >> > > To: dev@hbase.apache.org
> > >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > >> > >
> > >> > > I agree - backup / restore is in the same category as import /
> > export.
> > >> > >
> > >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > >> > andrew.purt...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Backup is extra tooling around core in my opinion. Like import
> or
> > >> > export.
> > >> > > > Or the

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Not practical to do those tools without MR, JM. We should be using the right 
framework for the use cases in hand. MR fits this really well. 
JM, when you say "if we can do without MR, then, why not?", do you have a 
framework in mind that performs/scale as well as MR? Curious.

From: Jean-Marc Spaggiari 
Sent: Thursday, September 22, 2016 4:29 PM
To: dev
Subject: Re: [DISCUSSION] MR jobs started by Master or RS

Well, I'm just not using those features ;) But was hopping for the MOBs ;)
My point is, if we can do it without MR, then, why not? )

2016-09-22 19:25 GMT-04:00 Vladimir Rodionov :

> Forgot WALPlayer :)
>
> -Vlad
>
> On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov  >
> wrote:
>
> > >> and
> > >> backups too, but don't want to bother having to install and configure
> > YARN
> > >> just for that, as well as removing resources from HBase to give it to
> >
> > Any suggestions on how to do bulk data move with transformation from/to
> > HBase cluster w/o MapReduce?
> >
> > Opposition to M/R does not make sense imo, as since we have a lot of
> tools
> > in HBase which depend on MapReduce:
> >
> > CountRows
> > CountCells
> > Import
> > Export
> > ImportTsv
> > ExportTsv
> > CopyTable
> > VerifyReplication
> > ExportSnapshot
> >
> > and new backup create/restore of course.
> >
> >
> > -Vlad
> >
> >
> >
> >
> > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> >> My 2¢: I have a strong preference for NOT having a dependency on MR
> >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I
> >> like
> >> all the features that we built. Would love to be able to use MOBs and
> >> backups too, but don't want to bother having to install and configure
> YARN
> >> just for that, as well as removing resources from HBase to give it to
> >> yarn
> >>
> >> JMS
> >>
> >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi :
> >>
> >> > just a remark. my query was not about tools using MR (everyone i think
> >> is
> >> > ok with those).
> >> > the topic was about: "are we ok with running MR jobs from Master and
> RSs
> >> > code?" since this will be the first time we do this
> >> >
> >> > Matteo
> >> >
> >> >
> >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
> >> wrote:
> >> >
> >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore,
> >> it's
> >> > > fine to be dependent on MR. MR is the right framework for such. We
> >> should
> >> > > also do compactions using MR (just saying :) )
> >> > > 
> >> > > From: Ted Yu 
> >> > > Sent: Thursday, September 22, 2016 2:00 PM
> >> > > To: dev@hbase.apache.org
> >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >> > >
> >> > > I agree - backup / restore is in the same category as import /
> export.
> >> > >
> >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> >> > andrew.purt...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Backup is extra tooling around core in my opinion. Like import or
> >> > export.
> >> > > > Or the optional MOB tool. It's fine.
> >> > > >
> >> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> >> mberto...@apache.org>
> >> > > > wrote:
> >> > > > >
> >> > > > > What's the latest opinion around running MR jobs from hbase
> >> (Master
> >> > or
> >> > > > RS)?
> >> > > > >
> >> > > > > I remember in the past that there was discussion about not
> having
> >> MR
> >> > > has
> >> > > > > direct dependency of hbase.
> >> > > > >
> >> > > > > I think some of discussion where around MOB that had a MR job to
> >> > > compact,
> >> > > > > that later was transformed in a non-MR job to be merged, I think
> >> we
> >> > > had a
> >> > > > > similar discussion for log split/replay.
> >> > > > >
> >> > > > > the latest is the new Backup feature (HBASE-7912), that runs a
> MR
> >> job
> >> > > > from
> >> > > > > the master to copy data or restore data.
> >> > > > > (backup is also "not really core" as in.. if you don't use
> backup
> >> > > you'll
> >> > > > > not end up running MR jobs, but this was probably true for MOB
> as
> >> in
> >> > > "if
> >> > > > > you don't enable MOB you don't need MR")
> >> > > > >
> >> > > > > any thoughts? do we a rule that says "we don't want to have
> hbase
> >> run
> >> > > MR
> >> > > > > jobs, only tool started manually by the user can do that". or
> can
> >> we
> >> > > > start
> >> > > > > adding MR calls around without problems?
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Matteo, the Master won't spawn the job unless someone actually wants to use the 
backup/restore. So I'd argue we still don't have a 'hard' dependency - it's 
still much like the other tools that you consider as being outside the core.

From: Matteo Bertozzi 
Sent: Thursday, September 22, 2016 3:44 PM
To: dev@hbase.apache.org
Subject: Re: [DISCUSSION] MR jobs started by Master or RS

just a remark. my query was not about tools using MR (everyone i think is
ok with those).
the topic was about: "are we ok with running MR jobs from Master and RSs
code?" since this will be the first time we do this

Matteo


On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das  wrote:

> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
> fine to be dependent on MR. MR is the right framework for such. We should
> also do compactions using MR (just saying :) )
> 
> From: Ted Yu 
> Sent: Thursday, September 22, 2016 2:00 PM
> To: dev@hbase.apache.org
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> I agree - backup / restore is in the same category as import / export.
>
> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell 
> wrote:
>
> > Backup is extra tooling around core in my opinion. Like import or export.
> > Or the optional MOB tool. It's fine.
> >
> > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
> > wrote:
> > >
> > > What's the latest opinion around running MR jobs from hbase (Master or
> > RS)?
> > >
> > > I remember in the past that there was discussion about not having MR
> has
> > > direct dependency of hbase.
> > >
> > > I think some of discussion where around MOB that had a MR job to
> compact,
> > > that later was transformed in a non-MR job to be merged, I think we
> had a
> > > similar discussion for log split/replay.
> > >
> > > the latest is the new Backup feature (HBASE-7912), that runs a MR job
> > from
> > > the master to copy data or restore data.
> > > (backup is also "not really core" as in.. if you don't use backup
> you'll
> > > not end up running MR jobs, but this was probably true for MOB as in
> "if
> > > you don't enable MOB you don't need MR")
> > >
> > > any thoughts? do we a rule that says "we don't want to have hbase run
> MR
> > > jobs, only tool started manually by the user can do that". or can we
> > start
> > > adding MR calls around without problems?
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Jean-Marc Spaggiari
Well, I'm just not using those features ;) But was hopping for the MOBs ;)
My point is, if we can do it without MR, then, why not? )

2016-09-22 19:25 GMT-04:00 Vladimir Rodionov :

> Forgot WALPlayer :)
>
> -Vlad
>
> On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov  >
> wrote:
>
> > >> and
> > >> backups too, but don't want to bother having to install and configure
> > YARN
> > >> just for that, as well as removing resources from HBase to give it to
> >
> > Any suggestions on how to do bulk data move with transformation from/to
> > HBase cluster w/o MapReduce?
> >
> > Opposition to M/R does not make sense imo, as since we have a lot of
> tools
> > in HBase which depend on MapReduce:
> >
> > CountRows
> > CountCells
> > Import
> > Export
> > ImportTsv
> > ExportTsv
> > CopyTable
> > VerifyReplication
> > ExportSnapshot
> >
> > and new backup create/restore of course.
> >
> >
> > -Vlad
> >
> >
> >
> >
> > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> >> My 2¢: I have a strong preference for NOT having a dependency on MR
> >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I
> >> like
> >> all the features that we built. Would love to be able to use MOBs and
> >> backups too, but don't want to bother having to install and configure
> YARN
> >> just for that, as well as removing resources from HBase to give it to
> >> yarn
> >>
> >> JMS
> >>
> >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi :
> >>
> >> > just a remark. my query was not about tools using MR (everyone i think
> >> is
> >> > ok with those).
> >> > the topic was about: "are we ok with running MR jobs from Master and
> RSs
> >> > code?" since this will be the first time we do this
> >> >
> >> > Matteo
> >> >
> >> >
> >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
> >> wrote:
> >> >
> >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore,
> >> it's
> >> > > fine to be dependent on MR. MR is the right framework for such. We
> >> should
> >> > > also do compactions using MR (just saying :) )
> >> > > 
> >> > > From: Ted Yu 
> >> > > Sent: Thursday, September 22, 2016 2:00 PM
> >> > > To: dev@hbase.apache.org
> >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >> > >
> >> > > I agree - backup / restore is in the same category as import /
> export.
> >> > >
> >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> >> > andrew.purt...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Backup is extra tooling around core in my opinion. Like import or
> >> > export.
> >> > > > Or the optional MOB tool. It's fine.
> >> > > >
> >> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
> >> mberto...@apache.org>
> >> > > > wrote:
> >> > > > >
> >> > > > > What's the latest opinion around running MR jobs from hbase
> >> (Master
> >> > or
> >> > > > RS)?
> >> > > > >
> >> > > > > I remember in the past that there was discussion about not
> having
> >> MR
> >> > > has
> >> > > > > direct dependency of hbase.
> >> > > > >
> >> > > > > I think some of discussion where around MOB that had a MR job to
> >> > > compact,
> >> > > > > that later was transformed in a non-MR job to be merged, I think
> >> we
> >> > > had a
> >> > > > > similar discussion for log split/replay.
> >> > > > >
> >> > > > > the latest is the new Backup feature (HBASE-7912), that runs a
> MR
> >> job
> >> > > > from
> >> > > > > the master to copy data or restore data.
> >> > > > > (backup is also "not really core" as in.. if you don't use
> backup
> >> > > you'll
> >> > > > > not end up running MR jobs, but this was probably true for MOB
> as
> >> in
> >> > > "if
> >> > > > > you don't enable MOB you don't need MR")
> >> > > > >
> >> > > > > any thoughts? do we a rule that says "we don't want to have
> hbase
> >> run
> >> > > MR
> >> > > > > jobs, only tool started manually by the user can do that". or
> can
> >> we
> >> > > > start
> >> > > > > adding MR calls around without problems?
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
Forgot WALPlayer :)

-Vlad

On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov 
wrote:

> >> and
> >> backups too, but don't want to bother having to install and configure
> YARN
> >> just for that, as well as removing resources from HBase to give it to
>
> Any suggestions on how to do bulk data move with transformation from/to
> HBase cluster w/o MapReduce?
>
> Opposition to M/R does not make sense imo, as since we have a lot of tools
> in HBase which depend on MapReduce:
>
> CountRows
> CountCells
> Import
> Export
> ImportTsv
> ExportTsv
> CopyTable
> VerifyReplication
> ExportSnapshot
>
> and new backup create/restore of course.
>
>
> -Vlad
>
>
>
>
> On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> My 2¢: I have a strong preference for NOT having a dependency on MR
>> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I
>> like
>> all the features that we built. Would love to be able to use MOBs and
>> backups too, but don't want to bother having to install and configure YARN
>> just for that, as well as removing resources from HBase to give it to
>> yarn
>>
>> JMS
>>
>> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi :
>>
>> > just a remark. my query was not about tools using MR (everyone i think
>> is
>> > ok with those).
>> > the topic was about: "are we ok with running MR jobs from Master and RSs
>> > code?" since this will be the first time we do this
>> >
>> > Matteo
>> >
>> >
>> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
>> wrote:
>> >
>> > > Very much agree; for tools like ExportSnapshot / Backup / Restore,
>> it's
>> > > fine to be dependent on MR. MR is the right framework for such. We
>> should
>> > > also do compactions using MR (just saying :) )
>> > > 
>> > > From: Ted Yu 
>> > > Sent: Thursday, September 22, 2016 2:00 PM
>> > > To: dev@hbase.apache.org
>> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>> > >
>> > > I agree - backup / restore is in the same category as import / export.
>> > >
>> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
>> > andrew.purt...@gmail.com>
>> > > wrote:
>> > >
>> > > > Backup is extra tooling around core in my opinion. Like import or
>> > export.
>> > > > Or the optional MOB tool. It's fine.
>> > > >
>> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
>> mberto...@apache.org>
>> > > > wrote:
>> > > > >
>> > > > > What's the latest opinion around running MR jobs from hbase
>> (Master
>> > or
>> > > > RS)?
>> > > > >
>> > > > > I remember in the past that there was discussion about not having
>> MR
>> > > has
>> > > > > direct dependency of hbase.
>> > > > >
>> > > > > I think some of discussion where around MOB that had a MR job to
>> > > compact,
>> > > > > that later was transformed in a non-MR job to be merged, I think
>> we
>> > > had a
>> > > > > similar discussion for log split/replay.
>> > > > >
>> > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR
>> job
>> > > > from
>> > > > > the master to copy data or restore data.
>> > > > > (backup is also "not really core" as in.. if you don't use backup
>> > > you'll
>> > > > > not end up running MR jobs, but this was probably true for MOB as
>> in
>> > > "if
>> > > > > you don't enable MOB you don't need MR")
>> > > > >
>> > > > > any thoughts? do we a rule that says "we don't want to have hbase
>> run
>> > > MR
>> > > > > jobs, only tool started manually by the user can do that". or can
>> we
>> > > > start
>> > > > > adding MR calls around without problems?
>> > > >
>> > >
>> >
>>
>
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> and
>> backups too, but don't want to bother having to install and configure
YARN
>> just for that, as well as removing resources from HBase to give it to

Any suggestions on how to do bulk data move with transformation from/to
HBase cluster w/o MapReduce?

Opposition to M/R does not make sense imo, as since we have a lot of tools
in HBase which depend on MapReduce:

CountRows
CountCells
Import
Export
ImportTsv
ExportTsv
CopyTable
VerifyReplication
ExportSnapshot

and new backup create/restore of course.


-Vlad




On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> My 2¢: I have a strong preference for NOT having a dependency on MR
> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I like
> all the features that we built. Would love to be able to use MOBs and
> backups too, but don't want to bother having to install and configure YARN
> just for that, as well as removing resources from HBase to give it to
> yarn
>
> JMS
>
> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi :
>
> > just a remark. my query was not about tools using MR (everyone i think is
> > ok with those).
> > the topic was about: "are we ok with running MR jobs from Master and RSs
> > code?" since this will be the first time we do this
> >
> > Matteo
> >
> >
> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
> wrote:
> >
> > > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
> > > fine to be dependent on MR. MR is the right framework for such. We
> should
> > > also do compactions using MR (just saying :) )
> > > ____________
> > > From: Ted Yu 
> > > Sent: Thursday, September 22, 2016 2:00 PM
> > > To: dev@hbase.apache.org
> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> > >
> > > I agree - backup / restore is in the same category as import / export.
> > >
> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> > andrew.purt...@gmail.com>
> > > wrote:
> > >
> > > > Backup is extra tooling around core in my opinion. Like import or
> > export.
> > > > Or the optional MOB tool. It's fine.
> > > >
> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi  >
> > > > wrote:
> > > > >
> > > > > What's the latest opinion around running MR jobs from hbase (Master
> > or
> > > > RS)?
> > > > >
> > > > > I remember in the past that there was discussion about not having
> MR
> > > has
> > > > > direct dependency of hbase.
> > > > >
> > > > > I think some of discussion where around MOB that had a MR job to
> > > compact,
> > > > > that later was transformed in a non-MR job to be merged, I think we
> > > had a
> > > > > similar discussion for log split/replay.
> > > > >
> > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR
> job
> > > > from
> > > > > the master to copy data or restore data.
> > > > > (backup is also "not really core" as in.. if you don't use backup
> > > you'll
> > > > > not end up running MR jobs, but this was probably true for MOB as
> in
> > > "if
> > > > > you don't enable MOB you don't need MR")
> > > > >
> > > > > any thoughts? do we a rule that says "we don't want to have hbase
> run
> > > MR
> > > > > jobs, only tool started manually by the user can do that". or can
> we
> > > > start
> > > > > adding MR calls around without problems?
> > > >
> > >
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
I think the rationale behind a decision to move code to Master was a
security/access control. Enis will correct me if I am wrong.
Master does not have direct dependecy on mapreduce - only on backup.

-Vlad

On Thu, Sep 22, 2016 at 3:44 PM, Matteo Bertozzi 
wrote:

> just a remark. my query was not about tools using MR (everyone i think is
> ok with those).
> the topic was about: "are we ok with running MR jobs from Master and RSs
> code?" since this will be the first time we do this
>
> Matteo
>
>
> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das  wrote:
>
> > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
> > fine to be dependent on MR. MR is the right framework for such. We should
> > also do compactions using MR (just saying :) )
> > 
> > From: Ted Yu 
> > Sent: Thursday, September 22, 2016 2:00 PM
> > To: dev@hbase.apache.org
> > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >
> > I agree - backup / restore is in the same category as import / export.
> >
> > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> andrew.purt...@gmail.com>
> > wrote:
> >
> > > Backup is extra tooling around core in my opinion. Like import or
> export.
> > > Or the optional MOB tool. It's fine.
> > >
> > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
> > > wrote:
> > > >
> > > > What's the latest opinion around running MR jobs from hbase (Master
> or
> > > RS)?
> > > >
> > > > I remember in the past that there was discussion about not having MR
> > has
> > > > direct dependency of hbase.
> > > >
> > > > I think some of discussion where around MOB that had a MR job to
> > compact,
> > > > that later was transformed in a non-MR job to be merged, I think we
> > had a
> > > > similar discussion for log split/replay.
> > > >
> > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job
> > > from
> > > > the master to copy data or restore data.
> > > > (backup is also "not really core" as in.. if you don't use backup
> > you'll
> > > > not end up running MR jobs, but this was probably true for MOB as in
> > "if
> > > > you don't enable MOB you don't need MR")
> > > >
> > > > any thoughts? do we a rule that says "we don't want to have hbase run
> > MR
> > > > jobs, only tool started manually by the user can do that". or can we
> > > start
> > > > adding MR calls around without problems?
> > >
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Jean-Marc Spaggiari
My 2¢: I have a strong preference for NOT having a dependency on MR
anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I like
all the features that we built. Would love to be able to use MOBs and
backups too, but don't want to bother having to install and configure YARN
just for that, as well as removing resources from HBase to give it to
yarn

JMS

2016-09-22 18:44 GMT-04:00 Matteo Bertozzi :

> just a remark. my query was not about tools using MR (everyone i think is
> ok with those).
> the topic was about: "are we ok with running MR jobs from Master and RSs
> code?" since this will be the first time we do this
>
> Matteo
>
>
> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das  wrote:
>
> > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
> > fine to be dependent on MR. MR is the right framework for such. We should
> > also do compactions using MR (just saying :) )
> > 
> > From: Ted Yu 
> > Sent: Thursday, September 22, 2016 2:00 PM
> > To: dev@hbase.apache.org
> > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >
> > I agree - backup / restore is in the same category as import / export.
> >
> > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
> andrew.purt...@gmail.com>
> > wrote:
> >
> > > Backup is extra tooling around core in my opinion. Like import or
> export.
> > > Or the optional MOB tool. It's fine.
> > >
> > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
> > > wrote:
> > > >
> > > > What's the latest opinion around running MR jobs from hbase (Master
> or
> > > RS)?
> > > >
> > > > I remember in the past that there was discussion about not having MR
> > has
> > > > direct dependency of hbase.
> > > >
> > > > I think some of discussion where around MOB that had a MR job to
> > compact,
> > > > that later was transformed in a non-MR job to be merged, I think we
> > had a
> > > > similar discussion for log split/replay.
> > > >
> > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job
> > > from
> > > > the master to copy data or restore data.
> > > > (backup is also "not really core" as in.. if you don't use backup
> > you'll
> > > > not end up running MR jobs, but this was probably true for MOB as in
> > "if
> > > > you don't enable MOB you don't need MR")
> > > >
> > > > any thoughts? do we a rule that says "we don't want to have hbase run
> > MR
> > > > jobs, only tool started manually by the user can do that". or can we
> > > start
> > > > adding MR calls around without problems?
> > >
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Matteo Bertozzi
just a remark. my query was not about tools using MR (everyone i think is
ok with those).
the topic was about: "are we ok with running MR jobs from Master and RSs
code?" since this will be the first time we do this

Matteo


On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das  wrote:

> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
> fine to be dependent on MR. MR is the right framework for such. We should
> also do compactions using MR (just saying :) )
> 
> From: Ted Yu 
> Sent: Thursday, September 22, 2016 2:00 PM
> To: dev@hbase.apache.org
> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>
> I agree - backup / restore is in the same category as import / export.
>
> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell 
> wrote:
>
> > Backup is extra tooling around core in my opinion. Like import or export.
> > Or the optional MOB tool. It's fine.
> >
> > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
> > wrote:
> > >
> > > What's the latest opinion around running MR jobs from hbase (Master or
> > RS)?
> > >
> > > I remember in the past that there was discussion about not having MR
> has
> > > direct dependency of hbase.
> > >
> > > I think some of discussion where around MOB that had a MR job to
> compact,
> > > that later was transformed in a non-MR job to be merged, I think we
> had a
> > > similar discussion for log split/replay.
> > >
> > > the latest is the new Backup feature (HBASE-7912), that runs a MR job
> > from
> > > the master to copy data or restore data.
> > > (backup is also "not really core" as in.. if you don't use backup
> you'll
> > > not end up running MR jobs, but this was probably true for MOB as in
> "if
> > > you don't enable MOB you don't need MR")
> > >
> > > any thoughts? do we a rule that says "we don't want to have hbase run
> MR
> > > jobs, only tool started manually by the user can do that". or can we
> > start
> > > adding MR calls around without problems?
> >
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Very much agree; for tools like ExportSnapshot / Backup / Restore, it's fine to 
be dependent on MR. MR is the right framework for such. We should also do 
compactions using MR (just saying :) )

From: Ted Yu 
Sent: Thursday, September 22, 2016 2:00 PM
To: dev@hbase.apache.org
Subject: Re: [DISCUSSION] MR jobs started by Master or RS

I agree - backup / restore is in the same category as import / export.

On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell 
wrote:

> Backup is extra tooling around core in my opinion. Like import or export.
> Or the optional MOB tool. It's fine.
>
> > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
> wrote:
> >
> > What's the latest opinion around running MR jobs from hbase (Master or
> RS)?
> >
> > I remember in the past that there was discussion about not having MR has
> > direct dependency of hbase.
> >
> > I think some of discussion where around MOB that had a MR job to compact,
> > that later was transformed in a non-MR job to be merged, I think we had a
> > similar discussion for log split/replay.
> >
> > the latest is the new Backup feature (HBASE-7912), that runs a MR job
> from
> > the master to copy data or restore data.
> > (backup is also "not really core" as in.. if you don't use backup you'll
> > not end up running MR jobs, but this was probably true for MOB as in "if
> > you don't enable MOB you don't need MR")
> >
> > any thoughts? do we a rule that says "we don't want to have hbase run MR
> > jobs, only tool started manually by the user can do that". or can we
> start
> > adding MR calls around without problems?
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Ted Yu
I agree - backup / restore is in the same category as import / export.

On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell 
wrote:

> Backup is extra tooling around core in my opinion. Like import or export.
> Or the optional MOB tool. It's fine.
>
> > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
> wrote:
> >
> > What's the latest opinion around running MR jobs from hbase (Master or
> RS)?
> >
> > I remember in the past that there was discussion about not having MR has
> > direct dependency of hbase.
> >
> > I think some of discussion where around MOB that had a MR job to compact,
> > that later was transformed in a non-MR job to be merged, I think we had a
> > similar discussion for log split/replay.
> >
> > the latest is the new Backup feature (HBASE-7912), that runs a MR job
> from
> > the master to copy data or restore data.
> > (backup is also "not really core" as in.. if you don't use backup you'll
> > not end up running MR jobs, but this was probably true for MOB as in "if
> > you don't enable MOB you don't need MR")
> >
> > any thoughts? do we a rule that says "we don't want to have hbase run MR
> > jobs, only tool started manually by the user can do that". or can we
> start
> > adding MR calls around without problems?
>


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
Backup is extra tooling around core in my opinion. Like import or export. Or 
the optional MOB tool. It's fine. 

> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi  wrote:
> 
> What's the latest opinion around running MR jobs from hbase (Master or RS)?
> 
> I remember in the past that there was discussion about not having MR has
> direct dependency of hbase.
> 
> I think some of discussion where around MOB that had a MR job to compact,
> that later was transformed in a non-MR job to be merged, I think we had a
> similar discussion for log split/replay.
> 
> the latest is the new Backup feature (HBASE-7912), that runs a MR job from
> the master to copy data or restore data.
> (backup is also "not really core" as in.. if you don't use backup you'll
> not end up running MR jobs, but this was probably true for MOB as in "if
> you don't enable MOB you don't need MR")
> 
> any thoughts? do we a rule that says "we don't want to have hbase run MR
> jobs, only tool started manually by the user can do that". or can we start
> adding MR calls around without problems?


Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
I would be -1 a requirement for MR for something core to HBase. 

> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi  wrote:
> 
> What's the latest opinion around running MR jobs from hbase (Master or RS)?
> 
> I remember in the past that there was discussion about not having MR has
> direct dependency of hbase.
> 
> I think some of discussion where around MOB that had a MR job to compact,
> that later was transformed in a non-MR job to be merged, I think we had a
> similar discussion for log split/replay.
> 
> the latest is the new Backup feature (HBASE-7912), that runs a MR job from
> the master to copy data or restore data.
> (backup is also "not really core" as in.. if you don't use backup you'll
> not end up running MR jobs, but this was probably true for MOB as in "if
> you don't enable MOB you don't need MR")
> 
> any thoughts? do we a rule that says "we don't want to have hbase run MR
> jobs, only tool started manually by the user can do that". or can we start
> adding MR calls around without problems?


[DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Matteo Bertozzi
What's the latest opinion around running MR jobs from hbase (Master or RS)?

I remember in the past that there was discussion about not having MR has
direct dependency of hbase.

I think some of discussion where around MOB that had a MR job to compact,
that later was transformed in a non-MR job to be merged, I think we had a
similar discussion for log split/replay.

the latest is the new Backup feature (HBASE-7912), that runs a MR job from
the master to copy data or restore data.
(backup is also "not really core" as in.. if you don't use backup you'll
not end up running MR jobs, but this was probably true for MOB as in "if
you don't enable MOB you don't need MR")

any thoughts? do we a rule that says "we don't want to have hbase run MR
jobs, only tool started manually by the user can do that". or can we start
adding MR calls around without problems?