Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
I ran backup test suite on Linux. They passed - took 28 minutes. > On Oct 5, 2016, at 3:18 PM, Devaraj Das wrote: > > If tests pass with the patch (which I believe they are), let's commit the > patch. Follow it up with an updated mega patch for review... > > > From: Ted Yu > Sent: Tuesday, October 04, 2016 6:28 PM > To: dev@hbase.apache.org > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started > by Master or RS) > > Refactoring work over in HBASE-16727 is ready for review. > > Kindly provide your feedback. > > Thanks > >> On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell wrote: >> >> This sounds good to me. >> I'd be at least +0 as to merging the branch as long as we are not 'shelling >> out' to MR from master. >> >>> All or most of the Backup/Restore operations (especially the MR job >> spawns) should be moved to the client. >> >> We have a home grown backup solution at Salesforce that to a first order of >> approximation is this. I would like to see something like this merged. >> >>> In the future, if someone needs to support self-service operations (any >> user can take a backup/restore his/her tables), we can discuss the "backup >> service" or something else. >> >> I can't commit the time of the team here (smile), but we always strive to >> minimize the amount of local code we need to manage HBase. For example, we >> use VerifyReplication and other tools that ship with HBase, and we have >> contributed minor operational improvements as we've developed them (like >> the region mover and canary stuff). I suspect we will have some adoption of >> this tooling and further refinement insofar it fits into a backup workflow >> at 30kft view using snapshots, replication (or file shipping), and WAL >> replay. >> >> >>> On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das wrote: >>> >>> Vlad, thinking about it a little more, since the master is not >>> orchestrating the backup, let's make it dead simple as a first pass. I >>> think we should do the following: All or most of the Backup/Restore >>> operations (especially the MR job spawns) should be moved to the client. >>> Ignore security for the moment - let's live with what we have as the >>> current "limitation" for tools that need HDFS access - they need to run >> as >>> hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs >> to >>> be handled as well as much as possible - if the client fails after >>> initiating the backup/restore, who restores consistency in the >> hbase:backup >>> table, or cleans up the half copied data in the hdfs dirs, etc. >>> In the future, if someone needs to support self-service operations (any >>> user can take a backup/restore his/her tables), we can discuss the >> "backup >>> service" or something else. >>> Folks - Stack / Andrew / Matteo / others, please speak up if you disagree >>> with the above. Would like to get over this merge-to-master hump >> obviously. >>> >>> >>> From: Vladimir Rodionov >>> Sent: Monday, September 26, 2016 11:48 AM >>> To: dev@hbase.apache.org >>> Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs >>> started by Master or RS) >>> >>> Ok, we had internal discussion and this is what we are suggesting now: >>> >>> 1. We will create separate module (hbase-backup) and move server-side >> code >>> there. >>> 2. Master and RS will be MR and backup free. >>> 3. The code from Master will be moved into standalone service >>> (BackupService) for procedure orchestration, >>> operation resume/abort and SECURITY. It means - one additional >>> (process) similar to REST/Thrift server will be required >>>to operate backup. >>> >>> I would like to note that separate process running under hbase super user >>> is required to implement security properly in a multi-tenant environment, >>> otherwise, only hbase super user will be allowed to operate backups >>> >>> Please let us know, what do you think, HBase people :? >>> >>> -Vlad >>> >>> >>> >>>> On Sat, Sep 24, 2016 at 2:49 PM, Stack wrote: >>>> >>>> On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < >>> andrew.purt...@gmail.c
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
If tests pass with the patch (which I believe they are), let's commit the patch. Follow it up with an updated mega patch for review... From: Ted Yu Sent: Tuesday, October 04, 2016 6:28 PM To: dev@hbase.apache.org Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS) Refactoring work over in HBASE-16727 is ready for review. Kindly provide your feedback. Thanks On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell wrote: > This sounds good to me. > I'd be at least +0 as to merging the branch as long as we are not 'shelling > out' to MR from master. > > > All or most of the Backup/Restore operations (especially the MR job > spawns) should be moved to the client. > > We have a home grown backup solution at Salesforce that to a first order of > approximation is this. I would like to see something like this merged. > > > In the future, if someone needs to support self-service operations (any > user can take a backup/restore his/her tables), we can discuss the "backup > service" or something else. > > I can't commit the time of the team here (smile), but we always strive to > minimize the amount of local code we need to manage HBase. For example, we > use VerifyReplication and other tools that ship with HBase, and we have > contributed minor operational improvements as we've developed them (like > the region mover and canary stuff). I suspect we will have some adoption of > this tooling and further refinement insofar it fits into a backup workflow > at 30kft view using snapshots, replication (or file shipping), and WAL > replay. > > > On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das wrote: > > > Vlad, thinking about it a little more, since the master is not > > orchestrating the backup, let's make it dead simple as a first pass. I > > think we should do the following: All or most of the Backup/Restore > > operations (especially the MR job spawns) should be moved to the client. > > Ignore security for the moment - let's live with what we have as the > > current "limitation" for tools that need HDFS access - they need to run > as > > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs > to > > be handled as well as much as possible - if the client fails after > > initiating the backup/restore, who restores consistency in the > hbase:backup > > table, or cleans up the half copied data in the hdfs dirs, etc. > > In the future, if someone needs to support self-service operations (any > > user can take a backup/restore his/her tables), we can discuss the > "backup > > service" or something else. > > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree > > with the above. Would like to get over this merge-to-master hump > obviously. > > > > > > From: Vladimir Rodionov > > Sent: Monday, September 26, 2016 11:48 AM > > To: dev@hbase.apache.org > > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs > > started by Master or RS) > > > > Ok, we had internal discussion and this is what we are suggesting now: > > > > 1. We will create separate module (hbase-backup) and move server-side > code > > there. > > 2. Master and RS will be MR and backup free. > > 3. The code from Master will be moved into standalone service > > (BackupService) for procedure orchestration, > > operation resume/abort and SECURITY. It means - one additional > > (process) similar to REST/Thrift server will be required > > to operate backup. > > > > I would like to note that separate process running under hbase super user > > is required to implement security properly in a multi-tenant environment, > > otherwise, only hbase super user will be allowed to operate backups > > > > Please let us know, what do you think, HBase people :? > > > > -Vlad > > > > > > > > On Sat, Sep 24, 2016 at 2:49 PM, Stack wrote: > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > > andrew.purt...@gmail.com> > > > wrote: > > > > > > > At branch merge voting time now more eyes are getting on the design > > > issues > > > > with dissenting opinion emerging. This is the branch merge process > > > working > > > > as our community has designed it. Because this is the first full > > project > > > > review of the code and implementation I think we all have to be > > > flexible. I > > > > see the community
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Refactoring work over in HBASE-16727 is ready for review. Kindly provide your feedback. Thanks On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell wrote: > This sounds good to me. > I'd be at least +0 as to merging the branch as long as we are not 'shelling > out' to MR from master. > > > All or most of the Backup/Restore operations (especially the MR job > spawns) should be moved to the client. > > We have a home grown backup solution at Salesforce that to a first order of > approximation is this. I would like to see something like this merged. > > > In the future, if someone needs to support self-service operations (any > user can take a backup/restore his/her tables), we can discuss the "backup > service" or something else. > > I can't commit the time of the team here (smile), but we always strive to > minimize the amount of local code we need to manage HBase. For example, we > use VerifyReplication and other tools that ship with HBase, and we have > contributed minor operational improvements as we've developed them (like > the region mover and canary stuff). I suspect we will have some adoption of > this tooling and further refinement insofar it fits into a backup workflow > at 30kft view using snapshots, replication (or file shipping), and WAL > replay. > > > On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das wrote: > > > Vlad, thinking about it a little more, since the master is not > > orchestrating the backup, let's make it dead simple as a first pass. I > > think we should do the following: All or most of the Backup/Restore > > operations (especially the MR job spawns) should be moved to the client. > > Ignore security for the moment - let's live with what we have as the > > current "limitation" for tools that need HDFS access - they need to run > as > > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs > to > > be handled as well as much as possible - if the client fails after > > initiating the backup/restore, who restores consistency in the > hbase:backup > > table, or cleans up the half copied data in the hdfs dirs, etc. > > In the future, if someone needs to support self-service operations (any > > user can take a backup/restore his/her tables), we can discuss the > "backup > > service" or something else. > > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree > > with the above. Would like to get over this merge-to-master hump > obviously. > > > > > > From: Vladimir Rodionov > > Sent: Monday, September 26, 2016 11:48 AM > > To: dev@hbase.apache.org > > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs > > started by Master or RS) > > > > Ok, we had internal discussion and this is what we are suggesting now: > > > > 1. We will create separate module (hbase-backup) and move server-side > code > > there. > > 2. Master and RS will be MR and backup free. > > 3. The code from Master will be moved into standalone service > > (BackupService) for procedure orchestration, > > operation resume/abort and SECURITY. It means - one additional > > (process) similar to REST/Thrift server will be required > > to operate backup. > > > > I would like to note that separate process running under hbase super user > > is required to implement security properly in a multi-tenant environment, > > otherwise, only hbase super user will be allowed to operate backups > > > > Please let us know, what do you think, HBase people :? > > > > -Vlad > > > > > > > > On Sat, Sep 24, 2016 at 2:49 PM, Stack wrote: > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > > andrew.purt...@gmail.com> > > > wrote: > > > > > > > At branch merge voting time now more eyes are getting on the design > > > issues > > > > with dissenting opinion emerging. This is the branch merge process > > > working > > > > as our community has designed it. Because this is the first full > > project > > > > review of the code and implementation I think we all have to be > > > flexible. I > > > > see the community as trying to narrow the technical objection at > issue > > to > > > > the smallest possible scope. It's simple: don't call out to an > external > > > > execution framework we don't own from core master (and by extension > > > > regionserver) code. We had this objection before to a proposed > external &g
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
This sounds good to me. I'd be at least +0 as to merging the branch as long as we are not 'shelling out' to MR from master. > All or most of the Backup/Restore operations (especially the MR job spawns) should be moved to the client. We have a home grown backup solution at Salesforce that to a first order of approximation is this. I would like to see something like this merged. > In the future, if someone needs to support self-service operations (any user can take a backup/restore his/her tables), we can discuss the "backup service" or something else. I can't commit the time of the team here (smile), but we always strive to minimize the amount of local code we need to manage HBase. For example, we use VerifyReplication and other tools that ship with HBase, and we have contributed minor operational improvements as we've developed them (like the region mover and canary stuff). I suspect we will have some adoption of this tooling and further refinement insofar it fits into a backup workflow at 30kft view using snapshots, replication (or file shipping), and WAL replay. On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das wrote: > Vlad, thinking about it a little more, since the master is not > orchestrating the backup, let's make it dead simple as a first pass. I > think we should do the following: All or most of the Backup/Restore > operations (especially the MR job spawns) should be moved to the client. > Ignore security for the moment - let's live with what we have as the > current "limitation" for tools that need HDFS access - they need to run as > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs to > be handled as well as much as possible - if the client fails after > initiating the backup/restore, who restores consistency in the hbase:backup > table, or cleans up the half copied data in the hdfs dirs, etc. > In the future, if someone needs to support self-service operations (any > user can take a backup/restore his/her tables), we can discuss the "backup > service" or something else. > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree > with the above. Would like to get over this merge-to-master hump obviously. > > > From: Vladimir Rodionov > Sent: Monday, September 26, 2016 11:48 AM > To: dev@hbase.apache.org > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs > started by Master or RS) > > Ok, we had internal discussion and this is what we are suggesting now: > > 1. We will create separate module (hbase-backup) and move server-side code > there. > 2. Master and RS will be MR and backup free. > 3. The code from Master will be moved into standalone service > (BackupService) for procedure orchestration, > operation resume/abort and SECURITY. It means - one additional > (process) similar to REST/Thrift server will be required > to operate backup. > > I would like to note that separate process running under hbase super user > is required to implement security properly in a multi-tenant environment, > otherwise, only hbase super user will be allowed to operate backups > > Please let us know, what do you think, HBase people :? > > -Vlad > > > > On Sat, Sep 24, 2016 at 2:49 PM, Stack wrote: > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > andrew.purt...@gmail.com> > > wrote: > > > > > At branch merge voting time now more eyes are getting on the design > > issues > > > with dissenting opinion emerging. This is the branch merge process > > working > > > as our community has designed it. Because this is the first full > project > > > review of the code and implementation I think we all have to be > > flexible. I > > > see the community as trying to narrow the technical objection at issue > to > > > the smallest possible scope. It's simple: don't call out to an external > > > execution framework we don't own from core master (and by extension > > > regionserver) code. We had this objection before to a proposed external > > > compaction implementation for > > > MOB so should not come as a surprise. Please let me know if I have > > > misstated this. > > > > > > > > The above is my understanding also. > > > > > > > This would seem to require a modest refactor of coordination to move > > > invocation of MR code out from any core code path. To restate what I > > think > > > is an emerging recommendation: Move cross HBase and MR coordination to > a > > > separate tool. This tool can ask the master to invoke procedures on the > > > HBase
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
+1 for the simplified approach. if most of the backup code is on client side, it may be easy to move that to a backup module in case people ask. but for now, I'd say stick with hbase-server if that is easier Matteo On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das wrote: > Vlad, thinking about it a little more, since the master is not > orchestrating the backup, let's make it dead simple as a first pass. I > think we should do the following: All or most of the Backup/Restore > operations (especially the MR job spawns) should be moved to the client. > Ignore security for the moment - let's live with what we have as the > current "limitation" for tools that need HDFS access - they need to run as > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs to > be handled as well as much as possible - if the client fails after > initiating the backup/restore, who restores consistency in the hbase:backup > table, or cleans up the half copied data in the hdfs dirs, etc. > In the future, if someone needs to support self-service operations (any > user can take a backup/restore his/her tables), we can discuss the "backup > service" or something else. > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree > with the above. Would like to get over this merge-to-master hump obviously. > > > From: Vladimir Rodionov > Sent: Monday, September 26, 2016 11:48 AM > To: dev@hbase.apache.org > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs > started by Master or RS) > > Ok, we had internal discussion and this is what we are suggesting now: > > 1. We will create separate module (hbase-backup) and move server-side code > there. > 2. Master and RS will be MR and backup free. > 3. The code from Master will be moved into standalone service > (BackupService) for procedure orchestration, > operation resume/abort and SECURITY. It means - one additional > (process) similar to REST/Thrift server will be required > to operate backup. > > I would like to note that separate process running under hbase super user > is required to implement security properly in a multi-tenant environment, > otherwise, only hbase super user will be allowed to operate backups > > Please let us know, what do you think, HBase people :? > > -Vlad > > > > On Sat, Sep 24, 2016 at 2:49 PM, Stack wrote: > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > andrew.purt...@gmail.com> > > wrote: > > > > > At branch merge voting time now more eyes are getting on the design > > issues > > > with dissenting opinion emerging. This is the branch merge process > > working > > > as our community has designed it. Because this is the first full > project > > > review of the code and implementation I think we all have to be > > flexible. I > > > see the community as trying to narrow the technical objection at issue > to > > > the smallest possible scope. It's simple: don't call out to an external > > > execution framework we don't own from core master (and by extension > > > regionserver) code. We had this objection before to a proposed external > > > compaction implementation for > > > MOB so should not come as a surprise. Please let me know if I have > > > misstated this. > > > > > > > > The above is my understanding also. > > > > > > > This would seem to require a modest refactor of coordination to move > > > invocation of MR code out from any core code path. To restate what I > > think > > > is an emerging recommendation: Move cross HBase and MR coordination to > a > > > separate tool. This tool can ask the master to invoke procedures on the > > > HBase side that do first mile export and last mile restore. (Internally > > the > > > tool can also use the procedure framework for state durability, > perhaps, > > > just a thought.) Then the tool can further drive the things done with > MR > > > like shipping data off cluster or moving remote data in place and > > preparing > > > it for import. These activities do not need procedure coordination and > > > involvement of the HBase master. Only the first and last mile of the > > > process needs atomicity within the HBase deploy. Please let me know if > I > > > have misstated this. > > > > > > > > > Above is my understanding of our recommendation. > > > > St.Ack > > > > > > > > > > On Sep 24, 2016, at 8:17 AM, Ted Yu
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Vlad, thinking about it a little more, since the master is not orchestrating the backup, let's make it dead simple as a first pass. I think we should do the following: All or most of the Backup/Restore operations (especially the MR job spawns) should be moved to the client. Ignore security for the moment - let's live with what we have as the current "limitation" for tools that need HDFS access - they need to run as hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs to be handled as well as much as possible - if the client fails after initiating the backup/restore, who restores consistency in the hbase:backup table, or cleans up the half copied data in the hdfs dirs, etc. In the future, if someone needs to support self-service operations (any user can take a backup/restore his/her tables), we can discuss the "backup service" or something else. Folks - Stack / Andrew / Matteo / others, please speak up if you disagree with the above. Would like to get over this merge-to-master hump obviously. From: Vladimir Rodionov Sent: Monday, September 26, 2016 11:48 AM To: dev@hbase.apache.org Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS) Ok, we had internal discussion and this is what we are suggesting now: 1. We will create separate module (hbase-backup) and move server-side code there. 2. Master and RS will be MR and backup free. 3. The code from Master will be moved into standalone service (BackupService) for procedure orchestration, operation resume/abort and SECURITY. It means - one additional (process) similar to REST/Thrift server will be required to operate backup. I would like to note that separate process running under hbase super user is required to implement security properly in a multi-tenant environment, otherwise, only hbase super user will be allowed to operate backups Please let us know, what do you think, HBase people :? -Vlad On Sat, Sep 24, 2016 at 2:49 PM, Stack wrote: > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell > wrote: > > > At branch merge voting time now more eyes are getting on the design > issues > > with dissenting opinion emerging. This is the branch merge process > working > > as our community has designed it. Because this is the first full project > > review of the code and implementation I think we all have to be > flexible. I > > see the community as trying to narrow the technical objection at issue to > > the smallest possible scope. It's simple: don't call out to an external > > execution framework we don't own from core master (and by extension > > regionserver) code. We had this objection before to a proposed external > > compaction implementation for > > MOB so should not come as a surprise. Please let me know if I have > > misstated this. > > > > > The above is my understanding also. > > > > This would seem to require a modest refactor of coordination to move > > invocation of MR code out from any core code path. To restate what I > think > > is an emerging recommendation: Move cross HBase and MR coordination to a > > separate tool. This tool can ask the master to invoke procedures on the > > HBase side that do first mile export and last mile restore. (Internally > the > > tool can also use the procedure framework for state durability, perhaps, > > just a thought.) Then the tool can further drive the things done with MR > > like shipping data off cluster or moving remote data in place and > preparing > > it for import. These activities do not need procedure coordination and > > involvement of the HBase master. Only the first and last mile of the > > process needs atomicity within the HBase deploy. Please let me know if I > > have misstated this. > > > > > > Above is my understanding of our recommendation. > > St.Ack > > > > > > On Sep 24, 2016, at 8:17 AM, Ted Yu wrote: > > > > > > bq. procedure gives you a retry mechanism on failure > > > > > > We do need this mechanism. Take a look at the multi-step > > > in FullTableBackupProcedure, etc. > > > > > > bq. let the user export it later when he wants > > > > > > This would make supporting security more complex (user A shouldn't be > > > exporting user B's backup). And it is not user friendly - at the time > > > backup request is issued, the following is specified: > > > > > > + + " BACKUP_ROOT The full root path to store the backup > > > image,\n" > > > + + " the prefix can be hdfs, webhdfs or > gpfs\n" > &
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
>>>>>>>>>> > > >>>>>>>>>>> Matteo > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > >>> yuzhih...@gmail.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> I suggest you look at Matteo's work for > > >> AssignmentManager > > >>>>> which > > >>>>>>> is > > >>>>>>>> to > > >>>>>>>>>>> make > > >>>>>>>>>>>> Master more stable. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Cheers > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > >>> palomino...@gmail.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > > >>>>> sequence > > >>>>>>> of > > >>>>>>>>>> calls > > >>>>>>>>>>>> when > > >>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > >> regionserver > > >>>> so > > >>>>> it > > >>>>>>>>> extends > > >>>>>>>>>>>>> HRegionServer, and the initialization of > > >> HRegionServer > > >>>>>>> sometimes > > >>>>>>>>>> needs > > >>>>>>>>>>> to > > >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > > >> cause > > >>>>>>>>> probabilistic > > >>>>>>>>>>> dead > > >>>>>>>>>>>>> lock or some strange NPEs... > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > > >> add > > >>>> new > > >>>>>>>> features > > >>>>>>>>>> or > > >>>>>>>>>>>> add > > >>>>>>>>>>>>> external dependencies to HMaster, especially add more > > >>>> works > > >>>>>> for > > >>>>>>>> the > > >>>>>>>>>>> start > > >>>>>>>>>>>>> up processing... > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thanks. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > > >> yuzhih...@gmail.com > > >>>> : > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> I read through HADOOP-13433 > > >>>>>>>>>>>>>> <https://issues.apache.org/ > > >> jira/browse/HADOOP-13433> > > >>> - > > >>>>> the > > >>>>>>>> cited > > >>>>>>>>>>> race > > >>>>>>>>>>>>>> condition is in jdk. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > > >>> moving. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > > >>>> problem... > > >>>>>>>>>>>>>> > > >>>>>>>&
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
>>>>>>>>> That's why I'm very nervous when somebody wants to > >> add > >>>> new > >>>>>>>> features > >>>>>>>>>> or > >>>>>>>>>>>> add > >>>>>>>>>>>>> external dependencies to HMaster, especially add more > >>>> works > >>>>>> for > >>>>>>>> the > >>>>>>>>>>> start > >>>>>>>>>>>>> up processing... > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > >> yuzhih...@gmail.com > >>>> : > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I read through HADOOP-13433 > >>>>>>>>>>>>>> <https://issues.apache.org/ > >> jira/browse/HADOOP-13433> > >>> - > >>>>> the > >>>>>>>> cited > >>>>>>>>>>> race > >>>>>>>>>>>>>> condition is in jdk. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > >>> moving. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > >>>> problem... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > >> it > >>> in > >>>>> the > >>>>>>>>> backup > >>>>>>>>>> / > >>>>>>>>>>>>>> restore mega patch ? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > >>>>>> palomino...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If you guys have already implemented the feature > >> in > >>>> the > >>>>>> MR > >>>>>>>> way > >>>>>>>>>> and > >>>>>>>>>>>> the > >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > >>> it > >>>>> as I > >>>>>>> do > >>>>>>>>> not > >>>>>>>>>>> want > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> block the development progress. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > >> the > >>>>>> design > >>>>>>>> and > >>>>>>>>>> see > >>>>>>>>>>> if > >>>>>>>>>>>>> we > >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > >>>>> possible. > >>>>>>> HA > >>>>>>>> is > >>>>>>>>>>> not a > >>>>>>>>>>>>> big > >>>>>>>>>>>>>>> problem if you do not store any metada locally. > >> But > >>>> the > >>>>>>> ugly > >>>>>>>>> code > >>>>>>>>>>> in > >>>>>>>>>>>>>>> HMaster is readlly a problem... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> And for security, I have a issue pending for a > >> long > >>>>
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
>>>>>>> > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether > > behind > > > > > >>>> a > > > > > >>>>>> flag > > > > > >>>>>>>> or > > > > > >>>>>>>>>> not > > > > > >>>>>>>>>>> -- > > > > > >>>>>>>>>>>> ever being able to launch MR jobs. > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo > it > > > > > >>>> from > > > > > >>>>>>>>>> hbase-server > > > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be > > its > > > > > >>>>>> peer). > > > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and > > > Appy > > > > > >>>>> are > > > > > >>>>>>>> busy > > > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. > > Lets > > > > > >>>> not > > > > > >>>>>>>> clutter > > > > > >>>>>>>>>>> task > > > > > >>>>>>>>>>>> harder by piling on more moving parts. > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> St.Ack > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > >>>>>>>>>>>>> Matteo > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > > > > >>>>> yuzhih...@gmail.com > > > > > >>>>>>> > > > > > >>>>>>>>> wrote: > > > > > >>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > > > > > >>>> AssignmentManager > > > > > >>>>>>> which > > > > > >>>>>>>>> is > > > > > >>>>>>>>>> to > > > > > >>>>>>>>>>>>> make > > > > > >>>>>>>>>>>>>> Master more stable. > > > > > >>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>> Cheers > > > > > >>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > > > > >>>>> palomino...@gmail.com > > > > > >>>>>>> > > > > > >>>>>>>>> wrote: > > > > > >>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > > > > >>>>>>>>>>>>>>> > > > > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me > the > > > > > >>>>>>> sequence > > > > > >>>>>>>>> of > > > > > >>>>>>>>>>>> calls > > > > > >>>>>>>>>>>>>> when > > > > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > > > > >>>> regionserver > > > > > >>>>>> so > > > > > >>&
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
gt; > > >>>>>>>>>>>>> this question was meant to be generic, and provide some > > > > >>>>> rule > > > > >>>>>>> for > > > > >>>>>>>>>> future > > > > >>>>>>>>>>>>> code. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy > everyone > > > > >>>>> can > > > > >>>>>>> be: > > > > >>>>>>>>>>>>> - we don't want any core feature (e.g. > > > > >>>>>>> compaction/log-split/log- > > > > >>>>>>>>>>> reply) > > > > >>>>>>>>>>>>> over MR, because some cluster may not want or may have > an > > > > >>>>>>>>>>>>> external/uncontrolled MR setup. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> +1 > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by > a > > > > >>>>>> flag) > > > > >>>>>>>> to > > > > >>>>>>>>>> run > > > > >>>>>>>>>>> MR > > > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > > > > >>>> is > > > > >>>>>> not > > > > >>>>>>>>>>> required. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether > behind > > > > >>>> a > > > > >>>>>> flag > > > > >>>>>>>> or > > > > >>>>>>>>>> not > > > > >>>>>>>>>>> -- > > > > >>>>>>>>>>>> ever being able to launch MR jobs. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > > > > >>>> from > > > > >>>>>>>>>> hbase-server > > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be > its > > > > >>>>>> peer). > > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and > > Appy > > > > >>>>> are > > > > >>>>>>>> busy > > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. > Lets > > > > >>>> not > > > > >>>>>>>> clutter > > > > >>>>>>>>>>> task > > > > >>>>>>>>>>>> harder by piling on more moving parts. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> St.Ack > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> Matteo > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > > > >>>>> yuzhih...@gmail.com > > > > >>>>>>> > > > > >>>>>>>>> wrote: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> I suggest you look at Mat
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
>>>>>>>>> + MR is dead. We should be busy working hard to undo it > > > >>>> from > > > >>>>>>>>>> hbase-server > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > > > >>>>>> peer). > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and > Appy > > > >>>>> are > > > >>>>>>>> busy > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > > > >>>> not > > > >>>>>>>> clutter > > > >>>>>>>>>>> task > > > >>>>>>>>>>>> harder by piling on more moving parts. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> St.Ack > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Matteo > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > > >>>>> yuzhih...@gmail.com > > > >>>>>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > > > >>>> AssignmentManager > > > >>>>>>> which > > > >>>>>>>>> is > > > >>>>>>>>>> to > > > >>>>>>>>>>>>> make > > > >>>>>>>>>>>>>> Master more stable. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Cheers > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > > >>>>> palomino...@gmail.com > > > >>>>>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > > > >>>>>>> sequence > > > >>>>>>>>> of > > > >>>>>>>>>>>> calls > > > >>>>>>>>>>>>>> when > > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > > > >>>> regionserver > > > >>>>>> so > > > >>>>>>> it > > > >>>>>>>>>>> extends > > > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > > > >>>> HRegionServer > > > >>>>>>>>> sometimes > > > >>>>>>>>>>>> needs > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > > > >>>> cause > > > >>>>>>>>>>> probabilistic > > > >>>>>>>>>>>>> dead > > > >>>>>>>>>>>>>>> lock or some strange NPEs... > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > > > >>>> add > > > >>>>>> new > > > >>>>>>>>>> features > > > >>>>>>>>>>>> or > >
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
;>>>>>>>>>>>> >>>>>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase >>>>>>> (+1 >>>>>>>>> on >>>>>>>>>>>> adding >>>>>>>>>>>>> all the API any such external tool might need to run). >>>>>>>>>>>>> >>>>>>>>>>>>> St.Ack >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> -Vlad >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack >>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < >>>>>>>>>>>>>> theo.berto...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> let me try to go back to my original topic. >>>>>>>>>>>>>>>> this question was meant to be generic, and provide some >>>>>>>> rule >>>>>>>>>> for >>>>>>>>>>>>> future >>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone >>>>>>>> can >>>>>>>>>> be: >>>>>>>>>>>>>>>> - we don't want any core feature (e.g. >>>>>>>>>> compaction/log-split/log- >>>>>>>>>>>>>> reply) >>>>>>>>>>>>>>>> over MR, because some cluster may not want or may have an >>>>>>>>>>>>>>>> external/uncontrolled MR setup. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a >>>>>>>>> flag) >>>>>>>>>>> to >>>>>>>>>>>>> run >>>>>>>>>>>>>> MR >>>>>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR >>>>>>> is >>>>>>>>> not >>>>>>>>>>>>>> required. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind >>>>>>> a >>>>>>>>> flag >>>>>>>>>>> or >>>>>>>>>>>>> not >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> ever being able to launch MR jobs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it >>>>>>> from >>>>>>>>>>>>> hbase-server >>>>>>>>>>>>>>> moving it out to be an optional module (Spark would be its >>>>>>>>> peer). >>>>>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy >>>>>>>> are >>>>>>>>>>> busy >>>>>>>>>>>>>>> wor
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
gt;>>>>>>>>> over MR, because some cluster may not want or may have an >>>>>>>>>>>>>>> external/uncontrolled MR setup. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> +1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a >>>>>>>> flag) >>>>>>>>>> to >>>>>>>>>>>> run >>>>>>>>>>>>> MR >>>>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR >>>>>> is >>>>>>>> not >>>>>>>>>>>>> required. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind >>>>>> a >>>>>>>> flag >>>>>>>>>> or >>>>>>>>>>>> not >>>>>>>>>>>>> -- >>>>>>>>>>>>>> ever being able to launch MR jobs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it >>>>>> from >>>>>>>>>>>> hbase-server >>>>>>>>>>>>>> moving it out to be an optional module (Spark would be its >>>>>>>> peer). >>>>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy >>>>>>> are >>>>>>>>>> busy >>>>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets >>>>>> not >>>>>>>>>> clutter >>>>>>>>>>>>> task >>>>>>>>>>>>>> harder by piling on more moving parts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> St.Ack >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Matteo >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < >>>>>>> yuzhih...@gmail.com >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I suggest you look at Matteo's work for >>>>>> AssignmentManager >>>>>>>>> which >>>>>>>>>>> is >>>>>>>>>>>> to >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> Master more stable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < >>>>>>> palomino...@gmail.com >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the >>>>>>>>> sequence >>>>>>>>>>> of >>>>>>>>>>>>>> calls >>>>>>>>>>>>>>>> when >>>>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a >>>>>> regionserver >>>>>>>> so >>>>>>>>> it >
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
gt;>>> rule > > >>>>>>> for > > >>>>>>>>>> future > > >>>>>>>>>>>>> code. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone > > >>>>> can > > >>>>>>> be: > > >>>>>>>>>>>>> - we don't want any core feature (e.g. > > >>>>>>> compaction/log-split/log- > > >>>>>>>>>>> reply) > > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an > > >>>>>>>>>>>>> external/uncontrolled MR setup. > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> +1 > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a > > >>>>>> flag) > > >>>>>>>> to > > >>>>>>>>>> run > > >>>>>>>>>>> MR > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > > >>>> is > > >>>>>> not > > >>>>>>>>>>> required. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind > > >>>> a > > >>>>>> flag > > >>>>>>>> or > > >>>>>>>>>> not > > >>>>>>>>>>> -- > > >>>>>>>>>>>> ever being able to launch MR jobs. > > >>>>>>>>>>>> > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > > >>>> from > > >>>>>>>>>> hbase-server > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > > >>>>>> peer). > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy > > >>>>> are > > >>>>>>>> busy > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > > >>>> not > > >>>>>>>> clutter > > >>>>>>>>>>> task > > >>>>>>>>>>>> harder by piling on more moving parts. > > >>>>>>>>>>>> > > >>>>>>>>>>>> St.Ack > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Matteo > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > > >>>>> yuzhih...@gmail.com > > >>>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > > >>>> AssignmentManager > > >>>>>>> which > > >>>>>>>>> is > > >>>>>>>>>> to > > >>>>>>>>>>>>> make > > >>>>>>>>>>>>>> Master more stable. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > > >>>>> palomino...@gmail.com > > >>>>>>> > > >>>>>>>>> wr
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
e generic, and provide some >> >>>>> rule >> >>>>>>> for >> >>>>>>>>>> future >> >>>>>>>>>>>>> code. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone >> >>>>> can >> >>>>>>> be: >> >>>>>>>>>>>>> - we don't want any core feature (e.g. >> >>>>>>> compaction/log-split/log- >> >>>>>>>>>>> reply) >> >>>>>>>>>>>>> over MR, because some cluster may not want or may have an >> >>>>>>>>>>>>> external/uncontrolled MR setup. >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> +1 >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a >> >>>>>> flag) >> >>>>>>>> to >> >>>>>>>>>> run >> >>>>>>>>>>> MR >> >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR >> >>>> is >> >>>>>> not >> >>>>>>>>>>> required. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind >> >>>> a >> >>>>>> flag >> >>>>>>>> or >> >>>>>>>>>> not >> >>>>>>>>>>> -- >> >>>>>>>>>>>> ever being able to launch MR jobs. >> >>>>>>>>>>>> >> >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it >> >>>> from >> >>>>>>>>>> hbase-server >> >>>>>>>>>>>> moving it out to be an optional module (Spark would be its >> >>>>>> peer). >> >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy >> >>>>> are >> >>>>>>>> busy >> >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets >> >>>> not >> >>>>>>>> clutter >> >>>>>>>>>>> task >> >>>>>>>>>>>> harder by piling on more moving parts. >> >>>>>>>>>>>> >> >>>>>>>>>>>> St.Ack >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Matteo >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < >> >>>>> yuzhih...@gmail.com >> >>>>>>> >> >>>>>>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> I suggest you look at Matteo's work for >> >>>> AssignmentManager >> >>>>>>> which >> >>>>>>>>> is >> >>>>>>>>>> to >> >>>>>>>>>>>>> make >> >>>>>>>>>>>>>> Master more stable. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Cheers >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < >> >>>>> palomino...@gmail.com >> >>>>>>> >> >>>>>>>>> wrot
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
>>>>>>>>>>> ever being able to launch MR jobs. > >>>>>>>>>>>> > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > >>>> from > >>>>>>>>>> hbase-server > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > >>>>>> peer). > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy > >>>>> are > >>>>>>>> busy > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > >>>> not > >>>>>>>> clutter > >>>>>>>>>>> task > >>>>>>>>>>>> harder by piling on more moving parts. > >>>>>>>>>>>> > >>>>>>>>>>>> St.Ack > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Matteo > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > >>>>> yuzhih...@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > >>>> AssignmentManager > >>>>>>> which > >>>>>>>>> is > >>>>>>>>>> to > >>>>>>>>>>>>> make > >>>>>>>>>>>>>> Master more stable. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > >>>>> palomino...@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > >>>>>>> sequence > >>>>>>>>> of > >>>>>>>>>>>> calls > >>>>>>>>>>>>>> when > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > >>>> regionserver > >>>>>> so > >>>>>>> it > >>>>>>>>>>> extends > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > >>>> HRegionServer > >>>>>>>>> sometimes > >>>>>>>>>>>> needs > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > >>>> cause > >>>>>>>>>>> probabilistic > >>>>>>>>>>>>> dead > >>>>>>>>>>>>>>> lock or some strange NPEs... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > >>>> add > >>>>>> new > >>>>>>>>>> features > >>>>>>>>>>>> or > >>>>>>>>>>>>>> add > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more > >>>>>> works > >>>>>>>> for > >>>>>>>>>> the > >>>>>>>>>>>>> start > >>>>>>>>>>>>>>> up processing... > >>>>>>>>>>>>>>> > >>>>>
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
>>>>>>>>> -- > >>>>>>>>>>>> ever being able to launch MR jobs. > >>>>>>>>>>>> > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > >>>> from > >>>>>>>>>> hbase-server > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > >>>>>> peer). > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy > >>>>> are > >>>>>>>> busy > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > >>>> not > >>>>>>>> clutter > >>>>>>>>>>> task > >>>>>>>>>>>> harder by piling on more moving parts. > >>>>>>>>>>>> > >>>>>>>>>>>> St.Ack > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Matteo > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > >>>>> yuzhih...@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > >>>> AssignmentManager > >>>>>>> which > >>>>>>>>> is > >>>>>>>>>> to > >>>>>>>>>>>>> make > >>>>>>>>>>>>>> Master more stable. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < > >>>>> palomino...@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > >>>>>>> sequence > >>>>>>>>> of > >>>>>>>>>>>> calls > >>>>>>>>>>>>>> when > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > >>>> regionserver > >>>>>> so > >>>>>>> it > >>>>>>>>>>> extends > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > >>>> HRegionServer > >>>>>>>>> sometimes > >>>>>>>>>>>> needs > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > >>>> cause > >>>>>>>>>>> probabilistic > >>>>>>>>>>>>> dead > >>>>>>>>>>>>>>> lock or some strange NPEs... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > >>>> add > >>>>>> new > >>>>>>>>>> features > >>>>>>>>>>>> or > >>>>>>>>>>>>>> add > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more > >>>>>> works > >>>>>>>> for > >>>>>>>>>> the > >>>>>>>>>>>>> start > >>>>>>>>>>>>>>> up processing... > >>>>>>>>>>>>>>>
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
t;>>>>> >>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < >>>>> palomino...@gmail.com >>>>>>> >>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the >>>>>>> sequence >>>>>>>>> of >>>>>>>>>>>> calls >>>>>>>>>>>>>> when >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a >>>> regionserver >>>>>> so >>>>>>> it >>>>>>>>>>> extends >>>>>>>>>>>>>>> HRegionServer, and the initialization of >>>> HRegionServer >>>>>>>>> sometimes >>>>>>>>>>>> needs >>>>>>>>>>>>> to >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would >>>> cause >>>>>>>>>>> probabilistic >>>>>>>>>>>>> dead >>>>>>>>>>>>>>> lock or some strange NPEs... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to >>>> add >>>>>> new >>>>>>>>>> features >>>>>>>>>>>> or >>>>>>>>>>>>>> add >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more >>>>>> works >>>>>>>> for >>>>>>>>>> the >>>>>>>>>>>>> start >>>>>>>>>>>>>>> up processing... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < >>>> yuzhih...@gmail.com >>>>>> : >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I read through HADOOP-13433 >>>>>>>>>>>>>>>> <https://issues.apache.org/ >>>> jira/browse/HADOOP-13433> >>>>> - >>>>>>> the >>>>>>>>>> cited >>>>>>>>>>>>> race >>>>>>>>>>>>>>>> condition is in jdk. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it >>>>> moving. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a >>>>>> problem... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is >>>> it >>>>> in >>>>>>> the >>>>>>>>>>> backup >>>>>>>>>>>> / >>>>>>>>>>>>>>>> restore mega patch ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < >>>>>>>> palomino...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If you guys have already implemented the feature >>>> in >>>>>> the >>>>>>>> MR &
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
> >>>>>>>>>>>>> lock or some strange NPEs... > >>>>>>>>>>>>> > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > >> add > >>>> new > >>>>>>>> features > >>>>>>>>>> or > >>>>>>>>>>>> add > >>>>>>>>>>>>> external dependencies to HMaster, especially add more > >>>> works > >>>>>> for > >>>>>>>> the > >>>>>>>>>>> start > >>>>>>>>>>>>> up processing... > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > >> yuzhih...@gmail.com > >>>> : > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I read through HADOOP-13433 > >>>>>>>>>>>>>> <https://issues.apache.org/ > >> jira/browse/HADOOP-13433> > >>> - > >>>>> the > >>>>>>>> cited > >>>>>>>>>>> race > >>>>>>>>>>>>>> condition is in jdk. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > >>> moving. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > >>>> problem... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > >> it > >>> in > >>>>> the > >>>>>>>>> backup > >>>>>>>>>> / > >>>>>>>>>>>>>> restore mega patch ? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > >>>>>> palomino...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If you guys have already implemented the feature > >> in > >>>> the > >>>>>> MR > >>>>>>>> way > >>>>>>>>>> and > >>>>>>>>>>>> the > >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > >>> it > >>>>> as I > >>>>>>> do > >>>>>>>>> not > >>>>>>>>>>> want > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> block the development progress. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > >> the > >>>>>> design > >>>>>>>> and > >>>>>>>>>> see > >>>>>>>>>>> if > >>>>>>>>>>>>> we > >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > >>>>> possible. > >>>>>>> HA > >>>>>>>> is > >>>>>>>>>>> not a > >>>>>>>>>>>>> big > >>>>>>>>>>>>>>> problem if you do not store any metada locally. > >> But > >>>> the > >>>>>>> ugly > >>>>>>>>> code > >>>>>>>>>>> in > >>>>>>>>>>>>>>> HMaster is readlly a problem... > >>>>>>>>>>>>>
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
t;>>> moving it out to be an optional module (Spark would be its >>>> peer). >>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy >>> are >>>>>> busy >>>>>>>>>> working hard on moving it up on to a new foundation. Lets >> not >>>>>> clutter >>>>>>>>> task >>>>>>>>>> harder by piling on more moving parts. >>>>>>>>>> >>>>>>>>>> St.Ack >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Matteo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < >>> yuzhih...@gmail.com >>>>> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I suggest you look at Matteo's work for >> AssignmentManager >>>>> which >>>>>>> is >>>>>>>> to >>>>>>>>>>> make >>>>>>>>>>>> Master more stable. >>>>>>>>>>>> >>>>>>>>>>>> Cheers >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 < >>> palomino...@gmail.com >>>>> >>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> No, not your fault, at lease, not this time:) >>>>>>>>>>>>> >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the >>>>> sequence >>>>>>> of >>>>>>>>>> calls >>>>>>>>>>>> when >>>>>>>>>>>>> starting up the HMaster? HMaster is also a >> regionserver >>>> so >>>>> it >>>>>>>>> extends >>>>>>>>>>>>> HRegionServer, and the initialization of >> HRegionServer >>>>>>> sometimes >>>>>>>>>> needs >>>>>>>>>>> to >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would >> cause >>>>>>>>> probabilistic >>>>>>>>>>> dead >>>>>>>>>>>>> lock or some strange NPEs... >>>>>>>>>>>>> >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to >> add >>>> new >>>>>>>> features >>>>>>>>>> or >>>>>>>>>>>> add >>>>>>>>>>>>> external dependencies to HMaster, especially add more >>>> works >>>>>> for >>>>>>>> the >>>>>>>>>>> start >>>>>>>>>>>>> up processing... >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < >> yuzhih...@gmail.com >>>> : >>>>>>>>>>>>> >>>>>>>>>>>>>> I read through HADOOP-13433 >>>>>>>>>>>>>> <https://issues.apache.org/ >> jira/browse/HADOOP-13433> >>> - >>>>> the >>>>>>>> cited >>>>>>>>>>> race >>>>>>>>>>>>>> condition is in jdk. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it >>> moving. >>>>>>>>>>>>>> >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a >>>> problem... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is >> it >>> in >>>>
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
uggest later we need to revisit > the > > > > > design > > > > > > > and > > > > > > > > > see > > > > > > > > > > if > > > > > > > > > > > > we > > > > > > > > > > > > > > can seperated the logic from HMaster as much as > > > > possible. > > > > > > HA > > > > > > > is > > > > > > > > > > not a > > > > > > > > > > > > big > > > > > > > > > > > > > > problem if you do not store any metada locally. > But > > > the > > > > > > ugly > > > > > > > > code > > > > > > > > > > in > > > > > > > > > > > > > > HMaster is readlly a problem... > > > > > > > > > > > > > > > > > > > > > > > > > > > > And for security, I have a issue pending for a > long > > > > time. > > > > > > Can > > > > > > > > > > someone > > > > > > > > > > > > > help > > > > > > > > > > > > > > taking a simple look at it? This is what I mean, > > ugly > > > > > > code... > > > > > > > > > > logout > > > > > > > > > > > > and > > > > > > > > > > > > > > destroy the credentials in a subject when it is > > still > > > > > being > > > > > > > > used, > > > > > > > > > > and > > > > > > > > > > > > > > declared as LimitPrivacy so I can not change the > > > > behivor > > > > > > and > > > > > > > > the > > > > > > > > > > only > > > > > > > > > > > > way > > > > > > > > > > > > > > to fix it is to write another piece of ugly > code... > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/ > jira/browse/HADOOP-13433 > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > > > > > > > vladrodio...@gmail.com > > > > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> If in the future, we find better ways of > doing > > > > this > > > > > > > > without > > > > > > > > > > > using > > > > > > > > > > > > > MR, > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > can certainly consider that > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Our framework for distributed operations is > > > abstract > > > > > and > > > > > > > > allows > > > > > > > > > > > > > > > different implementations. MR is just one > > > > > implementation > > > > > > we > > > > > > > > > > > provide. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > > > > > > > d...@hortonworks.com > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the > > > topic > > > > > of > > > > > > > > > MR-based > > > > > > > > > > > > > > > > comp
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
; > > > > <https://issues.apache.org/jira/browse/HADOOP-13433> > - > > > the > > > > > > cited > > > > > > > > > race > > > > > > > > > > > > condition is in jdk. > > > > > > > > > > > > > > > > > > > > > > > > Suggest pinging the reviewer on JIRA to get it > moving. > > > > > > > > > > > > > > > > > > > > > > > > bq. But the ugly code in HMaster is readlly a > > problem... > > > > > > > > > > > > > > > > > > > > > > > > Can you be specific as to which code is ugly ? Is it > in > > > the > > > > > > > backup > > > > > > > > / > > > > > > > > > > > > restore mega patch ? > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > > > > palomino...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > If you guys have already implemented the feature in > > the > > > > MR > > > > > > way > > > > > > > > and > > > > > > > > > > the > > > > > > > > > > > > > patch is ready for landing on master, I'm a -0 on > it > > > as I > > > > > do > > > > > > > not > > > > > > > > > want > > > > > > > > > > > to > > > > > > > > > > > > > block the development progress. > > > > > > > > > > > > > > > > > > > > > > > > > > But I strongly suggest later we need to revisit the > > > > design > > > > > > and > > > > > > > > see > > > > > > > > > if > > > > > > > > > > > we > > > > > > > > > > > > > can seperated the logic from HMaster as much as > > > possible. > > > > > HA > > > > > > is > > > > > > > > > not a > > > > > > > > > > > big > > > > > > > > > > > > > problem if you do not store any metada locally. But > > the > > > > > ugly > > > > > > > code > > > > > > > > > in > > > > > > > > > > > > > HMaster is readlly a problem... > > > > > > > > > > > > > > > > > > > > > > > > > > And for security, I have a issue pending for a long > > > time. > > > > > Can > > > > > > > > > someone > > > > > > > > > > > > help > > > > > > > > > > > > > taking a simple look at it? This is what I mean, > ugly > > > > > code... > > > > > > > > > logout > > > > > > > > > > > and > > > > > > > > > > > > > destroy the credentials in a subject when it is > still > > > > being > > > > > > > used, > > > > > > > > > and > > > > > > > > > > > > > declared as LimitPrivacy so I can not change the > > > behivor > > > > > and > > > > > > > the > > > > > > > > > only > > > > > > > > > > > way > > > > > > > > > > > > > to fix it is to write another piece of ugly code... > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433 > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > > > > > > vladrodio...@gmail.com > > > > > > > > > > >: > > > > > >
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
sit the > > > design > > > > > and > > > > > > > see > > > > > > > > if > > > > > > > > > > we > > > > > > > > > > > > can seperated the logic from HMaster as much as > > possible. > > > > HA > > > > > is > > > > > > > > not a > > > > > > > > > > big > > > > > > > > > > > > problem if you do not store any metada locally. But > the > > > > ugly > > > > > > code > > > > > > > > in > > > > > > > > > > > > HMaster is readlly a problem... > > > > > > > > > > > > > > > > > > > > > > > > And for security, I have a issue pending for a long > > time. > > > > Can > > > > > > > > someone > > > > > > > > > > > help > > > > > > > > > > > > taking a simple look at it? This is what I mean, ugly > > > > code... > > > > > > > > logout > > > > > > > > > > and > > > > > > > > > > > > destroy the credentials in a subject when it is still > > > being > > > > > > used, > > > > > > > > and > > > > > > > > > > > > declared as LimitPrivacy so I can not change the > > behivor > > > > and > > > > > > the > > > > > > > > only > > > > > > > > > > way > > > > > > > > > > > > to fix it is to write another piece of ugly code... > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433 > > > > > > > > > > > > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > > > > > vladrodio...@gmail.com > > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > >> If in the future, we find better ways of doing > > this > > > > > > without > > > > > > > > > using > > > > > > > > > > > MR, > > > > > > > > > > > > we > > > > > > > > > > > > > can certainly consider that > > > > > > > > > > > > > > > > > > > > > > > > > > Our framework for distributed operations is > abstract > > > and > > > > > > allows > > > > > > > > > > > > > different implementations. MR is just one > > > implementation > > > > we > > > > > > > > > provide. > > > > > > > > > > > > > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > > > > > d...@hortonworks.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the > topic > > > of > > > > > > > MR-based > > > > > > > > > > > > > > compactions.. But I was thinking more about the > > > > > > SpliceMachine > > > > > > > > > > > approach > > > > > > > > > > > > of > > > > > > > > > > > > > > managing compactions in Spark where apparently > they > > > > saw a > > > > > > lot > > > > > > > > of > > > > > > > > > > > > > benefits. > > > > > > > > > > > > > > Apologies for giving you that sore throat > Andrew; I > > > > > really > > > > > > > > didn't > > > > > > > > > > &g
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
> > vladrodio...@gmail.com > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > >> If in the future, we find better ways of doing > this > > > > > without > > > > > > > > using > > > > > > > > > > MR, > > > > > > > > > > > we > > > > > > > > > > > > can certainly consider that > > > > > > > > > > > > > > > > > > > > > > > > Our framework for distributed operations is abstract > > and > > > > > allows > > > > > > > > > > > > different implementations. MR is just one > > implementation > > > we > > > > > > > > provide. > > > > > > > > > > > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > > > > d...@hortonworks.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic > > of > > > > > > MR-based > > > > > > > > > > > > > compactions.. But I was thinking more about the > > > > > SpliceMachine > > > > > > > > > > approach > > > > > > > > > > > of > > > > > > > > > > > > > managing compactions in Spark where apparently they > > > saw a > > > > > lot > > > > > > > of > > > > > > > > > > > > benefits. > > > > > > > > > > > > > Apologies for giving you that sore throat Andrew; I > > > > really > > > > > > > didn't > > > > > > > > > > mean > > > > > > > > > > > to > > > > > > > > > > > > > :-) > > > > > > > > > > > > > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > > > > > > > 0. Somehow not use MR but something like that > > > > > > > > > > > > > 1. Run a standalone service other than master > > > > > > > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I > > don't > > > > > think > > > > > > > > it's > > > > > > > > > > even > > > > > > > > > > > > > worth the effort of trying to build something when > MR > > > is > > > > > > > already > > > > > > > > > > there, > > > > > > > > > > > > and > > > > > > > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - > HA > > of > > > > the > > > > > > > > server > > > > > > > > > > not > > > > > > > > > > > > > being the least of them all. Security (kerberos > > > > > > authentication, > > > > > > > > > > another > > > > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that > approach > > > is > > > > > DOA. > > > > > > > > > Instead > > > > > > > > > > > > let's > > > > > > > > > > > > > substitute that (1) with the HBase Master. I > haven't > > > seen > > > > > any > > > > > > > > good > > > > > > > > > > > reason > > > >
Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
t; > > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 < > palomino...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > > If you guys have already implemented the feature in the > MR > > > way > > > > > and > > > > > > > the > > > > > > > > > > patch is ready for landing on master, I'm a -0 on it as I > > do > > > > not > > > > > > want > > > > > > > > to > > > > > > > > > > block the development progress. > > > > > > > > > > > > > > > > > > > > But I strongly suggest later we need to revisit the > design > > > and > > > > > see > > > > > > if > > > > > > > > we > > > > > > > > > > can seperated the logic from HMaster as much as possible. > > HA > > > is > > > > > > not a > > > > > > > > big > > > > > > > > > > problem if you do not store any metada locally. But the > > ugly > > > > code > > > > > > in > > > > > > > > > > HMaster is readlly a problem... > > > > > > > > > > > > > > > > > > > > And for security, I have a issue pending for a long time. > > Can > > > > > > someone > > > > > > > > > help > > > > > > > > > > taking a simple look at it? This is what I mean, ugly > > code... > > > > > > logout > > > > > > > > and > > > > > > > > > > destroy the credentials in a subject when it is still > being > > > > used, > > > > > > and > > > > > > > > > > declared as LimitPrivacy so I can not change the behivor > > and > > > > the > > > > > > only > > > > > > > > way > > > > > > > > > > to fix it is to write another piece of ugly code... > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433 > > > > > > > > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > > > vladrodio...@gmail.com > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > >> If in the future, we find better ways of doing this > > > > without > > > > > > > using > > > > > > > > > MR, > > > > > > > > > > we > > > > > > > > > > > can certainly consider that > > > > > > > > > > > > > > > > > > > > > > Our framework for distributed operations is abstract > and > > > > allows > > > > > > > > > > > different implementations. MR is just one > implementation > > we > > > > > > > provide. > > > > > > > > > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > > > d...@hortonworks.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic > of > > > > > MR-based > > > > > > > > > > > > compactions.. But I was thinking more about the > > > > SpliceMachine > > > > > > > > > approach > > > > > > > > > > of > > > > > > > > > > > > managing compactions in Spark where apparently they > > saw a > > > > lot > > > > > > of > > > > > > > > > > > benefits. > > > > > > > > > > > > Apologies for giving you that sore throat Andrew; I > > > really > > > > > > didn't > > > > > > > > > mean > > > > > > > >
Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
t it? This is what I mean, ugly > code... > > > > > logout > > > > > > > and > > > > > > > > > destroy the credentials in a subject when it is still being > > > used, > > > > > and > > > > > > > > > declared as LimitPrivacy so I can not change the behivor > and > > > the > > > > > only > > > > > > > way > > > > > > > > > to fix it is to write another piece of ugly code... > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433 > > > > > > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > > vladrodio...@gmail.com > > > > > > >: > > > > > > > > > > > > > > > > > > > >> If in the future, we find better ways of doing this > > > without > > > > > > using > > > > > > > > MR, > > > > > > > > > we > > > > > > > > > > can certainly consider that > > > > > > > > > > > > > > > > > > > > Our framework for distributed operations is abstract and > > > allows > > > > > > > > > > different implementations. MR is just one implementation > we > > > > > > provide. > > > > > > > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > > d...@hortonworks.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic of > > > > MR-based > > > > > > > > > > > compactions.. But I was thinking more about the > > > SpliceMachine > > > > > > > > approach > > > > > > > > > of > > > > > > > > > > > managing compactions in Spark where apparently they > saw a > > > lot > > > > > of > > > > > > > > > > benefits. > > > > > > > > > > > Apologies for giving you that sore throat Andrew; I > > really > > > > > didn't > > > > > > > > mean > > > > > > > > > to > > > > > > > > > > > :-) > > > > > > > > > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > > > > > 0. Somehow not use MR but something like that > > > > > > > > > > > 1. Run a standalone service other than master > > > > > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't > > > think > > > > > > it's > > > > > > > > even > > > > > > > > > > > worth the effort of trying to build something when MR > is > > > > > already > > > > > > > > there, > > > > > > > > > > and > > > > > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of > > the > > > > > > server > > > > > > > > not > > > > > > > > > > > being the least of them all. Security (kerberos > > > > authentication, > > > > > > > > another > > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach > is > > > DOA. > > > > > > > Instead > > > > > > > > > > let's > > > > > > > > > > > substitute that (1) with the HBase Master. I haven't > seen > > > any > > > > > > good > > > > > > > > > r
Re: [DISCUSSION] MR jobs started by Master or RS
; > > > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > > > vladrodio...@gmail.com > > > > > > >: > > > > > > > > > > > > > > > > > > > >> If in the future, we find better ways of doing this > > > without > > > > > > using > > > > > > > > MR, > > > > > > > > > we > > > > > > > > > > can certainly consider that > > > > > > > > > > > > > > > > > > > > Our framework for distributed operations is abstract and > > > allows > > > > > > > > > > different implementations. MR is just one implementation > we > > > > > > provide. > > > > > > > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > > > d...@hortonworks.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic of > > > > MR-based > > > > > > > > > > > compactions.. But I was thinking more about the > > > SpliceMachine > > > > > > > > approach > > > > > > > > > of > > > > > > > > > > > managing compactions in Spark where apparently they > saw a > > > lot > > > > > of > > > > > > > > > > benefits. > > > > > > > > > > > Apologies for giving you that sore throat Andrew; I > > really > > > > > didn't > > > > > > > > mean > > > > > > > > > to > > > > > > > > > > > :-) > > > > > > > > > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > > > > > 0. Somehow not use MR but something like that > > > > > > > > > > > 1. Run a standalone service other than master > > > > > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't > > > think > > > > > > it's > > > > > > > > even > > > > > > > > > > > worth the effort of trying to build something when MR > is > > > > > already > > > > > > > > there, > > > > > > > > > > and > > > > > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of > > the > > > > > > server > > > > > > > > not > > > > > > > > > > > being the least of them all. Security (kerberos > > > > authentication, > > > > > > > > another > > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach > is > > > DOA. > > > > > > > Instead > > > > > > > > > > let's > > > > > > > > > > > substitute that (1) with the HBase Master. I haven't > seen > > > any > > > > > > good > > > > > > > > > reason > > > > > > > > > > > why the HBase master shouldn't launch MR jobs if > needed. > > > It's > > > > > not > > > > > > > > > ideal; > > > > > > > > > > > agreed. > > > > > > > > > > > > > > > > > > > > > > Now before going to (2), let's see what are the > benefits > > of > > > > > > running > > > > > > > > the > > > > > > > > > > > backup/restore jobs from the master. I think Ted has > > > > summarized > > > > &
Re: [DISCUSSION] MR jobs started by Master or RS
> > > > > > compactions.. But I was thinking more about the > > SpliceMachine > > > > > > > approach > > > > > > > > of > > > > > > > > > > managing compactions in Spark where apparently they saw a > > lot > > > > of > > > > > > > > > benefits. > > > > > > > > > > Apologies for giving you that sore throat Andrew; I > really > > > > didn't > > > > > > > mean > > > > > > > > to > > > > > > > > > > :-) > > > > > > > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > > > > 0. Somehow not use MR but something like that > > > > > > > > > > 1. Run a standalone service other than master > > > > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't > > think > > > > > it's > > > > > > > even > > > > > > > > > > worth the effort of trying to build something when MR is > > > > already > > > > > > > there, > > > > > > > > > and > > > > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of > the > > > > > server > > > > > > > not > > > > > > > > > > being the least of them all. Security (kerberos > > > authentication, > > > > > > > another > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is > > DOA. > > > > > > Instead > > > > > > > > > let's > > > > > > > > > > substitute that (1) with the HBase Master. I haven't seen > > any > > > > > good > > > > > > > > reason > > > > > > > > > > why the HBase master shouldn't launch MR jobs if needed. > > It's > > > > not > > > > > > > > ideal; > > > > > > > > > > agreed. > > > > > > > > > > > > > > > > > > > > Now before going to (2), let's see what are the benefits > of > > > > > running > > > > > > > the > > > > > > > > > > backup/restore jobs from the master. I think Ted has > > > summarized > > > > > > some > > > > > > > of > > > > > > > > > the > > > > > > > > > > issues that we need to take care of - basically, the > master > > > can > > > > > > keep > > > > > > > > > track > > > > > > > > > > of running jobs, and should it fail, the backup master > can > > > > > continue > > > > > > > > > keeping > > > > > > > > > > track of it (since the jobId would have been recorded in > > the > > > > proc > > > > > > > WAL). > > > > > > > > > The > > > > > > > > > > master can also do cleanup, etc. of failed backup/restore > > > > > > processes. > > > > > > > > > > Security is another issue - the job needs to run as > 'hbase' > > > > since > > > > > > it > > > > > > > > owns > > > > > > > > > > the data. Having the master launch the job makes it get > > that > > > > > > > privilege. > > > > > > > > > In > > > > > > > > > > the (2) approach, it's hard to do some of the above > > > management. > > > > > > > > > > > > > > > > > > > > Guys, just to reiterate, the patch as such is ready from > > the > > > > > > overall > > > > > > > > > > design/arch point of view (maybe code review is still > > pending > > > > > from > > > > > > > &
Re: [DISCUSSION] MR jobs started by Master or RS
; > > want > > > > > to > > > > > > > block the development progress. > > > > > > > > > > > > > > But I strongly suggest later we need to revisit the design and > > see > > > if > > > > > we > > > > > > > can seperated the logic from HMaster as much as possible. HA is > > > not a > > > > > big > > > > > > > problem if you do not store any metada locally. But the ugly > code > > > in > > > > > > > HMaster is readlly a problem... > > > > > > > > > > > > > > And for security, I have a issue pending for a long time. Can > > > someone > > > > > > help > > > > > > > taking a simple look at it? This is what I mean, ugly code... > > > logout > > > > > and > > > > > > > destroy the credentials in a subject when it is still being > used, > > > and > > > > > > > declared as LimitPrivacy so I can not change the behivor and > the > > > only > > > > > way > > > > > > > to fix it is to write another piece of ugly code... > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433 > > > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > > vladrodio...@gmail.com > > > > >: > > > > > > > > > > > > > > > >> If in the future, we find better ways of doing this > without > > > > using > > > > > > MR, > > > > > > > we > > > > > > > > can certainly consider that > > > > > > > > > > > > > > > > Our framework for distributed operations is abstract and > allows > > > > > > > > different implementations. MR is just one implementation we > > > > provide. > > > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > > d...@hortonworks.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic of > > MR-based > > > > > > > > > compactions.. But I was thinking more about the > SpliceMachine > > > > > > approach > > > > > > > of > > > > > > > > > managing compactions in Spark where apparently they saw a > lot > > > of > > > > > > > > benefits. > > > > > > > > > Apologies for giving you that sore throat Andrew; I really > > > didn't > > > > > > mean > > > > > > > to > > > > > > > > > :-) > > > > > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > > > 0. Somehow not use MR but something like that > > > > > > > > > 1. Run a standalone service other than master > > > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't > think > > > > it's > > > > > > even > > > > > > > > > worth the effort of trying to build something when MR is > > > already > > > > > > there, > > > > > > > > and > > > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the > > > > server > > > > > > not > > > > > > > > > being the least of them all. Security (kerberos > > authentication, > > > > > > another > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is > DOA. > > > > > Instead > > > > > > > > let's > > > > > > > > > substitute that (1) with the HBase Master. I haven't seen > any > > > > good > > > >
Re: [DISCUSSION] MR jobs started by Master or RS
t; > > > > help > > > > > > taking a simple look at it? This is what I mean, ugly code... > > logout > > > > and > > > > > > destroy the credentials in a subject when it is still being used, > > and > > > > > > declared as LimitPrivacy so I can not change the behivor and the > > only > > > > way > > > > > > to fix it is to write another piece of ugly code... > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433 > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > > vladrodio...@gmail.com > > > >: > > > > > > > > > > > > > >> If in the future, we find better ways of doing this without > > > using > > > > > MR, > > > > > > we > > > > > > > can certainly consider that > > > > > > > > > > > > > > Our framework for distributed operations is abstract and allows > > > > > > > different implementations. MR is just one implementation we > > > provide. > > > > > > > > > > > > > > -Vlad > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > > d...@hortonworks.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic of > MR-based > > > > > > > > compactions.. But I was thinking more about the SpliceMachine > > > > > approach > > > > > > of > > > > > > > > managing compactions in Spark where apparently they saw a lot > > of > > > > > > > benefits. > > > > > > > > Apologies for giving you that sore throat Andrew; I really > > didn't > > > > > mean > > > > > > to > > > > > > > > :-) > > > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > > 0. Somehow not use MR but something like that > > > > > > > > 1. Run a standalone service other than master > > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't think > > > it's > > > > > even > > > > > > > > worth the effort of trying to build something when MR is > > already > > > > > there, > > > > > > > and > > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the > > > server > > > > > not > > > > > > > > being the least of them all. Security (kerberos > authentication, > > > > > another > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. > > > > Instead > > > > > > > let's > > > > > > > > substitute that (1) with the HBase Master. I haven't seen any > > > good > > > > > > reason > > > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's > > not > > > > > > ideal; > > > > > > > > agreed. > > > > > > > > > > > > > > > > Now before going to (2), let's see what are the benefits of > > > running > > > > > the > > > > > > > > backup/restore jobs from the master. I think Ted has > summarized > > > > some > > > > > of > > > > > > > the > > > > > > > > issues that we need to take care of - basically, the master > can > > > > keep > > > > > > > track > > > > > > > > of running jobs, and should it fail, the backup master can > > > continue > > > > > > > keeping > > > > > > > > track of it (since the jobId would have been recorded in the > > proc > > > > > WAL). > > > > > > > The > > > > > > > > master can also do clean
Re: [DISCUSSION] MR jobs started by Master or RS
> > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > d...@hortonworks.com > > > > > > > > wrote: > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic of MR-based > > > > > > > compactions.. But I was thinking more about the SpliceMachine > > > > approach > > > > > of > > > > > > > managing compactions in Spark where apparently they saw a lot > of > > > > > > benefits. > > > > > > > Apologies for giving you that sore throat Andrew; I really > didn't > > > > mean > > > > > to > > > > > > > :-) > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > 0. Somehow not use MR but something like that > > > > > > > 1. Run a standalone service other than master > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't think > > it's > > > > even > > > > > > > worth the effort of trying to build something when MR is > already > > > > there, > > > > > > and > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the > > server > > > > not > > > > > > > being the least of them all. Security (kerberos authentication, > > > > another > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. > > > Instead > > > > > > let's > > > > > > > substitute that (1) with the HBase Master. I haven't seen any > > good > > > > > reason > > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's > not > > > > > ideal; > > > > > > > agreed. > > > > > > > > > > > > > > Now before going to (2), let's see what are the benefits of > > running > > > > the > > > > > > > backup/restore jobs from the master. I think Ted has summarized > > > some > > > > of > > > > > > the > > > > > > > issues that we need to take care of - basically, the master can > > > keep > > > > > > track > > > > > > > of running jobs, and should it fail, the backup master can > > continue > > > > > > keeping > > > > > > > track of it (since the jobId would have been recorded in the > proc > > > > WAL). > > > > > > The > > > > > > > master can also do cleanup, etc. of failed backup/restore > > > processes. > > > > > > > Security is another issue - the job needs to run as 'hbase' > since > > > it > > > > > owns > > > > > > > the data. Having the master launch the job makes it get that > > > > privilege. > > > > > > In > > > > > > > the (2) approach, it's hard to do some of the above management. > > > > > > > > > > > > > > Guys, just to reiterate, the patch as such is ready from the > > > overall > > > > > > > design/arch point of view (maybe code review is still pending > > from > > > > > > Matteo). > > > > > > > If in the future, we find better ways of doing this without > using > > > MR, > > > > > we > > > > > > > can certainly consider that. But IMO don't think we should > block > > > this > > > > > > patch > > > > > > > from getting merged. > > > > > > > > > > > > > > > > > > > > > From: 张铎 > > > > > > > Sent: Thursday, September 22, 2016 8:32 PM > > > > > > > To: dev@hbase.apache.org > > > > > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > > > > > > > > > > > So what about a standalone service other than master? You can > use > > > > your > >
Re: [DISCUSSION] MR jobs started by Master or RS
Great points. I am getting a feeling that we have beaten this topic to death and have spent enough and more time on it, and it's time to move on. My takeaway is that it's fine to do MR for backup/restore - backup/restore is an optional feature - unless you use it, you don't need MR. Fingers crossed, Devaraj. P.S. To Nick's point, if it makes sense to do MR/Spark even for core features like Compactions, we should be open to it. But that's a topic for another [Friday] DISCUSS thread. From: Jerry He Sent: Friday, September 23, 2016 8:40 AM To: dev Subject: Re: [DISCUSSION] MR jobs started by Master or RS That is the point, Matteo. Put it another way, there is nothing that stops a user from deploying custom procedure, custom co-processor that calls out MR job. The optional feature should satisfy some basic requirements. .e.g. No impact if not deployed or used. Limited impact if used. It can be made with isolated dynamic loading of extra configuration (Yarn), non-blocking non-occupying on the server handlers, or separate handler. The impact would mostly be on the overall cluster resources. In this sense, there is no difference, using another standalone server or a command tool. The exportEnapshot can then be moved to the server as well. Also, thinking about in the higher level. It is probably beneficial if you allow HBase to call out an external framework to do computation. It can be think of as a UDF, a distributed UDF. The execution of this UDF is totally in separate address spaces, and you only need to poll the status. This would be like a dream in traditional database. My 2 cents. Jerry On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi wrote: > let me try to go back to my original topic. > this question was meant to be generic, and provide some rule for future > code. > > from what I can gather, a rule that may satisfy everyone can be: > - we don't want any core feature (e.g. compaction/log-split/log-reply) > over MR, because some cluster may not want or may have an > external/uncontrolled MR setup. > - we allow non-core features (e.g. features enabled by a flag) to run MR > jobs from hbase, because unless you use the feature, MR is not required. > > Matteo > > > On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu wrote: > > > I suggest you look at Matteo's work for AssignmentManager which is to > make > > Master more stable. > > > > Cheers > > > > On Fri, Sep 23, 2016 at 5:32 AM, 张铎 wrote: > > > > > No, not your fault, at lease, not this time:) > > > > > > Why I call the code ugly? Can you simply tell me the sequence of calls > > when > > > starting up the HMaster? HMaster is also a regionserver so it extends > > > HRegionServer, and the initialization of HRegionServer sometimes needs > to > > > make rpc calls to HMaster. A simple change would cause probabilistic > dead > > > lock or some strange NPEs... > > > > > > That's why I'm very nervous when somebody wants to add new features or > > add > > > external dependencies to HMaster, especially add more works for the > start > > > up processing... > > > > > > Thanks. > > > > > > 2016-09-23 20:02 GMT+08:00 Ted Yu : > > > > > > > I read through HADOOP-13433 > > > > <https://issues.apache.org/jira/browse/HADOOP-13433> - the cited > race > > > > condition is in jdk. > > > > > > > > Suggest pinging the reviewer on JIRA to get it moving. > > > > > > > > bq. But the ugly code in HMaster is readlly a problem... > > > > > > > > Can you be specific as to which code is ugly ? Is it in the backup / > > > > restore mega patch ? > > > > > > > > Cheers > > > > > > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 wrote: > > > > > > > > > If you guys have already implemented the feature in the MR way and > > the > > > > > patch is ready for landing on master, I'm a -0 on it as I do not > want > > > to > > > > > block the development progress. > > > > > > > > > > But I strongly suggest later we need to revisit the design and see > if > > > we > > > > > can seperated the logic from HMaster as much as possible. HA is > not a > > > big > > > > > problem if you do not store any metada locally. But the ugly code > in > > > > > HMaster is readlly a problem... > > > > > > > > > > And for security, I have a issue pending for a long time. Can > someone &g
Re: [DISCUSSION] MR jobs started by Master or RS
> > > > > > > > > > > >> If in the future, we find better ways of doing this without > > using > > > > MR, > > > > > we > > > > > > can certainly consider that > > > > > > > > > > > > Our framework for distributed operations is abstract and allows > > > > > > different implementations. MR is just one implementation we > > provide. > > > > > > > > > > > > -Vlad > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > d...@hortonworks.com > > > > > > > > wrote: > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic of MR-based > > > > > > > compactions.. But I was thinking more about the SpliceMachine > > > > approach > > > > > of > > > > > > > managing compactions in Spark where apparently they saw a lot > of > > > > > > benefits. > > > > > > > Apologies for giving you that sore throat Andrew; I really > didn't > > > > mean > > > > > to > > > > > > > :-) > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > 0. Somehow not use MR but something like that > > > > > > > 1. Run a standalone service other than master > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't think > > it's > > > > even > > > > > > > worth the effort of trying to build something when MR is > already > > > > there, > > > > > > and > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the > > server > > > > not > > > > > > > being the least of them all. Security (kerberos authentication, > > > > another > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. > > > Instead > > > > > > let's > > > > > > > substitute that (1) with the HBase Master. I haven't seen any > > good > > > > > reason > > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's > not > > > > > ideal; > > > > > > > agreed. > > > > > > > > > > > > > > Now before going to (2), let's see what are the benefits of > > running > > > > the > > > > > > > backup/restore jobs from the master. I think Ted has summarized > > > some > > > > of > > > > > > the > > > > > > > issues that we need to take care of - basically, the master can > > > keep > > > > > > track > > > > > > > of running jobs, and should it fail, the backup master can > > continue > > > > > > keeping > > > > > > > track of it (since the jobId would have been recorded in the > proc > > > > WAL). > > > > > > The > > > > > > > master can also do cleanup, etc. of failed backup/restore > > > processes. > > > > > > > Security is another issue - the job needs to run as 'hbase' > since > > > it > > > > > owns > > > > > > > the data. Having the master launch the job makes it get that > > > > privilege. > > > > > > In > > > > > > > the (2) approach, it's hard to do some of the above management. > > > > > > > > > > > > > > Guys, just to reiterate, the patch as such is ready from the > > > overall > > > > > > > design/arch point of view (maybe code review is still pending > > from > > > > > > Matteo). > > > > > > > If in the future, we find better ways of doing this without > using > > > MR, > > > > > we > > > > > > > can certainly consider that. But IMO don't think we should > block > > > this > > > > > > patch > > > > > > > from getting merged. > &
Re: [DISCUSSION] MR jobs started by Master or RS
> Our framework for distributed operations is abstract and allows > > > > > > different implementations. MR is just one implementation we > > provide. > > > > > > > > > > > > -Vlad > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > d...@hortonworks.com > > > > > > > > wrote: > > > > > > > > > > > > > Guys, first off apologies for bringing in the topic of MR-based > > > > > > > compactions.. But I was thinking more about the SpliceMachine > > > > approach > > > > > of > > > > > > > managing compactions in Spark where apparently they saw a lot > of > > > > > > benefits. > > > > > > > Apologies for giving you that sore throat Andrew; I really > didn't > > > > mean > > > > > to > > > > > > > :-) > > > > > > > > > > > > > > So on this issue, we have these on the plate: > > > > > > > 0. Somehow not use MR but something like that > > > > > > > 1. Run a standalone service other than master > > > > > > > 2. Shell out from the master > > > > > > > > > > > > > > I don't think we have a good answer to (0), and I don't think > > it's > > > > even > > > > > > > worth the effort of trying to build something when MR is > already > > > > there, > > > > > > and > > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the > > server > > > > not > > > > > > > being the least of them all. Security (kerberos authentication, > > > > another > > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. > > > Instead > > > > > > let's > > > > > > > substitute that (1) with the HBase Master. I haven't seen any > > good > > > > > reason > > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's > not > > > > > ideal; > > > > > > > agreed. > > > > > > > > > > > > > > Now before going to (2), let's see what are the benefits of > > running > > > > the > > > > > > > backup/restore jobs from the master. I think Ted has summarized > > > some > > > > of > > > > > > the > > > > > > > issues that we need to take care of - basically, the master can > > > keep > > > > > > track > > > > > > > of running jobs, and should it fail, the backup master can > > continue > > > > > > keeping > > > > > > > track of it (since the jobId would have been recorded in the > proc > > > > WAL). > > > > > > The > > > > > > > master can also do cleanup, etc. of failed backup/restore > > > processes. > > > > > > > Security is another issue - the job needs to run as 'hbase' > since > > > it > > > > > owns > > > > > > > the data. Having the master launch the job makes it get that > > > > privilege. > > > > > > In > > > > > > > the (2) approach, it's hard to do some of the above management. > > > > > > > > > > > > > > Guys, just to reiterate, the patch as such is ready from the > > > overall > > > > > > > design/arch point of view (maybe code review is still pending > > from > > > > > > Matteo). > > > > > > > If in the future, we find better ways of doing this without > using > > > MR, > > > > > we > > > > > > > can certainly consider that. But IMO don't think we should > block > > > this > > > > > > patch > > > > > > > from getting merged. > > > > > > > > > > > > > > > > > > > > > From: 张铎 > > > > > > > > Sent: Thursday, September 22, 2016 8:32 PM > > > > > > > To: dev@hbase.apac
Re: [DISCUSSION] MR jobs started by Master or RS
x27;t think we have a good answer to (0), and I don't think > it's > > > even > > > > > > worth the effort of trying to build something when MR is already > > > there, > > > > > and > > > > > > being used by HBase already for some operations. > > > > > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the > server > > > not > > > > > > being the least of them all. Security (kerberos authentication, > > > another > > > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. > > Instead > > > > > let's > > > > > > substitute that (1) with the HBase Master. I haven't seen any > good > > > > reason > > > > > > why the HBase master shouldn't launch MR jobs if needed. It's not > > > > ideal; > > > > > > agreed. > > > > > > > > > > > > Now before going to (2), let's see what are the benefits of > running > > > the > > > > > > backup/restore jobs from the master. I think Ted has summarized > > some > > > of > > > > > the > > > > > > issues that we need to take care of - basically, the master can > > keep > > > > > track > > > > > > of running jobs, and should it fail, the backup master can > continue > > > > > keeping > > > > > > track of it (since the jobId would have been recorded in the proc > > > WAL). > > > > > The > > > > > > master can also do cleanup, etc. of failed backup/restore > > processes. > > > > > > Security is another issue - the job needs to run as 'hbase' since > > it > > > > owns > > > > > > the data. Having the master launch the job makes it get that > > > privilege. > > > > > In > > > > > > the (2) approach, it's hard to do some of the above management. > > > > > > > > > > > > Guys, just to reiterate, the patch as such is ready from the > > overall > > > > > > design/arch point of view (maybe code review is still pending > from > > > > > Matteo). > > > > > > If in the future, we find better ways of doing this without using > > MR, > > > > we > > > > > > can certainly consider that. But IMO don't think we should block > > this > > > > > patch > > > > > > from getting merged. > > > > > > > > > > > > > > > > > > From: 张铎 > > > > > > Sent: Thursday, September 22, 2016 8:32 PM > > > > > > To: dev@hbase.apache.org > > > > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > > > > > > > > > So what about a standalone service other than master? You can use > > > your > > > > > own > > > > > > procedure store in that service? > > > > > > > > > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu : > > > > > > > > > > > > > An earlier implementation was client driven. > > > > > > > > > > > > > > But with that approach, it is hard to resume if there is error > > > > midway. > > > > > > > Using Procedure V2 makes the backup / restore more robust. > > > > > > > > > > > > > > Another consideration is for security. It is hard to enforce > > > security > > > > > (to > > > > > > > be implemented) for client driven actions. > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell < > > > > > andrew.purt...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > No, this misses Matteo's finer point, which is "shelling out" > > > from > > > > > the > > > > > > > master directly to run MR is a first. Why not drive this with a > > > > utility > > > > > > > derived from Tool? > > > > > > > > > > > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <
Re: [DISCUSSION] MR jobs started by Master or RS
t; the > > > > > backup/restore jobs from the master. I think Ted has summarized > some > > of > > > > the > > > > > issues that we need to take care of - basically, the master can > keep > > > > track > > > > > of running jobs, and should it fail, the backup master can continue > > > > keeping > > > > > track of it (since the jobId would have been recorded in the proc > > WAL). > > > > The > > > > > master can also do cleanup, etc. of failed backup/restore > processes. > > > > > Security is another issue - the job needs to run as 'hbase' since > it > > > owns > > > > > the data. Having the master launch the job makes it get that > > privilege. > > > > In > > > > > the (2) approach, it's hard to do some of the above management. > > > > > > > > > > Guys, just to reiterate, the patch as such is ready from the > overall > > > > > design/arch point of view (maybe code review is still pending from > > > > Matteo). > > > > > If in the future, we find better ways of doing this without using > MR, > > > we > > > > > can certainly consider that. But IMO don't think we should block > this > > > > patch > > > > > from getting merged. > > > > > > > > > > > > > > > From: 张铎 > > > > > Sent: Thursday, September 22, 2016 8:32 PM > > > > > To: dev@hbase.apache.org > > > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > > > > > > > So what about a standalone service other than master? You can use > > your > > > > own > > > > > procedure store in that service? > > > > > > > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu : > > > > > > > > > > > An earlier implementation was client driven. > > > > > > > > > > > > But with that approach, it is hard to resume if there is error > > > midway. > > > > > > Using Procedure V2 makes the backup / restore more robust. > > > > > > > > > > > > Another consideration is for security. It is hard to enforce > > security > > > > (to > > > > > > be implemented) for client driven actions. > > > > > > > > > > > > Cheers > > > > > > > > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell < > > > > andrew.purt...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > No, this misses Matteo's finer point, which is "shelling out" > > from > > > > the > > > > > > master directly to run MR is a first. Why not drive this with a > > > utility > > > > > > derived from Tool? > > > > > > > > > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov < > > > > vladrodio...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > >>>> In our production cluster, it is a common case we just have > > > HDFS > > > > > and > > > > > > >>>> HBase deployed. > > > > > > >>>> If our Master/RS depend on MR framework (especially some > > > features > > > > we > > > > > > >>>> have not used at all), it introduced another cost for > > maintain. > > > > I > > > > > > >>>> don't think it is a good idea. > > > > > > >> > > > > > > >> So , you are not backup users in this case. Many our customers > > > have > > > > > full > > > > > > >> stack deployed and > > > > > > >> want see backup to be a standard feature. Besides this, > nothing > > > will > > > > > > happen > > > > > > >> in your cluster > > > > > > >> if you won't be doing backups. > > > > > > >> > > > > > > >> This discussion (we do not want see M/R dependency) goes to > > > nowhere. > > > > > We > > > > > > >> asked already, at le
Re: [DISCUSSION] MR jobs started by Master or RS
No, not your fault, at lease, not this time:) Why I call the code ugly? Can you simply tell me the sequence of calls when starting up the HMaster? HMaster is also a regionserver so it extends HRegionServer, and the initialization of HRegionServer sometimes needs to make rpc calls to HMaster. A simple change would cause probabilistic dead lock or some strange NPEs... That's why I'm very nervous when somebody wants to add new features or add external dependencies to HMaster, especially add more works for the start up processing... Thanks. 2016-09-23 20:02 GMT+08:00 Ted Yu : > I read through HADOOP-13433 > <https://issues.apache.org/jira/browse/HADOOP-13433> - the cited race > condition is in jdk. > > Suggest pinging the reviewer on JIRA to get it moving. > > bq. But the ugly code in HMaster is readlly a problem... > > Can you be specific as to which code is ugly ? Is it in the backup / > restore mega patch ? > > Cheers > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 wrote: > > > If you guys have already implemented the feature in the MR way and the > > patch is ready for landing on master, I'm a -0 on it as I do not want to > > block the development progress. > > > > But I strongly suggest later we need to revisit the design and see if we > > can seperated the logic from HMaster as much as possible. HA is not a big > > problem if you do not store any metada locally. But the ugly code in > > HMaster is readlly a problem... > > > > And for security, I have a issue pending for a long time. Can someone > help > > taking a simple look at it? This is what I mean, ugly code... logout and > > destroy the credentials in a subject when it is still being used, and > > declared as LimitPrivacy so I can not change the behivor and the only way > > to fix it is to write another piece of ugly code... > > > > https://issues.apache.org/jira/browse/HADOOP-13433 > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov : > > > > > >> If in the future, we find better ways of doing this without using > MR, > > we > > > can certainly consider that > > > > > > Our framework for distributed operations is abstract and allows > > > different implementations. MR is just one implementation we provide. > > > > > > -Vlad > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das > > wrote: > > > > > > > Guys, first off apologies for bringing in the topic of MR-based > > > > compactions.. But I was thinking more about the SpliceMachine > approach > > of > > > > managing compactions in Spark where apparently they saw a lot of > > > benefits. > > > > Apologies for giving you that sore throat Andrew; I really didn't > mean > > to > > > > :-) > > > > > > > > So on this issue, we have these on the plate: > > > > 0. Somehow not use MR but something like that > > > > 1. Run a standalone service other than master > > > > 2. Shell out from the master > > > > > > > > I don't think we have a good answer to (0), and I don't think it's > even > > > > worth the effort of trying to build something when MR is already > there, > > > and > > > > being used by HBase already for some operations. > > > > > > > > On (1), we have to deal with a myriad of issues - HA of the server > not > > > > being the least of them all. Security (kerberos authentication, > another > > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead > > > let's > > > > substitute that (1) with the HBase Master. I haven't seen any good > > reason > > > > why the HBase master shouldn't launch MR jobs if needed. It's not > > ideal; > > > > agreed. > > > > > > > > Now before going to (2), let's see what are the benefits of running > the > > > > backup/restore jobs from the master. I think Ted has summarized some > of > > > the > > > > issues that we need to take care of - basically, the master can keep > > > track > > > > of running jobs, and should it fail, the backup master can continue > > > keeping > > > > track of it (since the jobId would have been recorded in the proc > WAL). > > > The > > > > master can also do cleanup, etc. of failed backup/restore processes. > > > > Security is another issue - the job needs to run as 'hbase' since it > > owns > > > > the
Re: [DISCUSSION] MR jobs started by Master or RS
I read through HADOOP-13433 <https://issues.apache.org/jira/browse/HADOOP-13433> - the cited race condition is in jdk. Suggest pinging the reviewer on JIRA to get it moving. bq. But the ugly code in HMaster is readlly a problem... Can you be specific as to which code is ugly ? Is it in the backup / restore mega patch ? Cheers On Thu, Sep 22, 2016 at 10:44 PM, 张铎 wrote: > If you guys have already implemented the feature in the MR way and the > patch is ready for landing on master, I'm a -0 on it as I do not want to > block the development progress. > > But I strongly suggest later we need to revisit the design and see if we > can seperated the logic from HMaster as much as possible. HA is not a big > problem if you do not store any metada locally. But the ugly code in > HMaster is readlly a problem... > > And for security, I have a issue pending for a long time. Can someone help > taking a simple look at it? This is what I mean, ugly code... logout and > destroy the credentials in a subject when it is still being used, and > declared as LimitPrivacy so I can not change the behivor and the only way > to fix it is to write another piece of ugly code... > > https://issues.apache.org/jira/browse/HADOOP-13433 > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov : > > > >> If in the future, we find better ways of doing this without using MR, > we > > can certainly consider that > > > > Our framework for distributed operations is abstract and allows > > different implementations. MR is just one implementation we provide. > > > > -Vlad > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das > wrote: > > > > > Guys, first off apologies for bringing in the topic of MR-based > > > compactions.. But I was thinking more about the SpliceMachine approach > of > > > managing compactions in Spark where apparently they saw a lot of > > benefits. > > > Apologies for giving you that sore throat Andrew; I really didn't mean > to > > > :-) > > > > > > So on this issue, we have these on the plate: > > > 0. Somehow not use MR but something like that > > > 1. Run a standalone service other than master > > > 2. Shell out from the master > > > > > > I don't think we have a good answer to (0), and I don't think it's even > > > worth the effort of trying to build something when MR is already there, > > and > > > being used by HBase already for some operations. > > > > > > On (1), we have to deal with a myriad of issues - HA of the server not > > > being the least of them all. Security (kerberos authentication, another > > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead > > let's > > > substitute that (1) with the HBase Master. I haven't seen any good > reason > > > why the HBase master shouldn't launch MR jobs if needed. It's not > ideal; > > > agreed. > > > > > > Now before going to (2), let's see what are the benefits of running the > > > backup/restore jobs from the master. I think Ted has summarized some of > > the > > > issues that we need to take care of - basically, the master can keep > > track > > > of running jobs, and should it fail, the backup master can continue > > keeping > > > track of it (since the jobId would have been recorded in the proc WAL). > > The > > > master can also do cleanup, etc. of failed backup/restore processes. > > > Security is another issue - the job needs to run as 'hbase' since it > owns > > > the data. Having the master launch the job makes it get that privilege. > > In > > > the (2) approach, it's hard to do some of the above management. > > > > > > Guys, just to reiterate, the patch as such is ready from the overall > > > design/arch point of view (maybe code review is still pending from > > Matteo). > > > If in the future, we find better ways of doing this without using MR, > we > > > can certainly consider that. But IMO don't think we should block this > > patch > > > from getting merged. > > > > > > > > > From: 张铎 > > > Sent: Thursday, September 22, 2016 8:32 PM > > > To: dev@hbase.apache.org > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > > > So what about a standalone service other than master? You can use your > > own > > > procedure store in that service? > > > > > > 2016-09-2
Re: [DISCUSSION] MR jobs started by Master or RS
If you guys have already implemented the feature in the MR way and the patch is ready for landing on master, I'm a -0 on it as I do not want to block the development progress. But I strongly suggest later we need to revisit the design and see if we can seperated the logic from HMaster as much as possible. HA is not a big problem if you do not store any metada locally. But the ugly code in HMaster is readlly a problem... And for security, I have a issue pending for a long time. Can someone help taking a simple look at it? This is what I mean, ugly code... logout and destroy the credentials in a subject when it is still being used, and declared as LimitPrivacy so I can not change the behivor and the only way to fix it is to write another piece of ugly code... https://issues.apache.org/jira/browse/HADOOP-13433 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov : > >> If in the future, we find better ways of doing this without using MR, we > can certainly consider that > > Our framework for distributed operations is abstract and allows > different implementations. MR is just one implementation we provide. > > -Vlad > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das wrote: > > > Guys, first off apologies for bringing in the topic of MR-based > > compactions.. But I was thinking more about the SpliceMachine approach of > > managing compactions in Spark where apparently they saw a lot of > benefits. > > Apologies for giving you that sore throat Andrew; I really didn't mean to > > :-) > > > > So on this issue, we have these on the plate: > > 0. Somehow not use MR but something like that > > 1. Run a standalone service other than master > > 2. Shell out from the master > > > > I don't think we have a good answer to (0), and I don't think it's even > > worth the effort of trying to build something when MR is already there, > and > > being used by HBase already for some operations. > > > > On (1), we have to deal with a myriad of issues - HA of the server not > > being the least of them all. Security (kerberos authentication, another > > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead > let's > > substitute that (1) with the HBase Master. I haven't seen any good reason > > why the HBase master shouldn't launch MR jobs if needed. It's not ideal; > > agreed. > > > > Now before going to (2), let's see what are the benefits of running the > > backup/restore jobs from the master. I think Ted has summarized some of > the > > issues that we need to take care of - basically, the master can keep > track > > of running jobs, and should it fail, the backup master can continue > keeping > > track of it (since the jobId would have been recorded in the proc WAL). > The > > master can also do cleanup, etc. of failed backup/restore processes. > > Security is another issue - the job needs to run as 'hbase' since it owns > > the data. Having the master launch the job makes it get that privilege. > In > > the (2) approach, it's hard to do some of the above management. > > > > Guys, just to reiterate, the patch as such is ready from the overall > > design/arch point of view (maybe code review is still pending from > Matteo). > > If in the future, we find better ways of doing this without using MR, we > > can certainly consider that. But IMO don't think we should block this > patch > > from getting merged. > > > > > > From: 张铎 > > Sent: Thursday, September 22, 2016 8:32 PM > > To: dev@hbase.apache.org > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > So what about a standalone service other than master? You can use your > own > > procedure store in that service? > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu : > > > > > An earlier implementation was client driven. > > > > > > But with that approach, it is hard to resume if there is error midway. > > > Using Procedure V2 makes the backup / restore more robust. > > > > > > Another consideration is for security. It is hard to enforce security > (to > > > be implemented) for client driven actions. > > > > > > Cheers > > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell < > andrew.purt...@gmail.com> > > > wrote: > > > > > > > > No, this misses Matteo's finer point, which is "shelling out" from > the > > > master directly to run MR is a first. Why not drive this with a utility > > > derived from Tool? >
Re: [DISCUSSION] MR jobs started by Master or RS
2016-09-23 12:38 GMT+08:00 Devaraj Das : > Guys, first off apologies for bringing in the topic of MR-based > compactions.. But I was thinking more about the SpliceMachine approach of > managing compactions in Spark where apparently they saw a lot of benefits. > Apologies for giving you that sore throat Andrew; I really didn't mean to > :-) > > So on this issue, we have these on the plate: > 0. Somehow not use MR but something like that > 1. Run a standalone service other than master > 2. Shell out from the master > > I don't think we have a good answer to (0), and I don't think it's even > worth the effort of trying to build something when MR is already there, and > being used by HBase already for some operations. > > On (1), we have to deal with a myriad of issues - HA of the server not > being the least of them all. Security (kerberos authentication, another > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's > substitute that (1) with the HBase Master. I haven't seen any good reason > why the HBase master shouldn't launch MR jobs if needed. It's not ideal; > agreed. > > We have already put lots of stuffs in HMaster, especially on startup, we even need to run the initializations in a background thread to let the master get up quickly and causes lots of races. I do not think it is a good idea to keep messing up the code there. For MR, as I said above, configuration. I do not want to restart HMaster if I just want to tune the config of a backup MR job. Yes, you could introduce new shell commands to do that, change job config, change YARN cluster, and persist the change to some places, maybe zookeeper? Oh no... > Now before going to (2), let's see what are the benefits of running the > backup/restore jobs from the master. I think Ted has summarized some of the > issues that we need to take care of - basically, the master can keep track > of running jobs, and should it fail, the backup master can continue keeping > track of it (since the jobId would have been recorded in the proc WAL). The > master can also do cleanup, etc. of failed backup/restore processes. > Security is another issue - the job needs to run as 'hbase' since it owns > the data. Having the master launch the job makes it get that privilege. In > the (2) approach, it's hard to do some of the above management. > > Guys, just to reiterate, the patch as such is ready from the overall > design/arch point of view (maybe code review is still pending from Matteo). > If in the future, we find better ways of doing this without using MR, we > can certainly consider that. But IMO don't think we should block this patch > from getting merged. > > > From: 张铎 > Sent: Thursday, September 22, 2016 8:32 PM > To: dev@hbase.apache.org > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > So what about a standalone service other than master? You can use your own > procedure store in that service? > > 2016-09-23 11:28 GMT+08:00 Ted Yu : > > > An earlier implementation was client driven. > > > > But with that approach, it is hard to resume if there is error midway. > > Using Procedure V2 makes the backup / restore more robust. > > > > Another consideration is for security. It is hard to enforce security (to > > be implemented) for client driven actions. > > > > Cheers > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell > > wrote: > > > > > > No, this misses Matteo's finer point, which is "shelling out" from the > > master directly to run MR is a first. Why not drive this with a utility > > derived from Tool? > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov > > > wrote: > > > > > >>>> In our production cluster, it is a common case we just have HDFS > and > > >>>> HBase deployed. > > >>>> If our Master/RS depend on MR framework (especially some features we > > >>>> have not used at all), it introduced another cost for maintain. I > > >>>> don't think it is a good idea. > > >> > > >> So , you are not backup users in this case. Many our customers have > full > > >> stack deployed and > > >> want see backup to be a standard feature. Besides this, nothing will > > happen > > >> in your cluster > > >> if you won't be doing backups. > > >> > > >> This discussion (we do not want see M/R dependency) goes to nowhere. > We > > >> asked already, at least twice, to suggest another framework (other &
Re: [DISCUSSION] MR jobs started by Master or RS
Just wanted to add one argument of doing this in a Master way : Client - based backups/restore are very hard (if possible) to make fully fault tolerant. If client fails abruptly half way, some system data will be broken, cluster will never return into original state. We disable, for example splits/merges, balancer during full backup and restore. Failed client will leave cluster in that state (disabled splits/merges) -Vlad On Thu, Sep 22, 2016 at 9:53 PM, Vladimir Rodionov wrote: > >> If in the future, we find better ways of doing this without using MR, > we can certainly consider that > > Our framework for distributed operations is abstract and allows > different implementations. MR is just one implementation we provide. > > -Vlad > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das wrote: > >> Guys, first off apologies for bringing in the topic of MR-based >> compactions.. But I was thinking more about the SpliceMachine approach of >> managing compactions in Spark where apparently they saw a lot of benefits. >> Apologies for giving you that sore throat Andrew; I really didn't mean to >> :-) >> >> So on this issue, we have these on the plate: >> 0. Somehow not use MR but something like that >> 1. Run a standalone service other than master >> 2. Shell out from the master >> >> I don't think we have a good answer to (0), and I don't think it's even >> worth the effort of trying to build something when MR is already there, and >> being used by HBase already for some operations. >> >> On (1), we have to deal with a myriad of issues - HA of the server not >> being the least of them all. Security (kerberos authentication, another >> keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's >> substitute that (1) with the HBase Master. I haven't seen any good reason >> why the HBase master shouldn't launch MR jobs if needed. It's not ideal; >> agreed. >> >> Now before going to (2), let's see what are the benefits of running the >> backup/restore jobs from the master. I think Ted has summarized some of the >> issues that we need to take care of - basically, the master can keep track >> of running jobs, and should it fail, the backup master can continue keeping >> track of it (since the jobId would have been recorded in the proc WAL). The >> master can also do cleanup, etc. of failed backup/restore processes. >> Security is another issue - the job needs to run as 'hbase' since it owns >> the data. Having the master launch the job makes it get that privilege. In >> the (2) approach, it's hard to do some of the above management. >> >> Guys, just to reiterate, the patch as such is ready from the overall >> design/arch point of view (maybe code review is still pending from Matteo). >> If in the future, we find better ways of doing this without using MR, we >> can certainly consider that. But IMO don't think we should block this patch >> from getting merged. >> >> >> From: 张铎 >> Sent: Thursday, September 22, 2016 8:32 PM >> To: dev@hbase.apache.org >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >> >> So what about a standalone service other than master? You can use your own >> procedure store in that service? >> >> 2016-09-23 11:28 GMT+08:00 Ted Yu : >> >> > An earlier implementation was client driven. >> > >> > But with that approach, it is hard to resume if there is error midway. >> > Using Procedure V2 makes the backup / restore more robust. >> > >> > Another consideration is for security. It is hard to enforce security >> (to >> > be implemented) for client driven actions. >> > >> > Cheers >> > >> > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell > > >> > wrote: >> > > >> > > No, this misses Matteo's finer point, which is "shelling out" from the >> > master directly to run MR is a first. Why not drive this with a utility >> > derived from Tool? >> > > >> > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov < >> vladrodio...@gmail.com> >> > wrote: >> > > >> > >>>> In our production cluster, it is a common case we just have HDFS >> and >> > >>>> HBase deployed. >> > >>>> If our Master/RS depend on MR framework (especially some features >> we >> > >>>> have not used at all), it introduced another cost for maintain. I >> &g
Re: [DISCUSSION] MR jobs started by Master or RS
All the better, Vlad! On Thu, Sep 22, 2016 at 9:53 PM -0700, "Vladimir Rodionov" mailto:vladrodio...@gmail.com>> wrote: >> If in the future, we find better ways of doing this without using MR, we can certainly consider that Our framework for distributed operations is abstract and allows different implementations. MR is just one implementation we provide. -Vlad On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das wrote: > Guys, first off apologies for bringing in the topic of MR-based > compactions.. But I was thinking more about the SpliceMachine approach of > managing compactions in Spark where apparently they saw a lot of benefits. > Apologies for giving you that sore throat Andrew; I really didn't mean to > :-) > > So on this issue, we have these on the plate: > 0. Somehow not use MR but something like that > 1. Run a standalone service other than master > 2. Shell out from the master > > I don't think we have a good answer to (0), and I don't think it's even > worth the effort of trying to build something when MR is already there, and > being used by HBase already for some operations. > > On (1), we have to deal with a myriad of issues - HA of the server not > being the least of them all. Security (kerberos authentication, another > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's > substitute that (1) with the HBase Master. I haven't seen any good reason > why the HBase master shouldn't launch MR jobs if needed. It's not ideal; > agreed. > > Now before going to (2), let's see what are the benefits of running the > backup/restore jobs from the master. I think Ted has summarized some of the > issues that we need to take care of - basically, the master can keep track > of running jobs, and should it fail, the backup master can continue keeping > track of it (since the jobId would have been recorded in the proc WAL). The > master can also do cleanup, etc. of failed backup/restore processes. > Security is another issue - the job needs to run as 'hbase' since it owns > the data. Having the master launch the job makes it get that privilege. In > the (2) approach, it's hard to do some of the above management. > > Guys, just to reiterate, the patch as such is ready from the overall > design/arch point of view (maybe code review is still pending from Matteo). > If in the future, we find better ways of doing this without using MR, we > can certainly consider that. But IMO don't think we should block this patch > from getting merged. > > > From: 张铎 > Sent: Thursday, September 22, 2016 8:32 PM > To: dev@hbase.apache.org > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > So what about a standalone service other than master? You can use your own > procedure store in that service? > > 2016-09-23 11:28 GMT+08:00 Ted Yu : > > > An earlier implementation was client driven. > > > > But with that approach, it is hard to resume if there is error midway. > > Using Procedure V2 makes the backup / restore more robust. > > > > Another consideration is for security. It is hard to enforce security (to > > be implemented) for client driven actions. > > > > Cheers > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell > > wrote: > > > > > > No, this misses Matteo's finer point, which is "shelling out" from the > > master directly to run MR is a first. Why not drive this with a utility > > derived from Tool? > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov > > > wrote: > > > > > >>>> In our production cluster, it is a common case we just have HDFS > and > > >>>> HBase deployed. > > >>>> If our Master/RS depend on MR framework (especially some features we > > >>>> have not used at all), it introduced another cost for maintain. I > > >>>> don't think it is a good idea. > > >> > > >> So , you are not backup users in this case. Many our customers have > full > > >> stack deployed and > > >> want see backup to be a standard feature. Besides this, nothing will > > happen > > >> in your cluster > > >> if you won't be doing backups. > > >> > > >> This discussion (we do not want see M/R dependency) goes to nowhere. > We > > >> asked already, at least twice, to suggest another framework (other > than > > M/R) > > >> for bulk data copy with *conversion*. Still waiting for suggestions. > > >> > > >>
Re: [DISCUSSION] MR jobs started by Master or RS
>> If in the future, we find better ways of doing this without using MR, we can certainly consider that Our framework for distributed operations is abstract and allows different implementations. MR is just one implementation we provide. -Vlad On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das wrote: > Guys, first off apologies for bringing in the topic of MR-based > compactions.. But I was thinking more about the SpliceMachine approach of > managing compactions in Spark where apparently they saw a lot of benefits. > Apologies for giving you that sore throat Andrew; I really didn't mean to > :-) > > So on this issue, we have these on the plate: > 0. Somehow not use MR but something like that > 1. Run a standalone service other than master > 2. Shell out from the master > > I don't think we have a good answer to (0), and I don't think it's even > worth the effort of trying to build something when MR is already there, and > being used by HBase already for some operations. > > On (1), we have to deal with a myriad of issues - HA of the server not > being the least of them all. Security (kerberos authentication, another > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's > substitute that (1) with the HBase Master. I haven't seen any good reason > why the HBase master shouldn't launch MR jobs if needed. It's not ideal; > agreed. > > Now before going to (2), let's see what are the benefits of running the > backup/restore jobs from the master. I think Ted has summarized some of the > issues that we need to take care of - basically, the master can keep track > of running jobs, and should it fail, the backup master can continue keeping > track of it (since the jobId would have been recorded in the proc WAL). The > master can also do cleanup, etc. of failed backup/restore processes. > Security is another issue - the job needs to run as 'hbase' since it owns > the data. Having the master launch the job makes it get that privilege. In > the (2) approach, it's hard to do some of the above management. > > Guys, just to reiterate, the patch as such is ready from the overall > design/arch point of view (maybe code review is still pending from Matteo). > If in the future, we find better ways of doing this without using MR, we > can certainly consider that. But IMO don't think we should block this patch > from getting merged. > > > From: 张铎 > Sent: Thursday, September 22, 2016 8:32 PM > To: dev@hbase.apache.org > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > So what about a standalone service other than master? You can use your own > procedure store in that service? > > 2016-09-23 11:28 GMT+08:00 Ted Yu : > > > An earlier implementation was client driven. > > > > But with that approach, it is hard to resume if there is error midway. > > Using Procedure V2 makes the backup / restore more robust. > > > > Another consideration is for security. It is hard to enforce security (to > > be implemented) for client driven actions. > > > > Cheers > > > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell > > wrote: > > > > > > No, this misses Matteo's finer point, which is "shelling out" from the > > master directly to run MR is a first. Why not drive this with a utility > > derived from Tool? > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov > > > wrote: > > > > > >>>> In our production cluster, it is a common case we just have HDFS > and > > >>>> HBase deployed. > > >>>> If our Master/RS depend on MR framework (especially some features we > > >>>> have not used at all), it introduced another cost for maintain. I > > >>>> don't think it is a good idea. > > >> > > >> So , you are not backup users in this case. Many our customers have > full > > >> stack deployed and > > >> want see backup to be a standard feature. Besides this, nothing will > > happen > > >> in your cluster > > >> if you won't be doing backups. > > >> > > >> This discussion (we do not want see M/R dependency) goes to nowhere. > We > > >> asked already, at least twice, to suggest another framework (other > than > > M/R) > > >> for bulk data copy with *conversion*. Still waiting for suggestions. > > >> > > >> -Vlad > > >> > > >> > > >> > > >> > > >>> On Thu, Sep 22, 2016 at 7:49 PM, Te
Re: [DISCUSSION] MR jobs started by Master or RS
Guys, first off apologies for bringing in the topic of MR-based compactions.. But I was thinking more about the SpliceMachine approach of managing compactions in Spark where apparently they saw a lot of benefits. Apologies for giving you that sore throat Andrew; I really didn't mean to :-) So on this issue, we have these on the plate: 0. Somehow not use MR but something like that 1. Run a standalone service other than master 2. Shell out from the master I don't think we have a good answer to (0), and I don't think it's even worth the effort of trying to build something when MR is already there, and being used by HBase already for some operations. On (1), we have to deal with a myriad of issues - HA of the server not being the least of them all. Security (kerberos authentication, another keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead let's substitute that (1) with the HBase Master. I haven't seen any good reason why the HBase master shouldn't launch MR jobs if needed. It's not ideal; agreed. Now before going to (2), let's see what are the benefits of running the backup/restore jobs from the master. I think Ted has summarized some of the issues that we need to take care of - basically, the master can keep track of running jobs, and should it fail, the backup master can continue keeping track of it (since the jobId would have been recorded in the proc WAL). The master can also do cleanup, etc. of failed backup/restore processes. Security is another issue - the job needs to run as 'hbase' since it owns the data. Having the master launch the job makes it get that privilege. In the (2) approach, it's hard to do some of the above management. Guys, just to reiterate, the patch as such is ready from the overall design/arch point of view (maybe code review is still pending from Matteo). If in the future, we find better ways of doing this without using MR, we can certainly consider that. But IMO don't think we should block this patch from getting merged. From: 张铎 Sent: Thursday, September 22, 2016 8:32 PM To: dev@hbase.apache.org Subject: Re: [DISCUSSION] MR jobs started by Master or RS So what about a standalone service other than master? You can use your own procedure store in that service? 2016-09-23 11:28 GMT+08:00 Ted Yu : > An earlier implementation was client driven. > > But with that approach, it is hard to resume if there is error midway. > Using Procedure V2 makes the backup / restore more robust. > > Another consideration is for security. It is hard to enforce security (to > be implemented) for client driven actions. > > Cheers > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell > wrote: > > > > No, this misses Matteo's finer point, which is "shelling out" from the > master directly to run MR is a first. Why not drive this with a utility > derived from Tool? > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov > wrote: > > > >>>> In our production cluster, it is a common case we just have HDFS and > >>>> HBase deployed. > >>>> If our Master/RS depend on MR framework (especially some features we > >>>> have not used at all), it introduced another cost for maintain. I > >>>> don't think it is a good idea. > >> > >> So , you are not backup users in this case. Many our customers have full > >> stack deployed and > >> want see backup to be a standard feature. Besides this, nothing will > happen > >> in your cluster > >> if you won't be doing backups. > >> > >> This discussion (we do not want see M/R dependency) goes to nowhere. We > >> asked already, at least twice, to suggest another framework (other than > M/R) > >> for bulk data copy with *conversion*. Still waiting for suggestions. > >> > >> -Vlad > >> > >> > >> > >> > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu wrote: > >>> > >>> If MR framework is not deployed in the cluster, hbase still functions > >>> normally (post merge). > >>> > >>> In terms of build time dependency, we have long been depending on > >>> mapreduce. Take a look at ExportSnapshot. > >>> > >>> Cheers > >>> > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen > >>> wrote: > >>> > >>>> In our production cluster, it is a common case we just have HDFS and > >>>> HBase deployed. > >>>> If our Master/RS depend on MR framework (especially some features we > >>>> have not used at all), it introduced
Re: [DISCUSSION] MR jobs started by Master or RS
if > we > > > >>> think > > > >>>>> this is not a core feature of HBase, then we could make it depend > > on > > > >>> MR, > > > >>>>> and start a standalone BackupManager instance that submits MR > jobs > > to > > > >>> do > > > >>>>> periodical maintenance job. And if we think this is a core > feature > > > that > > > >>>>> everyone should use it, then we'd better implement it without MR > > > >>>>> dependency, like DLS. > > > >>>>> > > > >>>>> Thanks. > > > >>>>> > > > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 : > > > >>>>> > > > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of > > our > > > >>>>>> features depend on MR but I think the bottom line is that we > > should > > > >>>> launch > > > >>>>>> the jobs from outside manually or by other services. > > > >>>>>> > > > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell < > > andrew.purt...@gmail.com > > > >: > > > >>>>>> > > > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a > fair > > > >>>>>>> question. > > > >>>>>>> > > > >>>>>>> Can this be driven by a utility derived from Tool like our > other > > MR > > > >>>> apps? > > > >>>>>>> The issue is needing the AccessController to decide if allowed? > > But > > > >>>> nothing > > > >>>>>>> prevents the user from running the job manually/independently, > > > right? > > > >>>>>>> > > > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < > > > >>>> theo.berto...@gmail.com> > > > >>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>> just a remark. my query was not about tools using MR > (everyone i > > > >>>> think > > > >>>>>>> is > > > >>>>>>>> ok with those). > > > >>>>>>>> the topic was about: "are we ok with running MR jobs from > Master > > > >>> and > > > >>>> RSs > > > >>>>>>>> code?" since this will be the first time we do this > > > >>>>>>>> > > > >>>>>>>> Matteo > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < > > > >>> d...@hortonworks.com> > > > >>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / > > > Restore, > > > >>>> it's > > > >>>>>>>>> fine to be dependent on MR. MR is the right framework for > such. > > > We > > > >>>>>>> should > > > >>>>>>>>> also do compactions using MR (just saying :) ) > > > >>>>>>>>> > > > >>>>>>>>> From: Ted Yu > > > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM > > > >>>>>>>>> To: dev@hbase.apache.org > > > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > >>>>>>>>> > > > >>>>>>>>> I agree - backup / restore is in the same category as import > / > > > >>>> export. > > > >>>>>>>>> > > > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > > > >>>>>>> andrew.purt...@gmail.com> > > > >>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> Backup is extra tooling around core in my opinion. Like > import > > > or > > > >>>>>>> export. > > > >>>>>>>>>> Or the optional MOB tool. It's fine. > > > >>>>>>>>>> > > > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < > > > >>>> mberto...@apache.org> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase > > > >>>> (Master > > > >>>>>>> or > > > >>>>>>>>>> RS)? > > > >>>>>>>>>>> > > > >>>>>>>>>>> I remember in the past that there was discussion about not > > > >>> having > > > >>>> MR > > > >>>>>>>>> has > > > >>>>>>>>>>> direct dependency of hbase. > > > >>>>>>>>>>> > > > >>>>>>>>>>> I think some of discussion where around MOB that had a MR > job > > > to > > > >>>>>>>>> compact, > > > >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I > > > think > > > >>>> we > > > >>>>>>>>> had a > > > >>>>>>>>>>> similar discussion for log split/replay. > > > >>>>>>>>>>> > > > >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that > runs > > a > > > >>> MR > > > >>>> job > > > >>>>>>>>>> from > > > >>>>>>>>>>> the master to copy data or restore data. > > > >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use > > > >>> backup > > > >>>>>>>>> you'll > > > >>>>>>>>>>> not end up running MR jobs, but this was probably true for > > MOB > > > >>> as > > > >>>> in > > > >>>>>>>>> "if > > > >>>>>>>>>>> you don't enable MOB you don't need MR") > > > >>>>>>>>>>> > > > >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have > > > >>> hbase > > > >>>> run > > > >>>>>>>>> MR > > > >>>>>>>>>>> jobs, only tool started manually by the user can do that". > or > > > >>> can > > > >>>> we > > > >>>>>>>>>> start > > > >>>>>>>>>>> adding MR calls around without problems? > > > >>> > > > > > >
Re: [DISCUSSION] MR jobs started by Master or RS
;>>>>> > > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair > > >>>>>>> question. > > >>>>>>> > > >>>>>>> Can this be driven by a utility derived from Tool like our other > MR > > >>>> apps? > > >>>>>>> The issue is needing the AccessController to decide if allowed? > But > > >>>> nothing > > >>>>>>> prevents the user from running the job manually/independently, > > right? > > >>>>>>> > > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < > > >>>> theo.berto...@gmail.com> > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>> just a remark. my query was not about tools using MR (everyone i > > >>>> think > > >>>>>>> is > > >>>>>>>> ok with those). > > >>>>>>>> the topic was about: "are we ok with running MR jobs from Master > > >>> and > > >>>> RSs > > >>>>>>>> code?" since this will be the first time we do this > > >>>>>>>> > > >>>>>>>> Matteo > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < > > >>> d...@hortonworks.com> > > >>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / > > Restore, > > >>>> it's > > >>>>>>>>> fine to be dependent on MR. MR is the right framework for such. > > We > > >>>>>>> should > > >>>>>>>>> also do compactions using MR (just saying :) ) > > >>>>>>>>> > > >>>>>>>>> From: Ted Yu > > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM > > >>>>>>>>> To: dev@hbase.apache.org > > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > >>>>>>>>> > > >>>>>>>>> I agree - backup / restore is in the same category as import / > > >>>> export. > > >>>>>>>>> > > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > > >>>>>>> andrew.purt...@gmail.com> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Backup is extra tooling around core in my opinion. Like import > > or > > >>>>>>> export. > > >>>>>>>>>> Or the optional MOB tool. It's fine. > > >>>>>>>>>> > > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < > > >>>> mberto...@apache.org> > > >>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> What's the latest opinion around running MR jobs from hbase > > >>>> (Master > > >>>>>>> or > > >>>>>>>>>> RS)? > > >>>>>>>>>>> > > >>>>>>>>>>> I remember in the past that there was discussion about not > > >>> having > > >>>> MR > > >>>>>>>>> has > > >>>>>>>>>>> direct dependency of hbase. > > >>>>>>>>>>> > > >>>>>>>>>>> I think some of discussion where around MOB that had a MR job > > to > > >>>>>>>>> compact, > > >>>>>>>>>>> that later was transformed in a non-MR job to be merged, I > > think > > >>>> we > > >>>>>>>>> had a > > >>>>>>>>>>> similar discussion for log split/replay. > > >>>>>>>>>>> > > >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912), that runs > a > > >>> MR > > >>>> job > > >>>>>>>>>> from > > >>>>>>>>>>> the master to copy data or restore data. > > >>>>>>>>>>> (backup is also "not really core" as in.. if you don't use > > >>> backup > > >>>>>>>>> you'll > > >>>>>>>>>>> not end up running MR jobs, but this was probably true for > MOB > > >>> as > > >>>> in > > >>>>>>>>> "if > > >>>>>>>>>>> you don't enable MOB you don't need MR") > > >>>>>>>>>>> > > >>>>>>>>>>> any thoughts? do we a rule that says "we don't want to have > > >>> hbase > > >>>> run > > >>>>>>>>> MR > > >>>>>>>>>>> jobs, only tool started manually by the user can do that". or > > >>> can > > >>>> we > > >>>>>>>>>> start > > >>>>>>>>>>> adding MR calls around without problems? > > >>> > > >
Re: [DISCUSSION] MR jobs started by Master or RS
So what about a standalone service other than master? You can use your own procedure store in that service? 2016-09-23 11:28 GMT+08:00 Ted Yu : > An earlier implementation was client driven. > > But with that approach, it is hard to resume if there is error midway. > Using Procedure V2 makes the backup / restore more robust. > > Another consideration is for security. It is hard to enforce security (to > be implemented) for client driven actions. > > Cheers > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell > wrote: > > > > No, this misses Matteo's finer point, which is "shelling out" from the > master directly to run MR is a first. Why not drive this with a utility > derived from Tool? > > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov > wrote: > > > >>>> In our production cluster, it is a common case we just have HDFS and > >>>> HBase deployed. > >>>> If our Master/RS depend on MR framework (especially some features we > >>>> have not used at all), it introduced another cost for maintain. I > >>>> don't think it is a good idea. > >> > >> So , you are not backup users in this case. Many our customers have full > >> stack deployed and > >> want see backup to be a standard feature. Besides this, nothing will > happen > >> in your cluster > >> if you won't be doing backups. > >> > >> This discussion (we do not want see M/R dependency) goes to nowhere. We > >> asked already, at least twice, to suggest another framework (other than > M/R) > >> for bulk data copy with *conversion*. Still waiting for suggestions. > >> > >> -Vlad > >> > >> > >> > >> > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu wrote: > >>> > >>> If MR framework is not deployed in the cluster, hbase still functions > >>> normally (post merge). > >>> > >>> In terms of build time dependency, we have long been depending on > >>> mapreduce. Take a look at ExportSnapshot. > >>> > >>> Cheers > >>> > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen > >>> wrote: > >>> > >>>> In our production cluster, it is a common case we just have HDFS and > >>>> HBase deployed. > >>>> If our Master/RS depend on MR framework (especially some features we > >>>> have not used at all), it introduced another cost for maintain. I > >>>> don't think it is a good idea. > >>>> > >>>> 2016-09-23 10:28 GMT+08:00 张铎 : > >>>>> To be specific, for example, our nice Backup/Restore feature, if we > >>> think > >>>>> this is not a core feature of HBase, then we could make it depend on > >>> MR, > >>>>> and start a standalone BackupManager instance that submits MR jobs to > >>> do > >>>>> periodical maintenance job. And if we think this is a core feature > that > >>>>> everyone should use it, then we'd better implement it without MR > >>>>> dependency, like DLS. > >>>>> > >>>>> Thanks. > >>>>> > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 : > >>>>> > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our > >>>>>> features depend on MR but I think the bottom line is that we should > >>>> launch > >>>>>> the jobs from outside manually or by other services. > >>>>>> > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell >: > >>>>>> > >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair > >>>>>>> question. > >>>>>>> > >>>>>>> Can this be driven by a utility derived from Tool like our other MR > >>>> apps? > >>>>>>> The issue is needing the AccessController to decide if allowed? But > >>>> nothing > >>>>>>> prevents the user from running the job manually/independently, > right? > >>>>>>> > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < > >>>> theo.berto...@gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> just a remark. my query was not about to
Re: [DISCUSSION] MR jobs started by Master or RS
An earlier implementation was client driven. But with that approach, it is hard to resume if there is error midway. Using Procedure V2 makes the backup / restore more robust. Another consideration is for security. It is hard to enforce security (to be implemented) for client driven actions. Cheers > On Sep 22, 2016, at 8:15 PM, Andrew Purtell wrote: > > No, this misses Matteo's finer point, which is "shelling out" from the master > directly to run MR is a first. Why not drive this with a utility derived from > Tool? > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov wrote: > >>>> In our production cluster, it is a common case we just have HDFS and >>>> HBase deployed. >>>> If our Master/RS depend on MR framework (especially some features we >>>> have not used at all), it introduced another cost for maintain. I >>>> don't think it is a good idea. >> >> So , you are not backup users in this case. Many our customers have full >> stack deployed and >> want see backup to be a standard feature. Besides this, nothing will happen >> in your cluster >> if you won't be doing backups. >> >> This discussion (we do not want see M/R dependency) goes to nowhere. We >> asked already, at least twice, to suggest another framework (other than M/R) >> for bulk data copy with *conversion*. Still waiting for suggestions. >> >> -Vlad >> >> >> >> >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu wrote: >>> >>> If MR framework is not deployed in the cluster, hbase still functions >>> normally (post merge). >>> >>> In terms of build time dependency, we have long been depending on >>> mapreduce. Take a look at ExportSnapshot. >>> >>> Cheers >>> >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen >>> wrote: >>> >>>> In our production cluster, it is a common case we just have HDFS and >>>> HBase deployed. >>>> If our Master/RS depend on MR framework (especially some features we >>>> have not used at all), it introduced another cost for maintain. I >>>> don't think it is a good idea. >>>> >>>> 2016-09-23 10:28 GMT+08:00 张铎 : >>>>> To be specific, for example, our nice Backup/Restore feature, if we >>> think >>>>> this is not a core feature of HBase, then we could make it depend on >>> MR, >>>>> and start a standalone BackupManager instance that submits MR jobs to >>> do >>>>> periodical maintenance job. And if we think this is a core feature that >>>>> everyone should use it, then we'd better implement it without MR >>>>> dependency, like DLS. >>>>> >>>>> Thanks. >>>>> >>>>> 2016-09-23 10:11 GMT+08:00 张铎 : >>>>> >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >>>>>> features depend on MR but I think the bottom line is that we should >>>> launch >>>>>> the jobs from outside manually or by other services. >>>>>> >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell : >>>>>> >>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair >>>>>>> question. >>>>>>> >>>>>>> Can this be driven by a utility derived from Tool like our other MR >>>> apps? >>>>>>> The issue is needing the AccessController to decide if allowed? But >>>> nothing >>>>>>> prevents the user from running the job manually/independently, right? >>>>>>> >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >>>> theo.berto...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> just a remark. my query was not about tools using MR (everyone i >>>> think >>>>>>> is >>>>>>>> ok with those). >>>>>>>> the topic was about: "are we ok with running MR jobs from Master >>> and >>>> RSs >>>>>>>> code?" since this will be the first time we do this >>>>>>>> >>>>>>>> Matteo >>>>>>>> >>>>>>>> >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < >
Re: [DISCUSSION] MR jobs started by Master or RS
Agreed, this would be interesting to contemplate. On Sep 22, 2016, at 8:03 PM, Vladimir Rodionov wrote: >>> No, never. > > No need for M/R here, just a simple compaction-server colocated with RS on > a same node. > You save a lot on GC in RS. Ideally, it can be IO "nice" in Linux (by > setting IO priority). But offtopic, of course :) > > -Vlad > > On Thu, Sep 22, 2016 at 7:57 PM, Vladimir Rodionov > wrote: > >>>> And if MR not deployed, Backup/Restore feature could not be used, >> right? >> >> Yes. >> >> On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen >> wrote: >> >>> {quote} >>> If MR framework is not deployed in the cluster, hbase still functions >>> normally (post merge). >>> {quote} >>> >>> If MR is not strong dependency for Master/RS, it is OK for me. >>> And if MR not deployed, Backup/Restore feature could not be used, right? >>> >>> 2016-09-23 10:49 GMT+08:00 Ted Yu : >>>> If MR framework is not deployed in the cluster, hbase still functions >>>> normally (post merge). >>>> >>>> In terms of build time dependency, we have long been depending on >>>> mapreduce. Take a look at ExportSnapshot. >>>> >>>> Cheers >>>> >>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen >>> wrote: >>>> >>>>> In our production cluster, it is a common case we just have HDFS and >>>>> HBase deployed. >>>>> If our Master/RS depend on MR framework (especially some features we >>>>> have not used at all), it introduced another cost for maintain. I >>>>> don't think it is a good idea. >>>>> >>>>> 2016-09-23 10:28 GMT+08:00 张铎 : >>>>>> To be specific, for example, our nice Backup/Restore feature, if we >>> think >>>>>> this is not a core feature of HBase, then we could make it depend on >>> MR, >>>>>> and start a standalone BackupManager instance that submits MR jobs >>> to do >>>>>> periodical maintenance job. And if we think this is a core feature >>> that >>>>>> everyone should use it, then we'd better implement it without MR >>>>>> dependency, like DLS. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> 2016-09-23 10:11 GMT+08:00 张铎 : >>>>>> >>>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >>>>>>> features depend on MR but I think the bottom line is that we should >>>>> launch >>>>>>> the jobs from outside manually or by other services. >>>>>>> >>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell >>> : >>>>>>> >>>>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair >>>>>>>> question. >>>>>>>> >>>>>>>> Can this be driven by a utility derived from Tool like our other MR >>>>> apps? >>>>>>>> The issue is needing the AccessController to decide if allowed? But >>>>> nothing >>>>>>>> prevents the user from running the job manually/independently, >>> right? >>>>>>>> >>>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >>>>> theo.berto...@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> just a remark. my query was not about tools using MR (everyone i >>>>> think >>>>>>>> is >>>>>>>>> ok with those). >>>>>>>>> the topic was about: "are we ok with running MR jobs from Master >>> and >>>>> RSs >>>>>>>>> code?" since this will be the first time we do this >>>>>>>>> >>>>>>>>> Matteo >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < >>> d...@hortonworks.com> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / >>> Restore, >>>>> it's >>>>>>>&g
Re: [DISCUSSION] MR jobs started by Master or RS
No, this misses Matteo's finer point, which is "shelling out" from the master directly to run MR is a first. Why not drive this with a utility derived from Tool? On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov wrote: >>> In our production cluster, it is a common case we just have HDFS and >>> HBase deployed. >>> If our Master/RS depend on MR framework (especially some features we >>> have not used at all), it introduced another cost for maintain. I >>> don't think it is a good idea. > > So , you are not backup users in this case. Many our customers have full > stack deployed and > want see backup to be a standard feature. Besides this, nothing will happen > in your cluster > if you won't be doing backups. > > This discussion (we do not want see M/R dependency) goes to nowhere. We > asked already, at least twice, to suggest another framework (other than M/R) > for bulk data copy with *conversion*. Still waiting for suggestions. > > -Vlad > > > > >> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu wrote: >> >> If MR framework is not deployed in the cluster, hbase still functions >> normally (post merge). >> >> In terms of build time dependency, we have long been depending on >> mapreduce. Take a look at ExportSnapshot. >> >> Cheers >> >> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen >> wrote: >> >>> In our production cluster, it is a common case we just have HDFS and >>> HBase deployed. >>> If our Master/RS depend on MR framework (especially some features we >>> have not used at all), it introduced another cost for maintain. I >>> don't think it is a good idea. >>> >>> 2016-09-23 10:28 GMT+08:00 张铎 : >>>> To be specific, for example, our nice Backup/Restore feature, if we >> think >>>> this is not a core feature of HBase, then we could make it depend on >> MR, >>>> and start a standalone BackupManager instance that submits MR jobs to >> do >>>> periodical maintenance job. And if we think this is a core feature that >>>> everyone should use it, then we'd better implement it without MR >>>> dependency, like DLS. >>>> >>>> Thanks. >>>> >>>> 2016-09-23 10:11 GMT+08:00 张铎 : >>>> >>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >>>>> features depend on MR but I think the bottom line is that we should >>> launch >>>>> the jobs from outside manually or by other services. >>>>> >>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell : >>>>> >>>>>> Ok, got it. Well "shelling out" is on the line I think, so a fair >>>>>> question. >>>>>> >>>>>> Can this be driven by a utility derived from Tool like our other MR >>> apps? >>>>>> The issue is needing the AccessController to decide if allowed? But >>> nothing >>>>>> prevents the user from running the job manually/independently, right? >>>>>> >>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >>> theo.berto...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> just a remark. my query was not about tools using MR (everyone i >>> think >>>>>> is >>>>>>> ok with those). >>>>>>> the topic was about: "are we ok with running MR jobs from Master >> and >>> RSs >>>>>>> code?" since this will be the first time we do this >>>>>>> >>>>>>> Matteo >>>>>>> >>>>>>> >>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < >> d...@hortonworks.com> >>>>>> wrote: >>>>>>>> >>>>>>>> Very much agree; for tools like ExportSnapshot / Backup / Restore, >>> it's >>>>>>>> fine to be dependent on MR. MR is the right framework for such. We >>>>>> should >>>>>>>> also do compactions using MR (just saying :) ) >>>>>>>> >>>>>>>> From: Ted Yu >>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM >>>>>>>> To: dev@hbase.apache.org >>>>>>>> Subject: Re: [DISCUSSION] MR jobs
Re: [DISCUSSION] MR jobs started by Master or RS
>> If MR is not strong dependency for Master/RS, it is OK for me. There is no strong MR dependency for Master/RS. They will function as usual, until you try backup, it will fail but Master won't. -Vlad On Thu, Sep 22, 2016 at 8:03 PM, Vladimir Rodionov wrote: > >> No, never. > > No need for M/R here, just a simple compaction-server colocated with RS on > a same node. > You save a lot on GC in RS. Ideally, it can be IO "nice" in Linux (by > setting IO priority). But offtopic, of course :) > > -Vlad > > On Thu, Sep 22, 2016 at 7:57 PM, Vladimir Rodionov > wrote: > >> >> And if MR not deployed, Backup/Restore feature could not be used, >> right? >> >> Yes. >> >> On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen >> wrote: >> >>> {quote} >>> If MR framework is not deployed in the cluster, hbase still functions >>> normally (post merge). >>> {quote} >>> >>> If MR is not strong dependency for Master/RS, it is OK for me. >>> And if MR not deployed, Backup/Restore feature could not be used, right? >>> >>> 2016-09-23 10:49 GMT+08:00 Ted Yu : >>> > If MR framework is not deployed in the cluster, hbase still functions >>> > normally (post merge). >>> > >>> > In terms of build time dependency, we have long been depending on >>> > mapreduce. Take a look at ExportSnapshot. >>> > >>> > Cheers >>> > >>> > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen >>> wrote: >>> > >>> >> In our production cluster, it is a common case we just have HDFS and >>> >> HBase deployed. >>> >> If our Master/RS depend on MR framework (especially some features we >>> >> have not used at all), it introduced another cost for maintain. I >>> >> don't think it is a good idea. >>> >> >>> >> 2016-09-23 10:28 GMT+08:00 张铎 : >>> >> > To be specific, for example, our nice Backup/Restore feature, if we >>> think >>> >> > this is not a core feature of HBase, then we could make it depend >>> on MR, >>> >> > and start a standalone BackupManager instance that submits MR jobs >>> to do >>> >> > periodical maintenance job. And if we think this is a core feature >>> that >>> >> > everyone should use it, then we'd better implement it without MR >>> >> > dependency, like DLS. >>> >> > >>> >> > Thanks. >>> >> > >>> >> > 2016-09-23 10:11 GMT+08:00 张铎 : >>> >> > >>> >> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of >>> our >>> >> >> features depend on MR but I think the bottom line is that we should >>> >> launch >>> >> >> the jobs from outside manually or by other services. >>> >> >> >>> >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell >> >: >>> >> >> >>> >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair >>> >> >>> question. >>> >> >>> >>> >> >>> Can this be driven by a utility derived from Tool like our other >>> MR >>> >> apps? >>> >> >>> The issue is needing the AccessController to decide if allowed? >>> But >>> >> nothing >>> >> >>> prevents the user from running the job manually/independently, >>> right? >>> >> >>> >>> >> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >>> >> theo.berto...@gmail.com> >>> >> >>> wrote: >>> >> >>> > >>> >> >>> > just a remark. my query was not about tools using MR (everyone i >>> >> think >>> >> >>> is >>> >> >>> > ok with those). >>> >> >>> > the topic was about: "are we ok with running MR jobs from >>> Master and >>> >> RSs >>> >> >>> > code?" since this will be the first time we do this >>> >> >>> > >>> >> >>> > Matteo >>> >> >>> > >>> >> >>> > >>> >> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < >>&g
Re: [DISCUSSION] MR jobs started by Master or RS
>> No, never. No need for M/R here, just a simple compaction-server colocated with RS on a same node. You save a lot on GC in RS. Ideally, it can be IO "nice" in Linux (by setting IO priority). But offtopic, of course :) -Vlad On Thu, Sep 22, 2016 at 7:57 PM, Vladimir Rodionov wrote: > >> And if MR not deployed, Backup/Restore feature could not be used, > right? > > Yes. > > On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen > wrote: > >> {quote} >> If MR framework is not deployed in the cluster, hbase still functions >> normally (post merge). >> {quote} >> >> If MR is not strong dependency for Master/RS, it is OK for me. >> And if MR not deployed, Backup/Restore feature could not be used, right? >> >> 2016-09-23 10:49 GMT+08:00 Ted Yu : >> > If MR framework is not deployed in the cluster, hbase still functions >> > normally (post merge). >> > >> > In terms of build time dependency, we have long been depending on >> > mapreduce. Take a look at ExportSnapshot. >> > >> > Cheers >> > >> > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen >> wrote: >> > >> >> In our production cluster, it is a common case we just have HDFS and >> >> HBase deployed. >> >> If our Master/RS depend on MR framework (especially some features we >> >> have not used at all), it introduced another cost for maintain. I >> >> don't think it is a good idea. >> >> >> >> 2016-09-23 10:28 GMT+08:00 张铎 : >> >> > To be specific, for example, our nice Backup/Restore feature, if we >> think >> >> > this is not a core feature of HBase, then we could make it depend on >> MR, >> >> > and start a standalone BackupManager instance that submits MR jobs >> to do >> >> > periodical maintenance job. And if we think this is a core feature >> that >> >> > everyone should use it, then we'd better implement it without MR >> >> > dependency, like DLS. >> >> > >> >> > Thanks. >> >> > >> >> > 2016-09-23 10:11 GMT+08:00 张铎 : >> >> > >> >> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >> >> >> features depend on MR but I think the bottom line is that we should >> >> launch >> >> >> the jobs from outside manually or by other services. >> >> >> >> >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell > >: >> >> >> >> >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair >> >> >>> question. >> >> >>> >> >> >>> Can this be driven by a utility derived from Tool like our other MR >> >> apps? >> >> >>> The issue is needing the AccessController to decide if allowed? But >> >> nothing >> >> >>> prevents the user from running the job manually/independently, >> right? >> >> >>> >> >> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >> >> theo.berto...@gmail.com> >> >> >>> wrote: >> >> >>> > >> >> >>> > just a remark. my query was not about tools using MR (everyone i >> >> think >> >> >>> is >> >> >>> > ok with those). >> >> >>> > the topic was about: "are we ok with running MR jobs from Master >> and >> >> RSs >> >> >>> > code?" since this will be the first time we do this >> >> >>> > >> >> >>> > Matteo >> >> >>> > >> >> >>> > >> >> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < >> d...@hortonworks.com> >> >> >>> wrote: >> >> >>> >> >> >> >>> >> Very much agree; for tools like ExportSnapshot / Backup / >> Restore, >> >> it's >> >> >>> >> fine to be dependent on MR. MR is the right framework for such. >> We >> >> >>> should >> >> >>> >> also do compactions using MR (just saying :) ) >> >> >>> >> >> >> >>> >> From: Ted Yu >> >> >>> >> Sent: Thursday, September 22, 2016 2:00 PM >
Re: [DISCUSSION] MR jobs started by Master or RS
>> And if MR not deployed, Backup/Restore feature could not be used, right? Yes. On Thu, Sep 22, 2016 at 7:53 PM, Heng Chen wrote: > {quote} > If MR framework is not deployed in the cluster, hbase still functions > normally (post merge). > {quote} > > If MR is not strong dependency for Master/RS, it is OK for me. > And if MR not deployed, Backup/Restore feature could not be used, right? > > 2016-09-23 10:49 GMT+08:00 Ted Yu : > > If MR framework is not deployed in the cluster, hbase still functions > > normally (post merge). > > > > In terms of build time dependency, we have long been depending on > > mapreduce. Take a look at ExportSnapshot. > > > > Cheers > > > > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen > wrote: > > > >> In our production cluster, it is a common case we just have HDFS and > >> HBase deployed. > >> If our Master/RS depend on MR framework (especially some features we > >> have not used at all), it introduced another cost for maintain. I > >> don't think it is a good idea. > >> > >> 2016-09-23 10:28 GMT+08:00 张铎 : > >> > To be specific, for example, our nice Backup/Restore feature, if we > think > >> > this is not a core feature of HBase, then we could make it depend on > MR, > >> > and start a standalone BackupManager instance that submits MR jobs to > do > >> > periodical maintenance job. And if we think this is a core feature > that > >> > everyone should use it, then we'd better implement it without MR > >> > dependency, like DLS. > >> > > >> > Thanks. > >> > > >> > 2016-09-23 10:11 GMT+08:00 张铎 : > >> > > >> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our > >> >> features depend on MR but I think the bottom line is that we should > >> launch > >> >> the jobs from outside manually or by other services. > >> >> > >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell : > >> >> > >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair > >> >>> question. > >> >>> > >> >>> Can this be driven by a utility derived from Tool like our other MR > >> apps? > >> >>> The issue is needing the AccessController to decide if allowed? But > >> nothing > >> >>> prevents the user from running the job manually/independently, > right? > >> >>> > >> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < > >> theo.berto...@gmail.com> > >> >>> wrote: > >> >>> > > >> >>> > just a remark. my query was not about tools using MR (everyone i > >> think > >> >>> is > >> >>> > ok with those). > >> >>> > the topic was about: "are we ok with running MR jobs from Master > and > >> RSs > >> >>> > code?" since this will be the first time we do this > >> >>> > > >> >>> > Matteo > >> >>> > > >> >>> > > >> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < > d...@hortonworks.com> > >> >>> wrote: > >> >>> >> > >> >>> >> Very much agree; for tools like ExportSnapshot / Backup / > Restore, > >> it's > >> >>> >> fine to be dependent on MR. MR is the right framework for such. > We > >> >>> should > >> >>> >> also do compactions using MR (just saying :) ) > >> >>> >> > >> >>> >> From: Ted Yu > >> >>> >> Sent: Thursday, September 22, 2016 2:00 PM > >> >>> >> To: dev@hbase.apache.org > >> >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS > >> >>> >> > >> >>> >> I agree - backup / restore is in the same category as import / > >> export. > >> >>> >> > >> >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > >> >>> andrew.purt...@gmail.com> > >> >>> >> wrote: > >> >>> >> > >> >>> >>> Backup is extra tooling around core in my opinion. Like import > or > >> >>>
Re: [DISCUSSION] MR jobs started by Master or RS
>> In our production cluster, it is a common case we just have HDFS and >> HBase deployed. >> If our Master/RS depend on MR framework (especially some features we >> have not used at all), it introduced another cost for maintain. I >> don't think it is a good idea. So , you are not backup users in this case. Many our customers have full stack deployed and want see backup to be a standard feature. Besides this, nothing will happen in your cluster if you won't be doing backups. This discussion (we do not want see M/R dependency) goes to nowhere. We asked already, at least twice, to suggest another framework (other than M/R) for bulk data copy with *conversion*. Still waiting for suggestions. -Vlad On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu wrote: > If MR framework is not deployed in the cluster, hbase still functions > normally (post merge). > > In terms of build time dependency, we have long been depending on > mapreduce. Take a look at ExportSnapshot. > > Cheers > > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen > wrote: > > > In our production cluster, it is a common case we just have HDFS and > > HBase deployed. > > If our Master/RS depend on MR framework (especially some features we > > have not used at all), it introduced another cost for maintain. I > > don't think it is a good idea. > > > > 2016-09-23 10:28 GMT+08:00 张铎 : > > > To be specific, for example, our nice Backup/Restore feature, if we > think > > > this is not a core feature of HBase, then we could make it depend on > MR, > > > and start a standalone BackupManager instance that submits MR jobs to > do > > > periodical maintenance job. And if we think this is a core feature that > > > everyone should use it, then we'd better implement it without MR > > > dependency, like DLS. > > > > > > Thanks. > > > > > > 2016-09-23 10:11 GMT+08:00 张铎 : > > > > > >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our > > >> features depend on MR but I think the bottom line is that we should > > launch > > >> the jobs from outside manually or by other services. > > >> > > >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell : > > >> > > >>> Ok, got it. Well "shelling out" is on the line I think, so a fair > > >>> question. > > >>> > > >>> Can this be driven by a utility derived from Tool like our other MR > > apps? > > >>> The issue is needing the AccessController to decide if allowed? But > > nothing > > >>> prevents the user from running the job manually/independently, right? > > >>> > > >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < > > theo.berto...@gmail.com> > > >>> wrote: > > >>> > > > >>> > just a remark. my query was not about tools using MR (everyone i > > think > > >>> is > > >>> > ok with those). > > >>> > the topic was about: "are we ok with running MR jobs from Master > and > > RSs > > >>> > code?" since this will be the first time we do this > > >>> > > > >>> > Matteo > > >>> > > > >>> > > > >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das < > d...@hortonworks.com> > > >>> wrote: > > >>> >> > > >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, > > it's > > >>> >> fine to be dependent on MR. MR is the right framework for such. We > > >>> should > > >>> >> also do compactions using MR (just saying :) ) > > >>> >> > > >>> >> From: Ted Yu > > >>> >> Sent: Thursday, September 22, 2016 2:00 PM > > >>> >> To: dev@hbase.apache.org > > >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > >>> >> > > >>> >> I agree - backup / restore is in the same category as import / > > export. > > >>> >> > > >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > > >>> andrew.purt...@gmail.com> > > >>> >> wrote: > > >>> >> > > >>> >>> Backup is extra tooling around core in my opinion. Like import or > > >>> export. >
Re: [DISCUSSION] MR jobs started by Master or RS
{quote} If MR framework is not deployed in the cluster, hbase still functions normally (post merge). {quote} If MR is not strong dependency for Master/RS, it is OK for me. And if MR not deployed, Backup/Restore feature could not be used, right? 2016-09-23 10:49 GMT+08:00 Ted Yu : > If MR framework is not deployed in the cluster, hbase still functions > normally (post merge). > > In terms of build time dependency, we have long been depending on > mapreduce. Take a look at ExportSnapshot. > > Cheers > > On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen wrote: > >> In our production cluster, it is a common case we just have HDFS and >> HBase deployed. >> If our Master/RS depend on MR framework (especially some features we >> have not used at all), it introduced another cost for maintain. I >> don't think it is a good idea. >> >> 2016-09-23 10:28 GMT+08:00 张铎 : >> > To be specific, for example, our nice Backup/Restore feature, if we think >> > this is not a core feature of HBase, then we could make it depend on MR, >> > and start a standalone BackupManager instance that submits MR jobs to do >> > periodical maintenance job. And if we think this is a core feature that >> > everyone should use it, then we'd better implement it without MR >> > dependency, like DLS. >> > >> > Thanks. >> > >> > 2016-09-23 10:11 GMT+08:00 张铎 : >> > >> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >> >> features depend on MR but I think the bottom line is that we should >> launch >> >> the jobs from outside manually or by other services. >> >> >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell : >> >> >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair >> >>> question. >> >>> >> >>> Can this be driven by a utility derived from Tool like our other MR >> apps? >> >>> The issue is needing the AccessController to decide if allowed? But >> nothing >> >>> prevents the user from running the job manually/independently, right? >> >>> >> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < >> theo.berto...@gmail.com> >> >>> wrote: >> >>> > >> >>> > just a remark. my query was not about tools using MR (everyone i >> think >> >>> is >> >>> > ok with those). >> >>> > the topic was about: "are we ok with running MR jobs from Master and >> RSs >> >>> > code?" since this will be the first time we do this >> >>> > >> >>> > Matteo >> >>> > >> >>> > >> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das >> >>> wrote: >> >>> >> >> >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, >> it's >> >>> >> fine to be dependent on MR. MR is the right framework for such. We >> >>> should >> >>> >> also do compactions using MR (just saying :) ) >> >>> >> >> >>> >> From: Ted Yu >> >>> >> Sent: Thursday, September 22, 2016 2:00 PM >> >>> >> To: dev@hbase.apache.org >> >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >> >>> >> >> >>> >> I agree - backup / restore is in the same category as import / >> export. >> >>> >> >> >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < >> >>> andrew.purt...@gmail.com> >> >>> >> wrote: >> >>> >> >> >>> >>> Backup is extra tooling around core in my opinion. Like import or >> >>> export. >> >>> >>> Or the optional MOB tool. It's fine. >> >>> >>> >> >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < >> mberto...@apache.org> >> >>> >>> wrote: >> >>> >>>> >> >>> >>>> What's the latest opinion around running MR jobs from hbase >> (Master >> >>> or >> >>> >>> RS)? >> >>> >>>> >> >>> >>>> I remember in the past that there was discussion about not having >> MR >> >>> >> has >> >>> >>>> direct dependency of hbase. >> >>> >>>> >> >>> >>>> I think some of discussion where around MOB that had a MR job to >> >>> >> compact, >> >>> >>>> that later was transformed in a non-MR job to be merged, I think >> we >> >>> >> had a >> >>> >>>> similar discussion for log split/replay. >> >>> >>>> >> >>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR >> job >> >>> >>> from >> >>> >>>> the master to copy data or restore data. >> >>> >>>> (backup is also "not really core" as in.. if you don't use backup >> >>> >> you'll >> >>> >>>> not end up running MR jobs, but this was probably true for MOB as >> in >> >>> >> "if >> >>> >>>> you don't enable MOB you don't need MR") >> >>> >>>> >> >>> >>>> any thoughts? do we a rule that says "we don't want to have hbase >> run >> >>> >> MR >> >>> >>>> jobs, only tool started manually by the user can do that". or can >> we >> >>> >>> start >> >>> >>>> adding MR calls around without problems? >> >>> >>> >> >>> >> >> >>> >> >> >> >> >>
Re: [DISCUSSION] MR jobs started by Master or RS
If MR framework is not deployed in the cluster, hbase still functions normally (post merge). In terms of build time dependency, we have long been depending on mapreduce. Take a look at ExportSnapshot. Cheers On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen wrote: > In our production cluster, it is a common case we just have HDFS and > HBase deployed. > If our Master/RS depend on MR framework (especially some features we > have not used at all), it introduced another cost for maintain. I > don't think it is a good idea. > > 2016-09-23 10:28 GMT+08:00 张铎 : > > To be specific, for example, our nice Backup/Restore feature, if we think > > this is not a core feature of HBase, then we could make it depend on MR, > > and start a standalone BackupManager instance that submits MR jobs to do > > periodical maintenance job. And if we think this is a core feature that > > everyone should use it, then we'd better implement it without MR > > dependency, like DLS. > > > > Thanks. > > > > 2016-09-23 10:11 GMT+08:00 张铎 : > > > >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our > >> features depend on MR but I think the bottom line is that we should > launch > >> the jobs from outside manually or by other services. > >> > >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell : > >> > >>> Ok, got it. Well "shelling out" is on the line I think, so a fair > >>> question. > >>> > >>> Can this be driven by a utility derived from Tool like our other MR > apps? > >>> The issue is needing the AccessController to decide if allowed? But > nothing > >>> prevents the user from running the job manually/independently, right? > >>> > >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi < > theo.berto...@gmail.com> > >>> wrote: > >>> > > >>> > just a remark. my query was not about tools using MR (everyone i > think > >>> is > >>> > ok with those). > >>> > the topic was about: "are we ok with running MR jobs from Master and > RSs > >>> > code?" since this will be the first time we do this > >>> > > >>> > Matteo > >>> > > >>> > > >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das > >>> wrote: > >>> >> > >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, > it's > >>> >> fine to be dependent on MR. MR is the right framework for such. We > >>> should > >>> >> also do compactions using MR (just saying :) ) > >>> >> > >>> >> From: Ted Yu > >>> >> Sent: Thursday, September 22, 2016 2:00 PM > >>> >> To: dev@hbase.apache.org > >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS > >>> >> > >>> >> I agree - backup / restore is in the same category as import / > export. > >>> >> > >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > >>> andrew.purt...@gmail.com> > >>> >> wrote: > >>> >> > >>> >>> Backup is extra tooling around core in my opinion. Like import or > >>> export. > >>> >>> Or the optional MOB tool. It's fine. > >>> >>> > >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < > mberto...@apache.org> > >>> >>> wrote: > >>> >>>> > >>> >>>> What's the latest opinion around running MR jobs from hbase > (Master > >>> or > >>> >>> RS)? > >>> >>>> > >>> >>>> I remember in the past that there was discussion about not having > MR > >>> >> has > >>> >>>> direct dependency of hbase. > >>> >>>> > >>> >>>> I think some of discussion where around MOB that had a MR job to > >>> >> compact, > >>> >>>> that later was transformed in a non-MR job to be merged, I think > we > >>> >> had a > >>> >>>> similar discussion for log split/replay. > >>> >>>> > >>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR > job > >>> >>> from > >>> >>>> the master to copy data or restore data. > >>> >>>> (backup is also "not really core" as in.. if you don't use backup > >>> >> you'll > >>> >>>> not end up running MR jobs, but this was probably true for MOB as > in > >>> >> "if > >>> >>>> you don't enable MOB you don't need MR") > >>> >>>> > >>> >>>> any thoughts? do we a rule that says "we don't want to have hbase > run > >>> >> MR > >>> >>>> jobs, only tool started manually by the user can do that". or can > we > >>> >>> start > >>> >>>> adding MR calls around without problems? > >>> >>> > >>> >> > >>> > >> > >> >
Re: [DISCUSSION] MR jobs started by Master or RS
In our production cluster, it is a common case we just have HDFS and HBase deployed. If our Master/RS depend on MR framework (especially some features we have not used at all), it introduced another cost for maintain. I don't think it is a good idea. 2016-09-23 10:28 GMT+08:00 张铎 : > To be specific, for example, our nice Backup/Restore feature, if we think > this is not a core feature of HBase, then we could make it depend on MR, > and start a standalone BackupManager instance that submits MR jobs to do > periodical maintenance job. And if we think this is a core feature that > everyone should use it, then we'd better implement it without MR > dependency, like DLS. > > Thanks. > > 2016-09-23 10:11 GMT+08:00 张铎 : > >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our >> features depend on MR but I think the bottom line is that we should launch >> the jobs from outside manually or by other services. >> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell : >> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair >>> question. >>> >>> Can this be driven by a utility derived from Tool like our other MR apps? >>> The issue is needing the AccessController to decide if allowed? But nothing >>> prevents the user from running the job manually/independently, right? >>> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi >>> wrote: >>> > >>> > just a remark. my query was not about tools using MR (everyone i think >>> is >>> > ok with those). >>> > the topic was about: "are we ok with running MR jobs from Master and RSs >>> > code?" since this will be the first time we do this >>> > >>> > Matteo >>> > >>> > >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das >>> wrote: >>> >> >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's >>> >> fine to be dependent on MR. MR is the right framework for such. We >>> should >>> >> also do compactions using MR (just saying :) ) >>> >> >>> >> From: Ted Yu >>> >> Sent: Thursday, September 22, 2016 2:00 PM >>> >> To: dev@hbase.apache.org >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >>> >> >>> >> I agree - backup / restore is in the same category as import / export. >>> >> >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < >>> andrew.purt...@gmail.com> >>> >> wrote: >>> >> >>> >>> Backup is extra tooling around core in my opinion. Like import or >>> export. >>> >>> Or the optional MOB tool. It's fine. >>> >>> >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi >>> >>> wrote: >>> >>>> >>> >>>> What's the latest opinion around running MR jobs from hbase (Master >>> or >>> >>> RS)? >>> >>>> >>> >>>> I remember in the past that there was discussion about not having MR >>> >> has >>> >>>> direct dependency of hbase. >>> >>>> >>> >>>> I think some of discussion where around MOB that had a MR job to >>> >> compact, >>> >>>> that later was transformed in a non-MR job to be merged, I think we >>> >> had a >>> >>>> similar discussion for log split/replay. >>> >>>> >>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job >>> >>> from >>> >>>> the master to copy data or restore data. >>> >>>> (backup is also "not really core" as in.. if you don't use backup >>> >> you'll >>> >>>> not end up running MR jobs, but this was probably true for MOB as in >>> >> "if >>> >>>> you don't enable MOB you don't need MR") >>> >>>> >>> >>>> any thoughts? do we a rule that says "we don't want to have hbase run >>> >> MR >>> >>>> jobs, only tool started manually by the user can do that". or can we >>> >>> start >>> >>>> adding MR calls around without problems? >>> >>> >>> >> >>> >> >>
Re: [DISCUSSION] MR jobs started by Master or RS
To be specific, for example, our nice Backup/Restore feature, if we think this is not a core feature of HBase, then we could make it depend on MR, and start a standalone BackupManager instance that submits MR jobs to do periodical maintenance job. And if we think this is a core feature that everyone should use it, then we'd better implement it without MR dependency, like DLS. Thanks. 2016-09-23 10:11 GMT+08:00 张铎 : > I‘m -1 on let master or rs launch MR jobs. It is OK that some of our > features depend on MR but I think the bottom line is that we should launch > the jobs from outside manually or by other services. > > 2016-09-23 9:47 GMT+08:00 Andrew Purtell : > >> Ok, got it. Well "shelling out" is on the line I think, so a fair >> question. >> >> Can this be driven by a utility derived from Tool like our other MR apps? >> The issue is needing the AccessController to decide if allowed? But nothing >> prevents the user from running the job manually/independently, right? >> >> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi >> wrote: >> > >> > just a remark. my query was not about tools using MR (everyone i think >> is >> > ok with those). >> > the topic was about: "are we ok with running MR jobs from Master and RSs >> > code?" since this will be the first time we do this >> > >> > Matteo >> > >> > >> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das >> wrote: >> >> >> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's >> >> fine to be dependent on MR. MR is the right framework for such. We >> should >> >> also do compactions using MR (just saying :) ) >> >> >> >> From: Ted Yu >> >> Sent: Thursday, September 22, 2016 2:00 PM >> >> To: dev@hbase.apache.org >> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >> >> >> >> I agree - backup / restore is in the same category as import / export. >> >> >> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < >> andrew.purt...@gmail.com> >> >> wrote: >> >> >> >>> Backup is extra tooling around core in my opinion. Like import or >> export. >> >>> Or the optional MOB tool. It's fine. >> >>> >> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi >> >>> wrote: >> >>>> >> >>>> What's the latest opinion around running MR jobs from hbase (Master >> or >> >>> RS)? >> >>>> >> >>>> I remember in the past that there was discussion about not having MR >> >> has >> >>>> direct dependency of hbase. >> >>>> >> >>>> I think some of discussion where around MOB that had a MR job to >> >> compact, >> >>>> that later was transformed in a non-MR job to be merged, I think we >> >> had a >> >>>> similar discussion for log split/replay. >> >>>> >> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job >> >>> from >> >>>> the master to copy data or restore data. >> >>>> (backup is also "not really core" as in.. if you don't use backup >> >> you'll >> >>>> not end up running MR jobs, but this was probably true for MOB as in >> >> "if >> >>>> you don't enable MOB you don't need MR") >> >>>> >> >>>> any thoughts? do we a rule that says "we don't want to have hbase run >> >> MR >> >>>> jobs, only tool started manually by the user can do that". or can we >> >>> start >> >>>> adding MR calls around without problems? >> >>> >> >> >> > >
Re: [DISCUSSION] MR jobs started by Master or RS
I‘m -1 on let master or rs launch MR jobs. It is OK that some of our features depend on MR but I think the bottom line is that we should launch the jobs from outside manually or by other services. 2016-09-23 9:47 GMT+08:00 Andrew Purtell : > Ok, got it. Well "shelling out" is on the line I think, so a fair question. > > Can this be driven by a utility derived from Tool like our other MR apps? > The issue is needing the AccessController to decide if allowed? But nothing > prevents the user from running the job manually/independently, right? > > > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi > wrote: > > > > just a remark. my query was not about tools using MR (everyone i think is > > ok with those). > > the topic was about: "are we ok with running MR jobs from Master and RSs > > code?" since this will be the first time we do this > > > > Matteo > > > > > >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das > wrote: > >> > >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's > >> fine to be dependent on MR. MR is the right framework for such. We > should > >> also do compactions using MR (just saying :) ) > >> ________________________ > >> From: Ted Yu > >> Sent: Thursday, September 22, 2016 2:00 PM > >> To: dev@hbase.apache.org > >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS > >> > >> I agree - backup / restore is in the same category as import / export. > >> > >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > andrew.purt...@gmail.com> > >> wrote: > >> > >>> Backup is extra tooling around core in my opinion. Like import or > export. > >>> Or the optional MOB tool. It's fine. > >>> > >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > >>> wrote: > >>>> > >>>> What's the latest opinion around running MR jobs from hbase (Master or > >>> RS)? > >>>> > >>>> I remember in the past that there was discussion about not having MR > >> has > >>>> direct dependency of hbase. > >>>> > >>>> I think some of discussion where around MOB that had a MR job to > >> compact, > >>>> that later was transformed in a non-MR job to be merged, I think we > >> had a > >>>> similar discussion for log split/replay. > >>>> > >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job > >>> from > >>>> the master to copy data or restore data. > >>>> (backup is also "not really core" as in.. if you don't use backup > >> you'll > >>>> not end up running MR jobs, but this was probably true for MOB as in > >> "if > >>>> you don't enable MOB you don't need MR") > >>>> > >>>> any thoughts? do we a rule that says "we don't want to have hbase run > >> MR > >>>> jobs, only tool started manually by the user can do that". or can we > >>> start > >>>> adding MR calls around without problems? > >>> > >> >
Re: [DISCUSSION] MR jobs started by Master or RS
(Back with a sore throat.) Also for what it is worth - it may well be that the attempt to bolt containers-as-executors to YARN is too little too late and coordination of container based services and applications (such as distributed map-reduce workflows or more likely Spark) will be handled by the native management tooling we are all using to build container based infrastructure. It's unclear to me how long HBase will live into this new regime but I'd optimistically wager long enough so even having YARN around is a suspect assumption. > On Sep 22, 2016, at 6:58 PM, Andrew Purtell wrote: > > > We should also do compactions using MR (just saying :) > > No, never. It's not a good idea to wed any of our core function to something > that independently evolves, that some of us don't have commit rights on (and > never will), and has varying degrees of utility depending on deploy. Like JM > says in some places not having the MR runtime around is virtuous. > > (Runs away screaming.) > > On Sep 22, 2016, at 2:49 PM, Devaraj Das wrote: > >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's fine >> to be dependent on MR. MR is the right framework for such. We should also do >> compactions using MR (just saying :) ) >> >> From: Ted Yu >> Sent: Thursday, September 22, 2016 2:00 PM >> To: dev@hbase.apache.org >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >> >> I agree - backup / restore is in the same category as import / export. >> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell >> wrote: >> >>> Backup is extra tooling around core in my opinion. Like import or export. >>> Or the optional MOB tool. It's fine. >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi >>> wrote: >>>> >>>> What's the latest opinion around running MR jobs from hbase (Master or >>> RS)? >>>> >>>> I remember in the past that there was discussion about not having MR has >>>> direct dependency of hbase. >>>> >>>> I think some of discussion where around MOB that had a MR job to compact, >>>> that later was transformed in a non-MR job to be merged, I think we had a >>>> similar discussion for log split/replay. >>>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job >>> from >>>> the master to copy data or restore data. >>>> (backup is also "not really core" as in.. if you don't use backup you'll >>>> not end up running MR jobs, but this was probably true for MOB as in "if >>>> you don't enable MOB you don't need MR") >>>> >>>> any thoughts? do we a rule that says "we don't want to have hbase run MR >>>> jobs, only tool started manually by the user can do that". or can we >>> start >>>> adding MR calls around without problems? >>>
Re: [DISCUSSION] MR jobs started by Master or RS
> We should also do compactions using MR (just saying :) No, never. It's not a good idea to wed any of our core function to something that independently evolves, that some of us don't have commit rights on (and never will), and has varying degrees of utility depending on deploy. Like JM says in some places not having the MR runtime around is virtuous. (Runs away screaming.) > On Sep 22, 2016, at 2:49 PM, Devaraj Das wrote: > > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's fine > to be dependent on MR. MR is the right framework for such. We should also do > compactions using MR (just saying :) ) > > From: Ted Yu > Sent: Thursday, September 22, 2016 2:00 PM > To: dev@hbase.apache.org > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > I agree - backup / restore is in the same category as import / export. > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell > wrote: > >> Backup is extra tooling around core in my opinion. Like import or export. >> Or the optional MOB tool. It's fine. >> >>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi >> wrote: >>> >>> What's the latest opinion around running MR jobs from hbase (Master or >> RS)? >>> >>> I remember in the past that there was discussion about not having MR has >>> direct dependency of hbase. >>> >>> I think some of discussion where around MOB that had a MR job to compact, >>> that later was transformed in a non-MR job to be merged, I think we had a >>> similar discussion for log split/replay. >>> >>> the latest is the new Backup feature (HBASE-7912), that runs a MR job >> from >>> the master to copy data or restore data. >>> (backup is also "not really core" as in.. if you don't use backup you'll >>> not end up running MR jobs, but this was probably true for MOB as in "if >>> you don't enable MOB you don't need MR") >>> >>> any thoughts? do we a rule that says "we don't want to have hbase run MR >>> jobs, only tool started manually by the user can do that". or can we >> start >>> adding MR calls around without problems? >>
Re: [DISCUSSION] MR jobs started by Master or RS
Ok, got it. Well "shelling out" is on the line I think, so a fair question. Can this be driven by a utility derived from Tool like our other MR apps? The issue is needing the AccessController to decide if allowed? But nothing prevents the user from running the job manually/independently, right? > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi wrote: > > just a remark. my query was not about tools using MR (everyone i think is > ok with those). > the topic was about: "are we ok with running MR jobs from Master and RSs > code?" since this will be the first time we do this > > Matteo > > >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das wrote: >> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's >> fine to be dependent on MR. MR is the right framework for such. We should >> also do compactions using MR (just saying :) ) >> >> From: Ted Yu >> Sent: Thursday, September 22, 2016 2:00 PM >> To: dev@hbase.apache.org >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS >> >> I agree - backup / restore is in the same category as import / export. >> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell >> wrote: >> >>> Backup is extra tooling around core in my opinion. Like import or export. >>> Or the optional MOB tool. It's fine. >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi >>> wrote: >>>> >>>> What's the latest opinion around running MR jobs from hbase (Master or >>> RS)? >>>> >>>> I remember in the past that there was discussion about not having MR >> has >>>> direct dependency of hbase. >>>> >>>> I think some of discussion where around MOB that had a MR job to >> compact, >>>> that later was transformed in a non-MR job to be merged, I think we >> had a >>>> similar discussion for log split/replay. >>>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR job >>> from >>>> the master to copy data or restore data. >>>> (backup is also "not really core" as in.. if you don't use backup >> you'll >>>> not end up running MR jobs, but this was probably true for MOB as in >> "if >>>> you don't enable MOB you don't need MR") >>>> >>>> any thoughts? do we a rule that says "we don't want to have hbase run >> MR >>>> jobs, only tool started manually by the user can do that". or can we >>> start >>>> adding MR calls around without problems? >>> >>
Re: [DISCUSSION] MR jobs started by Master or RS
Once you are in the game of coordinating large scale tasks with distribution, fault tolerance, etc other than implementing a similar framework inside HBase, MR will be the way to go. Things like exporting snapshots, dist cp, or backups (which uses these) must use such a framework. The issue about master launching MR jobs came in the review around that time, and we concluded that it was fine since backups by definition require such a framework. Enis On Thu, Sep 22, 2016 at 4:32 PM, Devaraj Das wrote: > Not practical to do those tools without MR, JM. We should be using the > right framework for the use cases in hand. MR fits this really well. > JM, when you say "if we can do without MR, then, why not?", do you have a > framework in mind that performs/scale as well as MR? Curious. > > From: Jean-Marc Spaggiari > Sent: Thursday, September 22, 2016 4:29 PM > To: dev > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > Well, I'm just not using those features ;) But was hopping for the MOBs ;) > My point is, if we can do it without MR, then, why not? ) > > 2016-09-22 19:25 GMT-04:00 Vladimir Rodionov : > > > Forgot WALPlayer :) > > > > -Vlad > > > > On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov < > vladrodio...@gmail.com > > > > > wrote: > > > > > >> and > > > >> backups too, but don't want to bother having to install and > configure > > > YARN > > > >> just for that, as well as removing resources from HBase to give it > to > > > > > > Any suggestions on how to do bulk data move with transformation from/to > > > HBase cluster w/o MapReduce? > > > > > > Opposition to M/R does not make sense imo, as since we have a lot of > > tools > > > in HBase which depend on MapReduce: > > > > > > CountRows > > > CountCells > > > Import > > > Export > > > ImportTsv > > > ExportTsv > > > CopyTable > > > VerifyReplication > > > ExportSnapshot > > > > > > and new backup create/restore of course. > > > > > > > > > -Vlad > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari < > > > jean-m...@spaggiari.org> wrote: > > > > > >> My 2¢: I have a strong preference for NOT having a dependency on MR > > >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I > > >> like > > >> all the features that we built. Would love to be able to use MOBs and > > >> backups too, but don't want to bother having to install and configure > > YARN > > >> just for that, as well as removing resources from HBase to give it to > > >> yarn > > >> > > >> JMS > > >> > > >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi : > > >> > > >> > just a remark. my query was not about tools using MR (everyone i > think > > >> is > > >> > ok with those). > > >> > the topic was about: "are we ok with running MR jobs from Master and > > RSs > > >> > code?" since this will be the first time we do this > > >> > > > >> > Matteo > > >> > > > >> > > > >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das > > >> wrote: > > >> > > > >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore, > > >> it's > > >> > > fine to be dependent on MR. MR is the right framework for such. We > > >> should > > >> > > also do compactions using MR (just saying :) ) > > >> > > > > >> > > From: Ted Yu > > >> > > Sent: Thursday, September 22, 2016 2:00 PM > > >> > > To: dev@hbase.apache.org > > >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > >> > > > > >> > > I agree - backup / restore is in the same category as import / > > export. > > >> > > > > >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > > >> > andrew.purt...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > Backup is extra tooling around core in my opinion. Like import > or > > >> > export. > > >> > > > Or the
Re: [DISCUSSION] MR jobs started by Master or RS
Not practical to do those tools without MR, JM. We should be using the right framework for the use cases in hand. MR fits this really well. JM, when you say "if we can do without MR, then, why not?", do you have a framework in mind that performs/scale as well as MR? Curious. From: Jean-Marc Spaggiari Sent: Thursday, September 22, 2016 4:29 PM To: dev Subject: Re: [DISCUSSION] MR jobs started by Master or RS Well, I'm just not using those features ;) But was hopping for the MOBs ;) My point is, if we can do it without MR, then, why not? ) 2016-09-22 19:25 GMT-04:00 Vladimir Rodionov : > Forgot WALPlayer :) > > -Vlad > > On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov > > wrote: > > > >> and > > >> backups too, but don't want to bother having to install and configure > > YARN > > >> just for that, as well as removing resources from HBase to give it to > > > > Any suggestions on how to do bulk data move with transformation from/to > > HBase cluster w/o MapReduce? > > > > Opposition to M/R does not make sense imo, as since we have a lot of > tools > > in HBase which depend on MapReduce: > > > > CountRows > > CountCells > > Import > > Export > > ImportTsv > > ExportTsv > > CopyTable > > VerifyReplication > > ExportSnapshot > > > > and new backup create/restore of course. > > > > > > -Vlad > > > > > > > > > > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari < > > jean-m...@spaggiari.org> wrote: > > > >> My 2¢: I have a strong preference for NOT having a dependency on MR > >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I > >> like > >> all the features that we built. Would love to be able to use MOBs and > >> backups too, but don't want to bother having to install and configure > YARN > >> just for that, as well as removing resources from HBase to give it to > >> yarn > >> > >> JMS > >> > >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi : > >> > >> > just a remark. my query was not about tools using MR (everyone i think > >> is > >> > ok with those). > >> > the topic was about: "are we ok with running MR jobs from Master and > RSs > >> > code?" since this will be the first time we do this > >> > > >> > Matteo > >> > > >> > > >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das > >> wrote: > >> > > >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore, > >> it's > >> > > fine to be dependent on MR. MR is the right framework for such. We > >> should > >> > > also do compactions using MR (just saying :) ) > >> > > > >> > > From: Ted Yu > >> > > Sent: Thursday, September 22, 2016 2:00 PM > >> > > To: dev@hbase.apache.org > >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > >> > > > >> > > I agree - backup / restore is in the same category as import / > export. > >> > > > >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > >> > andrew.purt...@gmail.com> > >> > > wrote: > >> > > > >> > > > Backup is extra tooling around core in my opinion. Like import or > >> > export. > >> > > > Or the optional MOB tool. It's fine. > >> > > > > >> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < > >> mberto...@apache.org> > >> > > > wrote: > >> > > > > > >> > > > > What's the latest opinion around running MR jobs from hbase > >> (Master > >> > or > >> > > > RS)? > >> > > > > > >> > > > > I remember in the past that there was discussion about not > having > >> MR > >> > > has > >> > > > > direct dependency of hbase. > >> > > > > > >> > > > > I think some of discussion where around MOB that had a MR job to > >> > > compact, > >> > > > > that later was transformed in a non-MR job to be merged, I think > >> we > >> > > had a > >> > > > > similar discussion for log split/replay. > >> > > > > > >> > > > > the latest is the new Backup feature (HBASE-7912), that runs a > MR > >> job > >> > > > from > >> > > > > the master to copy data or restore data. > >> > > > > (backup is also "not really core" as in.. if you don't use > backup > >> > > you'll > >> > > > > not end up running MR jobs, but this was probably true for MOB > as > >> in > >> > > "if > >> > > > > you don't enable MOB you don't need MR") > >> > > > > > >> > > > > any thoughts? do we a rule that says "we don't want to have > hbase > >> run > >> > > MR > >> > > > > jobs, only tool started manually by the user can do that". or > can > >> we > >> > > > start > >> > > > > adding MR calls around without problems? > >> > > > > >> > > > >> > > >> > > > > >
Re: [DISCUSSION] MR jobs started by Master or RS
Matteo, the Master won't spawn the job unless someone actually wants to use the backup/restore. So I'd argue we still don't have a 'hard' dependency - it's still much like the other tools that you consider as being outside the core. From: Matteo Bertozzi Sent: Thursday, September 22, 2016 3:44 PM To: dev@hbase.apache.org Subject: Re: [DISCUSSION] MR jobs started by Master or RS just a remark. my query was not about tools using MR (everyone i think is ok with those). the topic was about: "are we ok with running MR jobs from Master and RSs code?" since this will be the first time we do this Matteo On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das wrote: > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's > fine to be dependent on MR. MR is the right framework for such. We should > also do compactions using MR (just saying :) ) > > From: Ted Yu > Sent: Thursday, September 22, 2016 2:00 PM > To: dev@hbase.apache.org > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > I agree - backup / restore is in the same category as import / export. > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell > wrote: > > > Backup is extra tooling around core in my opinion. Like import or export. > > Or the optional MOB tool. It's fine. > > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > > wrote: > > > > > > What's the latest opinion around running MR jobs from hbase (Master or > > RS)? > > > > > > I remember in the past that there was discussion about not having MR > has > > > direct dependency of hbase. > > > > > > I think some of discussion where around MOB that had a MR job to > compact, > > > that later was transformed in a non-MR job to be merged, I think we > had a > > > similar discussion for log split/replay. > > > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job > > from > > > the master to copy data or restore data. > > > (backup is also "not really core" as in.. if you don't use backup > you'll > > > not end up running MR jobs, but this was probably true for MOB as in > "if > > > you don't enable MOB you don't need MR") > > > > > > any thoughts? do we a rule that says "we don't want to have hbase run > MR > > > jobs, only tool started manually by the user can do that". or can we > > start > > > adding MR calls around without problems? > > >
Re: [DISCUSSION] MR jobs started by Master or RS
Well, I'm just not using those features ;) But was hopping for the MOBs ;) My point is, if we can do it without MR, then, why not? ) 2016-09-22 19:25 GMT-04:00 Vladimir Rodionov : > Forgot WALPlayer :) > > -Vlad > > On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov > > wrote: > > > >> and > > >> backups too, but don't want to bother having to install and configure > > YARN > > >> just for that, as well as removing resources from HBase to give it to > > > > Any suggestions on how to do bulk data move with transformation from/to > > HBase cluster w/o MapReduce? > > > > Opposition to M/R does not make sense imo, as since we have a lot of > tools > > in HBase which depend on MapReduce: > > > > CountRows > > CountCells > > Import > > Export > > ImportTsv > > ExportTsv > > CopyTable > > VerifyReplication > > ExportSnapshot > > > > and new backup create/restore of course. > > > > > > -Vlad > > > > > > > > > > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari < > > jean-m...@spaggiari.org> wrote: > > > >> My 2¢: I have a strong preference for NOT having a dependency on MR > >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I > >> like > >> all the features that we built. Would love to be able to use MOBs and > >> backups too, but don't want to bother having to install and configure > YARN > >> just for that, as well as removing resources from HBase to give it to > >> yarn > >> > >> JMS > >> > >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi : > >> > >> > just a remark. my query was not about tools using MR (everyone i think > >> is > >> > ok with those). > >> > the topic was about: "are we ok with running MR jobs from Master and > RSs > >> > code?" since this will be the first time we do this > >> > > >> > Matteo > >> > > >> > > >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das > >> wrote: > >> > > >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore, > >> it's > >> > > fine to be dependent on MR. MR is the right framework for such. We > >> should > >> > > also do compactions using MR (just saying :) ) > >> > > > >> > > From: Ted Yu > >> > > Sent: Thursday, September 22, 2016 2:00 PM > >> > > To: dev@hbase.apache.org > >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > >> > > > >> > > I agree - backup / restore is in the same category as import / > export. > >> > > > >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > >> > andrew.purt...@gmail.com> > >> > > wrote: > >> > > > >> > > > Backup is extra tooling around core in my opinion. Like import or > >> > export. > >> > > > Or the optional MOB tool. It's fine. > >> > > > > >> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < > >> mberto...@apache.org> > >> > > > wrote: > >> > > > > > >> > > > > What's the latest opinion around running MR jobs from hbase > >> (Master > >> > or > >> > > > RS)? > >> > > > > > >> > > > > I remember in the past that there was discussion about not > having > >> MR > >> > > has > >> > > > > direct dependency of hbase. > >> > > > > > >> > > > > I think some of discussion where around MOB that had a MR job to > >> > > compact, > >> > > > > that later was transformed in a non-MR job to be merged, I think > >> we > >> > > had a > >> > > > > similar discussion for log split/replay. > >> > > > > > >> > > > > the latest is the new Backup feature (HBASE-7912), that runs a > MR > >> job > >> > > > from > >> > > > > the master to copy data or restore data. > >> > > > > (backup is also "not really core" as in.. if you don't use > backup > >> > > you'll > >> > > > > not end up running MR jobs, but this was probably true for MOB > as > >> in > >> > > "if > >> > > > > you don't enable MOB you don't need MR") > >> > > > > > >> > > > > any thoughts? do we a rule that says "we don't want to have > hbase > >> run > >> > > MR > >> > > > > jobs, only tool started manually by the user can do that". or > can > >> we > >> > > > start > >> > > > > adding MR calls around without problems? > >> > > > > >> > > > >> > > >> > > > > >
Re: [DISCUSSION] MR jobs started by Master or RS
Forgot WALPlayer :) -Vlad On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov wrote: > >> and > >> backups too, but don't want to bother having to install and configure > YARN > >> just for that, as well as removing resources from HBase to give it to > > Any suggestions on how to do bulk data move with transformation from/to > HBase cluster w/o MapReduce? > > Opposition to M/R does not make sense imo, as since we have a lot of tools > in HBase which depend on MapReduce: > > CountRows > CountCells > Import > Export > ImportTsv > ExportTsv > CopyTable > VerifyReplication > ExportSnapshot > > and new backup create/restore of course. > > > -Vlad > > > > > On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> My 2¢: I have a strong preference for NOT having a dependency on MR >> anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I >> like >> all the features that we built. Would love to be able to use MOBs and >> backups too, but don't want to bother having to install and configure YARN >> just for that, as well as removing resources from HBase to give it to >> yarn >> >> JMS >> >> 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi : >> >> > just a remark. my query was not about tools using MR (everyone i think >> is >> > ok with those). >> > the topic was about: "are we ok with running MR jobs from Master and RSs >> > code?" since this will be the first time we do this >> > >> > Matteo >> > >> > >> > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das >> wrote: >> > >> > > Very much agree; for tools like ExportSnapshot / Backup / Restore, >> it's >> > > fine to be dependent on MR. MR is the right framework for such. We >> should >> > > also do compactions using MR (just saying :) ) >> > > >> > > From: Ted Yu >> > > Sent: Thursday, September 22, 2016 2:00 PM >> > > To: dev@hbase.apache.org >> > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS >> > > >> > > I agree - backup / restore is in the same category as import / export. >> > > >> > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < >> > andrew.purt...@gmail.com> >> > > wrote: >> > > >> > > > Backup is extra tooling around core in my opinion. Like import or >> > export. >> > > > Or the optional MOB tool. It's fine. >> > > > >> > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi < >> mberto...@apache.org> >> > > > wrote: >> > > > > >> > > > > What's the latest opinion around running MR jobs from hbase >> (Master >> > or >> > > > RS)? >> > > > > >> > > > > I remember in the past that there was discussion about not having >> MR >> > > has >> > > > > direct dependency of hbase. >> > > > > >> > > > > I think some of discussion where around MOB that had a MR job to >> > > compact, >> > > > > that later was transformed in a non-MR job to be merged, I think >> we >> > > had a >> > > > > similar discussion for log split/replay. >> > > > > >> > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR >> job >> > > > from >> > > > > the master to copy data or restore data. >> > > > > (backup is also "not really core" as in.. if you don't use backup >> > > you'll >> > > > > not end up running MR jobs, but this was probably true for MOB as >> in >> > > "if >> > > > > you don't enable MOB you don't need MR") >> > > > > >> > > > > any thoughts? do we a rule that says "we don't want to have hbase >> run >> > > MR >> > > > > jobs, only tool started manually by the user can do that". or can >> we >> > > > start >> > > > > adding MR calls around without problems? >> > > > >> > > >> > >> > >
Re: [DISCUSSION] MR jobs started by Master or RS
>> and >> backups too, but don't want to bother having to install and configure YARN >> just for that, as well as removing resources from HBase to give it to Any suggestions on how to do bulk data move with transformation from/to HBase cluster w/o MapReduce? Opposition to M/R does not make sense imo, as since we have a lot of tools in HBase which depend on MapReduce: CountRows CountCells Import Export ImportTsv ExportTsv CopyTable VerifyReplication ExportSnapshot and new backup create/restore of course. -Vlad On Thu, Sep 22, 2016 at 4:15 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > My 2¢: I have a strong preference for NOT having a dependency on MR > anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I like > all the features that we built. Would love to be able to use MOBs and > backups too, but don't want to bother having to install and configure YARN > just for that, as well as removing resources from HBase to give it to > yarn > > JMS > > 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi : > > > just a remark. my query was not about tools using MR (everyone i think is > > ok with those). > > the topic was about: "are we ok with running MR jobs from Master and RSs > > code?" since this will be the first time we do this > > > > Matteo > > > > > > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das > wrote: > > > > > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's > > > fine to be dependent on MR. MR is the right framework for such. We > should > > > also do compactions using MR (just saying :) ) > > > ____________ > > > From: Ted Yu > > > Sent: Thursday, September 22, 2016 2:00 PM > > > To: dev@hbase.apache.org > > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > > > I agree - backup / restore is in the same category as import / export. > > > > > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > > andrew.purt...@gmail.com> > > > wrote: > > > > > > > Backup is extra tooling around core in my opinion. Like import or > > export. > > > > Or the optional MOB tool. It's fine. > > > > > > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > > > > > wrote: > > > > > > > > > > What's the latest opinion around running MR jobs from hbase (Master > > or > > > > RS)? > > > > > > > > > > I remember in the past that there was discussion about not having > MR > > > has > > > > > direct dependency of hbase. > > > > > > > > > > I think some of discussion where around MOB that had a MR job to > > > compact, > > > > > that later was transformed in a non-MR job to be merged, I think we > > > had a > > > > > similar discussion for log split/replay. > > > > > > > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR > job > > > > from > > > > > the master to copy data or restore data. > > > > > (backup is also "not really core" as in.. if you don't use backup > > > you'll > > > > > not end up running MR jobs, but this was probably true for MOB as > in > > > "if > > > > > you don't enable MOB you don't need MR") > > > > > > > > > > any thoughts? do we a rule that says "we don't want to have hbase > run > > > MR > > > > > jobs, only tool started manually by the user can do that". or can > we > > > > start > > > > > adding MR calls around without problems? > > > > > > > > > >
Re: [DISCUSSION] MR jobs started by Master or RS
I think the rationale behind a decision to move code to Master was a security/access control. Enis will correct me if I am wrong. Master does not have direct dependecy on mapreduce - only on backup. -Vlad On Thu, Sep 22, 2016 at 3:44 PM, Matteo Bertozzi wrote: > just a remark. my query was not about tools using MR (everyone i think is > ok with those). > the topic was about: "are we ok with running MR jobs from Master and RSs > code?" since this will be the first time we do this > > Matteo > > > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das wrote: > > > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's > > fine to be dependent on MR. MR is the right framework for such. We should > > also do compactions using MR (just saying :) ) > > > > From: Ted Yu > > Sent: Thursday, September 22, 2016 2:00 PM > > To: dev@hbase.apache.org > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > I agree - backup / restore is in the same category as import / export. > > > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > andrew.purt...@gmail.com> > > wrote: > > > > > Backup is extra tooling around core in my opinion. Like import or > export. > > > Or the optional MOB tool. It's fine. > > > > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > > > wrote: > > > > > > > > What's the latest opinion around running MR jobs from hbase (Master > or > > > RS)? > > > > > > > > I remember in the past that there was discussion about not having MR > > has > > > > direct dependency of hbase. > > > > > > > > I think some of discussion where around MOB that had a MR job to > > compact, > > > > that later was transformed in a non-MR job to be merged, I think we > > had a > > > > similar discussion for log split/replay. > > > > > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job > > > from > > > > the master to copy data or restore data. > > > > (backup is also "not really core" as in.. if you don't use backup > > you'll > > > > not end up running MR jobs, but this was probably true for MOB as in > > "if > > > > you don't enable MOB you don't need MR") > > > > > > > > any thoughts? do we a rule that says "we don't want to have hbase run > > MR > > > > jobs, only tool started manually by the user can do that". or can we > > > start > > > > adding MR calls around without problems? > > > > > >
Re: [DISCUSSION] MR jobs started by Master or RS
My 2¢: I have a strong preference for NOT having a dependency on MR anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I like all the features that we built. Would love to be able to use MOBs and backups too, but don't want to bother having to install and configure YARN just for that, as well as removing resources from HBase to give it to yarn JMS 2016-09-22 18:44 GMT-04:00 Matteo Bertozzi : > just a remark. my query was not about tools using MR (everyone i think is > ok with those). > the topic was about: "are we ok with running MR jobs from Master and RSs > code?" since this will be the first time we do this > > Matteo > > > On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das wrote: > > > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's > > fine to be dependent on MR. MR is the right framework for such. We should > > also do compactions using MR (just saying :) ) > > > > From: Ted Yu > > Sent: Thursday, September 22, 2016 2:00 PM > > To: dev@hbase.apache.org > > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > > > I agree - backup / restore is in the same category as import / export. > > > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell < > andrew.purt...@gmail.com> > > wrote: > > > > > Backup is extra tooling around core in my opinion. Like import or > export. > > > Or the optional MOB tool. It's fine. > > > > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > > > wrote: > > > > > > > > What's the latest opinion around running MR jobs from hbase (Master > or > > > RS)? > > > > > > > > I remember in the past that there was discussion about not having MR > > has > > > > direct dependency of hbase. > > > > > > > > I think some of discussion where around MOB that had a MR job to > > compact, > > > > that later was transformed in a non-MR job to be merged, I think we > > had a > > > > similar discussion for log split/replay. > > > > > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job > > > from > > > > the master to copy data or restore data. > > > > (backup is also "not really core" as in.. if you don't use backup > > you'll > > > > not end up running MR jobs, but this was probably true for MOB as in > > "if > > > > you don't enable MOB you don't need MR") > > > > > > > > any thoughts? do we a rule that says "we don't want to have hbase run > > MR > > > > jobs, only tool started manually by the user can do that". or can we > > > start > > > > adding MR calls around without problems? > > > > > >
Re: [DISCUSSION] MR jobs started by Master or RS
just a remark. my query was not about tools using MR (everyone i think is ok with those). the topic was about: "are we ok with running MR jobs from Master and RSs code?" since this will be the first time we do this Matteo On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das wrote: > Very much agree; for tools like ExportSnapshot / Backup / Restore, it's > fine to be dependent on MR. MR is the right framework for such. We should > also do compactions using MR (just saying :) ) > > From: Ted Yu > Sent: Thursday, September 22, 2016 2:00 PM > To: dev@hbase.apache.org > Subject: Re: [DISCUSSION] MR jobs started by Master or RS > > I agree - backup / restore is in the same category as import / export. > > On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell > wrote: > > > Backup is extra tooling around core in my opinion. Like import or export. > > Or the optional MOB tool. It's fine. > > > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > > wrote: > > > > > > What's the latest opinion around running MR jobs from hbase (Master or > > RS)? > > > > > > I remember in the past that there was discussion about not having MR > has > > > direct dependency of hbase. > > > > > > I think some of discussion where around MOB that had a MR job to > compact, > > > that later was transformed in a non-MR job to be merged, I think we > had a > > > similar discussion for log split/replay. > > > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job > > from > > > the master to copy data or restore data. > > > (backup is also "not really core" as in.. if you don't use backup > you'll > > > not end up running MR jobs, but this was probably true for MOB as in > "if > > > you don't enable MOB you don't need MR") > > > > > > any thoughts? do we a rule that says "we don't want to have hbase run > MR > > > jobs, only tool started manually by the user can do that". or can we > > start > > > adding MR calls around without problems? > > >
Re: [DISCUSSION] MR jobs started by Master or RS
Very much agree; for tools like ExportSnapshot / Backup / Restore, it's fine to be dependent on MR. MR is the right framework for such. We should also do compactions using MR (just saying :) ) From: Ted Yu Sent: Thursday, September 22, 2016 2:00 PM To: dev@hbase.apache.org Subject: Re: [DISCUSSION] MR jobs started by Master or RS I agree - backup / restore is in the same category as import / export. On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell wrote: > Backup is extra tooling around core in my opinion. Like import or export. > Or the optional MOB tool. It's fine. > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > wrote: > > > > What's the latest opinion around running MR jobs from hbase (Master or > RS)? > > > > I remember in the past that there was discussion about not having MR has > > direct dependency of hbase. > > > > I think some of discussion where around MOB that had a MR job to compact, > > that later was transformed in a non-MR job to be merged, I think we had a > > similar discussion for log split/replay. > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job > from > > the master to copy data or restore data. > > (backup is also "not really core" as in.. if you don't use backup you'll > > not end up running MR jobs, but this was probably true for MOB as in "if > > you don't enable MOB you don't need MR") > > > > any thoughts? do we a rule that says "we don't want to have hbase run MR > > jobs, only tool started manually by the user can do that". or can we > start > > adding MR calls around without problems? >
Re: [DISCUSSION] MR jobs started by Master or RS
I agree - backup / restore is in the same category as import / export. On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell wrote: > Backup is extra tooling around core in my opinion. Like import or export. > Or the optional MOB tool. It's fine. > > > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi > wrote: > > > > What's the latest opinion around running MR jobs from hbase (Master or > RS)? > > > > I remember in the past that there was discussion about not having MR has > > direct dependency of hbase. > > > > I think some of discussion where around MOB that had a MR job to compact, > > that later was transformed in a non-MR job to be merged, I think we had a > > similar discussion for log split/replay. > > > > the latest is the new Backup feature (HBASE-7912), that runs a MR job > from > > the master to copy data or restore data. > > (backup is also "not really core" as in.. if you don't use backup you'll > > not end up running MR jobs, but this was probably true for MOB as in "if > > you don't enable MOB you don't need MR") > > > > any thoughts? do we a rule that says "we don't want to have hbase run MR > > jobs, only tool started manually by the user can do that". or can we > start > > adding MR calls around without problems? >
Re: [DISCUSSION] MR jobs started by Master or RS
Backup is extra tooling around core in my opinion. Like import or export. Or the optional MOB tool. It's fine. > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi wrote: > > What's the latest opinion around running MR jobs from hbase (Master or RS)? > > I remember in the past that there was discussion about not having MR has > direct dependency of hbase. > > I think some of discussion where around MOB that had a MR job to compact, > that later was transformed in a non-MR job to be merged, I think we had a > similar discussion for log split/replay. > > the latest is the new Backup feature (HBASE-7912), that runs a MR job from > the master to copy data or restore data. > (backup is also "not really core" as in.. if you don't use backup you'll > not end up running MR jobs, but this was probably true for MOB as in "if > you don't enable MOB you don't need MR") > > any thoughts? do we a rule that says "we don't want to have hbase run MR > jobs, only tool started manually by the user can do that". or can we start > adding MR calls around without problems?
Re: [DISCUSSION] MR jobs started by Master or RS
I would be -1 a requirement for MR for something core to HBase. > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi wrote: > > What's the latest opinion around running MR jobs from hbase (Master or RS)? > > I remember in the past that there was discussion about not having MR has > direct dependency of hbase. > > I think some of discussion where around MOB that had a MR job to compact, > that later was transformed in a non-MR job to be merged, I think we had a > similar discussion for log split/replay. > > the latest is the new Backup feature (HBASE-7912), that runs a MR job from > the master to copy data or restore data. > (backup is also "not really core" as in.. if you don't use backup you'll > not end up running MR jobs, but this was probably true for MOB as in "if > you don't enable MOB you don't need MR") > > any thoughts? do we a rule that says "we don't want to have hbase run MR > jobs, only tool started manually by the user can do that". or can we start > adding MR calls around without problems?
[DISCUSSION] MR jobs started by Master or RS
What's the latest opinion around running MR jobs from hbase (Master or RS)? I remember in the past that there was discussion about not having MR has direct dependency of hbase. I think some of discussion where around MOB that had a MR job to compact, that later was transformed in a non-MR job to be merged, I think we had a similar discussion for log split/replay. the latest is the new Backup feature (HBASE-7912), that runs a MR job from the master to copy data or restore data. (backup is also "not really core" as in.. if you don't use backup you'll not end up running MR jobs, but this was probably true for MOB as in "if you don't enable MOB you don't need MR") any thoughts? do we a rule that says "we don't want to have hbase run MR jobs, only tool started manually by the user can do that". or can we start adding MR calls around without problems?