from:"Gangumalla, Uma"

Re: Hadoop 3.1.0 release discussion

2018-01-30 Thread Gangumalla, Uma

Hi Wangda,

Daryn has provided his feedback and we would like to handle some of his 
feedbacks before merge.
That will take couple of more days for us to go through review cycles etc.
Next couple of days I will be in travel, I would not be able to put my efforts 
in it. Meanwhile Rakesh/Surendra will update if any changes in the plan.

So, to conclude, you can proceed cutting the branch and we will merge after 
reviews closed, that may go into 3.2.

Regards,
Uma

From: Wangda Tan <wheele...@gmail.com>
Date: Tuesday, January 30, 2018 at 6:24 AM
To: Uma Gangumalla <uma.ganguma...@intel.com>
Cc: "hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>, 
"yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>, 
"common-...@hadoop.apache.org" <common-...@hadoop.apache.org>, 
"mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>, Vinod 
Kumar Vavilapalli <vino...@hortonworks.com>
Subject: Re: Hadoop 3.1.0 release discussion

Thanks for the update.

On Tue, Jan 30, 2018 at 4:51 PM, Gangumalla, Uma 
<uma.ganguma...@intel.com<mailto:uma.ganguma...@intel.com>> wrote:
Hi Wangda,

Sorry that we have not started vote on 29th. Daryn is reviewing the branch and 
he needs a day for his finalized review, then we have to prioritize the 
comments and decide.
Will keep updated here.

Regards,
Uma


On 1/28/18, 5:31 PM, "Wangda Tan" 
<wheele...@gmail.com<mailto:wheele...@gmail.com>> wrote:

Hi Uma,

Thanks, I saw HDFS-13050 has been resolved 4 hours ago, I don't see any
other blockers under HDFS-10285. I think you guys should be able to start
voting thread in time for merging to trunk.

- Wangda

On Mon, Jan 29, 2018 at 3:12 AM, Gangumalla, Uma 
<uma.ganguma...@intel.com<mailto:uma.ganguma...@intel.com>>
wrote:

> Hi Wangda,
>
>
>
> Sorry for the delay.
>
>
>
> >>* (Uma) HDFS-10285: HDFS SPS. There're two remaining blockers:
> HDFS-12995/HDFS-13050. @Uma could you update what's the ETA of the two
> JIRAs?
>
> We have only one blocker now that is HDFS-13050 and we finished the key
> implementation from HDFS-12995, by HDFS-13075.
>
> We are planning to start vote by tomorrow (29th PST time). So, we request
> you to give us time for running vote. We will keep SPS off by default. So,
> interested users only can enable explicitly.
>
>
>
> Regards,
>
> Uma
>
>
>
> *From: *Wangda Tan <wheele...@gmail.com<mailto:wheele...@gmail.com>>
> *Date: *Friday, January 26, 2018 at 5:21 PM
> *To: *Uma Gangumalla 
<uma.ganguma...@intel.com<mailto:uma.ganguma...@intel.com>>
> *Cc: *"hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>" 
<hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>>, "
> yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>" 
<yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>>, "
> common-...@hadoop.apache.org<mailto:common-...@hadoop.apache.org>" 
<common-...@hadoop.apache.org<mailto:common-...@hadoop.apache.org>>, "
> mapreduce-...@hadoop.apache.org<mailto:mapreduce-...@hadoop.apache.org>" 
<mapreduce-...@hadoop.apache.org<mailto:mapreduce-...@hadoop.apache.org>>, Vinod
> Kumar Vavilapalli 
<vino...@hortonworks.com<mailto:vino...@hortonworks.com>>
> *Subject: *Re: Hadoop 3.1.0 release discussion
>
>
>
> Hi All,
>
>
>
> Just a reminder about feature freeze date.
>
>
>
> Feature freeze date for 3.1.0 release is Jan 30 PST (about 4 days from
> today), If you've any features which live in a branch and targeted to
> 3.1.0, please reply this email thread. Ideally, we should finish branch
> merging before feature freeze date.
>
>
>
> Here's an updated 3.1.0 feature status:
>
>
>
> 1. Merged features:
>
> * (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
>
> * (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
> end-to-end.
>
> * (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
>
> * (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename
> commits.
>
> * (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
> Queues While Doing Queue Mapping.
>
> * (Chris Douglas) HDFS-9806: HDFS Tiered Storage.
>
> * (Zhankun) YARN-5983: FPGA support. Majority implementations comple

Re: Hadoop 3.1.0 release discussion

2018-01-30 Thread Gangumalla, Uma

Hi Wangda,

Sorry that we have not started vote on 29th. Daryn is reviewing the branch and 
he needs a day for his finalized review, then we have to prioritize the 
comments and decide.
Will keep updated here.

Regards,
Uma


On 1/28/18, 5:31 PM, "Wangda Tan" <wheele...@gmail.com> wrote:

Hi Uma,

Thanks, I saw HDFS-13050 has been resolved 4 hours ago, I don't see any
other blockers under HDFS-10285. I think you guys should be able to start
voting thread in time for merging to trunk.

- Wangda

On Mon, Jan 29, 2018 at 3:12 AM, Gangumalla, Uma <uma.ganguma...@intel.com>
wrote:

> Hi Wangda,
>
>
>
> Sorry for the delay.
>
>
>
> >>* (Uma) HDFS-10285: HDFS SPS. There're two remaining blockers:
> HDFS-12995/HDFS-13050. @Uma could you update what's the ETA of the two
> JIRAs?
>
> We have only one blocker now that is HDFS-13050 and we finished the key
> implementation from HDFS-12995, by HDFS-13075.
>
> We are planning to start vote by tomorrow (29th PST time). So, we request
> you to give us time for running vote. We will keep SPS off by default. So,
> interested users only can enable explicitly.
>
>
>
> Regards,
>
> Uma
>
>
>
> *From: *Wangda Tan <wheele...@gmail.com>
> *Date: *Friday, January 26, 2018 at 5:21 PM
> *To: *Uma Gangumalla <uma.ganguma...@intel.com>
> *Cc: *"hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>, "
> yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>, "
> common-...@hadoop.apache.org" <common-...@hadoop.apache.org>, "
> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>, Vinod
> Kumar Vavilapalli <vino...@hortonworks.com>
> *Subject: *Re: Hadoop 3.1.0 release discussion
>
>
>
> Hi All,
>
>
>
> Just a reminder about feature freeze date.
>
>
>
> Feature freeze date for 3.1.0 release is Jan 30 PST (about 4 days from
> today), If you've any features which live in a branch and targeted to
> 3.1.0, please reply this email thread. Ideally, we should finish branch
> merging before feature freeze date.
>
>
>
> Here's an updated 3.1.0 feature status:
>
>
>
> 1. Merged features:
>
> * (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
>
> * (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
> end-to-end.
>
> * (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
>
> * (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename
> commits.
>
> * (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
> Queues While Doing Queue Mapping.
>
> * (Chris Douglas) HDFS-9806: HDFS Tiered Storage.
>
> * (Zhankun) YARN-5983: FPGA support. Majority implementations completed
> and merged to trunk. Except for UI/documentation.
>
>
>
> 2. Features close to finish:
>
> * (Uma) HDFS-10285: HDFS SPS. There're two remaining
> blockers: HDFS-12995/HDFS-13050. @Uma could you update what's the ETA of
> the two JIRAs?
>
> * (Arun Suresh / Kostas / Wangda). YARN-6592: New SchedulingRequest and
> anti-affinity support. (Voting thread started).
>
>
>
> 3. Tentative features:
>
> * (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic
> containers. Only one pending patch. Plan to finish before Jan 7th.
>
> * (Haibo Chen). YARN-1011: Resource overcommitment. Looks challenging to
> be done before Jan 2018.
>
> * (Anu): HDFS-7240: Ozone. Given the discussion on HDFS-7240. Looks
> challenging to be done before Jan 2018.
>
> * (Varun V) YARN-5673: container-executor write. Given security
> refactoring of c-e (YARN-6623) is already landed, IMHO other stuff may be
> moved to 3.2.
>
>
>
> Thanks,
>
> Wangda
>
    >
>
>
>
> On Mon, Jan 22, 2018 at 1:49 PM, Gangumalla, Uma 
<uma.ganguma...@intel.com>
> wrote:
>
> Sure, Wangda.
>
> Regards,
> Uma
>
>
> On 1/18/18, 10:19 AM, "Wangda Tan" <wheele...@gmail.com> wrote:
>
> Thanks Uma,
>
> Could you update this thread once the merge vote started?
>
> Best,
> Wangda
>
>

Re: Hadoop 3.1.0 release discussion

2018-01-28 Thread Gangumalla, Uma

Hi Wangda,

Sorry for the delay.

>>* (Uma) HDFS-10285: HDFS SPS. There're two remaining blockers: 
>>HDFS-12995/HDFS-13050. @Uma could you update what's the ETA of the two JIRAs?
We have only one blocker now that is HDFS-13050 and we finished the key 
implementation from HDFS-12995, by HDFS-13075.
We are planning to start vote by tomorrow (29th PST time). So, we request you 
to give us time for running vote. We will keep SPS off by default. So, 
interested users only can enable explicitly.

Regards,
Uma

From: Wangda Tan <wheele...@gmail.com>
Date: Friday, January 26, 2018 at 5:21 PM
To: Uma Gangumalla <uma.ganguma...@intel.com>
Cc: "hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>, 
"yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>, 
"common-...@hadoop.apache.org" <common-...@hadoop.apache.org>, 
"mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>, Vinod 
Kumar Vavilapalli <vino...@hortonworks.com>
Subject: Re: Hadoop 3.1.0 release discussion

Hi All,

Just a reminder about feature freeze date.

Feature freeze date for 3.1.0 release is Jan 30 PST (about 4 days from today), 
If you've any features which live in a branch and targeted to 3.1.0, please 
reply this email thread. Ideally, we should finish branch merging before 
feature freeze date.

Here's an updated 3.1.0 feature status:

1. Merged features:
* (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
* (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works 
end-to-end.
* (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
* (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename commits.
* (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf Queues 
While Doing Queue Mapping.
* (Chris Douglas) HDFS-9806: HDFS Tiered Storage.
* (Zhankun) YARN-5983: FPGA support. Majority implementations completed and 
merged to trunk. Except for UI/documentation.

2. Features close to finish:
* (Uma) HDFS-10285: HDFS SPS. There're two remaining blockers: 
HDFS-12995/HDFS-13050. @Uma could you update what's the ETA of the two JIRAs?
* (Arun Suresh / Kostas / Wangda). YARN-6592: New SchedulingRequest and 
anti-affinity support. (Voting thread started).

3. Tentative features:
* (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic containers. 
Only one pending patch. Plan to finish before Jan 7th.
* (Haibo Chen). YARN-1011: Resource overcommitment. Looks challenging to be 
done before Jan 2018.
* (Anu): HDFS-7240: Ozone. Given the discussion on HDFS-7240. Looks challenging 
to be done before Jan 2018.
* (Varun V) YARN-5673: container-executor write. Given security refactoring of 
c-e (YARN-6623) is already landed, IMHO other stuff may be moved to 3.2.

Thanks,
Wangda

On Mon, Jan 22, 2018 at 1:49 PM, Gangumalla, Uma 
<uma.ganguma...@intel.com<mailto:uma.ganguma...@intel.com>> wrote:
Sure, Wangda.

Regards,
Uma

On 1/18/18, 10:19 AM, "Wangda Tan" 
<wheele...@gmail.com<mailto:wheele...@gmail.com>> wrote:

Thanks Uma,

Could you update this thread once the merge vote started?

Best,
Wangda

On Wed, Jan 17, 2018 at 4:30 PM, Gangumalla, Uma 
<uma.ganguma...@intel.com<mailto:uma.ganguma...@intel.com>>
wrote:

> HI Wangda,
>
>  Thank you for the head-up mail.
>  We are in the branch (HDFS-10285) and trying to push the tasks sooner
> before the deadline.
>
> Regards,
> Uma
>
> On 1/17/18, 11:35 AM, "Wangda Tan" 
<wheele...@gmail.com<mailto:wheele...@gmail.com>> wrote:
>
> Hi All,
>
> Since we're fast approaching previously proposed feature freeze date
> (Jan
> 30, about 13 days from today). If you've any features which live in a
> branch and targeted to 3.1.0, please reply this email thread. Ideally,
> we
> should finish branch merging before feature freeze date.
>
> Here's an updated 3.1.0 feature status:
>
> 1. Merged & Completed features:
> * (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
> * (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
> end-to-end.
> * (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
> * (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename
> commits.
> * (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
> Queues While Doing Queue Mapping.
> * (Chris Douglas) HDFS-9806: HDFS Tiered Storage.
>
> 2. Features close to finish:
> * (Zhankun) YARN-5983: FPGA support. Majority implementations
> completed and
> merged to trunk. Except for UI/do

Re: Hadoop 3.1.0 release discussion

2018-01-21 Thread Gangumalla, Uma

Sure, Wangda.

Regards,
Uma

On 1/18/18, 10:19 AM, "Wangda Tan" <wheele...@gmail.com> wrote:

Thanks Uma,

Could you update this thread once the merge vote started?

Best,
Wangda

On Wed, Jan 17, 2018 at 4:30 PM, Gangumalla, Uma <uma.ganguma...@intel.com>
wrote:

> HI Wangda,
>
>  Thank you for the head-up mail.
>  We are in the branch (HDFS-10285) and trying to push the tasks sooner
> before the deadline.
>
> Regards,
> Uma
>
> On 1/17/18, 11:35 AM, "Wangda Tan" <wheele...@gmail.com> wrote:
>
> Hi All,
>
> Since we're fast approaching previously proposed feature freeze date
> (Jan
> 30, about 13 days from today). If you've any features which live in a
> branch and targeted to 3.1.0, please reply this email thread. Ideally,
> we
> should finish branch merging before feature freeze date.
>
> Here's an updated 3.1.0 feature status:
>
> 1. Merged & Completed features:
> * (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
> * (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
> end-to-end.
> * (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
> * (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename
> commits.
> * (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
> Queues While Doing Queue Mapping.
> * (Chris Douglas) HDFS-9806: HDFS Tiered Storage.
>
> 2. Features close to finish:
> * (Zhankun) YARN-5983: FPGA support. Majority implementations
> completed and
> merged to trunk. Except for UI/documentation.
> * (Uma) HDFS-10285: HDFS SPS. Majority implementations are done, some
> discussions going on about implementation.
> * (Arun Suresh / Kostas / Wangda). YARN-6592: New SchedulingRequest 
and
> anti-affinity support. Close to finish, on track to be merged before
> Jan 30.
>
> 3. Tentative features:
> * (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic
> containers. Only one pending patch. Plan to finish before Jan 7th.
> * (Haibo Chen). YARN-1011: Resource overcommitment. Looks challenging
> to be
> done before Jan 2018.
> * (Anu): HDFS-7240: Ozone. Given the discussion on HDFS-7240. Looks
> challenging to be done before Jan 2018.
> * (Varun V) YARN-5673: container-executor write. Given security
> refactoring
> of c-e (YARN-6623) is already landed, IMHO other stuff may be moved to
> 3.2.
>
> Thanks,
> Wangda
>
>
>
>
> On Fri, Dec 15, 2017 at 1:20 PM, Wangda Tan <wheele...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Congratulations on the 3.0.0-GA release!
> >
> > As we discussed in the previous email thread [1], I'd like to 
restart
> > 3.1.0 release plans.
> >
> > a) Quick summary:
> > a.1 Release status
> > We started 3.1 release discussion on Sep 6, 2017 [1]. As of today,
> > there’re 232 patches loaded on 3.1.0 alone [2], besides 6 open
> blockers and
> > 22 open critical issues.
> >
> > a.2 Release date update
> > Considering delays of 3.0-GA release by month-and-a-half, I propose
> to
> > move the dates as follows
> >  - feature freeze date from Dec 15, 2017, to Jan 30, 2018 - last
> date for
> > any branches to get merged too;
> >  - code freeze (blockers & critical only) date to Feb 08, 2018;
> >  - release voting start by Feb 18, 2018, leaving time for at least
> two RCx
> >  - release date from Jan 15, 2018, to Feb 28, 2018;
> >
> > Unlike before, I added an additional milestone for
> release-vote-start so
> > that we can account for voting time-period also.
> >
> > This overall is still 5 1/2 months of release-timeline unlike the
> faster
> > cadence we hoped for, but this, in my opinion, is the best-updated
> timeline
> > given the delays of the final release of 3.0-GA.
> >
> > b) Individual feature status:
> > I spoke to several feature owners and checked the status of
> un-finished
> > features, following are status of

Re: Hadoop 3.1.0 release discussion

2018-01-17 Thread Gangumalla, Uma

HI Wangda,

 Thank you for the head-up mail.
 We are in the branch (HDFS-10285) and trying to push the tasks sooner before 
the deadline.

Regards,
Uma

On 1/17/18, 11:35 AM, "Wangda Tan"  wrote:

Hi All,

Since we're fast approaching previously proposed feature freeze date (Jan
30, about 13 days from today). If you've any features which live in a
branch and targeted to 3.1.0, please reply this email thread. Ideally, we
should finish branch merging before feature freeze date.

Here's an updated 3.1.0 feature status:

1. Merged & Completed features:
* (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
* (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
end-to-end.
* (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
* (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename commits.
* (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
Queues While Doing Queue Mapping.
* (Chris Douglas) HDFS-9806: HDFS Tiered Storage.

2. Features close to finish:
* (Zhankun) YARN-5983: FPGA support. Majority implementations completed and
merged to trunk. Except for UI/documentation.
* (Uma) HDFS-10285: HDFS SPS. Majority implementations are done, some
discussions going on about implementation.
* (Arun Suresh / Kostas / Wangda). YARN-6592: New SchedulingRequest and
anti-affinity support. Close to finish, on track to be merged before Jan 30.

3. Tentative features:
* (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic
containers. Only one pending patch. Plan to finish before Jan 7th.
* (Haibo Chen). YARN-1011: Resource overcommitment. Looks challenging to be
done before Jan 2018.
* (Anu): HDFS-7240: Ozone. Given the discussion on HDFS-7240. Looks
challenging to be done before Jan 2018.
* (Varun V) YARN-5673: container-executor write. Given security refactoring
of c-e (YARN-6623) is already landed, IMHO other stuff may be moved to 3.2.

Thanks,
Wangda

On Fri, Dec 15, 2017 at 1:20 PM, Wangda Tan  wrote:

> Hi all,
>
> Congratulations on the 3.0.0-GA release!
>
> As we discussed in the previous email thread [1], I'd like to restart
> 3.1.0 release plans.
>
> a) Quick summary:
> a.1 Release status
> We started 3.1 release discussion on Sep 6, 2017 [1]. As of today,
> there’re 232 patches loaded on 3.1.0 alone [2], besides 6 open blockers 
and
> 22 open critical issues.
>
> a.2 Release date update
> Considering delays of 3.0-GA release by month-and-a-half, I propose to
> move the dates as follows
>  - feature freeze date from Dec 15, 2017, to Jan 30, 2018 - last date for
> any branches to get merged too;
>  - code freeze (blockers & critical only) date to Feb 08, 2018;
>  - release voting start by Feb 18, 2018, leaving time for at least two RCx
>  - release date from Jan 15, 2018, to Feb 28, 2018;
>
> Unlike before, I added an additional milestone for release-vote-start so
> that we can account for voting time-period also.
>
> This overall is still 5 1/2 months of release-timeline unlike the faster
> cadence we hoped for, but this, in my opinion, is the best-updated 
timeline
> given the delays of the final release of 3.0-GA.
>
> b) Individual feature status:
> I spoke to several feature owners and checked the status of un-finished
> features, following are status of features planned to 3.1.0:
>
> b.1 Merged & Completed features:
> * (Sunil) YARN-5881: Support absolute value in CapacityScheduler.
> * (Wangda) YARN-6223: GPU support on YARN. Features in trunk and works
> end-to-end.
> * (Jian) YARN-5079,YARN-4793,YARN-4757,YARN-6419 YARN native services.
> * (Steve Loughran): HADOOP-13786: S3Guard committer for zero-rename
> commits.
> * (Suma): YARN-7117: Capacity Scheduler: Support Auto Creation of Leaf
> Queues While Doing Queue Mapping.
>
> b.2 Features close to finish:
> * (Chris Douglas) HDFS-9806: HDFS Tiered Storage. Being voting now.
> * (Zhankun) YARN-5983: FPGA support. Majority implementations completed
> and merged to trunk. Except for UI/documentation.
> * (Uma) HDFS-10285: HDFS SPS. Majority implementations are done, some
> discussions going on about implementation.
>
> b.3 Tentative features:
> * (Arun Suresh). YARN-5972: Support pausing/freezing opportunistic
> containers. Only one pending patch. Plan to finish before Jan 7th.
> * (Haibo Chen). YARN-1011: Resource overcommitment. Looks challenging to
> be done before Jan 2018.
> * (Arun Suresh / Kostas / Wangda). YARN-6592: New SchedulingRequest and
> anti-affinity support. Tentative will figure out by Jan 1st.
> * (Anu):

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-13 Thread Gangumalla, Uma

Here is my +1(binding) too. 
Sorry for late vote.

Verified signatures of the source tarball.
built from source.
set up a 2-node test cluster.
Tested via HDFS commands and java API – Written bunch of files and read back. 
Ran basic MR job

Thanks Andrew and others for the hard work for getting Hadoop 3.0 out.

Regards,
Uma

On 12/13/17, 1:05 PM, "Andrew Wang"  wrote:

Hi folks,

To close this out, the vote passes successfully with 13 binding +1s, 5
non-binding +1s, and no -1s. Thanks everyone for voting! I'll work on
staging.

I'm hoping we can address YARN-7588 and any remaining rolling upgrade
issues in 3.0.x maintenance releases. Beyond a wiki page, it would be
really great to get JIRAs filed and targeted for tracking as soon as
possible.

Vinod, what do you think we need to do regarding caveating rolling upgrade
support? We haven't advertised rolling upgrade support between major
releases outside of dev lists and JIRA. As a new major release, our compat
guidelines allow us to break compatibility, so I don't think it's expected
by users.

Best,
Andrew

On Wed, Dec 13, 2017 at 12:37 PM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> I was waiting for Daniel to post the minutes from YARN meetup to talk
> about this. Anyways, in that discussion, we identified a bunch of key
> upgrade related scenarios that no-one seems to have validated - atleast
> from the representation in the YARN meetup. I'm going to create a 
wiki-page
> listing all these scenarios.
>
> But back to the bug that Junping raised. At this point, we don't have a
> clear path towards running 2.x applications on 3.0.0 clusters. So, our
> claim of rolling-upgrades already working is not accurate.
>
> One of the two options that Junping proposed should be pursued before we
> close the release. I'm in favor of calling out rolling-upgrade support be
> with-drawn or caveated and push for progress instead of blocking the
> release.
>
> Thanks
> +Vinod
>
> > On Dec 12, 2017, at 5:44 PM, Junping Du  wrote:
> >
> > Thanks Andrew for pushing new RC for 3.0.0. I was out last week, just
> get chance to validate new RC now.
> >
> > Basically, I found two critical issues with the same rolling upgrade
> scenario as where HADOOP-15059 get found previously:
> > HDFS-12920, we changed value format for some hdfs configurations that
> old version MR client doesn't understand when fetching these
> configurations. Some quick workarounds are to add old value (without time
> unit) in hdfs-site.xml to override new default values but will generate
> many annoying warnings. I provided my fix suggestions on the JIRA already
> for more discussion.
> > The other one is YARN-7646. After we workaround HDFS-12920, will hit the
> issue that old version MR AppMaster cannot communicate with new version of
> YARN RM - could be related to resource profile changes from YARN side but
> root cause are still in investigation.
> >
> > The first issue may not belong to a blocker given we can workaround this
> without code change. I am not sure if we can workaround 2nd issue so far.
> If not, we may have to fix this or compromise with withdrawing support of
> rolling upgrade or calling it a stable release.
> >
> >
> > Thanks,
> >
> > Junping
> >
> > 
> > From: Robert Kanter 
> > Sent: Tuesday, December 12, 2017 3:10 PM
> > To: Arun Suresh
> > Cc: Andrew Wang; Lei Xu; Wei-Chiu Chuang; Ajay Kumar; Xiao Chen; Aaron
> T. Myers; common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> > Subject: Re: [VOTE] Release Apache Hadoop 3.0.0 RC1
> >
> > +1 (binding)
> >
> > + Downloaded the binary release
> > + Deployed on a 3 node cluster on CentOS 7.3
> > + Ran some MR jobs, clicked around the UI, etc
> > + Ran some CLI commands (yarn logs, etc)
> >
> > Good job everyone on Hadoop 3!
> >
> >
> > - Robert
> >
> > On Tue, Dec 12, 2017 at 1:56 PM, Arun Suresh  wrote:
> >
> >> +1 (binding)
> >>
> >> - Verified signatures of the source tarball.
> >> - built from source - using the docker build environment.
> >> - set up a pseudo-distributed test cluster.
> >> - ran basic HDFS commands
> >> - ran some basic MR jobs
> >>
> >> Cheers
> >> -Arun
> >>
> >> On Tue, Dec 12, 2017 at 1:52 PM, Andrew Wang 
> >> wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> As a reminder, this vote closes tomorrow at 12:31pm, so please give it
> a
> >>> whack if you

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Gangumalla, Uma

Plan looks good to me.

+1

Regards,
Uma

On 8/25/17, 10:36 AM, "Andrew Wang"  wrote:

>Hi folks,
>
>With 3.0.0-beta1 fast approaching, I wanted to go over the proposed
>branching strategy.
>
>In the early 2.x days, moving trunk immediately to 3.0.0 was a mistake.
>branch-2 and trunk were virtually identical, and increased backport
>complexity. Until we need to make incompatible changes, there's no need
>for
>a Hadoop 4.0 version.
>
>Thus, here's a proposal of branches and versions:
>
>trunk: 3.1.0-SNAPSHOT
>branch-3.0: 3.0.0-beta1-SNAPSHOT
>branch-2 and etc: remain as is
>
>LMK questions/comments/etc. Appreciate your attentiveness; I'm hoping to
>build consensus quickly since we have a number of open VOTEs for branch
>merges.
>
>Thanks,
>Andrew


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS] Merge Storage Policy Satisfier (SPS) [HDFS-10285] feature branch to trunk

2017-08-19 Thread Gangumalla, Uma

Hi Andrew,

>Great to hear. It'd be nice to define which use cases are met by the current 
>version of SPS, and which will be handled after the merge.
After the discussions in JIRA, we planned to support recursive API as well. The 
primary use cases we planned was for Hbase. Please check next point for use 
case details.

>A bit more detail in the design doc on how HBase would use this feature would 
>also be helpful. Is there an HBase JIRA already?
Please find the usecase details at this comment in JIRA: 
https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16120227=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16120227

>I also spent some more time with the design doc and posted a few questions on 
>the JIRA.
Thank you for the reviews.

To summarize the discussions in JIRA:
1. After the feedbacks from Andrew, Eddy, Xiao in JIRA reviews, we planned to 
take up the support for recursive API support. 
HDFS-12291 (Rakesh started 
the work on it)
2. Xattr optimizations 
HDFS-12225 (Patch available)
3. Few other review comments already fixed and committed 
HDFS-12214

For tracking the follow-up tasks we filed JIRA HDFS-12226, they should not be 
critical for merge.

Regards,
Uma

From: Andrew Wang >
Date: Friday, July 28, 2017 at 11:33 AM
To: Uma Gangumalla >
Cc: "hdfs-dev@hadoop.apache.org" 
>
Subject: Re: [DISCUSS] Merge Storage Policy Satisfier (SPS) [HDFS-10285] 
feature branch to trunk

Hi Uma,

> If there are still plans to make changes that affect compatibility (the 
> hybrid RPC and bulk DN work mentioned sound like they would), then we can cut 
> branch-3 first, or wait to merge until after these tasks are finished.
[Uma] We don’t see that 2 items as high priority for the feature. Users would 
be able to use the feature with current code base and API. So, we would 
consider them after branch-3 only. That should be perfectly fine IMO. The 
current API is very much useful for Hbase scenario. In Hbase case, they will 
rename files under to different policy directory. They will not set the 
policies always. So, when rename files under to different policy directory, 
they can simply call satisfyStoragePolicy, they don’t need any hybrid API.

Great to hear. It'd be nice to define which usecases are met by the current 
version of SPS, and which will be handled after the merge.

A bit more detail in the design doc on how HBase would use this feature would 
also be helpful. Is there an HBase JIRA already?

I also spent some more time with the design doc and posted a few questions on 
the JIRA.

Best,
Andrew

Re: [DISCUSS] Merge Storage Policy Satisfier (SPS) [HDFS-10285] feature branch to trunk

2017-07-28 Thread Gangumalla, Uma

Hi Andrew, Thanks a lot for reviewing.

Your understanding on the 2 factors are totally right. More than 90% of code 
was newly added and very less portion of existing code was touched, that is for 
NN RPCs and DN messages. We can see that in combined patch stats ( only 45 
lines with "-“ )

> If there are still plans to make changes that affect compatibility (the 
> hybrid RPC and bulk DN work mentioned sound like they would), then we can cut 
> branch-3 first, or wait to merge until after these tasks are finished.
[Uma] We don’t see that 2 items as high priority for the feature. Users would 
be able to use the feature with current code base and API. So, we would 
consider them after branch-3 only. That should be perfectly fine IMO. The 
current API is very much useful for Hbase scenario. In Hbase case, they will 
rename files under to different policy directory. They will not set the 
policies always. So, when rename files under to different policy directory, 
they can simply call satisfyStoragePolicy, they don’t need any hybrid API.

>* Possible impact when this feature is disabled
[Uma] Related to this point, I wanted to highlight about dynamic activation and 
deactivation of the feature.That means, without restarting Namenode, feature 
can be disabled/enabled.
If feature is disabled, there should be 0 impact. As we have dynamic enabling 
feature, we will not even initialize threads if feature is disabled. The 
service will be initialized when enabled. For easy review, please look at the 
last section in this documentation 
ArchivalStorage.html<https://issues.apache.org/jira/secure/attachment/12877327/ArchivalStorage.html>

<https://issues.apache.org/jira/secure/attachment/12877327/ArchivalStorage.html>
Also Tiered storage + hdfs mounts solution wants to use SPS feature. 
https://issues.apache.org/jira/browse/HDFS-12090 . So, having this SPS upstream 
would allow HDFS-12090( dependent) feature to proceed.(I don’t say, we have to 
merge because of this reason alone, but I would just like to mention about it 
as an endorsement to the feature. :-) )

Regards,
Uma

From: Andrew Wang <andrew.w...@cloudera.com<mailto:andrew.w...@cloudera.com>>
Date: Thursday, July 27, 2017 at 12:15 PM
To: Uma Gangumalla <uma.ganguma...@intel.com<mailto:uma.ganguma...@intel.com>>
Cc: "hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>" 
<hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>>
Subject: Re: [DISCUSS] Merge Storage Policy Satisfier (SPS) [HDFS-10285] 
feature branch to trunk

Hi Uma, Rakesh,

First off, I like the idea of this feature. It'll definitely make HSM easier to 
use.

With my RM hat on, I gave the patch a quick skim looking for:

* Possible impact when this feature is disabled
* API stability and other compat concerns

At a high-level, it looks like it uses xattrs rather than new edit log ops to 
track files being moved. Some new NN RPCs and DN messages added to interact 
with the feature. Almost entirely new code that doesn't modify the guts of HDFS 
much.

Could you comment further on these two concerns? We're closing in on 
3.0.0-beta1, so the merge of any large amount of new code makes me wary. If 
there are still plans to make changes that affect compatibility (the hybrid RPC 
and bulk DN work mentioned sound like they would), then we can cut branch-3 
first, or wait to merge until after these tasks are finished.

Best,
Andrew

On Mon, Jul 24, 2017 at 11:35 PM, Gangumalla, Uma 
<uma.ganguma...@intel.com<mailto:uma.ganguma...@intel.com>> wrote:
Dear All,

I would like to propose Storage Policy Satisfier(SPS) feature merge into trunk. 
We have been working on this feature from last several months. This feature 
received the contributions from different companies. All of the feature 
development happened smoothly and collaboratively in JIRAs.

Detailed design document is available in JIRA: 
Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf<https://issues.apache.org/jira/secure/attachment/12873642/Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf>
Test report attached to JIRA: 
HDFS-SPS-TestReport-20170708.pdf<https://issues.apache.org/jira/secure/attachment/12876256/HDFS-SPS-TestReport-20170708.pdf>

Short Description of the feature:-
   Storage Policy Satisfier feature is to aim the distributed HDFS applications 
to schedule the block movements easily.
   When storage policy change happened, user can invoke the 
satisfyStoragePolicy api to trigger the block storage movements.
   Block movement tasks will be assigned to datanodes and movements will happen 
distributed fashion.
   Block level movement tracking also has been distributed to Dns to avoid the 
load on Namenodes.
   A co-ordinator Datanode tracks all the blocks associated to a 
blockCollection and send the consolidated final results to Namenode.
   If movement result is failure, Namenode will re-schedule the blo

[DISCUSS] Merge Storage Policy Satisfier (SPS) [HDFS-10285] feature branch to trunk

2017-07-25 Thread Gangumalla, Uma

Dear All,

I would like to propose Storage Policy Satisfier(SPS) feature merge into trunk. 
We have been working on this feature from last several months. This feature 
received the contributions from different companies. All of the feature 
development happened smoothly and collaboratively in JIRAs.

Detailed design document is available in JIRA: 
Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf
Test report attached to JIRA: 
HDFS-SPS-TestReport-20170708.pdf

Short Description of the feature:-
   Storage Policy Satisfier feature is to aim the distributed HDFS applications 
to schedule the block movements easily.
   When storage policy change happened, user can invoke the 
satisfyStoragePolicy api to trigger the block storage movements.
   Block movement tasks will be assigned to datanodes and movements will happen 
distributed fashion.
   Block level movement tracking also has been distributed to Dns to avoid the 
load on Namenodes.
   A co-ordinator Datanode tracks all the blocks associated to a 
blockCollection and send the consolidated final results to Namenode.
   If movement result is failure, Namenode will re-schedule the block movements.

Development branch is: HDFS-10285
No of JIRAs Resolved: 38
Pending JIRAs: 4 (I don’t think they are blockers for merge)

We have posted combined patch for easy merge reviews. Jenkins job test results 
looking good on the combined patch.
Quick stats on combined Patch:
  67 files changed, 7001 insertions(+), 45 deletions(-)
  Added/modified testcases= ~70


Thanks to all helpers namely Andrew Wang, Anoop Sam John, Du Jingcheng , Ewan 
Higgs, Jing Zhao, Kai Zheng,  Rakesh R, Ramakrishna , Surendra Singh Lilhore , 
Uma Maheswara Rao G, Wei Zhou , Yuanbo Liu. Without these members effort, this 
feature might not have reached to this state.

We will continue work on the following future work items:

  1.  Presently user has to do set & satisfy policy in separate RPC calls. The 
idea is to provide a hybrid API dfs#setStoragePolicy(src, policy) which should 
do set and satisfy in one RPC call to namenode (Reference HDFS-11669)
  2.  Presently BlockStorageMovementCommand sends all the blocks under a 
trackID over single heartbeat response. If blocks are many under a given 
trackID (For example: a file contains many blocks) then that bulk information 
goes to DN in a single network call and come with a lot of overhead. One idea 
is to Use smaller batches of BlockMovingInfo into the block storage movement 
command (Reference HDFS-11125)
  3.  Build a mechanism to throttle the number of concurrent moves at the 
datanode.
  4.  Allow specifying initial delay in seconds before the source file is 
scheduled for satisfying the storage policy. For example in HBase, the interval 
between archive (move files between different storages) and delete file is not 
large. In that case it may not be required to immediately scheduling satisfy 
policy task.
  5.  SPS related metrics to be covered.

So, I feel this branch is ready for merge into trunk. Please provide your 
feedbacks. If there are no objections, I will proceed for voting.

Regards,
Uma & Rakesh

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-08-31 Thread Gangumalla, Uma

+1 (binding).

Overall it¹s a great effort, Andrew. Thank you for putting all the energy.

Downloaded and built.
Ran some sample jobs.

I would love to see all this efforts will lead to get the GA from Hadoop
3.X soon.

Regards,
Uma


On 8/30/16, 8:51 AM, "Andrew Wang"  wrote:

>Hi all,
>
>Thanks to the combined work of many, many contributors, here's an RC0 for
>3.0.0-alpha1:
>
>http://home.apache.org/~wang/3.0.0-alpha1-RC0/
>
>alpha1 is the first in a series of planned alpha releases leading up to
>GA.
>The objective is to get an artifact out to downstreams for testing and to
>iterate quickly based on their feedback. So, please keep that in mind when
>voting; hopefully most issues can be addressed by future alphas rather
>than
>future RCs.
>
>Sorry for getting this out on a Tuesday, but I'd still like this vote to
>run the normal 5 days, thus ending Saturday (9/3) at 9AM PDT. I'll extend
>if we lack the votes.
>
>Please try it out and let me know what you think.
>
>Best,
>Andrew


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS] Retire BKJM from trunk?

2016-07-27 Thread Gangumalla, Uma

For Huawei, Vinay/Brahma should know about their usage. I think after QJM
stabilized and ready they also adopted to QJM is what I know, but they
should know more than me as I left that employer while ago.

If no one is using it, It is ok to remove.

Regards,
Uma

On 7/27/16, 9:49 PM, "Rakesh Radhakrishnan"  wrote:

>If I remember correctly, Huawei also adopted QJM component. I hope @Vinay
>might have discussed internally in Huawei before starting this e-mail
>discussion thread. I'm +1, for removing the bkjm contrib from the trunk
>code.
>
>Also, there are quite few open sub-tasks under HDFS-3399 umbrella jira,
>which was used for the BKJM implementation time. How about closing these
>jira by marking as "Won't Fix"?
>
>Thanks,
>Rakesh
>Intel
>
>On Thu, Jul 28, 2016 at 1:53 AM, Sijie Guo  wrote:
>
>> + Rakesh and Uma
>>
>> Rakesh and Uma might have a better idea on this. I think Huawei was
>>using
>> it when Rakesh and Uma worked there.
>>
>> - Sijie
>>
>> On Wed, Jul 27, 2016 at 12:06 PM, Chris Nauroth
>>
>> wrote:
>>
>> > I recommend including the BookKeeper community in this discussion.
>>I¹ve
>> > added their user@ and dev@ lists to this thread.
>> >
>> > I do not see BKJM being used in practice.  Removing it from trunk
>>would
>> be
>> > attractive in terms of less code for Hadoop to maintain and build,
>>but if
>> > we find existing users that want to keep it, I wouldn¹t object.
>> >
>> > --Chris Nauroth
>> >
>> > On 7/26/16, 11:14 PM, "Vinayakumar B" 
>>wrote:
>> >
>> > Hi All,
>> >
>> >BKJM was Active and made much stable when the NameNode HA was
>> > implemented and there was no QJM implemented.
>> >Now QJM is present and is much stable which is adopted by many
>> > production environment.
>> >I wonder whether it would be a good time to retire BKJM from
>> trunk?
>> >
>> >Are there any users of BKJM exists?
>> >
>> > -Vinay
>> >
>> >
>> >
>>


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Gangumalla, Uma



On 6/13/16, 12:41 PM, "Anu Engineer"  wrote:

>Hi Colin,
>
>>Even if everyone used branches for all development, person X might merge
>>their branch before person Y, forcing person Y to do a rebase or merge.
>>It is not the presence of absence of branches that causes the need to
>>merge or rebase, but the presence of absence of "churn."
>
>You are perfectly right on this technically. The issue is when a
>branch developer gets caught in Commit, Revert, let-us-commit-again,
>oh-it-is-not-fixed-completely, let-us-revert-the-revert cycle.
>
>I was hoping that branches will be exposed to less of this if everyone
>had private branches and got some time to test and bake the feature
>instead of just directly committing to trunk and then test.
>
>Once again, I agree with your point that in a perfect world, merges should
>be about the churn, but trunk is often treated as development branch,
>So my point is that it gets unnecessary churn. I really appreciate the
>thought in the thread - that is - let us be more responsible about how we
>treat trunk.
>
>> I thought the feature branch merge voting period had been shortened to 5
>>days rather than 7?  We should probably spell this out on
>>https://hadoop.apache.org/bylaws.html
>
>Thanks for the link, right now it says 7 days. That is why I assumed it
>is 7. 
>Would you be kind enough to point me to a thread that says it is 5 days
>for a merge Vote? 
>I did a google search, but was not able to find a thread like that.
>Thanks in advance.
I remember 5days voting was related to release. Not sure that time we
discussed about branch merge voting time.
Here is the link: 
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201406.mbox/%3C64A
2c234-dd6a-4e4c-b52d-e91d5d472...@hortonworks.com%3E
>
>Thanks
>Anu
>
>
>On 6/13/16, 11:51 AM, "Colin McCabe"  wrote:
>
>>On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
>>> > On 10 Jun 2016, at 20:37, Anu Engineer 
>>>wrote:
>>> > 
>>> > I actively work on two branches (Diskbalancer and ozone) and I agree
>>>with most of what Sangjin said.
>>> > There is an overhead in working with branches, there are both
>>>technical costs and administrative issues
>>> > which discourages developers from using branches.
>>> > 
>>> > I think the biggest issue with branch based development is that fact
>>>that other developers do not use a branch.
>>> > If a small feature appears as a series of commits to
>>>ââdatanode.javaââ, the branch based developer ends up rebasing
>>> > and paying this price of rebasing many times. If everyone followed a
>>>model of branch + Pull request, other branches
>>> > would not have to deal with continues rebasing to trunk commits. If
>>>we are moving to a branch based
>>
>>Even if everyone used branches for all development, person X might merge
>>their branch before person Y, forcing person Y to do a rebase or merge.
>>It is not the presence of absence of branches that causes the need to
>>merge or rebase, but the presence of absence of "churn."
>>
>>We try to minimize "churn" in many ways.  For example, we discourage
>>people from making trivial whitespace changes to parts of the code
>>they're not modifying in their patch.  Or doing things like letting
>>their editor change the line ending of files from LF to CR/LF.  However,
>>in the final analysis, churn will always exist because development
>>exists.
>>
>>> > development, we should probably move to that model for most
>>>development to avoid this tax on people who
>>> > actually end up working in the branches.
>>> > 
>>> > I do have a question in my mind though: What is being proposed is
>>>that we move active development to branches
>>> > if the feature is small or incomplete, however keep the trunk open
>>>for check-ins. One of the biggest reason why we
>>> > check-in into trunk and not to branch-2 is because it is a change
>>>that will break backward compatibility. So do we
>>> > have an expectation of backward compatibility thru the 3.0-alpha
>>>series (I personally vote No, since 3.0 is experimental
>>> > at this stage), but if we decide to support some sort of
>>>backward-compact then willy-nilly committing to trunk
>>> > and still maintaining the expectation we can release Alphas from 3.0
>>>does not look possible.
>>> > 
>>> > And then comes the question, once 3.0 becomes official, where do we
>>>check-in a change,  if that would break something?
>>> > so this will lead us back to trunk being the unstable â 3.0 being
>>>the new âbranch-2â.
>>
>>I'm not sure I really understand the goal of the "trunk-incompat"
>>proposal.  Like Karthik asked earlier in this thread, isn't it really
>>just a rename of the existing trunk branch?
>>It sounds like the policy is going to be exactly the same as now:
>>incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
>>changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
>>etc.
>>
>>I think we should

Re: [DISCUSS] Set minimum version of Hadoop 3 to JDK8 (HADOOP-11858)

2016-05-11 Thread Gangumalla, Uma

+1

Regards,
Uma

On 5/10/16, 2:24 PM, "Andrew Wang"  wrote:

>+1
>
>On Tue, May 10, 2016 at 12:36 PM, Ravi Prakash 
>wrote:
>
>> +1. Thanks for driving this Akira
>>
>> On Tue, May 10, 2016 at 10:25 AM, Tsuyoshi Ozawa 
>>wrote:
>>
>> > > Before cutting 3.0.0-alpha RC, I'd like to drop JDK7 support in
>>trunk.
>> >
>> > Sounds good. To do so, we need to check the blockers of 3.0.0-alpha
>> > RC, especially upgrading all dependencies which use refractions at
>> > first.
>> >
>> > Thanks,
>> > - Tsuyoshi
>> >
>> > On Tue, May 10, 2016 at 8:32 AM, Akira AJISAKA
>> >  wrote:
>> > > Hi developers,
>> > >
>> > > Before cutting 3.0.0-alpha RC, I'd like to drop JDK7 support in
>>trunk.
>> > > Given this is a critical change, I'm thinking we should get the
>> consensus
>> > > first.
>> > >
>> > > One concern I think is, when the minimum version is set to JDK8, we
>> need
>> > to
>> > > configure Jenkins to disable multi JDK test only in trunk.
>> > >
>> > > Any thoughts?
>> > >
>> > > Thanks,
>> > > Akira
>> > >
>> > > 
>>-
>> > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>> > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>> > >
>> >
>> > -
>> > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
>> > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>> >
>> >
>>


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Gangumalla, Uma

Yes. I think starting 3.0 release with alpha is good idea. So it would get
some time to reach the beta or GA.

+1 for the plan.

For the compatibility purposes and as current stable versions, we should
continue 2.x releases anyway.

Thanks Andrew for starting the thread.

Regards,
Uma

On 2/18/16, 3:04 PM, "Andrew Wang"  wrote:

>Hi Kihwal,
>
>I think there's still value in continuing the 2.x releases. 3.x comes with
>the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>be beta or GA for some number of months. In the meanwhile, it'd be good to
>keep putting out regular, stable 2.x releases.
>
>Best,
>Andrew
>
>
>On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee 
>wrote:
>
>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>motivations,
>> are we getting rid of branch-2.8?
>>
>> Kihwal
>>
>>   From: Andrew Wang 
>>  To: "common-...@hadoop.apache.org" 
>> Cc: "yarn-...@hadoop.apache.org" ; "
>> mapreduce-...@hadoop.apache.org" ;
>> hdfs-dev 
>>  Sent: Thursday, February 18, 2016 4:35 PM
>>  Subject: Re: Looking to a Hadoop 3 release
>>
>> Hi all,
>>
>> Reviving this thread. I've seen renewed interest in a trunk release
>>since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>>
>> My overall plan is still the same as in my original email: a series of
>> regular alpha releases leading up to beta and GA. Alpha releases make it
>> easier for downstreams to integrate with our code, and making them
>>regular
>> means features can be included when they are ready.
>>
>> I know there are some incompatible changes waiting in the wings
>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>If
>> you have changes like this, please set the target version to 3.0.0 and
>>mark
>> them "Incompatible". We can use this JIRA query to track:
>>
>>
>> 
>>https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>
>> There's some release-related stuff that needs to be sorted out (namely,
>>the
>> new CHANGES.txt and release note generation from Yetus), but I'd
>> tentatively like to roll the first alpha a month out, so third week of
>> March.
>>
>> Best,
>> Andrew
>>
>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata 
>>wrote:
>>
>> > Avoiding the use of JDK8 language features (and, presumably, APIs)
>> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>> > source version to JDK8.
>> >
>> > Also, note that releasing from trunk is a way of achieving #3, it's
>> > not a way of abandoning it.
>> >
>> >
>> >
>> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
>> > wrote:
>> > > Hi Raymie,
>> > >
>> > > Konst proposed just releasing off of trunk rather than cutting a
>> > branch-2,
>> > > and there was general agreement there. So, consider #3 abandoned.
>>1&2
>> can
>> > > be achieved at the same time, we just need to avoid using JDK8
>>language
>> > > features in trunk so things can be backported.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
>> > wrote:
>> > >
>> > >> In this (and the related threads), I see the following three
>> > requirements:
>> > >>
>> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>> > >>
>> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
>> > >> feature sets as 3.x."
>> > >>
>> > >> 3. Avoid the "risk of split-brain behavior" by "minimize
>>backporting
>> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>tedious.
>> > >> Adding a branch-3, branch-3.x would be obnoxious."
>> > >>
>> > >> These three cannot be achieved at the same time.  Which do we
>>abandon?
>> > >>
>> > >>
>> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>>
>> > >> wrote:
>> > >> >
>> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth 
>> wrote:
>> > >> >>
>> > >> >> 2) Simplification of configs - potentially separating client
>>side
>> > >> configs
>> > >> >> and those used by daemons. This is another source of perpetual
>> > confusion
>> > >> >> for users.
>> > >> > + 1 on this.
>> > >> >
>> > >> > sanjay
>> > >>
>> >
>>
>>
>>

Re: Hadoop encryption module as Apache Chimera incubator project

2016-02-11 Thread Gangumalla, Uma

Thanks Haifeng. I was just waiting if any more comments. If no objections
further, I would initiate a discussion thread in Apache Commons in a day
time and will also cc to hadoop common.

Regards,
Uma

On 2/11/16, 6:13 PM, "Chen, Haifeng" <haifeng.c...@intel.com> wrote:

>Thanks all the folks participating this discussion and providing valuable
>suggestions and options.
>
>I suggest we take it forward to make a proposal in Apache Commons
>community. 
>
>Thanks,
>Haifeng
>
>-Original Message-
>From: Chen, Haifeng [mailto:haifeng.c...@intel.com]
>Sent: Friday, February 5, 2016 10:06 AM
>To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org
>Subject: RE: Hadoop encryption module as Apache Chimera incubator project
>
>> [Chirs] Yes, but even if the artifact is widely consumed, as a TLP it
>>would need to sustain a community. If the scope is too narrow, then it
>>will quickly fall into maintenance mode, its contributors will move on,
>>and it will retire to the attic. Alone, I doubt its viability as a TLP.
>>So as a first option, donating only this code to Apache Commons would
>>accomplish some immediate goals in a sustainable forum.
>Totally agree. As a TLP it needs nice scope and roadmap to sustain a
>development community.
>
>Thanks,
>Haifeng
>
>-Original Message-
>From: Chris Douglas [mailto:cdoug...@apache.org]
>Sent: Friday, February 5, 2016 6:28 AM
>To: common-...@hadoop.apache.org
>Cc: hdfs-dev@hadoop.apache.org
>Subject: Re: Hadoop encryption module as Apache Chimera incubator project
>
>On Thu, Feb 4, 2016 at 12:06 PM, Gangumalla, Uma
><uma.ganguma...@intel.com> wrote:
>
>> [UMA] Ok. Great. You are right. I have cc¹ed to hadoop common. (You
>> mean to cc Apache commons as well?)
>
>I meant, if you start a discussion with Apache Commons, please CC
>common-dev@hadoop to coordinate.
>
>> [UMA] Right now we plan to have encryption libraries are the only
>> one¹s we planned and as we see lot of interest from other projects
>> like spark to use them. I see some challenges when we bring lot of
>> code(other common
>> codes) into this project is that, they all would have different
>> requirements and may be different expected timelines for release etc.
>> Some projects may just wanted to use encryption interfaces alone but
>>not all.
>> As they are completely independent codes, may be better to scope out
>> clearly.
>
>Yes, but even if the artifact is widely consumed, as a TLP it would need
>to sustain a community. If the scope is too narrow, then it will quickly
>fall into maintenance mode, its contributors will move on, and it will
>retire to the attic. Alone, I doubt its viability as a TLP. So as a first
>option, donating only this code to Apache Commons would accomplish some
>immediate goals in a sustainable forum.
>
>APR has a similar scope. As a second option, that may also be a
>reasonable home, particularly if some of the native bits could integrate
>with APR.
>
>If the scope is broader, the effort could sustain prolonged development.
>The current code is developing a strategy for packing native libraries on
>multiple platforms, a capability that, say, the native compression codecs
>(AFAIK) still lack. While java.nio is improving, many projects would
>benefit from a better, native interface to the filesystem (e.g.,
>NativeIO). We could avoid duplicating effort and collaborate on a common
>library.
>
>As a third option, Hadoop already implements some useful native
>libraries, which is why a subproject might be a sound course. That would
>enable the subproject to coordinate with Hadoop on migrating its native
>functionality to a separable, reusable component, then move to a TLP when
>we can rely on it exclusively (if it has a well-defined, independent
>community). It could control its release cadence and limit its
>dependencies.
>
>Finally, this is beside the point if nobody is interested in doing the
>work on such a project. It's rude to pull code out of Hadoop and donate
>it to another project so Spark can avoid a dependency, but this instance
>seems reasonable to me. -C
>
>[1] https://apr.apache.org/
>
>> On 2/3/16, 6:46 PM, "Chen, Haifeng" <haifeng.c...@intel.com> wrote:
>>
>>>Thanks Chris.
>>>
>>>>> I went through the repository, and now understand the reasoning
>>>>>that would locate this code in Apache Commons. This isn't proposing
>>>>>to extract much of the implementation and it takes none of the
>>>>>integration. It's limited to interfaces to crypto libraries and
>>>>>streams/configuration.
>>>Exactl

Re: Hadoop encryption module as Apache Chimera incubator project

2016-02-04 Thread Gangumalla, Uma

storical baggage for other projects to rely on.
>I agree with Colin/Steve: we don't want this to grow into another
>guava-like dependency that creates more work in conflicts than it saves
>in implementation...
>
>Would it make sense to also package some of the compression libraries,
>and maybe some of the text processing from MapReduce? Evolving some of
>this code to a common library with few/no dependencies would be generally
>useful. As a subproject, it could have a broader scope that could evolve
>into a viable TLP. If the encryption libraries are the only ones you're
>interested in pulling out, then Apache Commons does seem like a better
>target than a separate project. -C
>
>
>On Wed, Feb 3, 2016 at 1:49 AM, Chris Douglas <cdoug...@apache.org> wrote:
>> On Wed, Feb 3, 2016 at 12:48 AM, Gangumalla, Uma
>> <uma.ganguma...@intel.com> wrote:
>>>>Standing in the point of shared fundamental piece of code like this,
>>>>I do think Apache Commons might be the best direction which we can
>>>>try as the first effort. In this direction, we still need to work
>>>>with Apache Common community for buying in and accepting the proposal.
>>> Make sense.
>>
>> Makes sense how?
>>
>>> For this we should define the independent release cycles for this
>>> project and it would just place under Hadoop tree if we all conclude
>>> with this option at the end.
>>
>> Yes.
>>
>>> [Chris]
>>>>If Chimera is not successful as an independent project or stalls,
>>>>Hadoop and/or Spark and/or $project will have to reabsorb it as
>>>>maintainers.
>>>>
>>> I am not so strong on this point. If we assume project would be
>>> unsuccessful, it can be unsuccessful(less maintained) even under
>>>hadoop.
>>> But if other projects depending on this piece then they would get
>>> less support. Of course right now we feel this piece of code is very
>>> important and we feel(expect) it can be successful as independent
>>> project, irrespective of whether it as separate project outside hadoop
>>>or inside.
>>> So, I feel this point would not really influence to judge the
>>>discussion.
>>
>> Sure; code can idle anywhere, but that wasn't the point I was after.
>> You propose to extract code from Hadoop, but if Chimera fails then
>> what recourse do we have among the other projects taking a dependency
>> on it? Splitting off another project is feasible, but Chimera should
>> be sustainable before this PMC can divest itself of responsibility for
>> security libraries. That's a pretty low bar.
>>
>> Bundling the library with the jar is helpful; I've used that before.
>> It should prefer (updated) libraries from the environment, if
>> configured. Otherwise it's a pain (or impossible) for ops to patch
>> security bugs. -C
>>
>>>>-Original Message-
>>>>From: Colin P. McCabe [mailto:cmcc...@apache.org]
>>>>Sent: Wednesday, February 3, 2016 4:56 AM
>>>>To: hdfs-dev@hadoop.apache.org
>>>>Subject: Re: Hadoop encryption module as Apache Chimera incubator
>>>>project
>>>>
>>>>It's great to see interest in improving this functionality.  I think
>>>>Chimera could be successful as an Apache project.  I don't have a
>>>>strong opinion one way or the other as to whether it belongs as part
>>>>of Hadoop or separate.
>>>>
>>>>I do think there will be some challenges splitting this functionality
>>>>out into a separate jar, because of the way our CLASSPATH works right
>>>>now.
>>>>For example, let's say that Hadoop depends on Chimera 1.2 and Spark
>>>>depends on Chimera 1.1.  Now Spark jobs have two different versions
>>>>fighting it out on the classpath, similar to the situation with Guava
>>>>and other libraries.  Perhaps if Chimera adopts a policy of strong
>>>>backwards compatibility, we can just always use the latest jar, but
>>>>it still seems likely that there will be problems.  There are various
>>>>classpath isolation ideas that could help here, but they are big
>>>>projects in their own right and we don't have a clear timeline for
>>>>them.  If this does end up being a separate jar, we may need to shade
>>>>it to avoid all these issues.
>>>>
>>>>Bundling the JNI glue code in the jar itself is an interesting idea,
>>>>which we have talked about before for libhadoop.so.  It doesn't
>>>>really have anything

Re: Hadoop encryption module as Apache Chimera incubator project

2016-02-03 Thread Gangumalla, Uma

Thanks guys for the opinions. Below are my responses for some questions or
thoughts.

On 2/3/16, 12:07 AM, "Chen, Haifeng"  wrote:

>Thanks Chris and Colin for your opinions.
>
>>> [Chris] If Chimera is not successful as an independent project or
>>>stalls, Hadoop and/or Spark and/or $project will have to reabsorb it as
>>>maintainers. 
>Understand the concern. One point to consider, Chimera dedicates in a
>specific domain of optimized cryptographic, like Apache commons logging
>dedicates in logging. It is not as dynamic as other Apache projects.
>Of course, as to whether it as part of Hadoop or separate, both ways have
>uncertainties. I am not strongly opposite one way or the other.
>
>Standing in the point of shared fundamental piece of code like this, I do
>think Apache Commons might be the best direction which we can try as the
>first effort. In this direction, we still need to work with Apache Common
>community for buying in and accepting the proposal.
Make sense.
>
>On the other hand, for the direction as sub project within Hadoop, I am
>uncertain about where will the sub-project locate and how it manages to
>be its own cadence in Hadoop. Hadoop has modules like Hadoop Common,
>Hadoop HDFS, Hadoop YARN, Hadop MapReduce. And these modules have the
>same release cycle and are released together. Am I right?
For this we should define the independent release cycles for this project
and it would just place under Hadoop tree if we all conclude with this
option at the end.
>
>>> [Colin] I do think there will be some challenges splitting this
>>>functionality out into a separate jar, because of the way our CLASSPATH
>>>works right now.
>Yes, this challenges are common for shared libraries in Java. Just as you
>mentioned, keeping API compatibility or using classpath isolation are two
>practical methods.
>
>>> [Colin] The really complicated part of bundling JNI code in a jar is
>>>that you need to create jars for every cross product.
>Building does get complex for cross platform. But it might not be as
>complex as described considering the native. First, building with JDK7 or
>JDK8 is the common thing to consider for all Java libraries I think. It
>doesn't specific related to building of the JNI code. (Correct me if I am
>wrong). Secondly, it is still possible to isolate the building of the
>native in the way that you don't have to build different version for
>Ubuntu and RHEL. Third, if it is dynamic link to openssl and the openssl
>API used by the library is not changed in the versions, we don't have to
>build different versions for it.
>
>So the building matrix might be Linux32, Linux64, Windows32, Windows64,
>Mac... 
>
>>>[Colin] So probably we would not choose to bundle openssl.
>Agree. Bundle openssl is not a good idea considering upgrading for
>vulnerabilities.
Agreed too.

[Chris]
>If Chimera is not successful as an independent project or stalls,
>Hadoop and/or Spark and/or $project will have to reabsorb it as
>maintainers.
>
I am not so strong on this point. If we assume project would be
unsuccessful, it can be unsuccessful(less maintained) even under hadoop.
But if other projects depending on this piece then they would get less
support. Of course right now we feel this piece of code is very important
and we feel(expect) it can be successful as independent project,
irrespective of whether it as separate project outside hadoop or inside.
So, I feel this point would not really influence to judge the discussion.
>
>
>Regards,
>Haifeng
>
>-Original Message-
>From: Colin P. McCabe [mailto:cmcc...@apache.org]
>Sent: Wednesday, February 3, 2016 4:56 AM
>To: hdfs-dev@hadoop.apache.org
>Subject: Re: Hadoop encryption module as Apache Chimera incubator project
>
>It's great to see interest in improving this functionality.  I think
>Chimera could be successful as an Apache project.  I don't have a strong
>opinion one way or the other as to whether it belongs as part of Hadoop
>or separate.
>
>I do think there will be some challenges splitting this functionality out
>into a separate jar, because of the way our CLASSPATH works right now.
>For example, let's say that Hadoop depends on Chimera 1.2 and Spark
>depends on Chimera 1.1.  Now Spark jobs have two different versions
>fighting it out on the classpath, similar to the situation with Guava and
>other libraries.  Perhaps if Chimera adopts a policy of strong backwards
>compatibility, we can just always use the latest jar, but it still seems
>likely that there will be problems.  There are various classpath
>isolation ideas that could help here, but they are big projects in their
>own right and we don't have a clear timeline for them.  If this does end
>up being a separate jar, we may need to shade it to avoid all these
>issues.
>
>Bundling the JNI glue code in the jar itself is an interesting idea,
>which we have talked about before for libhadoop.so.  It doesn't really
>have anything to do with the question of TLP vs. non-TLP, of

Re: [Release thread] 2.8.0 release activities

2016-02-03 Thread Gangumalla, Uma

Thanks Vinod. +1 for 2.8 release start.

Regards,
Uma

On 2/3/16, 3:53 PM, "Vinod Kumar Vavilapalli"  wrote:

>Seems like all the features listed in the Roadmap wiki are in. I¹m going
>to try cutting an RC this weekend for a first/non-stable release off of
>branch-2.8.
>
>Let me know if anyone has any objections/concerns.
>
>Thanks
>+Vinod
>
>> On Nov 25, 2015, at 5:59 PM, Vinod Kumar Vavilapalli
>> wrote:
>> 
>> Branch-2.8 is created.
>> 
>> As mentioned before, the goal on branch-2.8 is to put improvements /
>>fixes to existing features with a goal of converging on an alpha release
>>soon.
>> 
>> Thanks
>> +Vinod
>> 
>> 
>>> On Nov 25, 2015, at 5:30 PM, Vinod Kumar Vavilapalli
>>> wrote:
>>> 
>>> Forking threads now in order to track all things related to the
>>>release.
>>> 
>>> Creating the branch now.
>>> 
>>> Thanks
>>> +Vinod
>>> 
>>> 
 On Nov 25, 2015, at 11:37 AM, Vinod Kumar Vavilapalli
 wrote:
 
 I think we¹ve converged at a high level w.r.t 2.8. And as I just sent
out an email, I updated the Roadmap wiki reflecting the same:
https://wiki.apache.org/hadoop/Roadmap

 
 I plan to create a 2.8 branch EOD today.
 
 The goal for all of us should be to restrict improvements & fixes to
only (a) the feature-set documented under 2.8 in the RoadMap wiki and
(b) other minor features that are already in 2.8.
 
 Thanks
 +Vinod
 
 
> On Nov 11, 2015, at 12:13 PM, Vinod Kumar Vavilapalli
>> wrote:
> 
> - Cut a branch about two weeks from now
> - Do an RC mid next month (leaving ~4weeks since branch-cut)
> - As with 2.7.x series, the first release will still be called as
>early / alpha release in the interest of
>   ‹ gaining downstream adoption
>   ‹ wider testing,
>   ‹ yet reserving our right to fix any inadvertent incompatibilities
>introduced.
 
>>> 
>> 
>

Re: Hadoop encryption module as Apache Chimera incubator project

2016-01-27 Thread Gangumalla, Uma

I think Chimera goal is to enhance even for other use cases. For Hadoop,
CTR mode should be enough today, but when we want to support other modes
for other users(ex:While a lot of encryption such as network encryption or
data transfer encryption over the wire doesn't necessarily CTR, other
modes such as CBC and GCM are better modes for this use cases, hadoop can
use them as well in some form later), I think separate module and
independent release is good idea but I am not so strong on the point to
keep under Hadoop. It may be good to keep at generalized place(As in the
discussion, we thought that place could be Apache Commons). But lets see
what others feel. Getting attention to more Hadoop use cases will be
depending on community IMO. Irrespective of where we place, if community
is same then attention would be same right.
To summarize my points:
1. To add a subproject with independent release in Hadoop actually shows
it is something may be widely shared. May nicely and better fit in Apache
Commons
2. Sub projects with independent release cycles in Hadoop may complicate
the Hadoop releases and versions and there is no existing practice in
Hadoop yet. 

Others also please comment what you guys feel, to conclude further steps.

Regards,
Uma

On 1/26/16, 11:26 AM, "Owen O'Malley"  wrote:

>Sorry to be coming in to this discussion late. Rather than pull the code
>out of Hadoop, may I suggest instead making it a separate subproject
>within
>Hadoop itself? I'd suggest letting it release independently of Hadoop,
>since it will need a much faster cadence that Hadoop proper does. It
>should
>also keep the number of dependencies to a very small set (empty?).
>
>Thoughts?
>   Owen

Re: Hadoop encryption module as Apache Chimera incubator project

2016-01-27 Thread Gangumalla, Uma

Thanks for the inputs Owen.

On 1/27/16, 11:31 AM, "Owen O'Malley" <omal...@apache.org> wrote:

>On Wed, Jan 27, 2016 at 9:59 AM, Gangumalla, Uma
><uma.ganguma...@intel.com>
>wrote:
>
>> I think Chimera goal is to enhance even for other use cases.
>
>
>Naturally.
>
>
>> For Hadoop, CTR mode should be enough today,
>
>
>This isn't true. Hadoop should use better encryption for RPC and shuffle,
>both of which should not use CTR.
|| Yes, I said later Hadoop could use other options too.
>
>
>> I think separate module and
>> independent release is good idea but I am not so strong on the point to
>> keep under Hadoop.
>
>
>I believe encryption is becoming a core part of Hadoop. I think that
>moving
>core components out of Hadoop is bad from a project management
>perspective.
>To put it another way, a bug in the encryption routines will likely become
>a security problem that security@hadoop needs to hear about. I don't think
>adding a separate project in the middle of that communication chain is a
>good idea. The same applies to data corruption problems, and so on...
|| I agree on security related discussion we have separate one. Thanks for
this point.
>
>
>> It may be good to keep at generalized place(As in the
>> discussion, we thought that place could be Apache Commons).
>
>
>Apache Commons is a collection of *Java* projects, so Chimera as a
>JNI-based library isn't a natural fit. Furthermore, Apache Commons doesn't
>have its own security list so problems will go to the generic
>secur...@apache.org.
||I see some projects including native stuff too. Example: Commons-daemon.
||But, yeah I noticed now Apache commons proper is indicating that for
reusable Java sources.
>
>Why do you think that Apache Commons is a better home than Hadoop?
>
>.. Owen


@ATM, Andrew, Chris, Yi  do you want to comment on this proposal?

Regards,
Uma

Re: Hadoop encryption module as Apache Chimera incubator project

2016-01-21 Thread Gangumalla, Uma

>Uma and everyone, thank you for the proposal.  +1 to proceed.
Thanks Chris for your feedback.

Kai Wrote:
I believe Haifeng had mentioned the problem in a call when discussing
erasure coding work, but until now I got to understand what's the
problem and how Chimera or Snappy Java solved it. It looks like there can
be some thin clients that don't rely on Hadoop installation so no
libhadoop.so is available to use on the client host. The approach
mentioned here is to bundle the library file (*.so) into a jar and
dynamically extract the file when loading it. When no library file is
contained in the jar then it goes to the normal case, loading it from an
installation. It's smart and nice! My question is, could we consider to
adopt the approach for libhadoop.so library? It might be worth to discuss
because, we're bundling more and more things into the library (recently
we just put Intel ISA-L support into it), and such things may be desired
for such clients. It may also be helpful for development, because
sometimes when run unit tests that involve native codes, some error may
happen and complain no place to find libhadoop.so. Thanks.
[UMA] Good points Kai. It is good to think and invest some efforts to
solve libhadoop.so part.
 As Chris suggested taking this discussion into that JIRA HADOOP-11127 is
more appropriate thing to do.


Regards,
Uma


On 1/21/16, 12:18 PM, "Chris Nauroth" <cnaur...@hortonworks.com> wrote:

>> My question is, could we consider to adopt the approach for libhadoop.so
>>library?
>
>
>This is something that I have proposed already in HADOOP-11127.  There is
>not consensus on proceeding with it from the contributors in that
>discussion.  There are some big challenges around how it would impact the
>release process.  I also have not had availability to prototype an
>implementation to make a stronger case for feasibility.  Kai, if this is
>something that you're interested in, then I encourage you to join the
>discussion in HADOOP-11127 or even pick up prototyping work if you'd like.
> Since we have that existing JIRA, let's keep this mail thread focused
>just on Chimera.  Thank you!
>
>Uma and everyone, thank you for the proposal.  +1 to proceed.
>
>--Chris Nauroth
>
>
>
>
>On 1/20/16, 11:16 PM, "Zheng, Kai" <kai.zh...@intel.com> wrote:
>
>>Thanks Uma. 
>>
>>I have a question by the way, it's not about Chimera project, but about
>>the mentioned advantage 1 and libhadoop.so installation problem. I copied
>>the saying as below for convenience.
>>
>>>>1. As Chimera embedded the native in jar (similar to Snappy java), it
>>>>solves the current issues in Hadoop that a HDFS client has to depend
>>>>libhadoop.so if the client needs to read encryption zone in HDFS. This
>>>>means a HDFS client may has to depend a Hadoop installation in local
>>>>machine. For example, HBase uses depends on HDFS client jar other than
>>>>a Hadoop installation and then has no access to libhadoop.so. So HBase
>>>>cannot use an encryption zone or it cause error.
>>
>>I believe Haifeng had mentioned the problem in a call when discussing
>>erasure coding work, but until now I got to understand what's the
>>problem and how Chimera or Snappy Java solved it. It looks like there can
>>be some thin clients that don't rely on Hadoop installation so no
>>libhadoop.so is available to use on the client host. The approach
>>mentioned here is to bundle the library file (*.so) into a jar and
>>dynamically extract the file when loading it. When no library file is
>>contained in the jar then it goes to the normal case, loading it from an
>>installation. It's smart and nice! My question is, could we consider to
>>adopt the approach for libhadoop.so library? It might be worth to discuss
>>because, we're bundling more and more things into the library (recently
>>we just put Intel ISA-L support into it), and such things may be desired
>>for such clients. It may also be helpful for development, because
>>sometimes when run unit tests that involve native codes, some error may
>>happen and complain no place to find libhadoop.so. Thanks.
>>
>>Regards,
>>Kai
>>
>>-Original Message-
>>From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
>>Sent: Thursday, January 21, 2016 11:20 AM
>>To: hdfs-dev@hadoop.apache.org
>>Subject: Re: Hadoop encryption module as Apache Chimera incubator project
>>
>>Hi All,
>>Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying release
>>stuff.
>>
>>Please find my responses below.
>>
>>Andrew wrote:
>>If it becomes part of Apache Commons, could we make Chimera a se

Re: Hadoop encryption module as Apache Chimera incubator project

2016-01-20 Thread Gangumalla, Uma

Hi All, 
Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying release
stuff.

Please find my responses below.

Andrew wrote:
If it becomes part of Apache Commons, could we make Chimera a separate
JAR? We have real difficulties bumping dependency versions right now, so
ideally we don't need to bump our existing Commons dependencies to use
Chimera.
[UMA] Yes, We plan to make separate Jar.

Andrew wrote:
With this refactoring, do we have confidence that we can get our desired
changes merged and released in a timely fashion? e.g. if we find another
bug like HADOOP-11343, we'll first need to get the fix into Chimera, have a
new Chimera release, then bump Hadoop's Chimera dependency. This also
relates to the previous point, it's easier to do this dependency bump if
Chimera is a separate JAR.
[UMA] Yes and the main target users for this project is Hadoop and Spark
right now. 
So, Hadoop requirements would be the priority tasks for it.

ATM wrote:
Uma, would you be up for approaching the Apache Commons folks saying that
you'd like to contribute Chimera? I'd recommend saying that Hadoop and
Spark are both on board to depend on this.
[UMA] Yes, will do that.

Kai wrote:
Just a question. Becoming a separate jar/module in Apache Commons means
Chimera or the module can be released separately or in a timely manner,
not coupling with other modules for release in the project? Thanks.

[Haifeng] From apache commons project web (https://commons.apache.org/),
we see there is already a long list of components in its Apache Commons
Proper list. Each component has its own release version and date. To join
and be one of the list is the target.

Larry wrote:
If what we are looking for is some level of autonomy then it would need to
be a module with its own release train - or at least be able to.

[UMA] Yes. Agree

Kai wrote:
So far I saw it's mainly about AES-256. I suggest the scope can be
expanded a little bit, perhaps a dedicated high performance encryption
library, then we would have quite much to contribute to it, like other
ciphers, MACs, PRNGs and so on. Then both Hadoop and Spark can benefit
from it.

[UMA] Yes, once development started as separate project then its free to
evolve and provide more improvements to support more customer/user space
for encryption based on demand.
Haifeng, would you add some points here?

Regards,
Uma

On 1/20/16, 4:31 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:

>Thanks Uma for putting together this proposal. Overall sounds good to me,
>+1 for these improvements. A few comments/questions:
>
>* If it becomes part of Apache Commons, could we make Chimera a separate
>JAR? We have real difficulties bumping dependency versions right now, so
>ideally we don't need to bump our existing Commons dependencies to use
>Chimera.
>* With this refactoring, do we have confidence that we can get our desired
>changes merged and released in a timely fashion? e.g. if we find another
>bug like HADOOP-11343, we'll first need to get the fix into Chimera, have
>a
>new Chimera release, then bump Hadoop's Chimera dependency. This also
>relates to the previous point, it's easier to do this dependency bump if
>Chimera is a separate JAR.
>
>Best,
>Andrew
>
>On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma
><uma.ganguma...@intel.com>
>wrote:
>
>> Hi Devs,
>>
>>   Some of our Hadoop developers working with Spark community to
>>implement
>> the shuffle encryption. While implementing that, they realized
>>some/most of
>> the code in Hadoop encryption code and their  implemention code have to
>>be
>> duplicated. This leads to an idea to create separate library, named it
>>as
>> Chimera (https://github.com/intel-hadoop/chimera). It is an optimized
>> cryptographic library. It provides Java API for both cipher level and
>>Java
>> stream level to help developers implement high performance AES
>> encryption/decryption with the minimum code and effort. Chimera was
>> originally based Hadoop crypto code but was improved and generalized a
>>lot
>> for supporting wider scope of data encryption needs for more components
>>in
>> the community.
>>
>> So, now team is thinking to make this library code as open source
>>project
>> via Apache Incubation.  Proposal is Chimera to join the Apache as
>> incubating or Apache commons for facilitating its adoption.
>>
>> In general this will get the following advantages:
>> 1. As Chimera embedded the native in jar (similar to Snappy java), it
>> solves the current issues in Hadoop that a HDFS client has to depend
>> libhadoop.so if the client needs to read encryption zone in HDFS. This
>> means a HDFS client may has to depend a Hadoop installation in local
>> machine. For example, HBase uses dep

Hadoop encryption module as Apache Chimera incubator project

2016-01-18 Thread Gangumalla, Uma

Hi Devs,

  Some of our Hadoop developers working with Spark community to implement the 
shuffle encryption. While implementing that, they realized some/most of the 
code in Hadoop encryption code and their  implemention code have to be 
duplicated. This leads to an idea to create separate library, named it as 
Chimera (https://github.com/intel-hadoop/chimera). It is an optimized 
cryptographic library. It provides Java API for both cipher level and Java 
stream level to help developers implement high performance AES 
encryption/decryption with the minimum code and effort. Chimera was originally 
based Hadoop crypto code but was improved and generalized a lot for supporting 
wider scope of data encryption needs for more components in the community.

So, now team is thinking to make this library code as open source project via 
Apache Incubation.  Proposal is Chimera to join the Apache as incubating or 
Apache commons for facilitating its adoption.

In general this will get the following advantages:
1. As Chimera embedded the native in jar (similar to Snappy java), it solves 
the current issues in Hadoop that a HDFS client has to depend libhadoop.so if 
the client needs to read encryption zone in HDFS. This means a HDFS client may 
has to depend a Hadoop installation in local machine. For example, HBase uses 
depends on HDFS client jar other than a Hadoop installation and then has no 
access to libhadoop.so. So HBase cannot use an encryption zone or it cause 
error.
2. Apache Spark shuffle and spill encryption could be another example where we 
can use Chimera. We see the fact that the stream encryption for Spark shuffle 
and spill doesn’t require a stream cipher like AES/CTR, although the code 
shares the common characteristics of a stream style API. We also see the need 
of optimized Cipher for non-stream style use cases such as network encryption 
such as RPC. These improvements actually can be shared by more projects of need.

3. Simplified code in Hadoop to use dedicated library. And drives more 
improvements. For example, current the Hadoop crypto code API is totally based 
on AES/CTR although it has cipher suite configurations.

AES/CTR is for HDFS data encryption at rest, but it doesn’t necessary to be 
AES/CTR for all the cases such as Data transfer encryption and intermediate 
file encryption.



 So, we wanted to check with Hadoop community about this proposal. Please 
provide your feedbacks on it.

Regards,
Uma

Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk]

2015-11-02 Thread Gangumalla, Uma

+1 for EC to go into 2.9. Yes, 3.x would be long way to go when we plan to
have 2.8 and 2.9 releases.

Regards,
Uma

On 11/2/15, 11:49 AM, "Vinod Vavilapalli" <vino...@hortonworks.com> wrote:

>Forking the thread. Started looking at the 2.8 list, various features¹
>status and arrived here.
>
>While I understand the pervasive nature of EC and a need for a
>significant bake-in, moving this to a 3.x release is not a good idea. We
>will surely get a 2.8 out this year and, as needed, I can even spend time
>getting started on a 2.9. OTOH, 3.x is long ways off, and given all the
>incompatibilities there, it would be a while before users can get their
>hands on EC if it were to be only on 3.x. At best, this may force sites
>that want EC to backport the entire EC feature to older releases, at
>worst this will be repeat the mess of 0.20 security release forks.
>
>If we think adding this to 2.8 (even if it switched off) is too much risk
>per our original plan, let¹s move this to 2.9, there by leaving enough
>time for stability, integration testing and bake-in, and a realistic
>chance of having it end up on users¹ clusters soonish.
>
>+Vinod
>
>> On Oct 19, 2015, at 1:44 PM, Andrew Wang <andrew.w...@cloudera.com>
>>wrote:
>> 
>> I think our plan thus far has been to target this for 3.0. I'm okay with
>> putting it in branch-2 if we've given a hard look at compatibility, but
>> I'll note though that 2.8 is already looking like quite a large release,
>> and our release bandwidth has been focused on the 2.6 and 2.7
>>maintenance
>> releases. Adding another multi-hundred JIRAs to 2.8 might make it too
>> unwieldy to get out the door. If we bump EC past that, 3.0 might very
>>well
>> be our next release vehicle. I do plan to revive the 3.0 schedule some
>>time
>> next year. With EC and JDK8 in a good spot, the only big feature
>>remaining
>> is classpath isolation.
>> 
>> EC is also a pretty fundamental change to HDFS. Even if it's
>>compatible, in
>> terms of size and impact it might best belong in a new major release.
>> 
>> Best,
>> Andrew
>> 
>> On Fri, Oct 16, 2015 at 7:04 PM, Vinayakumar B <
>> vinayakumarb.apa...@gmail.com> wrote:
>> 
>>> Is anyone else also thinks that feature is ready to goto branch-2  as
>>>well?
>>> 
>>> Its > 2 weeks EC landed on trunk. IMo Looks Its quite stable since
>>>then and
>>> ready to go in branch-2.
>>> 
>>> -Vinay
>>> On Oct 6, 2015 12:51 AM, "Zhe Zhang" <zhezh...@cloudera.com> wrote:
>>> 
>>>> Thanks Vinay for capturing the issue and Uma for offering the help.
>>>> 
>>>> ---
>>>> Zhe Zhang
>>>> 
>>>> On Mon, Oct 5, 2015 at 12:19 PM, Gangumalla, Uma <
>>> uma.ganguma...@intel.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Vinay,
>>>>> 
>>>>> 
>>>>> I would merge them as part of HDFS-9182.
>>>>> 
>>>>> Thanks,
>>>>> Uma
>>>>> 
>>>>> 
>>>>> 
>>>>> On 10/5/15, 12:48 AM, "Vinayakumar B" <vinayakum...@apache.org>
>>>>>wrote:
>>>>> 
>>>>>> Hi Andrew,
>>>>>> I see CHANGES.txt entries not yet merged from
>>> CHANGES-HDFS-EC-7285.txt.
>>>>>> 
>>>>>> Was this intentional?
>>>>>> 
>>>>>> Regards,
>>>>>> Vinay
>>>>>> 
>>>>>> On Wed, Sep 30, 2015 at 9:15 PM, Andrew Wang <
>>> andrew.w...@cloudera.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Branch has been merged to trunk, thanks again to everyone who
>>>>>>>worked
>>>> on
>>>>>>> the
>>>>>>> feature!
>>>>>>> 
>>>>>>> On Tue, Sep 29, 2015 at 10:44 PM, Zhe Zhang <zhezh...@cloudera.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks everyone who has participated in this discussion.
>>>>>>>> 
>>>>>>>> With 7 +1's (5 binding and 2 non-binding), and no -1, this vote
>>> has
>>>>>>> passed.
>>>>>>>> I will do a final 'git merge' with trunk and work with Andrew to
>>>> merge
>>>>>>> the
>>>>>>>> branch to trunk. I'll u

Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk

2015-10-05 Thread Gangumalla, Uma

Vinay,


 I would merge them as part of HDFS-9182.

Thanks,
Uma



On 10/5/15, 12:48 AM, "Vinayakumar B" <vinayakum...@apache.org> wrote:

>Hi Andrew,
> I see CHANGES.txt entries not yet merged from CHANGES-HDFS-EC-7285.txt.
>
> Was this intentional?
>
>Regards,
>Vinay
>
>On Wed, Sep 30, 2015 at 9:15 PM, Andrew Wang <andrew.w...@cloudera.com>
>wrote:
>
>> Branch has been merged to trunk, thanks again to everyone who worked on
>>the
>> feature!
>>
>> On Tue, Sep 29, 2015 at 10:44 PM, Zhe Zhang <zhezh...@cloudera.com>
>>wrote:
>>
>> > Thanks everyone who has participated in this discussion.
>> >
>> > With 7 +1's (5 binding and 2 non-binding), and no -1, this vote has
>> passed.
>> > I will do a final 'git merge' with trunk and work with Andrew to merge
>> the
>> > branch to trunk. I'll update on this thread when the merge is done.
>> >
>> > ---
>> > Zhe Zhang
>> >
>> > On Thu, Sep 24, 2015 at 11:08 PM, Liu, Yi A <yi.a@intel.com>
>>wrote:
>> >
>> > > (Change it to binding.)
>> > >
>> > > +1
>> > > I have been involved in the development and code review on the
>>feature
>> > > branch. It's a great feature and I think it's ready to merge it into
>> > trunk.
>> > >
>> > > Thanks all for the contribution.
>> > >
>> > > Regards,
>> > > Yi Liu
>> > >
>> > >
>> > > -Original Message-
>> > > From: Liu, Yi A
>> > > Sent: Friday, September 25, 2015 1:51 PM
>> > > To: hdfs-dev@hadoop.apache.org
>> > > Subject: RE: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk
>> > >
>> > > +1 (non-binding)
>> > > I have been involved in the development and code review on the
>>feature
>> > > branch. It's a great feature and I think it's ready to merge it into
>> > trunk.
>> > >
>> > > Thanks all for the contribution.
>> > >
>> > > Regards,
>> > > Yi Liu
>> > >
>> > >
>> > > -Original Message-
>> > > From: Vinayakumar B [mailto:vinayakum...@apache.org]
>> > > Sent: Friday, September 25, 2015 12:21 PM
>> > > To: hdfs-dev@hadoop.apache.org
>> > > Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk
>> > >
>> > > +1,
>> > >
>> > > I've been involved starting from design and development of
>> ErasureCoding.
>> > > I think phase 1 of this development is ready to be merged to trunk.
>> > > It had come a long way to the current state with significant effort
>>of
>> > > many Contributors and Reviewers for both design and code.
>> > >
>> > > Thanks Everyone for the efforts.
>> > >
>> > > Regards,
>> > > Vinay
>> > >
>> > > On Wed, Sep 23, 2015 at 10:53 PM, Jing Zhao <ji...@apache.org>
>>wrote:
>> > >
>> > > > +1
>> > > >
>> > > > I've been involved in both development and review on the branch,
>>and
>> I
>> > > > believe it's now ready to get merged into trunk. Many thanks to
>>all
>> > > > the contributors and reviewers!
>> > > >
>> > > > Thanks,
>> > > > -Jing
>> > > >
>> > > > On Tue, Sep 22, 2015 at 6:17 PM, Zheng, Kai <kai.zh...@intel.com>
>> > wrote:
>> > > >
>> > > > > Non-binding +1
>> > > > >
>> > > > > According to our extensive performance tests, striping + ISA-L
>> coder
>> > > > based
>> > > > > erasure coding not only can save storage, but also can increase
>>the
>> > > > > throughput of a client or a cluster. It will be a great
>>addition to
>> > > > > HDFS and its users. Based on the latest branch codes, we also
>> > > > > observed it's
>> > > > very
>> > > > > reliable in the concurrent tests. We'll provide the perf test
>> report
>> > > > after
>> > > > > it's sorted out and hope it helps.
>> > > > > Thanks!
>> > > > >
>> > > > > Regards,
>> > > > > Kai
>> > > > >
>> > > > > -Original Message-
>&g

Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk

2015-09-22 Thread Gangumalla, Uma

+1 

Great addition to HDFS. Thanks all contributors for the nice work.

Regards,
Uma

On 9/22/15, 3:40 PM, "Zhe Zhang"  wrote:

>Hi,
>
>I'd like to propose a vote to merge the HDFS-7285 feature branch back to
>trunk. Since November 2014 we have been designing and developing this
>feature under the umbrella JIRAs HDFS-7285 and HADOOP-11264, and have
>committed approximately 210 patches.
>
>The HDFS-7285 feature branch was created to support the first phase of
>HDFS
>erasure coding (HDFS-EC). The objective of HDFS-EC is to significantly
>reduce storage space usage in HDFS clusters. Instead of always creating 3
>replicas of each block with 200% storage space overhead, HDFS-EC provides
>data durability through parity data blocks. With most EC configurations,
>the storage overhead is no more than 50%. Based on profiling results of
>production clusters, we decided to support EC with the striped block
>layout
>in the first phase, so that small files can be better handled. This means
>dividing each logical HDFS file block into smaller units (striping cells)
>and spreading them on a set of DataNodes in round-robin fashion. Parity
>cells are generated for each stripe of original data cells. We have made
>changes to NameNode, client, and DataNode to generalize the block concept
>and handle the mapping between a logical file block and its internal
>storage blocks. For further details please see the design doc on
>HDFS-7285.
>HADOOP-11264 focuses on providing flexible and high-performance codec
>calculation support.
>
>The nightly Jenkins job of the branch has reported several successful
>runs,
>and doesn't show new flaky tests compared with trunk. We have posted
>several versions of the test plan including both unit testing and cluster
>testing, and have executed most tests in the plan. The most basic
>functionalities have been extensively tested and verified in several real
>clusters with different hardware configurations; results have been very
>stable. We have created follow-on tasks for more advanced error handling
>and optimization under the umbrella HDFS-8031. We also plan to implement
>or
>harden the integration of EC with existing features such as WebHDFS,
>snapshot, append, truncate, hflush, hsync, and so forth.
>
>Development of this feature has been a collaboration across many companies
>and institutions. I'd like to thank J. Andreina, Takanobu Asanuma,
>Vinayakumar B, Li Bo, Takuya Fukudome, Uma Maheswara Rao G, Rui Li, Yi
>Liu,
>Colin McCabe, Xinwei Qin, Rakesh R, Gao Rui, Kai Sasaki, Walter Su, Tsz Wo
>Nicholas Sze, Andrew Wang, Yong Zhang, Jing Zhao, Hui Zheng and Kai Zheng
>for their code contributions and reviews. Andrew and Kai Zheng also made
>fundamental contributions to the initial design. Rui Li, Gao Rui, Kai
>Sasaki, Kai Zheng and many other contributors have made great efforts in
>system testing. Many thanks go to Weihua Jiang for proposing the JIRA, and
>ATM, Todd Lipcon, Silvius Rus, Suresh, as well as many others for
>providing
>helpful feedbacks.
>
>Following the community convention, this vote will last for 7 days (ending
>September 29th). Votes from Hadoop committers are binding but non-binding
>votes are very welcome as well. And here's my non-binding +1.
>
>Thanks,
>---
>Zhe Zhang

RE: Block creation in HDFS

2015-02-17 Thread Gangumalla, Uma

Hi,

HDFS does store the data how you writing to it. It will not organize the data. 
HDFS has flexibility I terms of placements.
If you want to write in this fashion bunch of blocks should be allocated once 
and client should write all of them based on you portion. Which is sounding 
something similar to striping approach. Which is not supported right now in 
HDFS. It is being developed with erasure codes branch. HDFS-7285. Correct me if 
I misunderstood ur needs here.

Regards,
Uma

-Original Message-
From: Abhishek Das [mailto:abhishek.b...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:14 PM
To: hdfs-dev
Subject: Re: Block creation in HDFS

Hi,

Thanks Vinay for your response. I dont need blocks of variable size. But 
setting only the block size probably wont help in my case. Let me give an 
example to explain what I am trying to do.

Let say the main file has 12 integers 1 to 12. The block size is such that each 
block will have 3 integers. Now if I ask hdfs to create the blocks, it would 
create 4 blocks - first one would have 1-3, second one would have 4-7. 
According to my requirement, the data in the main file is partitioned into 3 
clusters. (1,2,3,4), (5,6,7,8) and (9,10,11,12). Now when the blocks will be 
created, I need data from all partitions get represented in each block. So in 
this case, the first block would have (1,5,9), second one would have (2,6,10) 
etc... So i want to change how the data is allocated in each of the blocks.

Is it feasible to change  the default block creation policy in current 
implementation?

Regards,
Abhishek Das

On Tue, Feb 17, 2015 at 2:25 AM, Vinayakumar B vinayakum...@apache.org
wrote:

 Hi abhishek,
 Is Your partitions of same sizes? If yes, then you can set that as 
 block size.

 If not you can use the latest feature.. variable block size.
 To verify your use case.
 You can close the current block after each partition data is written 
 and append to new block for new partition data.
 This feature is not yet available in any of the release. Hope to see 
 in future 2.7 release. As of now you can verify in any of the 
 trunk/branch-2 builds.

 Hope this helps.

 -Vinay
 On Feb 17, 2015 8:30 AM, Abhishek Das abhishek.b...@gmail.com wrote:

  Hi,
 
  I am new in this group. I had a question regarding block creation in
 HDFS.
  By default the file is split into multiple blocks of size equal to 
  block size. I need to introduce new block creation policy into the 
  system. In
 my
  case the main file is divided into multiple partitions. My goal is 
  to create the blocks where data is represented from each partition 
  of the file. Is it possible to introduce the new policy ? If yes, 
  what would the starting point in the code I should look at.
 
  Regards,
  Abhishek Das

RE: [VOTE] Migration from subversion to git for version control

2014-08-11 Thread Gangumalla, Uma

+1

Regards,
Uma

-Original Message-
From: Karthik Kambatla [mailto:ka...@cloudera.com] 
Sent: Saturday, August 09, 2014 8:27 AM
To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: [VOTE] Migration from subversion to git for version control

I have put together this proposal based on recent discussion on this topic.

Please vote on the proposal. The vote runs for 7 days.

   1. Migrate from subversion to git for version control.
   2. Force-push to be disabled on trunk and branch-* branches. Applying
   changes from any of trunk/branch-* to any of branch-* should be through
   git cherry-pick -x.
   3. Force-push on feature-branches is allowed. Before pulling in a
   feature, the feature-branch should be rebased on latest trunk and the
   changes applied to trunk through git rebase --onto or git cherry-pick
   commit-range.
   4. Every time a feature branch is rebased on trunk, a tag that
   identifies the state before the rebase needs to be created (e.g.
   tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once
   the feature is pulled into trunk and the tags are no longer useful.
   5. The relevance/use of tags stay the same after the migration.

Thanks
Karthik

PS: Per Andrew Wang, this should be a Adoption of New Codebase kind of vote 
and will be Lazy 2/3 majority of PMC members.

RE: [VOTE] Change by-laws on release votes: 5 days instead of 7

2014-06-29 Thread Gangumalla, Uma

+1

Regards,
Uma

-Original Message-
From: Arun C Murthy [mailto:a...@hortonworks.com] 
Sent: Tuesday, June 24, 2014 2:24 PM
To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: [VOTE] Change by-laws on release votes: 5 days instead of 7 

Folks,

 As discussed, I'd like to call a vote on changing our by-laws to change 
release votes from 7 days to 5.

 I've attached the change to by-laws I'm proposing.

 Please vote, the vote will the usual period of 7 days.

thanks,
Arun



[main]$ svn diff
Index: author/src/documentation/content/xdocs/bylaws.xml
===
--- author/src/documentation/content/xdocs/bylaws.xml   (revision 1605015)
+++ author/src/documentation/content/xdocs/bylaws.xml   (working copy)
@@ -344,7 +344,16 @@
 pVotes are open for a period of 7 days to allow all active
 voters time to consider the vote. Votes relating to code
 changes are not subject to a strict timetable but should be
-made as timely as possible./p/li
+made as timely as possible./p
+
+ ul
+ li strongProduct Release - Vote Timeframe/strong
+   pRelease votes, alone, run for a period of 5 days. All other
+ votes are subject to the above timeframe of 7 days./p
+ /li
+   /ul
+   /li
+
/ul
/section
 /body
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: [DISCUSS] Change by-laws on release votes: 5 days instead of 7

2014-06-24 Thread Gangumalla, Uma

Thanks Arun. 

+1 

Regards,
Uma

-Original Message-
From: Arun C. Murthy [mailto:a...@hortonworks.com] 
Sent: Saturday, June 21, 2014 11:07 PM
To: hdfs-dev@hadoop.apache.org
Cc: common-...@hadoop.apache.org; yarn-...@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] Change by-laws on release votes: 5 days instead of 7

Uma,

 Voting periods are defined in *minimum* terms, so it already covers what you'd 
like to see i.e. the vote can continue longer.

thanks,
Arun

 On Jun 21, 2014, at 2:19 AM, Gangumalla, Uma uma.ganguma...@intel.com 
 wrote:
 
 How about proposing vote for 5days and give chance to RM for extending vote 
 for 2more days( total to 7days) if the rc did not receive enough vote within 
 5days? If a rc received enough votes in 5days, RM can close vote.
 I can see an advantage of 7days voting is, that will cover all the week and 
 weekend days. So, if someone wants to test on weekend time(due to the weekday 
 schedules), that will give chance to them. 
 
 Regards,
 Uma
 
 -Original Message-
 From: Arun C Murthy [mailto:a...@hortonworks.com]
 Sent: Saturday, June 21, 2014 11:25 AM
 To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org; 
 yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
 Subject: [DISCUSS] Change by-laws on release votes: 5 days instead of 
 7
 
 Folks,
 
 I'd like to propose we change our by-laws to reduce our voting periods on new 
 releases from 7 days to 5.
 
 Currently, it just takes too long to turn around releases; particularly if we 
 have critical security fixes etc.
 
 Thoughts?
 
 thanks,
 Arun
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: [DISCUSS] Change by-laws on release votes: 5 days instead of 7

2014-06-21 Thread Gangumalla, Uma

How about proposing vote for 5days and give chance to RM for extending vote for 
2more days( total to 7days) if the rc did not receive enough vote within 5days? 
If a rc received enough votes in 5days, RM can close vote.
I can see an advantage of 7days voting is, that will cover all the week and 
weekend days. So, if someone wants to test on weekend time(due to the weekday 
schedules), that will give chance to them. 

Regards,
Uma

-Original Message-
From: Arun C Murthy [mailto:a...@hortonworks.com] 
Sent: Saturday, June 21, 2014 11:25 AM
To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: [DISCUSS] Change by-laws on release votes: 5 days instead of 7

Folks,

 I'd like to propose we change our by-laws to reduce our voting periods on new 
releases from 7 days to 5.

 Currently, it just takes too long to turn around releases; particularly if we 
have critical security fixes etc.

 Thoughts?

thanks,
Arun


--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: hadoop-2.5 - June end?

2014-06-11 Thread Gangumalla, Uma

Yes. Suresh.

I have merged HDFS-2006 (Extended Attributes) to branch-2. So, that it will be 
included in 2.5 release.

Regards,
Uma

-Original Message-
From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
Sent: Tuesday, June 10, 2014 10:15 PM
To: mapreduce-...@hadoop.apache.org
Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: hadoop-2.5 - June end?

We should also include extended attributes feature for HDFS from HDFS-2006 for 
release 2.5.


On Mon, Jun 9, 2014 at 9:39 AM, Arun C Murthy a...@hortonworks.com wrote:

 Folks,

  As you can see from the Roadmap wiki, it looks like several items are 
 still a bit away from being ready.

  I think rather than wait for them, it will be useful to create an 
 intermediate release (2.5) this month - I think ATS security is pretty 
 close, so we can ship that. I'm thinking of creating hadoop-2.5 by end 
 of the month, with a branch a couple of weeks prior.

  Thoughts?

 thanks,
 Arun


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or 
 entity to which it is addressed and may contain information that is 
 confidential, privileged and exempt from disclosure under applicable 
 law. If the reader of this message is not the intended recipient, you 
 are hereby notified that any printing, copying, dissemination, 
 distribution, disclosure or forwarding of this communication is 
 strictly prohibited. If you have received this communication in error, 
 please contact the sender immediately and delete it from your system. Thank 
 You.




--
http://hortonworks.com/download/

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

2014-06-11 Thread Gangumalla, Uma

I have merged this feature to Branch-2 now. 

From now onwards if any issues related to Xattrs, please merge them to 
branch-2 if needed.

I will merge the remaining jiras tomorrow which are related to Xattrs feature 
but handled as top level Jiras. Ex: DistCp(MAPREDUCE-5898) support etc

Regards,
Uma

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, May 21, 2014 8:06 PM
To: hdfs-dev@hadoop.apache.org
Subject: RE: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

Thanks a lot, for the great work on branch and support.
I have just completed the merge of HDFS Extended attributes branch(HDFS-2006)  
to trunk.

Regards,
Uma

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, May 21, 2014 6:38 PM
To: hdfs-dev@hadoop.apache.org
Subject: RE: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

Thanks a lot for participating in this vote.

With 4  +1's( from Me, Andrew Wang, Chris and Colin) and no -1, the vote has 
passed for the merge.

I will do the merge shortly to trunk.

Regards,
Uma

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, May 14, 2014 6:17 PM
To: hdfs-dev@hadoop.apache.org
Subject: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

Hello HDFS Devs,
  I would like to call for a vote to merge the HDFS Extended Attributes 
(XAttrs) feature from the HDFS-2006 branch to the trunk.
  XAttrs are already widely supported on many operating systems, including 
Linux, Windows, and Mac OS. This will allow storing attributes for HDFS 
file/directory.
  XAttr consist of a name and a value and exist in one of 4 namespaces: user, 
trusted, security, and system. An XAttr name is prefixed with one of these 
namespaces, so for example, user.myxattr.
  Consistent with ongoing awareness of Namenode memory usage, the maximum 
number and size of XAttrs on a file/directory are limited by a configuration 
parameter.
  The design document contains more details and can be found here: 
https://issues.apache.org/jira/secure/attachment/12644341/HDFS-XAttrs-Design-3.pdf
  Development of this feature has been tracked in JIRA HDFS-2006: 
https://issues.apache.org/jira/browse/HDFS-2006
  All of the development work for the feature is contained in the HDFS-2006 
branch: https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2006
 As last tasks, we are working to support XAttrs via libhdfs, webhdfs as well 
as other minor improvements.
  We intend to finish those enhancements before the vote completes and 
otherwise we could move them to top-level JIRAs as they can be tracked 
independently. User document is also ready for this feature.
  Here the doc attached in JIRA:  
https://issues.apache.org/jira/secure/attachment/12644787/ExtendedAttributes.html
 The XAttrs feature is backwards-compatible and enabled by default. A cluster 
administrator can disable it.
Testing:
 We've developed more than 70 new tests which cover the XAttrs get, set and 
remove APIs through DistributedFileSystem and WebHdfsFileSystem, the new XAttr 
CLI commands, HA, XAttr persistence in the fsimage and related.
  Additional  testing plans are documented in: 
https://issues.apache.org/jira/secure/attachment/12644342/Test-Plan-for-Extended-Attributes-1.pdf
  Thanks a lot to the contributors who have helped and participated in the 
branch development.
  Code contributors are Yi Liu, Charles Lamb, Andrew Wang and Uma Maheswara Rao 
G.
 The design document incorporates feedback from many community members: Chris 
Nauroth, Andrew Purtell, Tianyou Li, Avik Dey, Charles Lamb, Alejandro, Andrew 
Wang, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.
 Code reviewers on individual patches include Chris Nauroth, Alejandro, Andrew 
Wang, Charles Lamb, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.

  Also thanks to Dhruba for bringing up this JIRA and thanks to others who 
participated for discussions.
This vote will run for a week and close on 5/21/2014 at 06:16 pm IST.

Here is my +1 to start with.
Regards,
Uma
(umamah...@apache.orgmailto:umamah...@apache.org)

RE: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

2014-05-21 Thread Gangumalla, Uma

Thanks a lot for participating in this vote.

With 4  +1's( from Me, Andrew Wang, Chris and Colin) and no -1, the vote has 
passed for the merge.

I will do the merge shortly to trunk.

Regards,
Uma

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, May 14, 2014 6:17 PM
To: hdfs-dev@hadoop.apache.org
Subject: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

Hello HDFS Devs,
  I would like to call for a vote to merge the HDFS Extended Attributes 
(XAttrs) feature from the HDFS-2006 branch to the trunk.
  XAttrs are already widely supported on many operating systems, including 
Linux, Windows, and Mac OS. This will allow storing attributes for HDFS 
file/directory.
  XAttr consist of a name and a value and exist in one of 4 namespaces: user, 
trusted, security, and system. An XAttr name is prefixed with one of these 
namespaces, so for example, user.myxattr.
  Consistent with ongoing awareness of Namenode memory usage, the maximum 
number and size of XAttrs on a file/directory are limited by a configuration 
parameter.
  The design document contains more details and can be found here: 
https://issues.apache.org/jira/secure/attachment/12644341/HDFS-XAttrs-Design-3.pdf
  Development of this feature has been tracked in JIRA HDFS-2006: 
https://issues.apache.org/jira/browse/HDFS-2006
  All of the development work for the feature is contained in the HDFS-2006 
branch: https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2006
 As last tasks, we are working to support XAttrs via libhdfs, webhdfs as well 
as other minor improvements.
  We intend to finish those enhancements before the vote completes and 
otherwise we could move them to top-level JIRAs as they can be tracked 
independently. User document is also ready for this feature.
  Here the doc attached in JIRA:  
https://issues.apache.org/jira/secure/attachment/12644787/ExtendedAttributes.html
 The XAttrs feature is backwards-compatible and enabled by default. A cluster 
administrator can disable it.
Testing:
 We've developed more than 70 new tests which cover the XAttrs get, set and 
remove APIs through DistributedFileSystem and WebHdfsFileSystem, the new XAttr 
CLI commands, HA, XAttr persistence in the fsimage and related.
  Additional  testing plans are documented in: 
https://issues.apache.org/jira/secure/attachment/12644342/Test-Plan-for-Extended-Attributes-1.pdf
  Thanks a lot to the contributors who have helped and participated in the 
branch development.
  Code contributors are Yi Liu, Charles Lamb, Andrew Wang and Uma Maheswara Rao 
G.
 The design document incorporates feedback from many community members: Chris 
Nauroth, Andrew Purtell, Tianyou Li, Avik Dey, Charles Lamb, Alejandro, Andrew 
Wang, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.
 Code reviewers on individual patches include Chris Nauroth, Alejandro, Andrew 
Wang, Charles Lamb, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.

  Also thanks to Dhruba for bringing up this JIRA and thanks to others who 
participated for discussions.
This vote will run for a week and close on 5/21/2014 at 06:16 pm IST.

Here is my +1 to start with.
Regards,
Uma
(umamah...@apache.orgmailto:umamah...@apache.org)

Please hold off the commits into HDFS and Common for some time

2014-05-21 Thread Gangumalla, Uma

Hi Committers,

Please hold the commits to trunk for some time until I notify as I am merging 
the HDFS-2006 branch into trunk.

Regards,
Uma

RE: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

2014-05-21 Thread Gangumalla, Uma

Thanks a lot, for the great work on branch and support.
I have just completed the merge of HDFS Extended attributes branch(HDFS-2006)  
to trunk.

Regards,
Uma

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, May 21, 2014 6:38 PM
To: hdfs-dev@hadoop.apache.org
Subject: RE: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

Thanks a lot for participating in this vote.

With 4  +1's( from Me, Andrew Wang, Chris and Colin) and no -1, the vote has 
passed for the merge.

I will do the merge shortly to trunk.

Regards,
Uma

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, May 14, 2014 6:17 PM
To: hdfs-dev@hadoop.apache.org
Subject: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

Hello HDFS Devs,
  I would like to call for a vote to merge the HDFS Extended Attributes 
(XAttrs) feature from the HDFS-2006 branch to the trunk.
  XAttrs are already widely supported on many operating systems, including 
Linux, Windows, and Mac OS. This will allow storing attributes for HDFS 
file/directory.
  XAttr consist of a name and a value and exist in one of 4 namespaces: user, 
trusted, security, and system. An XAttr name is prefixed with one of these 
namespaces, so for example, user.myxattr.
  Consistent with ongoing awareness of Namenode memory usage, the maximum 
number and size of XAttrs on a file/directory are limited by a configuration 
parameter.
  The design document contains more details and can be found here: 
https://issues.apache.org/jira/secure/attachment/12644341/HDFS-XAttrs-Design-3.pdf
  Development of this feature has been tracked in JIRA HDFS-2006: 
https://issues.apache.org/jira/browse/HDFS-2006
  All of the development work for the feature is contained in the HDFS-2006 
branch: https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2006
 As last tasks, we are working to support XAttrs via libhdfs, webhdfs as well 
as other minor improvements.
  We intend to finish those enhancements before the vote completes and 
otherwise we could move them to top-level JIRAs as they can be tracked 
independently. User document is also ready for this feature.
  Here the doc attached in JIRA:  
https://issues.apache.org/jira/secure/attachment/12644787/ExtendedAttributes.html
 The XAttrs feature is backwards-compatible and enabled by default. A cluster 
administrator can disable it.
Testing:
 We've developed more than 70 new tests which cover the XAttrs get, set and 
remove APIs through DistributedFileSystem and WebHdfsFileSystem, the new XAttr 
CLI commands, HA, XAttr persistence in the fsimage and related.
  Additional  testing plans are documented in: 
https://issues.apache.org/jira/secure/attachment/12644342/Test-Plan-for-Extended-Attributes-1.pdf
  Thanks a lot to the contributors who have helped and participated in the 
branch development.
  Code contributors are Yi Liu, Charles Lamb, Andrew Wang and Uma Maheswara Rao 
G.
 The design document incorporates feedback from many community members: Chris 
Nauroth, Andrew Purtell, Tianyou Li, Avik Dey, Charles Lamb, Alejandro, Andrew 
Wang, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.
 Code reviewers on individual patches include Chris Nauroth, Alejandro, Andrew 
Wang, Charles Lamb, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.

  Also thanks to Dhruba for bringing up this JIRA and thanks to others who 
participated for discussions.
This vote will run for a week and close on 5/21/2014 at 06:16 pm IST.

Here is my +1 to start with.
Regards,
Uma
(umamah...@apache.orgmailto:umamah...@apache.org)

RE: Please hold off the commits into HDFS and Common for some time

2014-05-21 Thread Gangumalla, Uma

Hi,

I have done the merge. Please proceed with your commits.

Regards,
Uma

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, May 21, 2014 7:12 PM
To: hdfs-dev@hadoop.apache.org
Subject: Please hold off the commits into HDFS and Common for some time

Hi Committers,

Please hold the commits to trunk for some time until I notify as I am merging 
the HDFS-2006 branch into trunk.

Regards,
Uma

[Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

2014-05-15 Thread Gangumalla, Uma

Hello HDFS Devs,
I would like to call for a vote to merge the HDFS Extended Attributes
(XAttrs) feature from the HDFS-2006 branch to the trunk.
XAttrs are already widely supported on many operating systems, including
Linux, Windows, and Mac OS. This will allow storing attributes for HDFS
file/directory.
XAttr consist of a name and a value and exist in one of 4 namespaces: user,
trusted, security, and system. An XAttr name is prefixed with one of these
namespaces, so for example, user.myxattr.
Consistent with ongoing awareness of Namenode memory usage, the maximum
number and size of XAttrs on a file/directory are limited by a configuration
parameter.
The design document contains more details and can be found here:
https://issues.apache.org/jira/secure/attachment/12644341/HDFS-XAttrs-Design-3.pdf
Development of this feature has been tracked in JIRA HDFS-2006:
https://issues.apache.org/jira/browse/HDFS-2006
All of the development work for the feature is contained in the HDFS-2006
branch: https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2006
As last tasks, we are working to support XAttrs via libhdfs, webhdfs as well
as other minor improvements.
We intend to finish those enhancements before the vote completes and
otherwise we could move them to top-level JIRAs as they can be tracked
independently. User document is also ready for this feature.
Here the doc attached in JIRA:
https://issues.apache.org/jira/secure/attachment/12644787/ExtendedAttributes.html
The XAttrs feature is backwards-compatible and enabled by default. A cluster
administrator can disable it.
Testing:
We've developed more than 70 new tests which cover the XAttrs get, set and
remove APIs through DistributedFileSystem and WebHdfsFileSystem, the new XAttr
CLI commands, HA, XAttr persistence in the fsimage and related.
Additional testing plans are documented in:
https://issues.apache.org/jira/secure/attachment/12644342/Test-Plan-for-Extended-Attributes-1.pdf
Thanks a lot to the contributors who have helped and participated in the
branch development.
Code contributors are Yi Liu, Charles Lamb, Andrew Wang and Uma Maheswara Rao
G.
The design document incorporates feedback from many community members: Chris
Nauroth, Andrew Purtell, Tianyou Li, Avik Dey, Charles Lamb, Alejandro, Andrew
Wang, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.
Code reviewers on individual patches include Chris Nauroth, Alejandro, Andrew
Wang, Charles Lamb, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.

Also thanks to Dhruba for bringing up this JIRA and thanks to others who
participated for discussions.
This vote will run for a week and close on 5/21/2014 at 06:16 pm IST.

Here is my +1 to start with.
Regards,
Uma
(umamah...@apache.orgmailto:umamah...@apache.org)

39 matches

Mail list logo