Forgot to update this thread. I branched off 2.8 last week. So, we can now go 
ahead and do a merge of HDFS-7285 into branch-2 (version 2.9) like we discussed 
before.

Thanks
+Vinod


> On Nov 3, 2015, at 4:40 PM, Vinod Kumar Vavilapalli <vino...@hortonworks.com> 
> wrote:
> 
> That makes sense.
> 
> Thanks for the discussion everyone, let’s stick to this tentative plan of EC 
> for 2.9.
> 
> I just updated the Roadmap wiki to reflect the same.
> 
> +Vinod
> 
> 
>> On Nov 2, 2015, at 4:26 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>> 
>> Yeah, so for the issues we recently resolved on trunk and are addressing as 
>> follow-on tasks in Phase I, we would label them with "erasure coding" and 
>> maybe also set the target version as "2.9" for the convenience?
>> 
>> -----Original Message-----
>> From: Jing Zhao [mailto:ji...@apache.org] 
>> Sent: Tuesday, November 03, 2015 8:04 AM
>> To: hdfs-dev@hadoop.apache.org
>> Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 
>> (erasure coding) branch to trunk]
>> 
>> +1 for the plan about Phase I & II.
>> 
>> BTW, maybe out of the scope of this thread, just want to mention we should 
>> either move the jira under HDFS-8031 or update the jira component as 
>> "erasure-coding" when making further improvement or fixing bugs in EC. In 
>> this way it will be easier for later backporting EC to 2.9.
>> 
>> On Mon, Nov 2, 2015 at 3:48 PM, Vinayakumar B <vinayakumarb.apa...@gmail.com
>>> wrote:
>> 
>>> +1 for the idea.
>>> On Nov 3, 2015 07:22, "Zheng, Kai" <kai.zh...@intel.com> wrote:
>>> 
>>>> Sounds good to me. When it's determined to include EC in 2.9 
>>>> release, it may be good to have a rough release date as Zhe asked, 
>>>> so accordingly the scope of EC can be discussed out. We still have 
>>>> quite a few of things as Phase I follow-on tasks to do before EC can 
>>>> be deployed in a production system. Phase II to develop non-striping 
>>>> EC for cold data would possibly
>>> be
>>>> started after that. We might consider to include only Phase I and 
>>>> leave Phase II for next release according to the rough release date.
>>>> 
>>>> Regards,
>>>> Kai
>>>> 
>>>> -----Original Message-----
>>>> From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
>>>> Sent: Tuesday, November 03, 2015 5:41 AM
>>>> To: hdfs-dev@hadoop.apache.org
>>>> Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge 
>>>> HDFS-7285 (erasure coding) branch to trunk]
>>>> 
>>>> +1 for EC to go into 2.9. Yes, 3.x would be long way to go when we 
>>>> +plan to
>>>> have 2.8 and 2.9 releases.
>>>> 
>>>> Regards,
>>>> Uma
>>>> 
>>>> On 11/2/15, 11:49 AM, "Vinod Vavilapalli" <vino...@hortonworks.com>
>>> wrote:
>>>> 
>>>>> Forking the thread. Started looking at the 2.8 list, various 
>>>>> features¹ status and arrived here.
>>>>> 
>>>>> While I understand the pervasive nature of EC and a need for a 
>>>>> significant bake-in, moving this to a 3.x release is not a good idea.
>>>>> We will surely get a 2.8 out this year and, as needed, I can even 
>>>>> spend time getting started on a 2.9. OTOH, 3.x is long ways off, 
>>>>> and given all the incompatibilities there, it would be a while 
>>>>> before users can get their hands on EC if it were to be only on 
>>>>> 3.x. At best, this may force sites that want EC to backport the 
>>>>> entire EC feature to older releases, at worst this will be repeat 
>>>>> the mess of 0.20 security release
>>>> forks.
>>>>> 
>>>>> If we think adding this to 2.8 (even if it switched off) is too 
>>>>> much risk per our original plan, let¹s move this to 2.9, there by 
>>>>> leaving enough time for stability, integration testing and bake-in, 
>>>>> and a realistic chance of having it end up on users¹ clusters soonish.
>>>>> 
>>>>> +Vinod
>>>>> 
>>>>>> On Oct 19, 2015, at 1:44 PM, Andrew Wang 
>>>>>> <andrew.w...@cloudera.com>
>>>>>> wrote:
>>>>>> 
>>>>>> I think our plan thus far has been to target this for 3.0. I'm 
>>>>>> okay with  putting it in branch-2 if we've given a hard look at 
>>>>>> compatibility, but  I'll note though that 2.8 is already looking 
>>>>>> like quite a large release,  and our release bandwidth has been 
>>>>>> focused on the 2.6 and 2.7 maintenance  releases. Adding another 
>>>>>> multi-hundred JIRAs to 2.8 might make it too  unwieldy to get out 
>>>>>> the door. If we bump EC past that, 3.0 might very well  be our 
>>>>>> next release vehicle. I do plan to revive the 3.0 schedule some 
>>>>>> time  next year. With EC and
>>>>>> JDK8 in a good spot, the only big feature remaining  is classpath 
>>>>>> isolation.
>>>>>> 
>>>>>> EC is also a pretty fundamental change to HDFS. Even if it's 
>>>>>> compatible, in  terms of size and impact it might best belong in a 
>>>>>> new major release.
>>>>>> 
>>>>>> Best,
>>>>>> Andrew
>>>>>> 
>>>>>> On Fri, Oct 16, 2015 at 7:04 PM, Vinayakumar B < 
>>>>>> vinayakumarb.apa...@gmail.com> wrote:
>>>>>> 
>>>>>>> Is anyone else also thinks that feature is ready to goto 
>>>>>>> branch-2 as well?
>>>>>>> 
>>>>>>> Its > 2 weeks EC landed on trunk. IMo Looks Its quite stable 
>>>>>>> since then and  ready to go in branch-2.
>>>>>>> 
>>>>>>> -Vinay
>>>>>>> On Oct 6, 2015 12:51 AM, "Zhe Zhang" <zhezh...@cloudera.com> wrote:
>>>>>>> 
>>>>>>>> Thanks Vinay for capturing the issue and Uma for offering the help.
>>>>>>>> 
>>>>>>>> ---
>>>>>>>> Zhe Zhang
>>>>>>>> 
>>>>>>>> On Mon, Oct 5, 2015 at 12:19 PM, Gangumalla, Uma <
>>>>>>> uma.ganguma...@intel.com
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Vinay,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I would merge them as part of HDFS-9182.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Uma
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 10/5/15, 12:48 AM, "Vinayakumar B" 
>>>>>>>>> <vinayakum...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Andrew,
>>>>>>>>>> I see CHANGES.txt entries not yet merged from
>>>>>>> CHANGES-HDFS-EC-7285.txt.
>>>>>>>>>> 
>>>>>>>>>> Was this intentional?
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Vinay
>>>>>>>>>> 
>>>>>>>>>> On Wed, Sep 30, 2015 at 9:15 PM, Andrew Wang <
>>>>>>> andrew.w...@cloudera.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Branch has been merged to trunk, thanks again to everyone 
>>>>>>>>>>> who worked
>>>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>> feature!
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Sep 29, 2015 at 10:44 PM, Zhe Zhang 
>>>>>>>>>>> <zhezh...@cloudera.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks everyone who has participated in this discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> With 7 +1's (5 binding and 2 non-binding), and no -1, this 
>>>>>>>>>>>> vote
>>>>>>> has
>>>>>>>>>>> passed.
>>>>>>>>>>>> I will do a final 'git merge' with trunk and work with 
>>>>>>>>>>>> Andrew to
>>>>>>>> merge
>>>>>>>>>>> the
>>>>>>>>>>>> branch to trunk. I'll update on this thread when the merge 
>>>>>>>>>>>> is
>>>>>>> done.
>>>>>>>>>>>> 
>>>>>>>>>>>> ---
>>>>>>>>>>>> Zhe Zhang
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Sep 24, 2015 at 11:08 PM, Liu, Yi A 
>>>>>>>>>>>> <yi.a....@intel.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> (Change it to binding.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> I have been involved in the development and code review on 
>>>>>>>>>>>>> the
>>>>>>>>>>> feature
>>>>>>>>>>>>> branch. It's a great feature and I think it's ready to 
>>>>>>>>>>>>> merge it
>>>>>>>> into
>>>>>>>>>>>> trunk.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks all for the contribution.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Yi Liu
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Liu, Yi A
>>>>>>>>>>>>> Sent: Friday, September 25, 2015 1:51 PM
>>>>>>>>>>>>> To: hdfs-dev@hadoop.apache.org
>>>>>>>>>>>>> Subject: RE: [VOTE] Merge HDFS-7285 (erasure coding) 
>>>>>>>>>>>>> branch to
>>>>>>>> trunk
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>>> I have been involved in the development and code review on 
>>>>>>>>>>>>> the
>>>>>>>>>>> feature
>>>>>>>>>>>>> branch. It's a great feature and I think it's ready to 
>>>>>>>>>>>>> merge it
>>>>>>>> into
>>>>>>>>>>>> trunk.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks all for the contribution.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Yi Liu
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Vinayakumar B [mailto:vinayakum...@apache.org]
>>>>>>>>>>>>> Sent: Friday, September 25, 2015 12:21 PM
>>>>>>>>>>>>> To: hdfs-dev@hadoop.apache.org
>>>>>>>>>>>>> Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) 
>>>>>>>>>>>>> branch to
>>>>>>>> trunk
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I've been involved starting from design and development of
>>>>>>>>>>> ErasureCoding.
>>>>>>>>>>>>> I think phase 1 of this development is ready to be merged 
>>>>>>>>>>>>> to
>>>>>>>> trunk.
>>>>>>>>>>>>> It had come a long way to the current state with 
>>>>>>>>>>>>> significant
>>>>>>>> effort
>>>>>>>>>>> of
>>>>>>>>>>>>> many Contributors and Reviewers for both design and code.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks Everyone for the efforts.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Vinay
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Sep 23, 2015 at 10:53 PM, Jing Zhao 
>>>>>>>>>>>>> <ji...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I've been involved in both development and review on the
>>>>>>> branch,
>>>>>>>>>>> and
>>>>>>>>>>> I
>>>>>>>>>>>>>> believe it's now ready to get merged into trunk. Many 
>>>>>>>>>>>>>> thanks
>>>>>>> to
>>>>>>>>>>> all
>>>>>>>>>>>>>> the contributors and reviewers!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -Jing
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Sep 22, 2015 at 6:17 PM, Zheng, Kai <
>>>>>>>> kai.zh...@intel.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Non-binding +1
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> According to our extensive performance tests, striping +
>>>>>>> ISA-L
>>>>>>>>>>> coder
>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>> erasure coding not only can save storage, but also can
>>>>>>>> increase
>>>>>>>>>>> the
>>>>>>>>>>>>>>> throughput of a client or a cluster. It will be a great
>>>>>>>>>>> addition to
>>>>>>>>>>>>>>> HDFS and its users. Based on the latest branch codes, we
>>>>>>> also
>>>>>>>>>>>>>>> observed it's
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>> reliable in the concurrent tests. We'll provide the perf
>>>>>>> test
>>>>>>>>>>> report
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>> it's sorted out and hope it helps.
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Kai
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
>>>>>>>>>>>>>>> Sent: Wednesday, September 23, 2015 8:50 AM
>>>>>>>>>>>>>>> To: hdfs-dev@hadoop.apache.org;
>>>>>>> common-...@hadoop.apache.org
>>>>>>>>>>>>>>> Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) 
>>>>>>>>>>>>>>> branch
>>>>>>> to
>>>>>>>>>>> trunk
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Great addition to HDFS. Thanks all contributors for the 
>>>>>>>>>>>>>>> nice
>>>>>>>>>>> work.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Uma
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 9/22/15, 3:40 PM, "Zhe Zhang" <zhezh...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'd like to propose a vote to merge the HDFS-7285 
>>>>>>>>>>>>>>>> feature
>>>>>>>>>>> branch
>>>>>>>>>>>>>>>> back to trunk. Since November 2014 we have been 
>>>>>>>>>>>>>>>> designing
>>>>>>> and
>>>>>>>>>>>>>>>> developing this feature under the umbrella JIRAs 
>>>>>>>>>>>>>>>> HDFS-7285
>>>>>>>> and
>>>>>>>>>>>>>>>> HADOOP-11264, and have committed approximately 210 patches.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The HDFS-7285 feature branch was created to support the
>>>>>>> first
>>>>>>>>>>> phase
>>>>>>>>>>>>>>>> of HDFS erasure coding (HDFS-EC). The objective of 
>>>>>>>>>>>>>>>> HDFS-EC
>>>>>>> is
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> significantly reduce storage space usage in HDFS clusters.
>>>>>>>>>>> Instead
>>>>>>>>>>>>>>>> of always creating 3 replicas of each block with 200%
>>>>>>> storage
>>>>>>>>>>> space
>>>>>>>>>>>>>>>> overhead, HDFS-EC provides data durability through 
>>>>>>>>>>>>>>>> parity
>>>>>>>> data
>>>>>>>>>>>> blocks.
>>>>>>>>>>>>>>>> With most EC configurations, the storage overhead is no
>>>>>>> more
>>>>>>>>>>> than
>>>>>>>>>>>> 50%.
>>>>>>>>>>>>>>>> Based on profiling results of production clusters, we
>>>>>>> decided
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> support EC with the striped block layout in the first
>>>>>>> phase,
>>>>>>>> so
>>>>>>>>>>>>>>>> that small files can be better handled. This means 
>>>>>>>>>>>>>>>> dividing
>>>>>>>>>>> each
>>>>>>>>>>>>>>>> logical HDFS file block into smaller units (striping 
>>>>>>>>>>>>>>>> cells)
>>>>>>>> and
>>>>>>>>>>>>>>>> spreading them on a set of DataNodes in round-robin
>>>>>>> fashion.
>>>>>>>>>>> Parity
>>>>>>>>>>>>>>>> cells are generated for each stripe of original data cells.
>>>>>>>> We
>>>>>>>>>>> have
>>>>>>>>>>>>>>>> made changes to NameNode, client, and DataNode to
>>>>>>> generalize
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> block concept and handle the mapping between a logical 
>>>>>>>>>>>>>>>> file
>>>>>>>>>>> block
>>>>>>>>>>>>>>>> and its internal storage blocks. For further details 
>>>>>>>>>>>>>>>> please
>>>>>>>> see
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> design doc on HDFS-7285.
>>>>>>>>>>>>>>>> HADOOP-11264 focuses on providing flexible and
>>>>>>>> high-performance
>>>>>>>>>>>>>>>> codec calculation support.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The nightly Jenkins job of the branch has reported 
>>>>>>>>>>>>>>>> several successful runs, and doesn't show new flaky 
>>>>>>>>>>>>>>>> tests compared
>>>>>>>> with
>>>>>>>>>>>>>>>> trunk. We have posted several versions of the test plan
>>>>>>>>>>> including
>>>>>>>>>>>>>>>> both unit testing and cluster testing, and have 
>>>>>>>>>>>>>>>> executed
>>>>>>> most
>>>>>>>>>>> tests
>>>>>>>>>>>>>>>> in the plan. The most basic functionalities have been
>>>>>>>>>>> extensively
>>>>>>>>>>>>>>>> tested and verified in several real clusters with 
>>>>>>>>>>>>>>>> different hardware configurations; results have been 
>>>>>>>>>>>>>>>> very stable. We
>>>>>>>> have
>>>>>>>>>>>>>>>> created follow-on tasks for more advanced error 
>>>>>>>>>>>>>>>> handling
>>>>>>> and
>>>>>>>>>>>>> optimization under the umbrella HDFS-8031.
>>>>>>>>>>>>>>>> We also plan to implement or harden the integration of 
>>>>>>>>>>>>>>>> EC
>>>>>>>> with
>>>>>>>>>>>>>>>> existing features such as WebHDFS, snapshot, append,
>>>>>>>> truncate,
>>>>>>>>>>>>>>>> hflush, hsync, and so forth.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Development of this feature has been a collaboration 
>>>>>>>>>>>>>>>> across
>>>>>>>>>>> many
>>>>>>>>>>>>>>>> companies and institutions. I'd like to thank J. 
>>>>>>>>>>>>>>>> Andreina,
>>>>>>>>>>> Takanobu
>>>>>>>>>>>>>>>> Asanuma, Vinayakumar B, Li Bo, Takuya Fukudome, Uma
>>>>>>> Maheswara
>>>>>>>>>>> Rao
>>>>>>>>>>>>>>>> G, Rui Li, Yi Liu, Colin McCabe, Xinwei Qin, Rakesh R, 
>>>>>>>>>>>>>>>> Gao
>>>>>>>> Rui,
>>>>>>>>>>> Kai
>>>>>>>>>>>>>>>> Sasaki, Walter Su, Tsz Wo Nicholas Sze, Andrew Wang, 
>>>>>>>>>>>>>>>> Yong
>>>>>>>>>>> Zhang,
>>>>>>>>>>>>>>>> Jing Zhao, Hui Zheng and Kai Zheng for their code
>>>>>>>> contributions
>>>>>>>>>>> and
>>>>>>>>>>>>> reviews.
>>>>>>>>>>>>>>>> Andrew and Kai Zheng also made fundamental 
>>>>>>>>>>>>>>>> contributions to
>>>>>>>> the
>>>>>>>>>>>>>>>> initial design. Rui Li, Gao Rui, Kai Sasaki, Kai Zheng 
>>>>>>>>>>>>>>>> and
>>>>>>>> many
>>>>>>>>>>>>>>>> other contributors have made great efforts in system
>>>>>>> testing.
>>>>>>>>>>> Many
>>>>>>>>>>>>>>>> thanks go to Weihua Jiang for proposing the JIRA, and 
>>>>>>>>>>>>>>>> ATM,
>>>>>>>> Todd
>>>>>>>>>>>>>>>> Lipcon, Silvius Rus, Suresh, as well as many others for
>>>>>>>>>>> providing
>>>>>>>>>>>>> helpful feedbacks.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Following the community convention, this vote will last
>>>>>>> for 7
>>>>>>>>>>> days
>>>>>>>>>>>>>>>> (ending September 29th). Votes from Hadoop committers 
>>>>>>>>>>>>>>>> are
>>>>>>>>>>> binding
>>>>>>>>>>>>>>>> but non-binding votes are very welcome as well. And 
>>>>>>>>>>>>>>>> here's
>>>>>>> my
>>>>>>>>>>>>>>>> non-binding
>>>>>>>>>>>>>> +1.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>> Zhe Zhang
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
> 

Reply via email to