This week's APAC Hadoop storage community online sync

2020-03-10 Thread Wei-Chiu Chuang
Hi!

Gentle reminder: Tomorrow's the APAC Hadoop storage community online sync.

Date/time:
March 12th 1PM (China) / 2PM (Japan) / 10:30AM (India)
March 11th 10PM (US West Coast)

Zoom link: https://cloudera.zoom.us/j/880548968

Past meeting minutes:
https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing

See you tomorrow!


This week's Hadoop storage community online sync (APAC Mandarin)

2020-02-26 Thread Wei-Chiu Chuang
Hi!

It's that time again. I'd like to lead this week's APAC Mandarin community
sync discussion. There are a few things to discuss/announce:

(1) user-zh mailing list.
(2) Fate of Hadoop 2.x / Hadoop 3.x adoption.
(3) Apache jiras pending reviews

Zoom link: https://cloudera.zoom.us/j/880548968

Time/Date:
Feb 26 10PM (US West Coast) / Feb 27 2PM (Beijing)

Past meeting minutes:
https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing


Re: This week's Hadoop storage community online sync

2019-10-30 Thread Wei-Chiu Chuang
Thanks Yiqun for sharing with us this morning.
The following is my note taken today. Feel free to update the Google doc in
case I missed something.

https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
11/30/2019

10/30

Attendee: Yiqun, Weichiu, Chao, Matt, Craig, Pifta

Yiqun presented the configurations and liras that help optimize and
stabilize large scale clusters at eBay.

HDFS-13183 was brought up. It runs well at eBay. Even though community
think Consistent Read from Standby would supersede this improvement,
there’s still value to add it since there can be users who do not use CRFS
or are using older releases.

eBay: Queue time < 10 ms

Uber avg Queue time > 100 ms

eBay is looking into consistent read from standby soon. RBF or Ozone
appears to be a big change so not considered. Additionally, federation is
running well at eBay so not looking into RBF now.

Looking into upgrading to Hadoop 2.9 or Hadoop 3

Ozone tested by Pinduoduo and JD in China.

Talked about recent upstream Hadoop development: Namenode fine grained
locking, OpenTracing, JDK11


On Wed, Oct 30, 2019 at 9:30 AM Wei-Chiu Chuang  wrote:

> Gentle reminder. Yiqun will present in 30 minutes!
>
> On Mon, Oct 28, 2019 at 7:41 PM Wei-Chiu Chuang 
> wrote:
>
>> Hello, I am super stoked to have Yiqun Lin with us this Wednesday morning
>> Oct 30 US Pacific 10am/CET (Budapest) 6pm/ IST (Banglore) 10:30pm/ CST
>> (Beijing) Oct 31 1am / JST (Tokyo) 2am to talk about “HDFS Cluster
>> Optimization in eBay” — Yiqun happens to be in the bay area this week and
>> this is the same talk that he is going to present Tuesday night at Yahoo
>> this week.
>>
>> HDFS Cluster Optimization in eBay
>>
>>
>>
>> Yiqun Lin, Hadoop Team, eBay + Apache Hadoop Committer / PMC member
>>> On eBay, we have many large HDFS clusters with thousands of nodes. We
>>> face many stability/data availability problems in our cluster. Today we
>>> want to share some optimizations we did in the system layer or HDFS level
>>> to improve our clusters. Besides, that makes our cluster more stable than
>>> before.
>>
>>
>> Past meeting notes and zoom link:
>>
>> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>>
>> Best,
>> Weichiu
>>
>


Re: This week's Hadoop storage community online sync

2019-10-30 Thread Wei-Chiu Chuang
Gentle reminder. Yiqun will present in 30 minutes!

On Mon, Oct 28, 2019 at 7:41 PM Wei-Chiu Chuang  wrote:

> Hello, I am super stoked to have Yiqun Lin with us this Wednesday morning
> Oct 30 US Pacific 10am/CET (Budapest) 6pm/ IST (Banglore) 10:30pm/ CST
> (Beijing) Oct 31 1am / JST (Tokyo) 2am to talk about “HDFS Cluster
> Optimization in eBay” — Yiqun happens to be in the bay area this week and
> this is the same talk that he is going to present Tuesday night at Yahoo
> this week.
>
> HDFS Cluster Optimization in eBay
>
>
>
> Yiqun Lin, Hadoop Team, eBay + Apache Hadoop Committer / PMC member
>> On eBay, we have many large HDFS clusters with thousands of nodes. We
>> face many stability/data availability problems in our cluster. Today we
>> want to share some optimizations we did in the system layer or HDFS level
>> to improve our clusters. Besides, that makes our cluster more stable than
>> before.
>
>
> Past meeting notes and zoom link:
>
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>
> Best,
> Weichiu
>


This week's Hadoop storage community online sync

2019-10-28 Thread Wei-Chiu Chuang
Hello, I am super stoked to have Yiqun Lin with us this Wednesday morning
Oct 30 US Pacific 10am/CET (Budapest) 6pm/ IST (Banglore) 10:30pm/ CST
(Beijing) Oct 31 1am / JST (Tokyo) 2am to talk about “HDFS Cluster
Optimization in eBay” — Yiqun happens to be in the bay area this week and
this is the same talk that he is going to present Tuesday night at Yahoo
this week.

HDFS Cluster Optimization in eBay



Yiqun Lin, Hadoop Team, eBay + Apache Hadoop Committer / PMC member
> On eBay, we have many large HDFS clusters with thousands of nodes. We face
> many stability/data availability problems in our cluster. Today we want to
> share some optimizations we did in the system layer or HDFS level to
> improve our clusters. Besides, that makes our cluster more stable than
> before.


Past meeting notes and zoom link:
https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing

Best,
Weichiu


Re: Hadoop storage community online sync

2019-08-22 Thread Matt Foley
+1 for publishing notes.  Thanks!

On Aug 21, 2019, at 4:16 PM, Aaron Fabbri  wrote:

Thank you Wei-Chiu for organizing this and sending out notes!

On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang mailto:weic...@apache.org>> wrote:

> We had a great turnout today, thanks to Konstantin for leading the
> discussion of the NameNode Fine-Grained Locking proposal.
> 
> There were at least 16 participants joined the call.
> 
> Today's summary can be found here:
> 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#
> 
> 8/19/2019
> 
> We are moving the sync to 10AM US PDT!
> 
> NameNode Fine-Grained Locking via InMemory Namespace Partitioning
> 
> Attendee:
> 
> Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
> Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
> Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.
> 
> Konstantin lead the discussion of HDFS-14703
>  >.
> 
> There are three important parts:
> 
> (1) Partition namespace into multiple GSet, different part of namespace can
> be processed in parallel.
> 
> (2) INode Key
> 
> (3) Latch lock
> 
> How to support snapshot —> should be able to get partitioned similarly.
> 
> Balance partition strategies: several possible ways. Dynamic partition
> strategy, Static partitioning strategy —> no need a higher level navigation
> lock.
> 
> Dynamic strategy: starting with 1, and grow.
> 
> And: why does the design doc use static partitioning? determining the size
> of partitions is hard. what about starting with 1024 partitions.
> 
> Hotspot problem
> 
> A related task, HDFS-14617
>  > (Improve fsimage load
> time by writing sub-sections to the fsimage index) writes multiple inode
> sections and inode directory sections, and load sections in parallel. It
> sounds like we can combine it with the fine-grained locking and partition
> inode/inode directory sections by the namespace partitions.
> 
> Anu: snapshot complicates design. Renames. Copy on write?
> 
> Anu: suggest to implement this feature without snapshot support to simplify
> design and implementation.
> 
> Konstantin: will develop in a feature branch. Feel free to pick up jiras or
> share thoughts.
> 
> FoldedTreeSet implemented in HDFS-9260
>  > is relevant. Need to fix
> or revert before developing the namespace partitioning feature.
> 
> On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang  >
> wrote:
> 
>> For this week,
>> We will have Konstantin and the LinkedIn folks to discuss a recent
> project
>> that's been baking for quite a while. This is an exciting project as it
> has
>> the potential to improve NameNode's throughput by 40%.
>> 
>> HDFS-14703 > > NameNode
>> Fine-Grained Locking
>> 
>> Access instruction, and the past sync notes are available here:
>> 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>  
> 
>> 
>> Reminder: We have Bi-weekly Hadoop storage online sync every other
>> Wednesday.
>> If there are no objections, I'd like to move the time to 10AM US pacific
>> time (GMT-8)



Re: Hadoop storage community online sync

2019-08-21 Thread Aaron Fabbri
Thank you Wei-Chiu for organizing this and sending out notes!

On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang  wrote:

> We had a great turnout today, thanks to Konstantin for leading the
> discussion of the NameNode Fine-Grained Locking proposal.
>
> There were at least 16 participants joined the call.
>
> Today's summary can be found here:
>
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#
>
> 8/19/2019
>
> We are moving the sync to 10AM US PDT!
>
> NameNode Fine-Grained Locking via InMemory Namespace Partitioning
>
> Attendee:
>
> Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
> Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
> Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.
>
> Konstantin lead the discussion of HDFS-14703
> .
>
> There are three important parts:
>
> (1) Partition namespace into multiple GSet, different part of namespace can
> be processed in parallel.
>
> (2) INode Key
>
> (3) Latch lock
>
> How to support snapshot —> should be able to get partitioned similarly.
>
> Balance partition strategies: several possible ways. Dynamic partition
> strategy, Static partitioning strategy —> no need a higher level navigation
> lock.
>
> Dynamic strategy: starting with 1, and grow.
>
> And: why does the design doc use static partitioning? determining the size
> of partitions is hard. what about starting with 1024 partitions.
>
> Hotspot problem
>
> A related task, HDFS-14617
>  (Improve fsimage load
> time by writing sub-sections to the fsimage index) writes multiple inode
> sections and inode directory sections, and load sections in parallel. It
> sounds like we can combine it with the fine-grained locking and partition
> inode/inode directory sections by the namespace partitions.
>
> Anu: snapshot complicates design. Renames. Copy on write?
>
> Anu: suggest to implement this feature without snapshot support to simplify
> design and implementation.
>
> Konstantin: will develop in a feature branch. Feel free to pick up jiras or
> share thoughts.
>
> FoldedTreeSet implemented in HDFS-9260
>  is relevant. Need to fix
> or revert before developing the namespace partitioning feature.
>
> On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang 
> wrote:
>
> > For this week,
> > We will have Konstantin and the LinkedIn folks to discuss a recent
> project
> > that's been baking for quite a while. This is an exciting project as it
> has
> > the potential to improve NameNode's throughput by 40%.
> >
> > HDFS-14703  NameNode
> > Fine-Grained Locking
> >
> > Access instruction, and the past sync notes are available here:
> >
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
> >
> > Reminder: We have Bi-weekly Hadoop storage online sync every other
> > Wednesday.
> > If there are no objections, I'd like to move the time to 10AM US pacific
> > time (GMT-8)
> >
>


Re: Hadoop storage community online sync

2019-08-21 Thread Wei-Chiu Chuang
We had a great turnout today, thanks to Konstantin for leading the
discussion of the NameNode Fine-Grained Locking proposal.

There were at least 16 participants joined the call.

Today's summary can be found here:
https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#

8/19/2019

We are moving the sync to 10AM US PDT!

NameNode Fine-Grained Locking via InMemory Namespace Partitioning

Attendee:

Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.

Konstantin lead the discussion of HDFS-14703
.

There are three important parts:

(1) Partition namespace into multiple GSet, different part of namespace can
be processed in parallel.

(2) INode Key

(3) Latch lock

How to support snapshot —> should be able to get partitioned similarly.

Balance partition strategies: several possible ways. Dynamic partition
strategy, Static partitioning strategy —> no need a higher level navigation
lock.

Dynamic strategy: starting with 1, and grow.

And: why does the design doc use static partitioning? determining the size
of partitions is hard. what about starting with 1024 partitions.

Hotspot problem

A related task, HDFS-14617
 (Improve fsimage load
time by writing sub-sections to the fsimage index) writes multiple inode
sections and inode directory sections, and load sections in parallel. It
sounds like we can combine it with the fine-grained locking and partition
inode/inode directory sections by the namespace partitions.

Anu: snapshot complicates design. Renames. Copy on write?

Anu: suggest to implement this feature without snapshot support to simplify
design and implementation.

Konstantin: will develop in a feature branch. Feel free to pick up jiras or
share thoughts.

FoldedTreeSet implemented in HDFS-9260
 is relevant. Need to fix
or revert before developing the namespace partitioning feature.

On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang 
wrote:

> For this week,
> We will have Konstantin and the LinkedIn folks to discuss a recent project
> that's been baking for quite a while. This is an exciting project as it has
> the potential to improve NameNode's throughput by 40%.
>
> HDFS-14703  NameNode
> Fine-Grained Locking
>
> Access instruction, and the past sync notes are available here:
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>
> Reminder: We have Bi-weekly Hadoop storage online sync every other
> Wednesday.
> If there are no objections, I'd like to move the time to 10AM US pacific
> time (GMT-8)
>


Re: Hadoop storage community online sync

2019-08-20 Thread Wei-Chiu Chuang
Great question!
Currently Pacific Daylight Saving Time is UTC-7, and Pacific Standard Time,
UTC-8 doesn't start until November 3rd.
I am being too US-centric, but if the purpose is to invite more people,
where many of them are US west coast based, we should do this following the
US pacific time zone (probably more specifically, California)

So GMT-7 it is.

On Mon, Aug 19, 2019 at 11:16 PM Akira Ajisaka  wrote:

> Thank you for the information.
>
> Now US pacific time is GMT-7, isn't it?
>
> -Akira
>
> On Tue, Aug 20, 2019 at 6:56 AM Wei-Chiu Chuang
>  wrote:
> >
> > For this week,
> > We will have Konstantin and the LinkedIn folks to discuss a recent
> project that's been baking for quite a while. This is an exciting project
> as it has the potential to improve NameNode's throughput by 40%.
> >
> > HDFS-14703 NameNode Fine-Grained Locking
> >
> > Access instruction, and the past sync notes are available here:
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
> >
> > Reminder: We have Bi-weekly Hadoop storage online sync every other
> Wednesday.
> > If there are no objections, I'd like to move the time to 10AM US pacific
> time (GMT-8)
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: Hadoop storage community online sync

2019-08-20 Thread Akira Ajisaka
Thank you for the information.

Now US pacific time is GMT-7, isn't it?

-Akira

On Tue, Aug 20, 2019 at 6:56 AM Wei-Chiu Chuang
 wrote:
>
> For this week,
> We will have Konstantin and the LinkedIn folks to discuss a recent project 
> that's been baking for quite a while. This is an exciting project as it has 
> the potential to improve NameNode's throughput by 40%.
>
> HDFS-14703 NameNode Fine-Grained Locking
>
> Access instruction, and the past sync notes are available here: 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>
> Reminder: We have Bi-weekly Hadoop storage online sync every other Wednesday.
> If there are no objections, I'd like to move the time to 10AM US pacific time 
> (GMT-8)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Hadoop storage community online sync

2019-08-19 Thread Wei-Chiu Chuang
For this week,
We will have Konstantin and the LinkedIn folks to discuss a recent project
that's been baking for quite a while. This is an exciting project as it has
the potential to improve NameNode's throughput by 40%.

HDFS-14703  NameNode
Fine-Grained Locking

Access instruction, and the past sync notes are available here:
https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing

Reminder: We have Bi-weekly Hadoop storage online sync every other
Wednesday.
If there are no objections, I'd like to move the time to 10AM US pacific
time (GMT-8)