Re: Hadoop storage community online sync

2019-08-22 Thread Matt Foley
+1 for publishing notes.  Thanks!

On Aug 21, 2019, at 4:16 PM, Aaron Fabbri  wrote:

Thank you Wei-Chiu for organizing this and sending out notes!

On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang mailto:weic...@apache.org>> wrote:

> We had a great turnout today, thanks to Konstantin for leading the
> discussion of the NameNode Fine-Grained Locking proposal.
> 
> There were at least 16 participants joined the call.
> 
> Today's summary can be found here:
> 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#
> 
> 8/19/2019
> 
> We are moving the sync to 10AM US PDT!
> 
> NameNode Fine-Grained Locking via InMemory Namespace Partitioning
> 
> Attendee:
> 
> Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
> Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
> Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.
> 
> Konstantin lead the discussion of HDFS-14703
>  >.
> 
> There are three important parts:
> 
> (1) Partition namespace into multiple GSet, different part of namespace can
> be processed in parallel.
> 
> (2) INode Key
> 
> (3) Latch lock
> 
> How to support snapshot —> should be able to get partitioned similarly.
> 
> Balance partition strategies: several possible ways. Dynamic partition
> strategy, Static partitioning strategy —> no need a higher level navigation
> lock.
> 
> Dynamic strategy: starting with 1, and grow.
> 
> And: why does the design doc use static partitioning? determining the size
> of partitions is hard. what about starting with 1024 partitions.
> 
> Hotspot problem
> 
> A related task, HDFS-14617
>  > (Improve fsimage load
> time by writing sub-sections to the fsimage index) writes multiple inode
> sections and inode directory sections, and load sections in parallel. It
> sounds like we can combine it with the fine-grained locking and partition
> inode/inode directory sections by the namespace partitions.
> 
> Anu: snapshot complicates design. Renames. Copy on write?
> 
> Anu: suggest to implement this feature without snapshot support to simplify
> design and implementation.
> 
> Konstantin: will develop in a feature branch. Feel free to pick up jiras or
> share thoughts.
> 
> FoldedTreeSet implemented in HDFS-9260
>  > is relevant. Need to fix
> or revert before developing the namespace partitioning feature.
> 
> On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang  >
> wrote:
> 
>> For this week,
>> We will have Konstantin and the LinkedIn folks to discuss a recent
> project
>> that's been baking for quite a while. This is an exciting project as it
> has
>> the potential to improve NameNode's throughput by 40%.
>> 
>> HDFS-14703 > > NameNode
>> Fine-Grained Locking
>> 
>> Access instruction, and the past sync notes are available here:
>> 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>  
> 
>> 
>> Reminder: We have Bi-weekly Hadoop storage online sync every other
>> Wednesday.
>> If there are no objections, I'd like to move the time to 10AM US pacific
>> time (GMT-8)



Re: Hadoop storage community online sync

2019-08-21 Thread Aaron Fabbri
Thank you Wei-Chiu for organizing this and sending out notes!

On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang  wrote:

> We had a great turnout today, thanks to Konstantin for leading the
> discussion of the NameNode Fine-Grained Locking proposal.
>
> There were at least 16 participants joined the call.
>
> Today's summary can be found here:
>
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#
>
> 8/19/2019
>
> We are moving the sync to 10AM US PDT!
>
> NameNode Fine-Grained Locking via InMemory Namespace Partitioning
>
> Attendee:
>
> Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
> Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
> Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.
>
> Konstantin lead the discussion of HDFS-14703
> .
>
> There are three important parts:
>
> (1) Partition namespace into multiple GSet, different part of namespace can
> be processed in parallel.
>
> (2) INode Key
>
> (3) Latch lock
>
> How to support snapshot —> should be able to get partitioned similarly.
>
> Balance partition strategies: several possible ways. Dynamic partition
> strategy, Static partitioning strategy —> no need a higher level navigation
> lock.
>
> Dynamic strategy: starting with 1, and grow.
>
> And: why does the design doc use static partitioning? determining the size
> of partitions is hard. what about starting with 1024 partitions.
>
> Hotspot problem
>
> A related task, HDFS-14617
>  (Improve fsimage load
> time by writing sub-sections to the fsimage index) writes multiple inode
> sections and inode directory sections, and load sections in parallel. It
> sounds like we can combine it with the fine-grained locking and partition
> inode/inode directory sections by the namespace partitions.
>
> Anu: snapshot complicates design. Renames. Copy on write?
>
> Anu: suggest to implement this feature without snapshot support to simplify
> design and implementation.
>
> Konstantin: will develop in a feature branch. Feel free to pick up jiras or
> share thoughts.
>
> FoldedTreeSet implemented in HDFS-9260
>  is relevant. Need to fix
> or revert before developing the namespace partitioning feature.
>
> On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang 
> wrote:
>
> > For this week,
> > We will have Konstantin and the LinkedIn folks to discuss a recent
> project
> > that's been baking for quite a while. This is an exciting project as it
> has
> > the potential to improve NameNode's throughput by 40%.
> >
> > HDFS-14703  NameNode
> > Fine-Grained Locking
> >
> > Access instruction, and the past sync notes are available here:
> >
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
> >
> > Reminder: We have Bi-weekly Hadoop storage online sync every other
> > Wednesday.
> > If there are no objections, I'd like to move the time to 10AM US pacific
> > time (GMT-8)
> >
>


Re: Hadoop storage community online sync

2019-08-21 Thread Wei-Chiu Chuang
We had a great turnout today, thanks to Konstantin for leading the
discussion of the NameNode Fine-Grained Locking proposal.

There were at least 16 participants joined the call.

Today's summary can be found here:
https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#

8/19/2019

We are moving the sync to 10AM US PDT!

NameNode Fine-Grained Locking via InMemory Namespace Partitioning

Attendee:

Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.

Konstantin lead the discussion of HDFS-14703
.

There are three important parts:

(1) Partition namespace into multiple GSet, different part of namespace can
be processed in parallel.

(2) INode Key

(3) Latch lock

How to support snapshot —> should be able to get partitioned similarly.

Balance partition strategies: several possible ways. Dynamic partition
strategy, Static partitioning strategy —> no need a higher level navigation
lock.

Dynamic strategy: starting with 1, and grow.

And: why does the design doc use static partitioning? determining the size
of partitions is hard. what about starting with 1024 partitions.

Hotspot problem

A related task, HDFS-14617
 (Improve fsimage load
time by writing sub-sections to the fsimage index) writes multiple inode
sections and inode directory sections, and load sections in parallel. It
sounds like we can combine it with the fine-grained locking and partition
inode/inode directory sections by the namespace partitions.

Anu: snapshot complicates design. Renames. Copy on write?

Anu: suggest to implement this feature without snapshot support to simplify
design and implementation.

Konstantin: will develop in a feature branch. Feel free to pick up jiras or
share thoughts.

FoldedTreeSet implemented in HDFS-9260
 is relevant. Need to fix
or revert before developing the namespace partitioning feature.

On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang 
wrote:

> For this week,
> We will have Konstantin and the LinkedIn folks to discuss a recent project
> that's been baking for quite a while. This is an exciting project as it has
> the potential to improve NameNode's throughput by 40%.
>
> HDFS-14703  NameNode
> Fine-Grained Locking
>
> Access instruction, and the past sync notes are available here:
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>
> Reminder: We have Bi-weekly Hadoop storage online sync every other
> Wednesday.
> If there are no objections, I'd like to move the time to 10AM US pacific
> time (GMT-8)
>


Re: Hadoop storage community online sync

2019-08-20 Thread Wei-Chiu Chuang
Great question!
Currently Pacific Daylight Saving Time is UTC-7, and Pacific Standard Time,
UTC-8 doesn't start until November 3rd.
I am being too US-centric, but if the purpose is to invite more people,
where many of them are US west coast based, we should do this following the
US pacific time zone (probably more specifically, California)

So GMT-7 it is.

On Mon, Aug 19, 2019 at 11:16 PM Akira Ajisaka  wrote:

> Thank you for the information.
>
> Now US pacific time is GMT-7, isn't it?
>
> -Akira
>
> On Tue, Aug 20, 2019 at 6:56 AM Wei-Chiu Chuang
>  wrote:
> >
> > For this week,
> > We will have Konstantin and the LinkedIn folks to discuss a recent
> project that's been baking for quite a while. This is an exciting project
> as it has the potential to improve NameNode's throughput by 40%.
> >
> > HDFS-14703 NameNode Fine-Grained Locking
> >
> > Access instruction, and the past sync notes are available here:
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
> >
> > Reminder: We have Bi-weekly Hadoop storage online sync every other
> Wednesday.
> > If there are no objections, I'd like to move the time to 10AM US pacific
> time (GMT-8)
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: Hadoop storage community online sync

2019-08-19 Thread Akira Ajisaka
Thank you for the information.

Now US pacific time is GMT-7, isn't it?

-Akira

On Tue, Aug 20, 2019 at 6:56 AM Wei-Chiu Chuang
 wrote:
>
> For this week,
> We will have Konstantin and the LinkedIn folks to discuss a recent project 
> that's been baking for quite a while. This is an exciting project as it has 
> the potential to improve NameNode's throughput by 40%.
>
> HDFS-14703 NameNode Fine-Grained Locking
>
> Access instruction, and the past sync notes are available here: 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>
> Reminder: We have Bi-weekly Hadoop storage online sync every other Wednesday.
> If there are no objections, I'd like to move the time to 10AM US pacific time 
> (GMT-8)

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org