Re: Hadoop storage community online sync

2019-08-22 Thread Matt Foley
+1 for publishing notes.  Thanks!

On Aug 21, 2019, at 4:16 PM, Aaron Fabbri  wrote:

Thank you Wei-Chiu for organizing this and sending out notes!

On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang mailto:weic...@apache.org>> wrote:

> We had a great turnout today, thanks to Konstantin for leading the
> discussion of the NameNode Fine-Grained Locking proposal.
> 
> There were at least 16 participants joined the call.
> 
> Today's summary can be found here:
> 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#
> 
> 8/19/2019
> 
> We are moving the sync to 10AM US PDT!
> 
> NameNode Fine-Grained Locking via InMemory Namespace Partitioning
> 
> Attendee:
> 
> Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
> Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
> Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.
> 
> Konstantin lead the discussion of HDFS-14703
>  >.
> 
> There are three important parts:
> 
> (1) Partition namespace into multiple GSet, different part of namespace can
> be processed in parallel.
> 
> (2) INode Key
> 
> (3) Latch lock
> 
> How to support snapshot —> should be able to get partitioned similarly.
> 
> Balance partition strategies: several possible ways. Dynamic partition
> strategy, Static partitioning strategy —> no need a higher level navigation
> lock.
> 
> Dynamic strategy: starting with 1, and grow.
> 
> And: why does the design doc use static partitioning? determining the size
> of partitions is hard. what about starting with 1024 partitions.
> 
> Hotspot problem
> 
> A related task, HDFS-14617
>  > (Improve fsimage load
> time by writing sub-sections to the fsimage index) writes multiple inode
> sections and inode directory sections, and load sections in parallel. It
> sounds like we can combine it with the fine-grained locking and partition
> inode/inode directory sections by the namespace partitions.
> 
> Anu: snapshot complicates design. Renames. Copy on write?
> 
> Anu: suggest to implement this feature without snapshot support to simplify
> design and implementation.
> 
> Konstantin: will develop in a feature branch. Feel free to pick up jiras or
> share thoughts.
> 
> FoldedTreeSet implemented in HDFS-9260
>  > is relevant. Need to fix
> or revert before developing the namespace partitioning feature.
> 
> On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang  >
> wrote:
> 
>> For this week,
>> We will have Konstantin and the LinkedIn folks to discuss a recent
> project
>> that's been baking for quite a while. This is an exciting project as it
> has
>> the potential to improve NameNode's throughput by 40%.
>> 
>> HDFS-14703 > > NameNode
>> Fine-Grained Locking
>> 
>> Access instruction, and the past sync notes are available here:
>> 
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
>  
> 
>> 
>> Reminder: We have Bi-weekly Hadoop storage online sync every other
>> Wednesday.
>> If there are no objections, I'd like to move the time to 10AM US pacific
>> time (GMT-8)



Re: Hadoop Community Sync Up Schedule

2019-08-22 Thread Matt Foley
Wangda and Eric,
We can express the intent, I think, by scheduling two recurring meetings:
- monthly, on the 2nd Wednesday, and
- monthly, on the 4th Wednesday.

This is pretty easy to understand, and not too onerous to maintain.
But I’m okay with simple bi-weekly too.

I’m neutral on 10 vs 11am, PDT.  But do all participants presently use Daylight 
Savings and change on the same date?  Because I can’t conveniently do 9am PST, 
so if it needs to stay fixed in, say, India Timezone, then 11am PDT / 10am PST 
would be best.
—Matt

On Aug 21, 2019, at 7:33 AM, epa...@apache.org wrote:

Let's go with bi-weekly (every 2 weeks). Sometimes this gives us 3 sync-ups in 
one month, which I think is fine.
-Eric Payne

On Wednesday, August 21, 2019, 5:01:52 AM CDT, Wangda Tan mailto:wheele...@gmail.com>> wrote: 
> 
> For folks in other US time zones: how about 11am PDT, is it better or 10am
> PDT will be better? I will be fine with both.
> 
> Hi Matt,
> 
> Thanks for mentioning this issue, this is the exactly issue I saw 🤣.
> 
> Basically there’re two options:
> 
> - a. weekly, bi-weekly (for odd/even week) and every four months.
> - b. weekly, 1st/3rd week or 2nd/4th week, x-th week monthly.
> 
> I’m not sure which one is easier for people to understand as the issue you
> mentioned.
> 
> After thinking about it. I prefer a. since it is more consistent for
> audience and not disrupted because of calendar.
> 
> If we choose a. I will redo the proposal and make it aligns with a.
> 
> Thoughts?
> 
> Thanks,
> Wangda


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org 

For additional commands, e-mail: common-dev-h...@hadoop.apache.org