FYI the PR: https://github.com/apache/flink-web/pull/249
A review from the discussion participants would be appreciated.

On Tue, Aug 20, 2019 at 5:29 PM Andrey Zagrebin <and...@ververica.com>
wrote:

> I created an umbrella issue for the code style guide effort and a subtask
> for this discussion:
> https://issues.apache.org/jira/browse/FLINK-13804
> I will also submit a PR to flink-web based on the conclusion.
>
> On Mon, Aug 19, 2019 at 6:15 PM Stephan Ewen <se...@apache.org> wrote:
>
>> @Andrey Will you open a PR to add this to the code style?
>>
>> On Mon, Aug 19, 2019 at 11:51 AM Andrey Zagrebin <and...@ververica.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > It looks like this proposal has an approval and we can conclude this
>> > discussion.
>> > Additionally, I agree with Piotr we should really force the proven good
>> > reasoning for setting the capacity to avoid confusion, redundancy and
>> other
>> > already mentioned things while reading and maintaining the code.
>> > Ideally the need of setting the capacity should be either immediately
>> clear
>> > (e.g. perf etc) or explained in comments if it is non-trivial.
>> > Although, it can easily enter a grey zone, so I would not demand
>> strictly
>> > performance measurement proof e.g. if the size is known and it is "per
>> > record" code.
>> > At the end of the day it is a decision of the code developer and
>> reviewer.
>> >
>> > The conclusion is then:
>> > Set the initial capacity only if there is a good proven reason to do it.
>> > Otherwise do not clutter the code with it.
>> >
>> > Best,
>> > Andrey
>> >
>> > On Thu, Aug 1, 2019 at 5:10 PM Piotr Nowojski <pi...@ververica.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > > - a bit more code, increases maintenance burden.
>> > >
>> > > I think there is even more to that. It’s almost like a code
>> duplication,
>> > > albeit expressed in very different way, with all of the drawbacks of
>> > > duplicated code: initial capacity can drift out of sync, causing
>> > confusion.
>> > > Also it’s not “a bit more code”, it might be non trivial
>> > > reasoning/calculation how to set the initial value. Whenever we change
>> > > something/refactor the code, "maintenance burden” will mostly come
>> from
>> > > that.
>> > >
>> > > Also I think this just usually falls under a premature optimisation
>> rule.
>> > >
>> > > Besides:
>> > >
>> > > > The conclusion is the following at the moment:
>> > > > Only set the initial capacity if you have a good idea about the
>> > expected
>> > > size.
>> > >
>> > > I would add a clause to set the initial capacity “only for good proven
>> > > reasons”. It’s not about whether we can set it, but whether it makes
>> > sense
>> > > to do so (to avoid the before mentioned "maintenance burden”).
>> > >
>> > > Piotrek
>> > >
>> > > > On 1 Aug 2019, at 14:41, Xintong Song <tonysong...@gmail.com>
>> wrote:
>> > > >
>> > > > +1 on setting initial capacity only when have good expectation on
>> the
>> > > > collection size.
>> > > >
>> > > > Thank you~
>> > > >
>> > > > Xintong Song
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Aug 1, 2019 at 2:32 PM Andrey Zagrebin <
>> and...@ververica.com>
>> > > wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> As you probably already noticed, Stephan has triggered a discussion
>> > > thread
>> > > >> about code style guide for Flink [1]. Recently we were discussing
>> > > >> internally some smaller concerns and I would like start separate
>> > threads
>> > > >> for them.
>> > > >>
>> > > >> This thread is about creating collections always with initial
>> > capacity.
>> > > As
>> > > >> you might have seen, some parts of our code base always initialise
>> > > >> collections with some non-default capacity. You can even activate a
>> > > check
>> > > >> in IntelliJ Idea that can monitor and highlight creation of
>> collection
>> > > >> without initial capacity.
>> > > >>
>> > > >> Pros:
>> > > >> - performance gain if there is a good reasoning about initial
>> capacity
>> > > >> - the capacity is always deterministic and does not depend on any
>> > > changes
>> > > >> of its default value in Java
>> > > >> - easy to follow: always initialise, has IDE support for detection
>> > > >>
>> > > >> Cons (for initialising w/o good reasoning):
>> > > >> - We are trying to outsmart JVM. When there is no good reasoning
>> about
>> > > >> initial capacity, we can rely on JVM default value.
>> > > >> - It is even confusing e.g. for hash maps as the real size depends
>> on
>> > > the
>> > > >> load factor.
>> > > >> - It would only add minor performance gain.
>> > > >> - a bit more code, increases maintenance burden.
>> > > >>
>> > > >> The conclusion is the following at the moment:
>> > > >> Only set the initial capacity if you have a good idea about the
>> > expected
>> > > >> size.
>> > > >>
>> > > >> Please, feel free to share you thoughts.
>> > > >>
>> > > >> Best,
>> > > >> Andrey
>> > > >>
>> > > >> [1]
>> > > >>
>> > > >>
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
>> > > >>
>> > >
>> > >
>> >
>>
>

Reply via email to