FYI the PR: https://github.com/apache/flink-web/pull/249 A review from the discussion participants would be appreciated.
On Tue, Aug 20, 2019 at 5:29 PM Andrey Zagrebin <and...@ververica.com> wrote: > I created an umbrella issue for the code style guide effort and a subtask > for this discussion: > https://issues.apache.org/jira/browse/FLINK-13804 > I will also submit a PR to flink-web based on the conclusion. > > On Mon, Aug 19, 2019 at 6:15 PM Stephan Ewen <se...@apache.org> wrote: > >> @Andrey Will you open a PR to add this to the code style? >> >> On Mon, Aug 19, 2019 at 11:51 AM Andrey Zagrebin <and...@ververica.com> >> wrote: >> >> > Hi All, >> > >> > It looks like this proposal has an approval and we can conclude this >> > discussion. >> > Additionally, I agree with Piotr we should really force the proven good >> > reasoning for setting the capacity to avoid confusion, redundancy and >> other >> > already mentioned things while reading and maintaining the code. >> > Ideally the need of setting the capacity should be either immediately >> clear >> > (e.g. perf etc) or explained in comments if it is non-trivial. >> > Although, it can easily enter a grey zone, so I would not demand >> strictly >> > performance measurement proof e.g. if the size is known and it is "per >> > record" code. >> > At the end of the day it is a decision of the code developer and >> reviewer. >> > >> > The conclusion is then: >> > Set the initial capacity only if there is a good proven reason to do it. >> > Otherwise do not clutter the code with it. >> > >> > Best, >> > Andrey >> > >> > On Thu, Aug 1, 2019 at 5:10 PM Piotr Nowojski <pi...@ververica.com> >> wrote: >> > >> > > Hi, >> > > >> > > > - a bit more code, increases maintenance burden. >> > > >> > > I think there is even more to that. It’s almost like a code >> duplication, >> > > albeit expressed in very different way, with all of the drawbacks of >> > > duplicated code: initial capacity can drift out of sync, causing >> > confusion. >> > > Also it’s not “a bit more code”, it might be non trivial >> > > reasoning/calculation how to set the initial value. Whenever we change >> > > something/refactor the code, "maintenance burden” will mostly come >> from >> > > that. >> > > >> > > Also I think this just usually falls under a premature optimisation >> rule. >> > > >> > > Besides: >> > > >> > > > The conclusion is the following at the moment: >> > > > Only set the initial capacity if you have a good idea about the >> > expected >> > > size. >> > > >> > > I would add a clause to set the initial capacity “only for good proven >> > > reasons”. It’s not about whether we can set it, but whether it makes >> > sense >> > > to do so (to avoid the before mentioned "maintenance burden”). >> > > >> > > Piotrek >> > > >> > > > On 1 Aug 2019, at 14:41, Xintong Song <tonysong...@gmail.com> >> wrote: >> > > > >> > > > +1 on setting initial capacity only when have good expectation on >> the >> > > > collection size. >> > > > >> > > > Thank you~ >> > > > >> > > > Xintong Song >> > > > >> > > > >> > > > >> > > > On Thu, Aug 1, 2019 at 2:32 PM Andrey Zagrebin < >> and...@ververica.com> >> > > wrote: >> > > > >> > > >> Hi all, >> > > >> >> > > >> As you probably already noticed, Stephan has triggered a discussion >> > > thread >> > > >> about code style guide for Flink [1]. Recently we were discussing >> > > >> internally some smaller concerns and I would like start separate >> > threads >> > > >> for them. >> > > >> >> > > >> This thread is about creating collections always with initial >> > capacity. >> > > As >> > > >> you might have seen, some parts of our code base always initialise >> > > >> collections with some non-default capacity. You can even activate a >> > > check >> > > >> in IntelliJ Idea that can monitor and highlight creation of >> collection >> > > >> without initial capacity. >> > > >> >> > > >> Pros: >> > > >> - performance gain if there is a good reasoning about initial >> capacity >> > > >> - the capacity is always deterministic and does not depend on any >> > > changes >> > > >> of its default value in Java >> > > >> - easy to follow: always initialise, has IDE support for detection >> > > >> >> > > >> Cons (for initialising w/o good reasoning): >> > > >> - We are trying to outsmart JVM. When there is no good reasoning >> about >> > > >> initial capacity, we can rely on JVM default value. >> > > >> - It is even confusing e.g. for hash maps as the real size depends >> on >> > > the >> > > >> load factor. >> > > >> - It would only add minor performance gain. >> > > >> - a bit more code, increases maintenance burden. >> > > >> >> > > >> The conclusion is the following at the moment: >> > > >> Only set the initial capacity if you have a good idea about the >> > expected >> > > >> size. >> > > >> >> > > >> Please, feel free to share you thoughts. >> > > >> >> > > >> Best, >> > > >> Andrey >> > > >> >> > > >> [1] >> > > >> >> > > >> >> > > >> > >> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E >> > > >> >> > > >> > > >> > >> >