Thank you all for the well discussion. If there's no further concerns or objections, I would like to conclude this thread into the following action items.
- Change default value of "taskmanager.memory.jvm-overhead.min" to 192MB. - Change default value of "taskmanager.memory.jvm-metaspace.size" to 96MB. - Change the value of "taskmanager.memory.process.size" in the default "flink-conf.yaml" to 1568MB. - Relax JVM overhead sanity check, so that the fraction does not need to be strictly followed, as long as the min/max range is respected. Thank you~ Xintong Song On Wed, Jan 15, 2020 at 5:50 PM Xintong Song <tonysong...@gmail.com> wrote: > There's more idea from offline discussion with Andrey. > > If we decide to make metaspace 96MB, we can also make process.size 1568MB > (1.5G + 32MB). > According to the spreadsheet > <https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0>, > 1.5GB process size and 64MB metaspace result in memory sizes with the > values to be powers of 2. > When increasing the metaspace from 64MB to 96MB, it would be good to > preserve that alignment, for better readability that later we explain the > memory configuration and calculations in documents. > I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for > memory consumption. > > Thank you~ > > Xintong Song > > > > On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <tonysong...@gmail.com> > wrote: > >> Thanks for the discussion, Stephan, Till and Andrey. >> >> +1 for the managed fraction (0.4) and process.size (1.5G). >> >> *JVM overhead min 196 -> 192Mb (128 + 64)* >>> small correction for better power 2 alignment of sizes >>> >> Sorry, this was a typo (and the same for the jira comment which is >> copy-pasted). It was 192mb used in the tuning report. >> >> *meta space at least 96Mb?* >>> There is still a concern about JVM metaspace being just 64Mb. >>> We should confirm that it is not a problem by trying to test it also with >>> the SQL jobs, Blink planner. >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically, >>> where >>> more classes are generated/loaded. >>> We can look into this tomorrow. >>> >> I have already tried the setting metaspace to 64Mb with the e2e tests, >> where I believe various sql / blink / tpc-ds test cases are included. (See >> https://travis-ci.com/flink-ci/flink/builds/142970113 ) >> However, I'm also ok with 96Mb, since we are increasing the process.size >> to 1.5G. >> My original concern for having larger metaspace size was that we may >> result in too small flink.size for out-of-box configuration on >> containerized setups. >> >> *sanity check of JVM overhead* >>> When the explicitly configured process and flink memory sizes are >>> verified >>> with the JVM meta space and overhead, >>> JVM overhead does not have to be the exact fraction. >>> It can be just within its min/max range, similar to how it is now for >>> network/shuffle memory check after FLINK-15300. >>> >> Also +1 for this. >> >> Thank you~ >> >> Xintong Song >> >> >> >> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <azagre...@apache.org> >> wrote: >> >>> Hi all, >>> >>> Stephan, Till and me had another offline discussion today. Here is the >>> outcome of our brainstorm. >>> >>> *managed fraction 0.4* >>> just confirmed what we already discussed here. >>> >>> *process.size = 1536Mb (1,5Gb)* >>> We agreed to have process.size in the default settings with the >>> explanation >>> of flink.size alternative in the comment. >>> The suggestion is to increase it from 1024 to 1536mb. As you can see in >>> the >>> earlier provided calculation spreadsheet, >>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for >>> all >>> new setups. >>> This should provide good enough experience for trying out Flink. >>> >>> *JVM overhead min 196 -> 192Mb (128 + 64)* >>> small correction for better power 2 alignment of sizes >>> >>> *meta space at least 96Mb?* >>> There is still a concern about JVM metaspace being just 64Mb. >>> We should confirm that it is not a problem by trying to test it also with >>> the SQL jobs, Blink planner. >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically, >>> where >>> more classes are generated/loaded. >>> We can look into this tomorrow. >>> >>> *sanity check of JVM overhead* >>> When the explicitly configured process and flink memory sizes are >>> verified >>> with the JVM meta space and overhead, >>> JVM overhead does not have to be the exact fraction. >>> It can be just within its min/max range, similar to how it is now for >>> network/shuffle memory check after FLINK-15300. >>> >>> Best,Andrey >>> >>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <se...@apache.org> wrote: >>> >>> > I like the idea of having a larger default "flink.size" in the >>> config.yaml. >>> > Maybe we don't need to double it, but something like 1280m would be >>> okay? >>> > >>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <azagre...@apache.org> >>> > wrote: >>> > >>> > > Hi all! >>> > > >>> > > Great that we have already tried out new FLIP-49 with the bigger >>> jobs. >>> > > >>> > > I am also +1 for the JVM metaspace and overhead changes. >>> > > >>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed >>> > memory >>> > > for Rocksdb limiting case. >>> > > >>> > > In general, this looks mostly to be about memory distribution >>> between JVM >>> > > heap and managed off-heap. >>> > > Comparing to the previous default setup, the JVM heap dropped >>> (especially >>> > > for standalone) mostly due to moving managed from heap to off-heap >>> and >>> > then >>> > > also adding framework off-heap. >>> > > In general, this can be the most important consequence for beginners >>> and >>> > > those who rely on the default configuration. >>> > > Especially the legacy default configuration in standalone with >>> falling >>> > back >>> > > heap.size to flink.size but there it seems we cannot do too much now. >>> > > >>> > > I prepared a spreadsheet >>> > > < >>> > > >>> > >>> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE >>> > > > >>> > > to play with numbers for the mentioned in the report setups. >>> > > >>> > > One idea would be to set process size (or smaller flink size >>> > respectively) >>> > > to a bigger default number, like 2048. >>> > > In this case, the abs derived default JVM heap and managed memory are >>> > close >>> > > to the previous defaults, especially for managed fraction 0.3. >>> > > This should align the defaults with the previous standalone try-out >>> > > experience where the increased off-heap memory is not strictly >>> controlled >>> > > by the environment anyways. >>> > > The consequence for container users who relied on and updated the >>> default >>> > > configuration is that the containers will be requested with the >>> double >>> > > size. >>> > > >>> > > Best, >>> > > Andrey >>> > > >>> > > >>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <trohrm...@apache.org >>> > >>> > > wrote: >>> > > >>> > > > +1 for the JVM metaspace and overhead changes. >>> > > > >>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann < >>> trohrm...@apache.org> >>> > > > wrote: >>> > > > >>> > > >> I guess one of the most important results of this experiment is to >>> > have >>> > > a >>> > > >> good tuning guide available for users who are past the initial >>> try-out >>> > > >> phase because the default settings will be kind of a compromise. I >>> > > assume >>> > > >> that this is part of the outstanding FLIP-49 documentation task. >>> > > >> >>> > > >> If we limit RocksDB's memory consumption by default, then I >>> believe >>> > that >>> > > >> 0.4 would give the better all-round experience as it leaves a bit >>> more >>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we should >>> > > optimize >>> > > >> the default settings for a configuration where the user still >>> needs to >>> > > >> activate the strict memory limiting for RocksDB. In this case, I >>> would >>> > > >> expect that the user could also adapt the managed memory fraction. >>> > > >> >>> > > >> Cheers, >>> > > >> Till >>> > > >> >>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song < >>> tonysong...@gmail.com> >>> > > >> wrote: >>> > > >> >>> > > >>> Thanks for the feedback, Stephan and Kurt. >>> > > >>> >>> > > >>> @Stephan >>> > > >>> >>> > > >>> Regarding managed memory fraction, >>> > > >>> - It makes sense to keep the default value 0.4, if we assume >>> rocksdb >>> > > >>> memory is limited by default. >>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory >>> > usage. >>> > > >>> And I'm positive to change it. >>> > > >>> - Personally, I don't like the idea that we the out-of-box >>> experience >>> > > >>> (for which we set the default fraction) relies on that users will >>> > > manually >>> > > >>> turn another switch on. >>> > > >>> >>> > > >>> Regarding framework heap memory, >>> > > >>> - The major reason we set it by default is, as you mentioned, >>> that to >>> > > >>> have a safe net of minimal JVM heap size. >>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot >>> > allocation), >>> > > >>> we want to reserve some heap memory that will not go into the >>> slot >>> > > >>> profiles. That's why we decide the default value according to the >>> > heap >>> > > >>> memory usage of an empty task executor. >>> > > >>> >>> > > >>> @Kurt >>> > > >>> Regarding metaspace, >>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only >>> takes >>> > > >>> effect on TMs. Currently we do not set metaspace size for JM. >>> > > >>> - If we have the same metaspace problem on TMs, then yes, >>> changing it >>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds >>> > benchmark >>> > > >>> should not be considered as out-of-box experience and it makes >>> sense >>> > to >>> > > >>> tune the configurations for it. I think the smaller metaspace >>> size >>> > > would be >>> > > >>> a better choice for the first trying-out, where a job should not >>> be >>> > too >>> > > >>> complicated, the TM size could be relative small (e.g. 1g). >>> > > >>> >>> > > >>> Thank you~ >>> > > >>> >>> > > >>> Xintong Song >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <ykt...@gmail.com> >>> wrote: >>> > > >>> >>> > > >>>> HI Xingtong, >>> > > >>>> >>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's >>> > > >>>> metaspace size and full gc which >>> > > >>>> caused by lots of classloadings of source input split. Could you >>> > check >>> > > >>>> whether changing the default >>> > > >>>> value from 128MB to 64MB will make it worse? >>> > > >>>> >>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong >>> > > >>>> >>> > > >>>> Best, >>> > > >>>> Kurt >>> > > >>>> >>> > > >>>> >>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <se...@apache.org> >>> > > wrote: >>> > > >>>> >>> > > >>>>> Hi all! >>> > > >>>>> >>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on >>> your >>> > > >>>>> analysis, >>> > > >>>>> here are some thoughts: >>> > > >>>>> >>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB >>> > > >>>>> +1 to change default JVM overhead min size from 128MB to 196MB >>> > > >>>>> >>> > > >>>>> Concerning the managed memory fraction, I am not sure I would >>> > change >>> > > >>>>> it, >>> > > >>>>> for the following reasons: >>> > > >>>>> >>> > > >>>>> - We should assume RocksDB will be limited to managed memory >>> by >>> > > >>>>> default. >>> > > >>>>> This will either be active by default or we would encourage >>> > everyone >>> > > >>>>> to use >>> > > >>>>> this by default, because otherwise it is super hard to reason >>> about >>> > > the >>> > > >>>>> RocksDB footprint. >>> > > >>>>> - For standalone, a managed memory fraction of 0.3 is less >>> than >>> > > half >>> > > >>>>> of >>> > > >>>>> the managed memory from 1.9. >>> > > >>>>> - I am not sure if the managed memory fraction is a value >>> that >>> > all >>> > > >>>>> users >>> > > >>>>> adjust immediately when scaling up the memory during their >>> first >>> > > >>>>> try-out >>> > > >>>>> phase. I would assume that most users initially only adjust >>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3 >>> will >>> > > lead >>> > > >>>>> to >>> > > >>>>> having too large heaps and very little RocksDB / batch memory >>> even >>> > > when >>> > > >>>>> scaling up during the initial exploration. >>> > > >>>>> - I agree, though, that 0.5 looks too aggressive, from your >>> > > >>>>> benchmarks. >>> > > >>>>> So maybe keeping it at 0.4 could work? >>> > > >>>>> >>> > > >>>>> And one question: Why do we set the Framework Heap by default? >>> Is >>> > > that >>> > > >>>>> so >>> > > >>>>> we reduce the managed memory further is less than framework >>> heap >>> > > would >>> > > >>>>> be >>> > > >>>>> left from the JVM heap? >>> > > >>>>> >>> > > >>>>> Best, >>> > > >>>>> Stephan >>> > > >>>>> >>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song < >>> > tonysong...@gmail.com> >>> > > >>>>> wrote: >>> > > >>>>> >>> > > >>>>> > Hi all, >>> > > >>>>> > >>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the >>> default >>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases. >>> > > >>>>> > >>> > > >>>>> > After spending time analyzing and tuning the configurations, >>> I've >>> > > >>>>> come >>> > > >>>>> > with several findings. To be brief, I would suggest the >>> following >>> > > >>>>> changes, >>> > > >>>>> > and for more details please take a look at my tuning report >>> [2]. >>> > > >>>>> > >>> > > >>>>> > - Change default managed memory fraction from 0.4 to 0.3. >>> > > >>>>> > - Change default JVM metaspace size from 128MB to 64MB. >>> > > >>>>> > - Change default JVM overhead min size from 128MB to >>> 196MB. >>> > > >>>>> > >>> > > >>>>> > Looking forward to your feedback. >>> > > >>>>> > >>> > > >>>>> > Thank you~ >>> > > >>>>> > >>> > > >>>>> > Xintong Song >>> > > >>>>> > >>> > > >>>>> > >>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145 >>> > > >>>>> > >>> > > >>>>> > [2] >>> > > >>>>> > >>> > > >>>>> >>> > > >>> > >>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing >>> > > >>>>> > >>> > > >>>>> > >>> > > >>>>> >>> > > >>>> >>> > > >>> > >>> >>