Ivan B., Good news, thank you!
2020-07-27 10:28 GMT+03:00, Ivan Bessonov <bessonov...@gmail.com>: > Hi Ivan P., > > I configured it for both PDS (Indexing) and PDS 4 (was asked by Nikita > Tolstunov). It totally worked, not a single 137 since then. > Occasional 130 will be fixed in [1], it has a different problem behind it. > > Now I'm trying to find someone who knows TC configuration better and > will be able to propagate the setting to all suites. Also I don't have the > access to agents so "jemalloc" is definitely not an option for me > specifically. > > [1] https://issues.apache.org/jira/browse/IGNITE-13266 > > вс, 26 июл. 2020 г. в 17:36, Ivan Pavlukhin <vololo...@gmail.com>: > >> Ivan B., >> >> I noticed that you were able to configure environment variables for >> PDS (Indexing). Do field experiments show that the suggested approach >> fixes the problem? >> >> Interesting stuff with jemalloc. It might be useful to file a ticket. >> >> 2020-07-23 16:07 GMT+03:00, Ivan Daschinsky <ivanda...@gmail.com>: >> >> >> >> About "jemalloc" - it's also an option, but it also requires >> >> reconfiguring >> >> suites on >> >> TC, maybe in a more complicated way. It requires additional >> installation, >> >> right? >> >> Can we stick to the solution that I already tested or should we update >> TC >> >> agents? :) >> > >> > >> > Yes, if you want to use jemalloc, you should install it and configure a >> > specific env variable. >> > This is just an option to consider, nothing more. I suppose that your >> > approach is may be the >> > best variant right now. >> > >> > >> > чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov <bessonov...@gmail.com>: >> > >> >> > >> >> > glibc allocator uses arenas for minimize contention between threads >> >> >> >> >> >> I understand it the same way. I did testing with running of Indexing >> >> suite >> >> locally >> >> and periodically executing "pmap <pid>", it showed that the number of >> >> 64mb >> >> arenas grows constantly and never shrinks. By the middle of the suite >> the >> >> amount >> >> of virtual memory was close to 50 Gb and used physical memory was at >> >> least >> >> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it >> >> should >> >> be >> >> worse on TC. >> >> It means that there is enough contention somewhere in tests. >> >> >> >> About "jemalloc" - it's also an option, but it also requires >> >> reconfiguring >> >> suites on >> >> TC, maybe in a more complicated way. It requires additional >> installation, >> >> right? >> >> Can we stick to the solution that I already tested or should we update >> TC >> >> agents? :) >> >> >> >> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <ivanda...@gmail.com>: >> >> >> >> > AFAIK, glibc allocator uses arenas for minimize contention between >> >> threads >> >> > when they trying to access >> >> > or free preallocated bit of memory. But seems that we >> >> > use -XX:+AlwaysPreTouch, so heap is allocated >> >> > and committed at start time. We allocate memory for durable memory >> >> > in >> >> > one >> >> > thread. >> >> > So I think there will be not so much contention between threads for >> >> native >> >> > memory pools. >> >> > >> >> > Also, there is another approach -- try to use jemalloc. >> >> > This allocator shows better result than default glibc malloc in our >> >> > scenarios. (memory consumption) [1] >> >> > >> >> > [1] -- >> >> > >> >> > >> >> >> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/ >> >> > >> >> > >> >> > >> >> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <bessonov...@gmail.com>: >> >> > >> >> > > Hello Ivan, >> >> > > >> >> > > It feels like the problem is more about new starting threads >> >> > > rather >> >> than >> >> > > the >> >> > > allocation of offheap regions. Plus I'd like to see results soon, >> >> > > your >> >> > > proposal is >> >> > > a major change for Ignite that can't be implemented fast enough. >> >> > > >> >> > > Anyway, I think this makes sense, considering that one day Unsafe >> >> > > will >> >> be >> >> > > removed. But I wouldn't think about it right now, maybe as a >> separate >> >> > > proposal... >> >> > > >> >> > > >> >> > > >> >> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky >> >> > > <ivanda...@gmail.com>: >> >> > > >> >> > > > Ivan, I think that we should use mmap/munmap to allocate huge >> >> > > > chunks >> >> of >> >> > > > memory. >> >> > > > >> >> > > > I've experimented with JNA and invoke mmap/munmap with it and it >> >> works >> >> > > > fine. >> >> > > > May be we can create module (similar to direct-io) that use >> >> mmap/munap >> >> > on >> >> > > > platforms, that support them >> >> > > > and fallback to Unsafe if not? >> >> > > > >> >> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov >> >> > > > <bessonov...@gmail.com >> >: >> >> > > > >> >> > > > > Hello Igniters, >> >> > > > > >> >> > > > > I'd like to discuss the current issue with "out of memory" >> >> > > > > fails >> >> > > > > on >> >> > > > > TeamCity. Particularly suites [1] >> >> > > > > and [2], they have quite a lot of "Exit code 137" failures. >> >> > > > > >> >> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's >> >> another >> >> > > > > similar issue as well: [4]. >> >> > > > > I came to the conclusion that the main problem is inside the >> >> default >> >> > > > memory >> >> > > > > allocator (malloc). >> >> > > > > Let me explain the way I see it right now: >> >> > > > > >> >> > > > > "malloc" is allowed to allocate (for internal usages) up to 8 >> >> > > > > * >> >> > (number >> >> > > > of >> >> > > > > cores) blocks called >> >> > > > > ARENA, 64 mb each. This may happen when a program >> >> > > > > creates/stops >> >> > threads >> >> > > > > frequently and >> >> > > > > allocates a lot of memory all the time, which is exactly what >> our >> >> > tests >> >> > > > do. >> >> > > > > Given that TC agents >> >> > > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like >> the >> >> > whole >> >> > > > > amount of RAM on the >> >> > > > > single agent. >> >> > > > > >> >> > > > > The total amount of arenas can be manually lowered by setting >> >> > > > > the MALLOC_ARENA_MAX >> >> > > > > environment variable to 4 (or other small value). I tried it >> >> locally >> >> > > and >> >> > > > in >> >> > > > > PDS (Indexing) suite >> >> > > > > settings on TC, results look very promising: [5] >> >> > > > > >> >> > > > > It is said that changing this variable may lead to some >> >> > > > > performance >> >> > > > > degradation, but it's hard to tell whether we have it or not, >> >> because >> >> > > the >> >> > > > > suite usually failed before it was completed. >> >> > > > > >> >> > > > > So, I have two questions right now: >> >> > > > > >> >> > > > > - can those of you, who are into hardcore Linux and C, confirm >> >> > > > > that >> >> > the >> >> > > > > solution can help us? Experiments show that it completely >> >> > > > > solves >> >> the >> >> > > > > problem. >> >> > > > > - can you please point me to a person who usually does TC >> >> > maintenance? >> >> > > > I'm >> >> > > > > not entirely sure >> >> > > > > that I can propagate this environment variable to all suites >> >> > > > > by >> >> > myself, >> >> > > > > which is necessary to >> >> > > > > avoid occasional error 137 (resulted from the same problem) in >> >> > future. >> >> > > I >> >> > > > > just don't know all the >> >> > > > > details about suites structure. >> >> > > > > >> >> > > > > Thank you! >> >> > > > > >> >> > > > > [1] >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E >> >> > > > > [2] >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed >> >> > > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266 >> >> > > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263 >> >> > > > > [5] >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead >> >> > > > > >> >> > > > > -- >> >> > > > > Sincerely yours, >> >> > > > > Ivan Bessonov >> >> > > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > Sincerely yours, Ivan Daschinskiy >> >> > > > >> >> > > >> >> > > >> >> > > -- >> >> > > Sincerely yours, >> >> > > Ivan Bessonov >> >> > > >> >> > >> >> > >> >> > -- >> >> > Sincerely yours, Ivan Daschinskiy >> >> > >> >> >> >> >> >> -- >> >> Sincerely yours, >> >> Ivan Bessonov >> >> >> > >> > >> > -- >> > Sincerely yours, Ivan Daschinskiy >> > >> >> >> -- >> >> Best regards, >> Ivan Pavlukhin >> > > > -- > Sincerely yours, > Ivan Bessonov > -- Best regards, Ivan Pavlukhin