Ivan B.,

I noticed that you were able to configure environment variables for
PDS (Indexing). Do field experiments show that the suggested approach
fixes the problem?

Interesting stuff with jemalloc. It might be useful to file a ticket.

2020-07-23 16:07 GMT+03:00, Ivan Daschinsky <ivanda...@gmail.com>:
>>
>> About "jemalloc" - it's also an option, but it also requires
>> reconfiguring
>> suites on
>> TC, maybe in a more complicated way. It requires additional installation,
>> right?
>> Can we stick to the solution that I already tested or should we update TC
>> agents? :)
>
>
> Yes, if you want to use jemalloc, you should install it and configure a
> specific env variable.
> This is just an option to consider, nothing more. I suppose that your
> approach is may be the
> best variant right now.
>
>
> чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov <bessonov...@gmail.com>:
>
>> >
>> > glibc allocator uses arenas for minimize contention between threads
>>
>>
>> I understand it the same way. I did testing with running of Indexing
>> suite
>> locally
>> and periodically executing "pmap <pid>", it showed that the number of
>> 64mb
>> arenas grows constantly and never shrinks. By the middle of the suite the
>> amount
>> of virtual memory was close to 50 Gb and used physical memory was at
>> least
>> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it should
>> be
>> worse on TC.
>> It means that there is enough contention somewhere in tests.
>>
>> About "jemalloc" - it's also an option, but it also requires
>> reconfiguring
>> suites on
>> TC, maybe in a more complicated way. It requires additional installation,
>> right?
>> Can we stick to the solution that I already tested or should we update TC
>> agents? :)
>>
>> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <ivanda...@gmail.com>:
>>
>> > AFAIK, glibc allocator uses arenas for minimize contention between
>> threads
>> > when they trying to access
>> > or free preallocated bit of memory. But seems that we
>> > use -XX:+AlwaysPreTouch, so heap is allocated
>> > and committed at start time. We allocate memory for durable memory in
>> > one
>> > thread.
>> > So I think there will be not so much contention between threads for
>> native
>> > memory pools.
>> >
>> > Also, there is another approach -- try to use jemalloc.
>> > This allocator shows better result than default glibc malloc in our
>> > scenarios. (memory consumption) [1]
>> >
>> > [1] --
>> >
>> >
>> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
>> >
>> >
>> >
>> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <bessonov...@gmail.com>:
>> >
>> > > Hello Ivan,
>> > >
>> > > It feels like the problem is more about new starting threads rather
>> than
>> > > the
>> > > allocation of offheap regions. Plus I'd like to see results soon,
>> > > your
>> > > proposal is
>> > > a major change for Ignite that can't be implemented fast enough.
>> > >
>> > > Anyway, I think this makes sense, considering that one day Unsafe
>> > > will
>> be
>> > > removed. But I wouldn't think about it right now, maybe as a separate
>> > > proposal...
>> > >
>> > >
>> > >
>> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <ivanda...@gmail.com>:
>> > >
>> > > > Ivan, I think that we should use mmap/munmap to allocate huge
>> > > > chunks
>> of
>> > > > memory.
>> > > >
>> > > > I've experimented with JNA and invoke mmap/munmap with it and it
>> works
>> > > > fine.
>> > > > May be we can create module (similar to direct-io) that use
>> mmap/munap
>> > on
>> > > > platforms, that support them
>> > > > and fallback to Unsafe if not?
>> > > >
>> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <bessonov...@gmail.com>:
>> > > >
>> > > > > Hello Igniters,
>> > > > >
>> > > > > I'd like to discuss the current issue with "out of memory" fails
>> > > > > on
>> > > > > TeamCity. Particularly suites [1]
>> > > > > and [2], they have quite a lot of "Exit code 137" failures.
>> > > > >
>> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's
>> another
>> > > > > similar issue as well: [4].
>> > > > > I came to the conclusion that the main problem is inside the
>> default
>> > > > memory
>> > > > > allocator (malloc).
>> > > > > Let me explain the way I see it right now:
>> > > > >
>> > > > > "malloc" is allowed to allocate (for internal usages) up to 8 *
>> > (number
>> > > > of
>> > > > > cores) blocks called
>> > > > > ARENA, 64 mb each. This may happen when a program creates/stops
>> > threads
>> > > > > frequently and
>> > > > > allocates a lot of memory all the time, which is exactly what our
>> > tests
>> > > > do.
>> > > > > Given that TC agents
>> > > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the
>> > whole
>> > > > > amount of RAM on the
>> > > > > single agent.
>> > > > >
>> > > > > The total amount of arenas can be manually lowered by setting
>> > > > > the MALLOC_ARENA_MAX
>> > > > > environment variable to 4 (or other small value). I tried it
>> locally
>> > > and
>> > > > in
>> > > > > PDS (Indexing) suite
>> > > > > settings on TC, results look very promising: [5]
>> > > > >
>> > > > > It is said that changing this variable may lead to some
>> > > > > performance
>> > > > > degradation, but it's hard to tell whether we have it or not,
>> because
>> > > the
>> > > > > suite usually failed before it was completed.
>> > > > >
>> > > > > So, I have two questions right now:
>> > > > >
>> > > > > - can those of you, who are into hardcore Linux and C, confirm
>> > > > > that
>> > the
>> > > > > solution can help us? Experiments show that it completely solves
>> the
>> > > > > problem.
>> > > > > - can you please point me to a person who usually does TC
>> > maintenance?
>> > > > I'm
>> > > > > not entirely sure
>> > > > > that I can propagate this environment variable to all suites by
>> > myself,
>> > > > > which is necessary to
>> > > > > avoid occasional error 137 (resulted from the same problem) in
>> > future.
>> > > I
>> > > > > just don't know all the
>> > > > > details about suites structure.
>> > > > >
>> > > > > Thank you!
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
>> > > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
>> > > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
>> > > > > [5]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
>> > > > >
>> > > > > --
>> > > > > Sincerely yours,
>> > > > > Ivan Bessonov
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Sincerely yours, Ivan Daschinskiy
>> > > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours,
>> > > Ivan Bessonov
>> > >
>> >
>> >
>> > --
>> > Sincerely yours, Ivan Daschinskiy
>> >
>>
>>
>> --
>> Sincerely yours,
>> Ivan Bessonov
>>
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


-- 

Best regards,
Ivan Pavlukhin

Reply via email to