Ivan B.,

Good news, thank you!

2020-07-27 10:28 GMT+03:00, Ivan Bessonov <bessonov...@gmail.com>:
> Hi Ivan P.,
>
> I configured it for both PDS (Indexing) and PDS 4 (was asked by Nikita
> Tolstunov). It totally worked, not a single 137 since then.
> Occasional 130 will be fixed in [1], it has a different problem behind it.
>
> Now I'm trying to find someone who knows TC configuration better and
> will be able to propagate the setting to all suites. Also I don't have the
> access to agents so "jemalloc" is definitely not an option for me
> specifically.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13266
>
> вс, 26 июл. 2020 г. в 17:36, Ivan Pavlukhin <vololo...@gmail.com>:
>
>> Ivan B.,
>>
>> I noticed that you were able to configure environment variables for
>> PDS (Indexing). Do field experiments show that the suggested approach
>> fixes the problem?
>>
>> Interesting stuff with jemalloc. It might be useful to file a ticket.
>>
>> 2020-07-23 16:07 GMT+03:00, Ivan Daschinsky <ivanda...@gmail.com>:
>> >>
>> >> About "jemalloc" - it's also an option, but it also requires
>> >> reconfiguring
>> >> suites on
>> >> TC, maybe in a more complicated way. It requires additional
>> installation,
>> >> right?
>> >> Can we stick to the solution that I already tested or should we update
>> TC
>> >> agents? :)
>> >
>> >
>> > Yes, if you want to use jemalloc, you should install it and configure a
>> > specific env variable.
>> > This is just an option to consider, nothing more. I suppose that your
>> > approach is may be the
>> > best variant right now.
>> >
>> >
>> > чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov <bessonov...@gmail.com>:
>> >
>> >> >
>> >> > glibc allocator uses arenas for minimize contention between threads
>> >>
>> >>
>> >> I understand it the same way. I did testing with running of Indexing
>> >> suite
>> >> locally
>> >> and periodically executing "pmap <pid>", it showed that the number of
>> >> 64mb
>> >> arenas grows constantly and never shrinks. By the middle of the suite
>> the
>> >> amount
>> >> of virtual memory was close to 50 Gb and used physical memory was at
>> >> least
>> >> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it
>> >> should
>> >> be
>> >> worse on TC.
>> >> It means that there is enough contention somewhere in tests.
>> >>
>> >> About "jemalloc" - it's also an option, but it also requires
>> >> reconfiguring
>> >> suites on
>> >> TC, maybe in a more complicated way. It requires additional
>> installation,
>> >> right?
>> >> Can we stick to the solution that I already tested or should we update
>> TC
>> >> agents? :)
>> >>
>> >> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <ivanda...@gmail.com>:
>> >>
>> >> > AFAIK, glibc allocator uses arenas for minimize contention between
>> >> threads
>> >> > when they trying to access
>> >> > or free preallocated bit of memory. But seems that we
>> >> > use -XX:+AlwaysPreTouch, so heap is allocated
>> >> > and committed at start time. We allocate memory for durable memory
>> >> > in
>> >> > one
>> >> > thread.
>> >> > So I think there will be not so much contention between threads for
>> >> native
>> >> > memory pools.
>> >> >
>> >> > Also, there is another approach -- try to use jemalloc.
>> >> > This allocator shows better result than default glibc malloc in our
>> >> > scenarios. (memory consumption) [1]
>> >> >
>> >> > [1] --
>> >> >
>> >> >
>> >>
>> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
>> >> >
>> >> >
>> >> >
>> >> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <bessonov...@gmail.com>:
>> >> >
>> >> > > Hello Ivan,
>> >> > >
>> >> > > It feels like the problem is more about new starting threads
>> >> > > rather
>> >> than
>> >> > > the
>> >> > > allocation of offheap regions. Plus I'd like to see results soon,
>> >> > > your
>> >> > > proposal is
>> >> > > a major change for Ignite that can't be implemented fast enough.
>> >> > >
>> >> > > Anyway, I think this makes sense, considering that one day Unsafe
>> >> > > will
>> >> be
>> >> > > removed. But I wouldn't think about it right now, maybe as a
>> separate
>> >> > > proposal...
>> >> > >
>> >> > >
>> >> > >
>> >> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky
>> >> > > <ivanda...@gmail.com>:
>> >> > >
>> >> > > > Ivan, I think that we should use mmap/munmap to allocate huge
>> >> > > > chunks
>> >> of
>> >> > > > memory.
>> >> > > >
>> >> > > > I've experimented with JNA and invoke mmap/munmap with it and it
>> >> works
>> >> > > > fine.
>> >> > > > May be we can create module (similar to direct-io) that use
>> >> mmap/munap
>> >> > on
>> >> > > > platforms, that support them
>> >> > > > and fallback to Unsafe if not?
>> >> > > >
>> >> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov
>> >> > > > <bessonov...@gmail.com
>> >:
>> >> > > >
>> >> > > > > Hello Igniters,
>> >> > > > >
>> >> > > > > I'd like to discuss the current issue with "out of memory"
>> >> > > > > fails
>> >> > > > > on
>> >> > > > > TeamCity. Particularly suites [1]
>> >> > > > > and [2], they have quite a lot of "Exit code 137" failures.
>> >> > > > >
>> >> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's
>> >> another
>> >> > > > > similar issue as well: [4].
>> >> > > > > I came to the conclusion that the main problem is inside the
>> >> default
>> >> > > > memory
>> >> > > > > allocator (malloc).
>> >> > > > > Let me explain the way I see it right now:
>> >> > > > >
>> >> > > > > "malloc" is allowed to allocate (for internal usages) up to 8
>> >> > > > > *
>> >> > (number
>> >> > > > of
>> >> > > > > cores) blocks called
>> >> > > > > ARENA, 64 mb each. This may happen when a program
>> >> > > > > creates/stops
>> >> > threads
>> >> > > > > frequently and
>> >> > > > > allocates a lot of memory all the time, which is exactly what
>> our
>> >> > tests
>> >> > > > do.
>> >> > > > > Given that TC agents
>> >> > > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like
>> the
>> >> > whole
>> >> > > > > amount of RAM on the
>> >> > > > > single agent.
>> >> > > > >
>> >> > > > > The total amount of arenas can be manually lowered by setting
>> >> > > > > the MALLOC_ARENA_MAX
>> >> > > > > environment variable to 4 (or other small value). I tried it
>> >> locally
>> >> > > and
>> >> > > > in
>> >> > > > > PDS (Indexing) suite
>> >> > > > > settings on TC, results look very promising: [5]
>> >> > > > >
>> >> > > > > It is said that changing this variable may lead to some
>> >> > > > > performance
>> >> > > > > degradation, but it's hard to tell whether we have it or not,
>> >> because
>> >> > > the
>> >> > > > > suite usually failed before it was completed.
>> >> > > > >
>> >> > > > > So, I have two questions right now:
>> >> > > > >
>> >> > > > > - can those of you, who are into hardcore Linux and C, confirm
>> >> > > > > that
>> >> > the
>> >> > > > > solution can help us? Experiments show that it completely
>> >> > > > > solves
>> >> the
>> >> > > > > problem.
>> >> > > > > - can you please point me to a person who usually does TC
>> >> > maintenance?
>> >> > > > I'm
>> >> > > > > not entirely sure
>> >> > > > > that I can propagate this environment variable to all suites
>> >> > > > > by
>> >> > myself,
>> >> > > > > which is necessary to
>> >> > > > > avoid occasional error 137 (resulted from the same problem) in
>> >> > future.
>> >> > > I
>> >> > > > > just don't know all the
>> >> > > > > details about suites structure.
>> >> > > > >
>> >> > > > > Thank you!
>> >> > > > >
>> >> > > > > [1]
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
>> >> > > > > [2]
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
>> >> > > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
>> >> > > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
>> >> > > > > [5]
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
>> >> > > > >
>> >> > > > > --
>> >> > > > > Sincerely yours,
>> >> > > > > Ivan Bessonov
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > Sincerely yours, Ivan Daschinskiy
>> >> > > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Sincerely yours,
>> >> > > Ivan Bessonov
>> >> > >
>> >> >
>> >> >
>> >> > --
>> >> > Sincerely yours, Ivan Daschinskiy
>> >> >
>> >>
>> >>
>> >> --
>> >> Sincerely yours,
>> >> Ivan Bessonov
>> >>
>> >
>> >
>> > --
>> > Sincerely yours, Ivan Daschinskiy
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


-- 

Best regards,
Ivan Pavlukhin

Reply via email to