Hello, Im currently working on a data structure allowing the storage of a dynamic set of short DNA sequences plus annotations. Here are few details : the data structure is written in C, tests are currently run on Ubuntu 14.04 64 bits, everything is single threaded and Valgrind indicates that the program which manipulates the data structure has no memory leaks.
Ive started to use Jemalloc in an attempt to reduce the fragmentation of the memory (by using one arena, disabling the thread caching system and using a high ratio of dirty pages). On small data sets (30 millions insertions), results are very good in comparison of Glibc: about 150MB less by using tuned Jemalloc. Now, Ive started tests with much bigger data sets (3 to 10 billions insertions) and I realized that Jemalloc is using more memory than Glibc. I have generated a data set of 200 millions entries which I tried to insert in the data structure and when the memory used reached 1GB, I stopped the program and reported the number of entries inserted. When using Jemalloc, doesnt matter the tuning parameters (1 or 4 arenas, tcache activated or not, lg_dirty = 3 or 8 or 16, lg_chunk = 14 or 22 or 30), the number of entries inserted varies between 120 millions to 172 millions. Or by using the standard Glibc, Im able to insert 187 millions of entries. And on billions of entries, Glibc (I dont have precise numbers unfortunately) uses few Gigabytes less than Jemalloc. So I would like to know if there is an explanation for this and if I can do something to make Jemalloc at least as efficient as Glibc is on my tests ? Maybe Im not using Jemalloc correctly ? Thank you a lot for your help and your time. Have a nice day. Guillaume Holley _______________________________________________ jemalloc-discuss mailing list [email protected] http://www.canonware.com/mailman/listinfo/jemalloc-discuss
