Re: [PATCH v1 0/5] Allocate cache entries from memory pool
On Mon, Apr 23, 2018 at 9:44 AM, Jameson Miller wrote: > I would be interested to understand how the > mem_pool would fit your needs, and if it is sufficient or needs modification > for your use cases. > >> [1] proof of concept in patches nearby >> https://public-inbox.org/git/20180206001749.218943-31-sbel...@google.com/ >> Currenlty the parsed objects are loaded into memory and never freed. See alloc.c which implements a specialized memory allocator for this object loading. When working with submodules, their objects are also just put into this globally-namespaced object store. (See struct object **obj_hash in object.c) I want to make the object store a per-repository object, such that when working with submodules, you can free up all submodule objects once you are done with a given submodule. To do so, the memory allocation needs to manage the whole life cycle, while preserving the efficiency of alloc.c. See 855419f764 (Add specialized object allocator, 2006-06-19). The mem-pool that you propose can allocate large slabs of memory and we can put objects in there without alignment overhead, such that we preserve the memory efficiency while being able to track all the memory. So I would think it is sufficient as-is with this series, maybe we need a little tweaking there, but nothing large IMHO. Thanks, Stefan
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
On 04/20/2018 07:34 PM, Jonathan Tan wrote: On Tue, 17 Apr 2018 16:34:39 + Jameson Miller wrote: Jameson Miller (5): read-cache: teach refresh_cache_entry to take istate Add an API creating / discarding cache_entry structs mem-pool: fill out functionality Allocate cache entries from memory pools Add optional memory validations around cache_entry lifecyle In this patch set, there is no enforcement that the cache entry created by make_index_cache_entry() goes into the correct index when add_index_entry() is invoked. (Junio described similar things, I believe, in [1].) This might be an issue when we bring up and drop multiple indexes, and dropping one index causes a cache entry in another to become invalidated. Correct - it is up to the caller here to coordinate this. The code should be set up so this is not a problem here. In the case of a split-index, the cache entries should be allocated from the memory pool associated with the "most common" / base index. If you found a place I missed or seems questionable, or have suggestions, I would be glad to look into it. One solution is to store the index for which the cache entry was created in the cache entry itself, but that does increase its size. Another is Yes, this is an option. For this initial patch series, I decided to not add extra fields to the cache_entry type, but I think incorporating this in cache_entry is a viable option, and has some positive properties. to change the API such that a cache entry is created and added in the same function, and then have some rollback if the cache entry turns out to be invalid (to support add-empty-entry -> fill -> verify), but I don't know if this is feasible. Anyway, all these alternatives should be at least discussed in the commit message, I think. I can include a discussion of these in the commit message. Thanks. The make_transient_cache_entry() function might be poorly named, since as far as I can tell, the entries produced by that function are actually the longest lasting, since they are never freed. They should always be freed (and are usually freed close to where they are allocated, or by the calling function). If you see an instance where this is not the case, please point it out, because that is not the intention. Along those lines, I was slightly surprised to find out in patch 4 that cache entry freeing is a no-op. That's fine, but in that case, it would be better to delete all the calls to the "discard" function, and document in the others that the entries they create will only be freed when the memory pool itself is discarded. I can add a comment inside the function body. In the next commit, I do add logic to perform extra (optional) verification in the discard function. I did wrestle with this fact, but I feel there is value in having the locations where it is appropriate to free these entries in code, even if this particular implementation is not utilizing it right now. Hopefully the verification logic added in 5/5 is enough to justify keeping this function around. [1] https://public-inbox.org/git/xmqqwox5i0f7@gitster-ct.c.googlers.com/
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
On 04/20/2018 01:49 PM, Stefan Beller wrote: base-commit: cafaccae98f749ebf33495aec42ea25060de8682 I couldn't quite figure out what these five patches were based on, even with this line. Basing on and referring to a commit that is not part of our published history with "base-commit" is not all that useful to others. I'd like to second this. In the object store refactoring, I am at a point where I'd want to migrate the memory management of {object, tree, commit, tag}.c which currently is done in alloc.c to a memory pool, that has a dedicated pointer to it. So I'd either have to refactor alloc.c to take the_repository[1] or I'd play around with the mem_pool to manage memory in the object layer. I guess this playing around can happen with what is at origin/jm/mem-pool, however the life cycle management part of the third patch[2] would allow for stopping memleaks there. So I am interested in this series as well. Sorry about the confusion here. This patch series can be applied to the next branch. They apply cleanly on [3]. The lifecycle functions are re-introduced in [4], which we could incorporate sooner if useful. I didn't have a consumer for the calls in the original patch series, and so deferred them until there was a caller. I would be interested to understand how the mem_pool would fit your needs, and if it is sufficient or needs modification for your use cases. [1] proof of concept in patches nearby https://public-inbox.org/git/20180206001749.218943-31-sbel...@google.com/ [2] https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/ Thanks, Stefan [3] b46fe60e1d ("merge-fix/ps/test-chmtime-get", 2018-04-20) [4] https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
On 04/18/2018 12:49 AM, Junio C Hamano wrote: Jameson Miller writes: This patch series improves the performance of loading indexes by reducing the number of malloc() calls. ... Jameson Miller (5): read-cache: teach refresh_cache_entry to take istate Add an API creating / discarding cache_entry structs mem-pool: fill out functionality Allocate cache entries from memory pools Add optional memory validations around cache_entry lifecyle apply.c| 26 +++--- blame.c| 5 +- builtin/checkout.c | 8 +- builtin/difftool.c | 8 +- builtin/reset.c| 6 +- builtin/update-index.c | 26 +++--- cache.h| 40 - git.c | 3 + mem-pool.c | 136 - mem-pool.h | 34 merge-recursive.c | 4 +- read-cache.c | 229 +++-- resolve-undo.c | 6 +- split-index.c | 31 +-- tree.c | 4 +- unpack-trees.c | 27 +++--- 16 files changed, 476 insertions(+), 117 deletions(-) base-commit: cafaccae98f749ebf33495aec42ea25060de8682 I couldn't quite figure out what these five patches were based on, even with this line. Basing on and referring to a commit that is not part of our published history with "base-commit" is not all that useful to others. My apologies - this patch series should be applied to the 'next' branch. It applies cleanly on top of b46fe60e1d ("merge-fix/ps/test-chmtime-get", 2018-04-20), which is a commit in the 'next' branch. Offhand without applying these patches and viewing the changes in wider contexts, one thing that makes me wonder is how the two allocation schemes can be used with two implementations of free(). Upon add_index_entry() that replaces an index entry for an existing path, we'd discard an entry that was originally read as part of read_cache(). If we do that again, the second add_index_entry() will be discading, in its call to replace_index_entry(), the entry that was allocated by the caller of the first add_index_entry() call. And replace_index_entry() would not have a way to know from which allocation the entry's memory came from. Perhaps it is not how you are using the "transient" stuff, and from the comment in 2/5, it is for "entries that are not meant to go into the index", but then higher stage index entries in unpack_trees seem to be allocated via the "transient" stuff, so I am not sure what the plans are for things like merge_recursive() that uses unpack_trees() to prepare the index and then later "fix it up" by further futzing around the index to adjust for renames it finds, etc. Good points. The intention with this logic is that any entries that *could* go into an index are allocated from the memory pool. The "transient" entries only exist for a short period of time. These have a defined lifetime and we can always trace the corresponding "free" call. make_transient_cache_entry() is only used to construct a temporary cache_entry to pass to the checkout_entry() / write_entry(). There is a note in checkout.c indicating that write_entry() needs to be re-factored to not take a cache_entry. The cache_entry type could gain knowledge about where it was allocated from. This would allow us to only have a single "free()" function, which could inspect the cache_entry to see if it was allocated from a mem_pool (and possibly which mem_pool) and take the appropriate action. The downside of this approach is that cache_entry would need to gain a field to track this information, and I *think* all of the bit field spots are taken. Let me read it fully once we know where these patches are to be applied, but before that I cannot say much about them X-<. Thanks.
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
On 04/17/2018 02:39 PM, Ben Peart wrote: On 4/17/2018 12:34 PM, Jameson Miller wrote: 100K Test baseline [4] block_allocation 0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01) 0.02(0.01+0.01) -33.3% 1M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11) 0.17(0.07+0.09) -26.1% 2M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19) 0.39(0.17+0.20) -13.3% 100K is not a large enough sample size to show the perf impact of this change, but we can see a perf improvement with 1M and 2M entries. I see a 33% change with 100K files which is a substantial improvement even in the 100K case. I do see that the actual wall clock savings aren't nearly as much with a small repo as it is with the larger repos which makes sense. You are correct - I should have been more careful in my wording. What I meant is that the wall time savings with 100K is not large, because this operation is already very fast.
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
On Tue, 17 Apr 2018 16:34:39 + Jameson Miller wrote: > Jameson Miller (5): > read-cache: teach refresh_cache_entry to take istate > Add an API creating / discarding cache_entry structs > mem-pool: fill out functionality > Allocate cache entries from memory pools > Add optional memory validations around cache_entry lifecyle In this patch set, there is no enforcement that the cache entry created by make_index_cache_entry() goes into the correct index when add_index_entry() is invoked. (Junio described similar things, I believe, in [1].) This might be an issue when we bring up and drop multiple indexes, and dropping one index causes a cache entry in another to become invalidated. One solution is to store the index for which the cache entry was created in the cache entry itself, but that does increase its size. Another is to change the API such that a cache entry is created and added in the same function, and then have some rollback if the cache entry turns out to be invalid (to support add-empty-entry -> fill -> verify), but I don't know if this is feasible. Anyway, all these alternatives should be at least discussed in the commit message, I think. The make_transient_cache_entry() function might be poorly named, since as far as I can tell, the entries produced by that function are actually the longest lasting, since they are never freed. Along those lines, I was slightly surprised to find out in patch 4 that cache entry freeing is a no-op. That's fine, but in that case, it would be better to delete all the calls to the "discard" function, and document in the others that the entries they create will only be freed when the memory pool itself is discarded. [1] https://public-inbox.org/git/xmqqwox5i0f7@gitster-ct.c.googlers.com/
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
>> base-commit: cafaccae98f749ebf33495aec42ea25060de8682 > > I couldn't quite figure out what these five patches were based on, > even with this line. Basing on and referring to a commit that is > not part of our published history with "base-commit" is not all that > useful to others. I'd like to second this. In the object store refactoring, I am at a point where I'd want to migrate the memory management of {object, tree, commit, tag}.c which currently is done in alloc.c to a memory pool, that has a dedicated pointer to it. So I'd either have to refactor alloc.c to take the_repository[1] or I'd play around with the mem_pool to manage memory in the object layer. I guess this playing around can happen with what is at origin/jm/mem-pool, however the life cycle management part of the third patch[2] would allow for stopping memleaks there. So I am interested in this series as well. [1] proof of concept in patches nearby https://public-inbox.org/git/20180206001749.218943-31-sbel...@google.com/ [2] https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/ Thanks, Stefan
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
Jameson Miller writes: > This patch series improves the performance of loading indexes by > reducing the number of malloc() calls. ... > > Jameson Miller (5): > read-cache: teach refresh_cache_entry to take istate > Add an API creating / discarding cache_entry structs > mem-pool: fill out functionality > Allocate cache entries from memory pools > Add optional memory validations around cache_entry lifecyle > > apply.c| 26 +++--- > blame.c| 5 +- > builtin/checkout.c | 8 +- > builtin/difftool.c | 8 +- > builtin/reset.c| 6 +- > builtin/update-index.c | 26 +++--- > cache.h| 40 - > git.c | 3 + > mem-pool.c | 136 - > mem-pool.h | 34 > merge-recursive.c | 4 +- > read-cache.c | 229 > +++-- > resolve-undo.c | 6 +- > split-index.c | 31 +-- > tree.c | 4 +- > unpack-trees.c | 27 +++--- > 16 files changed, 476 insertions(+), 117 deletions(-) > > > base-commit: cafaccae98f749ebf33495aec42ea25060de8682 I couldn't quite figure out what these five patches were based on, even with this line. Basing on and referring to a commit that is not part of our published history with "base-commit" is not all that useful to others. Offhand without applying these patches and viewing the changes in wider contexts, one thing that makes me wonder is how the two allocation schemes can be used with two implementations of free(). Upon add_index_entry() that replaces an index entry for an existing path, we'd discard an entry that was originally read as part of read_cache(). If we do that again, the second add_index_entry() will be discading, in its call to replace_index_entry(), the entry that was allocated by the caller of the first add_index_entry() call. And replace_index_entry() would not have a way to know from which allocation the entry's memory came from. Perhaps it is not how you are using the "transient" stuff, and from the comment in 2/5, it is for "entries that are not meant to go into the index", but then higher stage index entries in unpack_trees seem to be allocated via the "transient" stuff, so I am not sure what the plans are for things like merge_recursive() that uses unpack_trees() to prepare the index and then later "fix it up" by further futzing around the index to adjust for renames it finds, etc. Let me read it fully once we know where these patches are to be applied, but before that I cannot say much about them X-<. Thanks.
Re: [PATCH v1 0/5] Allocate cache entries from memory pool
On 4/17/2018 12:34 PM, Jameson Miller wrote: 100K Test baseline [4] block_allocation 0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01)0.02(0.01+0.01) -33.3% 1M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11)0.17(0.07+0.09) -26.1% 2M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19)0.39(0.17+0.20) -13.3% 100K is not a large enough sample size to show the perf impact of this change, but we can see a perf improvement with 1M and 2M entries. I see a 33% change with 100K files which is a substantial improvement even in the 100K case. I do see that the actual wall clock savings aren't nearly as much with a small repo as it is with the larger repos which makes sense.
[PATCH v1 0/5] Allocate cache entries from memory pool
This patch series improves the performance of loading indexes by reducing the number of malloc() calls. Loading the index from disk is partly dominated by the time in malloc(), which is called for each index entry. This patch series reduces the number of times malloc() is called as part of loading the index, and instead allocates a block of memory upfront that is large enough to hold all of the cache entries, and chunks this memory itself. This change builds on [1], which is a prerequisite for this change. Git previously allocated block of memory for the index cache entries until [2]. This 5 part patch series is broken up as follows: 1/5, 2/5 - Move cache entry lifecycle methods behind an API 3/5 - Fill out memory pool API to include lifecycle and other methods used in later patches 4/5 - Allocate cache entry structs from memory pool 5/5 - Add extra optional validation Performance Benchmarks: To evaluate the performance of this approach, the p0002-read-cache.sh test was run with several combinations of allocators (glibc default, tcmalloc, jemalloc), with and without block allocation, and across several different index sized (100K, 1M, 2M entries). The details on how these repositories were constructed can be found in [3].The p0002-read-cache.sh was run with the iteration count set to 1 and $GIT_PERF_REPEAT_COUNT=10. The tests were run with iteration count set to 1 because this best approximates the real behavior. The read_cache/discard_cache test will load / free the index N times, and the performance of this logic is different between N = 1 and N > 1. As the production code does not read / discard the index in a loop, a better approximation is when N = 1. 100K Test baseline [4] block_allocation 0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01)0.02(0.01+0.01) -33.3% 1M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11)0.17(0.07+0.09) -26.1% 2M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19)0.39(0.17+0.20) -13.3% 100K is not a large enough sample size to show the perf impact of this change, but we can see a perf improvement with 1M and 2M entries. For completeness, here is the p0002-read-cache tests for git.git and linux.git: git.git: Test baseline block_allocation - 0002.1: read_cache/discard_cache 1000 times 0.30(0.26+0.03) 0.17(0.13+0.03) -43.3% linux.git: Test baseline block_allocation - 0002.1: read_cache/discard_cache 1000 times 7.05(6.01+0.84) 4.61(3.74+0.66) -34.6% We also investigated the performance of just using different allocators. We can see that there is not a consistent performance gain. 100K Test baseline [4] tcmalloc jemalloc -- 0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01) 0.03(0.01+0.01) +0.0% 0.03(0.02+0.01) +0.0% 1M: Test baseline tcmalloc jemalloc -- 0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11) 0.21(0.10+0.10) -8.7% 0.27(0.16+0.10) +17.4% 2M: Test baseline tcmalloc jemalloc -- 0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19) 0.46(0.25+0.21) +2.2% 0.57(0.36+0.21) +26.7% [1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/ [2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24) [3] Constructing test repositories: The test repositories were constructed with t/perf/repos/many_files.sh with the following parameters: 100K:many-files.sh 4 10 9 1M: many-files.sh 5 10 9 2M: many-files.sh 6 8 7 [4] baseline commit: 8b026eda Revert "Merge branch 'en/rename-directory-detection'" Jameson Miller (5): read-cache: teach refresh_cache_entry to take istate Add an API creating / discarding cache_entry structs
[PATCH v1 0/5] Allocate cache entries from memory pool
This patch series improves the performance of loading indexes by reducing the number of malloc() calls. Loading the index from disk is partly dominated by the time in malloc(), which is called for each index entry. This patch series reduces the number of times malloc() is called as part of loading the index, and instead allocates a block of memory upfront that is large enough to hold all of the cache entries, and chunks this memory itself. This change builds on [1], which is a prerequisite for this change. Git previously allocated block of memory for the index cache entries until [2]. This 5 part patch series is broken up as follows: 1/5, 2/5 - Move cache entry lifecycle methods behind an API 3/5 - Fill out memory pool API to include lifecycle and other methods used in later patches 4/5 - Allocate cache entry structs from memory pool 5/5 - Add extra optional validation Performance Benchmarks: To evaluate the performance of this approach, the p0002-read-cache.sh test was run with several combinations of allocators (glibc default, tcmalloc, jemalloc), with and without block allocation, and across several different index sized (100K, 1M, 2M entries). The details on how these repositories were constructed can be found in [3].The p0002-read-cache.sh was run with the iteration count set to 1 and $GIT_PERF_REPEAT_COUNT=10. The tests were run with iteration count set to 1 because this best approximates the real behavior. The read_cache/discard_cache test will load / free the index N times, and the performance of this logic is different between N = 1 and N > 1. As the production code does not read / discard the index in a loop, a better approximation is when N = 1. 100K Test baseline [4] block_allocation 0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01)0.02(0.01+0.01) -33.3% 1M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11)0.17(0.07+0.09) -26.1% 2M: Test baseline block_allocation 0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19)0.39(0.17+0.20) -13.3% 100K is not a large enough sample size to show the perf impact of this change, but we can see a perf improvement with 1M and 2M entries. For completeness, here is the p0002-read-cache tests for git.git and linux.git: git.git: Test baseline [4] block_allocation - 0002.1: read_cache/discard_cache 1000 times 0.30(0.26+0.03) 0.17(0.13+0.03) -43.3% linux.git: Test baseline block_allocation - 0002.1: read_cache/discard_cache 1000 times 7.05(6.01+0.84) 4.61(3.74+0.66) -34.6% We also investigated the performance of just using different allocators. We can see that there is not a consistent performance gain. 100K Test baseline [4] tcmalloc jemalloc -- 0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01) 0.03(0.01+0.01) +0.0% 0.03(0.02+0.01) +0.0% 1M: Test baseline tcmalloc jemalloc -- 0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11) 0.21(0.10+0.10) -8.7% 0.27(0.16+0.10) +17.4% 2M: Test baseline tcmalloc jemalloc -- 0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19) 0.46(0.25+0.21) +2.2% 0.57(0.36+0.21) +26.7% [1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/ [2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24) [3] Constructing test repositories: The test repositories were constructed with t/perf/repos/many_files.sh with the following parameters: 100K:many-files.sh 4 10 9 1M: many-files.sh 5 10 9 2M: many-files.sh 6 8 7 [4] baseline commit: 8b026eda Revert "Merge branch 'en/rename-directory-detection'" Jameson Miller (5): read-cache: teach refresh_cache_entry to take istate Add an API creating / discarding cache_entry structs