Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Stefan Beller
On Mon, Apr 23, 2018 at 9:44 AM, Jameson Miller
 wrote:
> I would be interested to understand how the
> mem_pool would fit your needs, and if it is sufficient or needs modification
> for your use cases.
>
>> [1] proof of concept in patches nearby
>> https://public-inbox.org/git/20180206001749.218943-31-sbel...@google.com/
>>

Currenlty the parsed objects are loaded into memory and never freed.
See alloc.c which implements a specialized memory allocator for this
object loading.
When working with submodules, their objects are also just put into this
globally-namespaced object store. (See struct object **obj_hash in
object.c)

I want to make the object store a per-repository object, such that
when working with submodules, you can free up all submodule objects
once you are done with a given submodule.

To do so, the memory allocation needs to manage the whole life cycle,
while preserving the efficiency of alloc.c. See 855419f764
(Add specialized object allocator, 2006-06-19). The mem-pool that
you propose can allocate large slabs of memory and we can put
objects in there without alignment overhead, such that we preserve
the memory efficiency while being able to track all the memory.

So I would think it is sufficient as-is with this series, maybe we need
a little tweaking there, but nothing large IMHO.

Thanks,
Stefan


Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller



On 04/20/2018 07:34 PM, Jonathan Tan wrote:

On Tue, 17 Apr 2018 16:34:39 +
Jameson Miller  wrote:


Jameson Miller (5):
   read-cache: teach refresh_cache_entry to take istate
   Add an API creating / discarding cache_entry structs
   mem-pool: fill out functionality
   Allocate cache entries from memory pools
   Add optional memory validations around cache_entry lifecyle


In this patch set, there is no enforcement that the cache entry created
by make_index_cache_entry() goes into the correct index when
add_index_entry() is invoked. (Junio described similar things, I
believe, in [1].) This might be an issue when we bring up and drop
multiple indexes, and dropping one index causes a cache entry in another
to become invalidated.


Correct - it is up to the caller here to coordinate this. The code 
should be set up so this is not a problem here. In the case of a 
split-index, the cache entries should be allocated from the memory pool 
associated with the "most common" / base index. If you found a place I 
missed or seems questionable, or have suggestions, I would be glad to 
look into it.




One solution is to store the index for which the cache entry was created
in the cache entry itself, but that does increase its size. Another is


Yes, this is an option. For this initial patch series, I decided to not 
add extra fields to the cache_entry type, but I think incorporating this 
in cache_entry is a viable option, and has some positive properties.



to change the API such that a cache entry is created and added in the
same function, and then have some rollback if the cache entry turns out
to be invalid (to support add-empty-entry -> fill -> verify), but I
don't know if this is feasible. Anyway, all these alternatives should be
at least discussed in the commit message, I think.


I can include a discussion of these in the commit message. Thanks.



The make_transient_cache_entry() function might be poorly named, since
as far as I can tell, the entries produced by that function are actually
the longest lasting, since they are never freed.


They should always be freed (and are usually freed close to where they 
are allocated, or by the calling function). If you see an instance where 
this is not the case, please point it out, because that is not the 
intention.




Along those lines, I was slightly surprised to find out in patch 4 that
cache entry freeing is a no-op. That's fine, but in that case, it would
be better to delete all the calls to the "discard" function, and
document in the others that the entries they create will only be freed
when the memory pool itself is discarded.


I can add a comment inside the function body. In the next commit, I do 
add logic to perform extra (optional) verification in the discard 
function. I did wrestle with this fact, but I feel there is value in 
having the locations where it is appropriate to free these entries in 
code, even if this particular implementation is not utilizing it right 
now. Hopefully the verification logic added in 5/5 is enough to justify 
keeping this function around.




[1] https://public-inbox.org/git/xmqqwox5i0f7@gitster-ct.c.googlers.com/



Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller



On 04/20/2018 01:49 PM, Stefan Beller wrote:

base-commit: cafaccae98f749ebf33495aec42ea25060de8682


I couldn't quite figure out what these five patches were based on,
even with this line.  Basing on and referring to a commit that is
not part of our published history with "base-commit" is not all that
useful to others.


I'd like to second this. In the object store refactoring, I am at a point where
I'd want to migrate the memory management of {object, tree, commit, tag}.c
which currently is done in alloc.c to a memory pool, that has a dedicated
pointer to it.

So I'd either have to refactor alloc.c to take the_repository[1] or
I'd play around with the mem_pool to manage memory in the
object layer. I guess this playing around can happen with
what is at origin/jm/mem-pool, however the life cycle management
part of the third patch[2] would allow for stopping memleaks there.
So I am interested in this series as well.



Sorry about the confusion here. This patch series can be applied to the 
next branch. They apply cleanly on [3]. The lifecycle functions are 
re-introduced in [4], which we could incorporate sooner if useful. I 
didn't have a consumer for the calls in the original patch series, and 
so deferred them until there was a caller. I would be interested to 
understand how the mem_pool would fit your needs, and if it is 
sufficient or needs modification for your use cases.



[1] proof of concept in patches nearby
https://public-inbox.org/git/20180206001749.218943-31-sbel...@google.com/

[2] https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/

Thanks,
Stefan



[3] b46fe60e1d ("merge-fix/ps/test-chmtime-get", 2018-04-20)

[4]
https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/


Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller



On 04/18/2018 12:49 AM, Junio C Hamano wrote:

Jameson Miller  writes:


This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. ...

Jameson Miller (5):
   read-cache: teach refresh_cache_entry to take istate
   Add an API creating / discarding cache_entry structs
   mem-pool: fill out functionality
   Allocate cache entries from memory pools
   Add optional memory validations around cache_entry lifecyle

  apply.c|  26 +++---
  blame.c|   5 +-
  builtin/checkout.c |   8 +-
  builtin/difftool.c |   8 +-
  builtin/reset.c|   6 +-
  builtin/update-index.c |  26 +++---
  cache.h|  40 -
  git.c  |   3 +
  mem-pool.c | 136 -
  mem-pool.h |  34 
  merge-recursive.c  |   4 +-
  read-cache.c   | 229 +++--
  resolve-undo.c |   6 +-
  split-index.c  |  31 +--
  tree.c |   4 +-
  unpack-trees.c |  27 +++---
  16 files changed, 476 insertions(+), 117 deletions(-)


base-commit: cafaccae98f749ebf33495aec42ea25060de8682

I couldn't quite figure out what these five patches were based on,
even with this line.  Basing on and referring to a commit that is
not part of our published history with "base-commit" is not all that
useful to others.

My apologies - this patch series should be applied to the 'next'
branch.  It applies cleanly on top of
b46fe60e1d ("merge-fix/ps/test-chmtime-get", 2018-04-20), which
is a commit in the 'next' branch.

Offhand without applying these patches and viewing the changes in
wider contexts, one thing that makes me wonder is how the two
allocation schemes can be used with two implementations of free().
Upon add_index_entry() that replaces an index entry for an existing
path, we'd discard an entry that was originally read as part of
read_cache().  If we do that again, the second add_index_entry()
will be discading, in its call to replace_index_entry(), the entry
that was allocated by the caller of the first add_index_entry()
call.  And replace_index_entry() would not have a way to know from
which allocation the entry's memory came from.

Perhaps it is not how you are using the "transient" stuff, and from
the comment in 2/5, it is for "entries that are not meant to go into
the index", but then higher stage index entries in unpack_trees seem
to be allocated via the "transient" stuff, so I am not sure what the
plans are for things like merge_recursive() that uses unpack_trees()
to prepare the index and then later "fix it up" by further futzing
around the index to adjust for renames it finds, etc.


Good points. The intention with this logic is that any entries
that *could* go into an index are allocated from the memory
pool. The "transient" entries only exist for a short period of
time. These have a defined lifetime and we can always trace the
corresponding "free" call. make_transient_cache_entry() is only
used to construct a temporary cache_entry to pass to the
checkout_entry() / write_entry(). There is a note in checkout.c
indicating that write_entry() needs to be re-factored to not take
a cache_entry.

The cache_entry type could gain knowledge about where it was
allocated from. This would allow us to only have a
single "free()" function, which could inspect the cache_entry to
see if it was allocated from a mem_pool (and possibly which
mem_pool) and take the appropriate action. The downside of this
approach is that cache_entry would need to gain a field to track
this information, and I *think* all of the bit field spots are
taken.


Let me read it fully once we know where these patches are to be
applied, but before that I cannot say much about them X-<.

Thanks.





Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller



On 04/17/2018 02:39 PM, Ben Peart wrote:



On 4/17/2018 12:34 PM, Jameson Miller wrote:


100K

Test   baseline [4] block_allocation

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01) 
0.02(0.01+0.01) -33.3%


1M:

Test   baseline block_allocation

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11) 
0.17(0.07+0.09) -26.1%


2M:

Test   baseline block_allocation

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19) 
0.39(0.17+0.20) -13.3%



100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.


I see a 33% change with 100K files which is a substantial improvement 
even in the 100K case.  I do see that the actual wall clock savings 
aren't nearly as much with a small repo as it is with the larger repos 
which makes sense.


You are correct - I should have been more careful in my wording. What I 
meant is that the wall time savings with 100K is not large, because this 
operation is already very fast.




Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-20 Thread Jonathan Tan
On Tue, 17 Apr 2018 16:34:39 +
Jameson Miller  wrote:

> Jameson Miller (5):
>   read-cache: teach refresh_cache_entry to take istate
>   Add an API creating / discarding cache_entry structs
>   mem-pool: fill out functionality
>   Allocate cache entries from memory pools
>   Add optional memory validations around cache_entry lifecyle

In this patch set, there is no enforcement that the cache entry created
by make_index_cache_entry() goes into the correct index when
add_index_entry() is invoked. (Junio described similar things, I
believe, in [1].) This might be an issue when we bring up and drop
multiple indexes, and dropping one index causes a cache entry in another
to become invalidated.

One solution is to store the index for which the cache entry was created
in the cache entry itself, but that does increase its size. Another is
to change the API such that a cache entry is created and added in the
same function, and then have some rollback if the cache entry turns out
to be invalid (to support add-empty-entry -> fill -> verify), but I
don't know if this is feasible. Anyway, all these alternatives should be
at least discussed in the commit message, I think.

The make_transient_cache_entry() function might be poorly named, since
as far as I can tell, the entries produced by that function are actually
the longest lasting, since they are never freed.

Along those lines, I was slightly surprised to find out in patch 4 that
cache entry freeing is a no-op. That's fine, but in that case, it would
be better to delete all the calls to the "discard" function, and
document in the others that the entries they create will only be freed
when the memory pool itself is discarded.

[1] https://public-inbox.org/git/xmqqwox5i0f7@gitster-ct.c.googlers.com/


Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-20 Thread Stefan Beller
>> base-commit: cafaccae98f749ebf33495aec42ea25060de8682
>
> I couldn't quite figure out what these five patches were based on,
> even with this line.  Basing on and referring to a commit that is
> not part of our published history with "base-commit" is not all that
> useful to others.

I'd like to second this. In the object store refactoring, I am at a point where
I'd want to migrate the memory management of {object, tree, commit, tag}.c
which currently is done in alloc.c to a memory pool, that has a dedicated
pointer to it.

So I'd either have to refactor alloc.c to take the_repository[1] or
I'd play around with the mem_pool to manage memory in the
object layer. I guess this playing around can happen with
what is at origin/jm/mem-pool, however the life cycle management
part of the third patch[2] would allow for stopping memleaks there.
So I am interested in this series as well.

[1] proof of concept in patches nearby
https://public-inbox.org/git/20180206001749.218943-31-sbel...@google.com/

[2] https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/

Thanks,
Stefan


Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Junio C Hamano
Jameson Miller  writes:

> This patch series improves the performance of loading indexes by
> reducing the number of malloc() calls. ...
>
> Jameson Miller (5):
>   read-cache: teach refresh_cache_entry to take istate
>   Add an API creating / discarding cache_entry structs
>   mem-pool: fill out functionality
>   Allocate cache entries from memory pools
>   Add optional memory validations around cache_entry lifecyle
>
>  apply.c|  26 +++---
>  blame.c|   5 +-
>  builtin/checkout.c |   8 +-
>  builtin/difftool.c |   8 +-
>  builtin/reset.c|   6 +-
>  builtin/update-index.c |  26 +++---
>  cache.h|  40 -
>  git.c  |   3 +
>  mem-pool.c | 136 -
>  mem-pool.h |  34 
>  merge-recursive.c  |   4 +-
>  read-cache.c   | 229 
> +++--
>  resolve-undo.c |   6 +-
>  split-index.c  |  31 +--
>  tree.c |   4 +-
>  unpack-trees.c |  27 +++---
>  16 files changed, 476 insertions(+), 117 deletions(-)
>
>
> base-commit: cafaccae98f749ebf33495aec42ea25060de8682

I couldn't quite figure out what these five patches were based on,
even with this line.  Basing on and referring to a commit that is
not part of our published history with "base-commit" is not all that
useful to others.

Offhand without applying these patches and viewing the changes in
wider contexts, one thing that makes me wonder is how the two
allocation schemes can be used with two implementations of free().
Upon add_index_entry() that replaces an index entry for an existing
path, we'd discard an entry that was originally read as part of
read_cache().  If we do that again, the second add_index_entry()
will be discading, in its call to replace_index_entry(), the entry
that was allocated by the caller of the first add_index_entry()
call.  And replace_index_entry() would not have a way to know from
which allocation the entry's memory came from.

Perhaps it is not how you are using the "transient" stuff, and from
the comment in 2/5, it is for "entries that are not meant to go into
the index", but then higher stage index entries in unpack_trees seem
to be allocated via the "transient" stuff, so I am not sure what the
plans are for things like merge_recursive() that uses unpack_trees()
to prepare the index and then later "fix it up" by further futzing
around the index to adjust for renames it finds, etc.

Let me read it fully once we know where these patches are to be
applied, but before that I cannot say much about them X-<.

Thanks.



Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Ben Peart



On 4/17/2018 12:34 PM, Jameson Miller wrote:


100K

Test   baseline [4]   block_allocation
  

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
  

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
  

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.


I see a 33% change with 100K files which is a substantial improvement 
even in the 100K case.  I do see that the actual wall clock savings 
aren't nearly as much with a small repo as it is with the larger repos 
which makes sense.


[PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Jameson Miller
This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.

Git previously allocated block of memory for the index cache entries
until [2].

This 5 part patch series is broken up as follows:

  1/5, 2/5 - Move cache entry lifecycle methods behind an API

  3/5 - Fill out memory pool API to include lifecycle and other
methods used in later patches

  4/5 - Allocate cache entry structs from memory pool

  5/5 - Add extra optional validation

Performance Benchmarks:

To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.

The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.

100K

Test   baseline [4]   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries. For
completeness, here is the p0002-read-cache tests for git.git and
linux.git:

git.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   0.30(0.26+0.03)   0.17(0.13+0.03) 
-43.3%

linux.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   7.05(6.01+0.84)   4.61(3.74+0.66) 
-34.6% 


We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.

100K

Test   baseline [4]  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)   0.03(0.01+0.01) 
+0.0% 0.03(0.02+0.01) +0.0% 

1M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)   0.21(0.10+0.10) 
-8.7% 0.27(0.16+0.10) +17.4%

2M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)   0.46(0.25+0.21) 
+2.2% 0.57(0.36+0.21) +26.7%


[1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/

[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)

[3] Constructing test repositories:

The test repositories were constructed with t/perf/repos/many_files.sh with the 
following parameters:

100K:many-files.sh 4 10 9
1M:  many-files.sh 5 10 9
2M:  many-files.sh 6 8 7

[4] baseline commit: 8b026eda Revert "Merge branch 
'en/rename-directory-detection'"

Jameson Miller (5):
  read-cache: teach refresh_cache_entry to take istate
  Add an API creating / discarding cache_entry structs
 

[PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Jameson Miller
This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.

Git previously allocated block of memory for the index cache entries
until [2].

This 5 part patch series is broken up as follows:

  1/5, 2/5 - Move cache entry lifecycle methods behind an API

  3/5 - Fill out memory pool API to include lifecycle and other
methods used in later patches

  4/5 - Allocate cache entry structs from memory pool

  5/5 - Add extra optional validation

Performance Benchmarks:

To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.

The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.

100K

Test   baseline [4]   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.

For completeness, here is the p0002-read-cache tests for git.git and
linux.git:

git.git:

Test  baseline [4] block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   0.30(0.26+0.03)   0.17(0.13+0.03) 
-43.3%

linux.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   7.05(6.01+0.84)   4.61(3.74+0.66) 
-34.6% 


We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.

100K

Test   baseline [4]  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)   0.03(0.01+0.01) 
+0.0% 0.03(0.02+0.01) +0.0% 

1M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)   0.21(0.10+0.10) 
-8.7% 0.27(0.16+0.10) +17.4%

2M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)   0.46(0.25+0.21) 
+2.2% 0.57(0.36+0.21) +26.7%



[1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/

[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)

[3] Constructing test repositories:

The test repositories were constructed with t/perf/repos/many_files.sh with the 
following parameters:

100K:many-files.sh 4 10 9
1M:  many-files.sh 5 10 9
2M:  many-files.sh 6 8 7

[4] baseline commit: 8b026eda Revert "Merge branch 
'en/rename-directory-detection'"

Jameson Miller (5):
  read-cache: teach refresh_cache_entry to take istate
  Add an API creating / discarding cache_entry structs