from:"Jameson Miller"

[PATCH 0/3] Fix gc segfault

2018-10-16 Thread Jameson Miller via GitGitGadget

In 9ac3f0e (pack-objects: fix performance issues on packing large deltas,
2018-07-22), a mutex was introduced that is used to guard the call to set
the delta size. This commit added code to initialize it, but at an incorrect
spot: in init_threaded_search(), while the call to oe_set_delta_size() (and
hence to packing_data_lock()) can happen in the call chain check_object() <- 
get_object_details() <-prepare_pack() <- cmd_pack_objects(), which is long
before theprepare_pack() function calls ll_find_deltas() (which initializes
the threaded search).

Another tell-tale that the mutex was initialized in an incorrect spot is
that the function to initialize it lives in builtin/, while the code that
uses the mutex is defined in a libgit.a header file.

Let's use a more appropriate function: prepare_packing_data(), which not
only lives in libgit.a, but has to be called before thepacking_data struct
is used that contains that mutex.

Johannes Schindelin (3):
  Fix typo 'detla' -> 'delta'
  pack-objects (mingw): demonstrate a segmentation fault with large
deltas
  pack-objects (mingw): initialize `packing_data` mutex in the correct
spot

 builtin/pack-objects.c|  1 -
 pack-objects.c|  3 +++
 pack-objects.h|  2 +-
 t/t5321-pack-large-objects.sh | 32 
 4 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100755 t/t5321-pack-large-objects.sh


base-commit: 5a0cc8aca797dbd7d2be3b67458ff880ed45cddf
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-47%2Fjamill%2Ffix-gc-segfault-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-47/jamill/fix-gc-segfault-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/47
-- 
gitgitgadget

RE: [PATCH v6 2/8] read-cache: teach make_cache_entry to take object_id

2018-07-05 Thread Jameson Miller

> >
> > Teach make_cache_entry function to take object_id instead of a SHA-1.
> 
> This repeats the subject line?
> 
> Sign off missing.

Thank you Stefan for pointing this out. This does not add much information to 
the subject line. I could clean it up if I re-roll to fix up the missing sign 
off.

Junio - would you like me to re-roll this patch series and include the correct 
sign off, or would you be able to correct this? I can do whichever is easiest 
for you.

[PATCH v6 4/8] mem-pool: only search head block for available space

2018-07-02 Thread Jameson Miller

Instead of searching all memory blocks for available space to fulfill
a memory request, only search the head block. If the head block does
not have space, assume that previous block would most likely not be
able to fulfill request either. This could potentially lead to more
memory fragmentation, but also avoids searching memory blocks that
probably will not be able to fulfill request.

This pattern will benefit consumers that are able to generate a good
estimate for how much memory will be needed, or if they are performing
fixed sized allocations, so that once a block is exhausted it will
never be able to fulfill a future request.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..c80124f1fe 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -21,16 +21,16 @@ static struct mp_block *mem_pool_alloc_block(struct 
mem_pool *mem_pool, size_t b
 
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
-   struct mp_block *p;
+   struct mp_block *p = NULL;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
+   if (mem_pool->mp_block &&
+   mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len)
+   p = mem_pool->mp_block;
 
if (!p) {
if (len >= (mem_pool->block_alloc / 2)) {
-- 
2.17.1

[PATCH v6 8/8] block alloc: add validations around cache_entry lifecyle

2018-07-02 Thread Jameson Miller

Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.

Signed-off-by: Jameson Miller 
---
 cache.h  |  6 ++
 git.c|  3 +++
 mem-pool.c   |  6 +-
 mem-pool.h   |  2 +-
 read-cache.c | 55 ++--
 5 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 8fd89ae51f..b3faef3efc 100644
--- a/cache.h
+++ b/cache.h
@@ -380,6 +380,12 @@ struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Check configuration if we should perform extra validation on cache
+ * entries.
+ */
+int should_validate_cache_entries(void);
+
 /*
  * Duplicate a cache_entry. Allocate memory for the new entry from a
  * memory_pool. Takes into account cache_entry fields that are meant
diff --git a/git.c b/git.c
index 9dbe6ffaa7..c7e4f0351a 100644
--- a/git.c
+++ b/git.c
@@ -414,7 +414,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(&the_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(&the_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index 139617cb23..a2841a4a9a 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -50,7 +50,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size)
*mem_pool = pool;
 }
 
-void mem_pool_discard(struct mem_pool *mem_pool)
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory)
 {
struct mp_block *block, *block_to_free;
 
@@ -59,6 +59,10 @@ void mem_pool_discard(struct mem_pool *mem_pool)
{
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
diff --git a/mem-pool.h b/mem-pool.h
index adeefdcb28..999d3c3a52 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -29,7 +29,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size);
 /*
  * Discard a memory pool and free all the memory it is responsible for.
  */
-void mem_pool_discard(struct mem_pool *mem_pool);
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory);
 
 /*
  * Alloc memory from the mem_pool.
diff --git a/read-cache.c b/read-cache.c
index b07369660b..fd67e2e8a4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2050,8 +2050,10 @@ int discard_index(struct index_state *istate)
 * Cache entries in istate->cache[] should have been allocated
 * from the memory pool associated with this index, or from an
 * associated split_index. There is no need to free individual
-* cache entries.
+* cache entries. validate_cache_entries can detect when this
+* assertion does not hold.
 */
+   validate_cache_entries(istate);
 
resolve_undo_clear_index(istate);
istate->cache_nr = 0;
@@ -2068,13 +2070,45 @@ int discard_index(struct index_state *istate)
istate->untracked = NULL;
 
if (istate->ce_mem_pool) {
-   mem_pool_discard(istate->ce_mem_pool);
+   mem_pool_discard(istate->ce_mem_pool, 
should_validate_cache_entries());
istate->ce_mem_pool = NULL;
}
 
return 0;
 }
 
+/*
+ * Validate the cache entries of this index.
+ * All cache entries associated with this index
+ * should have been allocated by the memory pool
+ * associated with this index, or by a referenced
+ * split index.
+ */
+void validate_cache_entries(const struct index_state *istate)
+{
+   int i;
+
+   if (!should_validate_cache_entries() ||!istate || !istate->initialized)
+   return;
+
+   for (i = 0; i < istate->cache_nr; i++) {
+   if (!istate) {
+   die("internal error: cache entry is not allocated from 
expected memory pool");
+   } else if (!istate->ce_mem_pool ||
+   !mem_pool_contains(istate->ce_mem_pool, 
istate->cache[i])) {
+   if (!istate->split_index ||
+   !istate->split_index->base ||
+   !istate->split_index->base->ce_mem_pool ||
+

[PATCH v6 1/8] read-cache: teach refresh_cache_entry to take istate

2018-07-02 Thread Jameson Miller

Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function apply to a specific index.

Signed-off-by: Jameson Miller 
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 9 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index d49092d94d..93af25f586 100644
--- a/cache.h
+++ b/cache.h
@@ -751,7 +751,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index bed4a5be02..8b3d6781c7 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -326,7 +326,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("add_cacheinfo failed to refresh for 
path '%s'; merge aborting."), path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 372588260e..fa8366ecab 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(&the_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1473,10 +1473,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
-  unsigned int options)
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+   struct cache_entry *ce,
+   unsigned int options)
 {
-   return refresh_cache_ent(&the_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.17.1

[PATCH v6 0/8] Allocate cache entries from mem_pool

2018-07-02 Thread Jameson Miller

Changes since v5:
  - Updated commit messages for:
7581156e8a read-cache: teach refresh_cache_entry to take istate
2c962aa0cd block alloc: add lifecycle APIs for cache_entry structs

  - Use oidcpy function instead of hashcpy function

### Interdiff (v5..v6):

diff --git a/read-cache.c b/read-cache.c
index ba31e929e8..fd67e2e8a4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -814,7 +814,7 @@ struct cache_entry *make_cache_entry(struct index_state 
*istate,
len = strlen(path);
ce = make_empty_cache_entry(istate, len);
 
-   hashcpy(ce->oid.hash, oid->hash);
+   oidcpy(&ce->oid, oid);
memcpy(ce->name, path, len);
ce->ce_flags = create_ce_flags(stage);
ce->ce_namelen = len;
@@ -840,7 +840,7 @@ struct cache_entry *make_transient_cache_entry(unsigned int 
mode, const struct o
len = strlen(path);
ce = make_empty_transient_cache_entry(len);
 
-   hashcpy(ce->oid.hash, oid->hash);
+   oidcpy(&ce->oid, oid);
memcpy(ce->name, path, len);
ce->ce_flags = create_ce_flags(stage);
ce->ce_namelen = len;


### Patches

Jameson Miller (8):
  read-cache: teach refresh_cache_entry to take istate
  read-cache: teach make_cache_entry to take object_id
  block alloc: add lifecycle APIs for cache_entry structs
  mem-pool: only search head block for available space
  mem-pool: add life cycle management functions
  mem-pool: fill out functionality
  block alloc: allocate cache entries from mem_pool
  block alloc: add validations around cache_entry lifecyle

 apply.c|  24 ++--
 blame.c|   5 +-
 builtin/checkout.c |   8 +-
 builtin/difftool.c |   6 +-
 builtin/reset.c|   2 +-
 builtin/update-index.c |  26 ++--
 cache.h|  64 +-
 git.c  |   3 +
 mem-pool.c | 114 --
 mem-pool.h |  23 
 merge-recursive.c  |   4 +-
 read-cache.c   | 264 ++---
 resolve-undo.c |   4 +-
 split-index.c  |  58 +++--
 tree.c |   4 +-
 unpack-trees.c |  40 ---
 16 files changed, 515 insertions(+), 134 deletions(-)


base-commit: ed843436dd4924c10669820cc73daf50f0b4dabd
-- 
2.17.1

[PATCH v6 3/8] block alloc: add lifecycle APIs for cache_entry structs

2018-07-02 Thread Jameson Miller

It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made to allocate cahce entries when loading an index.

Add an API to allocate and discard cache entries, abstracting the
details of managing the memory backing the cache entries. This commit
does actually change how memory is managed - this will be done in a
later commit in the series.

This change makes the distinction between cache entries that are
associated with an index and cache entries that are not associated with
an index. A main use of cache entries is with an index, and we can
optimize the memory management around this. We still have other cases
where a cache entry is not persisted with an index, and so we need to
handle the "transient" use case as well.

To keep the congnitive overhead of managing the cache entries, there
will only be a single discard function. This means there must be enough
information kept with the cache entry so that we know how to discard
them.

A summary of the main functions in the API is:

make_cache_entry: create cache entry for use in an index. Uses specified
  parameters to populate cache_entry fields.

make_empty_cache_entry: Create an empty cache entry for use in an index.
Returns cache entry with empty fields.

make_transient_cache_entry: create cache entry that is not used in an
index. Uses specified parameters to populate
cache_entry fields.

make_empty_transient_cache_entry: create cache entry that is not used in
  an index. Returns cache entry with
  empty fields.

discard_cache_entry: A single function that knows how to discard a cache
 entry regardless of how it was allocated.

Signed-off-by: Jameson Miller 
---
 apply.c| 24 +--
 blame.c|  5 +--
 builtin/checkout.c |  8 ++--
 builtin/difftool.c |  6 +--
 builtin/reset.c|  2 +-
 builtin/update-index.c | 26 +---
 cache.h| 40 +++---
 merge-recursive.c  |  2 +-
 read-cache.c   | 93 +-
 resolve-undo.c |  4 +-
 split-index.c  |  8 ++--
 tree.c |  4 +-
 unpack-trees.c | 35 +++-
 13 files changed, 165 insertions(+), 92 deletions(-)

diff --git a/apply.c b/apply.c
index 8ef975a32d..8a4a4439bc 100644
--- a/apply.c
+++ b/apply.c
@@ -4092,12 +4092,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, &oid, name, 0, 0);
+   ce = make_cache_entry(&result, patch->old_mode, &oid, name, 0, 
0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4265,9 +4265,8 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_cache_entry(&the_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4280,13 +4279,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", &s) ||
get_oid_hex(s, &ce->oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   discard_cache_entry(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, &st) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4294,13

[PATCH v6 7/8] block alloc: allocate cache entries from mem_pool

2018-07-02 Thread Jameson Miller

When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool.  This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.

Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:

Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.

Alternatives:

1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
  Pro: No extra memory overhead to track this state
  Con: Extra complexity in callers to handle this correctly.

The extra complexity and burden to not regress this behavior in the
future was more than we wanted.

2) cache_entry would gain knowledge about which mem_pool allocated it
  Pro: Could (potentially) do extra logic to know when a mem_pool no
   longer had references to any cache_entry
  Con: cache_entry would grow heavier by a pointer, instead of int

We didn't see a tangible benefit to this approach

3) Do not add any extra information to a cache_entry, but when freeing a
   cache entry, check if the memory exists in a region managed by existing
   mem_pools.
  Pro: No extra memory overhead to track state
  Con: Extra computation is performed when freeing cache entries

We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.

Signed-off-by: Jameson Miller 
---
 cache.h|  21 +
 mem-pool.c |   3 +-
 read-cache.c   | 119 +
 split-index.c  |  50 +
 unpack-trees.c |  13 +-
 5 files changed, 167 insertions(+), 39 deletions(-)

diff --git a/cache.h b/cache.h
index 035a627bea..8fd89ae51f 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -156,6 +157,7 @@ struct cache_entry {
struct stat_data ce_stat_data;
unsigned int ce_mode;
unsigned int ce_flags;
+   unsigned int mem_pool_allocated;
unsigned int ce_namelen;
unsigned int index; /* for link extension */
struct object_id oid;
@@ -227,6 +229,7 @@ static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
 {
unsigned int state = dst->ce_flags & CE_HASHED;
+   int mem_pool_allocated = dst->mem_pool_allocated;
 
/* Don't copy hash chain and name */
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
@@ -235,6 +238,9 @@ static inline void copy_cache_entry(struct cache_entry *dst,
 
/* Restore the hash state */
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
+
+   /* Restore the mem_pool_allocated fla

[PATCH v6 5/8] mem-pool: add life cycle management functions

2018-07-02 Thread Jameson Miller

Add initialization and discard functions to mem_pool type. As the
memory allocated by mem_pool can now be freed, we also track the large
allocations.

If the there are existing mp_blocks in the mem_poo's linked list of
mp_blocksl, then the mp_block for a large allocation is inserted
behind the head block. This is because only the head mp_block is considered
when searching for availble space. This results in the following
desirable properties:

1) The mp_block allocated for the large request will not be included
not included in the search for available in future requests, the large
mp_block is sized for the specific request and does not contain any
spare space.

2) The head mp_block will not bumped from considation for future
memory requests just because a request for a large chunk of memory
came in.

These changes are in preparation for a future commit that will utilize
creating and discarding memory pool.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 59 ++
 mem-pool.h | 10 +
 2 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index c80124f1fe..1769400d2d 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,20 +5,65 @@
 #include "cache.h"
 #include "mem-pool.h"
 
-static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
+/*
+ * Allocate a new mp_block and insert it after the block specified in
+ * `insert_after`. If `insert_after` is NULL, then insert block at the
+ * head of the linked list.
+ */
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc, struct mp_block *insert_after)
 {
struct mp_block *p;
 
mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
-   p->next_block = mem_pool->mp_block;
+
p->next_free = (char *)p->space;
p->end = p->next_free + block_alloc;
-   mem_pool->mp_block = p;
+
+   if (insert_after) {
+   p->next_block = insert_after->next_block;
+   insert_after->next_block = p;
+   } else {
+   p->next_block = mem_pool->mp_block;
+   mem_pool->mp_block = p;
+   }
 
return p;
 }
 
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   struct mem_pool *pool;
+
+   if (*mem_pool)
+   return;
+
+   pool = xcalloc(1, sizeof(*pool));
+
+   pool->block_alloc = BLOCK_GROWTH_SIZE;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(pool, initial_size, NULL);
+
+   *mem_pool = pool;
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   struct mp_block *block, *block_to_free;
+
+   while ((block = mem_pool->mp_block))
+   {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p = NULL;
@@ -33,12 +78,10 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
p = mem_pool->mp_block;
 
if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
+   if (len >= (mem_pool->block_alloc / 2))
+   return mem_pool_alloc_block(mem_pool, len, 
mem_pool->mp_block);
 
-   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc, NULL);
}
 
r = p->next_free;
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..f75b3365d5 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -21,6 +21,16 @@ struct mem_pool {
size_t pool_alloc;
 };
 
+/*
+ * Initialize mem_pool with specified initial size.
+ */
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size);
+
+/*
+ * Discard a memory pool and free all the memory it is responsible for.
+ */
+void mem_pool_discard(struct mem_pool *mem_pool);
+
 /*
  * Alloc memory from the mem_pool.
  */
-- 
2.17.1

[PATCH v6 2/8] read-cache: teach make_cache_entry to take object_id

2018-07-02 Thread Jameson Miller

Teach make_cache_entry function to take object_id instead of a SHA-1.
---
 apply.c| 2 +-
 builtin/checkout.c | 2 +-
 builtin/difftool.c | 4 ++--
 builtin/reset.c| 2 +-
 cache.h| 7 ++-
 merge-recursive.c  | 2 +-
 read-cache.c   | 8 +---
 resolve-undo.c | 2 +-
 8 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index 959c457910..8ef975a32d 100644
--- a/apply.c
+++ b/apply.c
@@ -4092,7 +4092,7 @@ static int build_fake_ancestor(struct apply_state *state, 
struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_cache_entry(patch->old_mode, &oid, name, 0, 0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"),
 name);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2e1d2376d2..548bf40f25 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -230,7 +230,7 @@ static int checkout_merged(int pos, const struct checkout 
*state)
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
-   ce = make_cache_entry(mode, oid.hash, path, 2, 0);
+   ce = make_cache_entry(mode, &oid, path, 2, 0);
if (!ce)
die(_("make_cache_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL);
diff --git a/builtin/difftool.c b/builtin/difftool.c
index bc97d4aef2..873a06f0d9 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -321,7 +321,7 @@ static int checkout_path(unsigned mode, struct object_id 
*oid,
struct cache_entry *ce;
int ret;
 
-   ce = make_cache_entry(mode, oid->hash, path, 0, 0);
+   ce = make_cache_entry(mode, oid, path, 0, 0);
ret = checkout_entry(ce, state, NULL);
 
free(ce);
@@ -488,7 +488,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
 * index.
 */
struct cache_entry *ce2 =
-   make_cache_entry(rmode, roid.hash,
+   make_cache_entry(rmode, &roid,
 dst_path, 0, 0);
 
add_index_entry(&wtindex, ce2,
diff --git a/builtin/reset.c b/builtin/reset.c
index a862c70fab..00109b041f 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -134,7 +134,7 @@ static void update_index_from_diff(struct diff_queue_struct 
*q,
continue;
}
 
-   ce = make_cache_entry(one->mode, one->oid.hash, one->path,
+   ce = make_cache_entry(one->mode, &one->oid, one->path,
  0, 0);
if (!ce)
die(_("make_cache_entry failed for path '%s'"),
diff --git a/cache.h b/cache.h
index 93af25f586..3fbf24771a 100644
--- a/cache.h
+++ b/cache.h
@@ -698,7 +698,12 @@ extern int remove_file_from_index(struct index_state *, 
const char *path);
 extern int add_to_index(struct index_state *, const char *path, struct stat *, 
int flags);
 extern int add_file_to_index(struct index_state *, const char *path, int 
flags);
 
-extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned 
char *sha1, const char *path, int stage, unsigned int refresh_options);
+extern struct cache_entry *make_cache_entry(unsigned int mode,
+   const struct object_id *oid,
+   const char *path,
+   int stage,
+   unsigned int refresh_options);
+
 extern int chmod_index_entry(struct index_state *, struct cache_entry *ce, 
char flip);
 extern int ce_same_name(const struct cache_entry *a, const struct cache_entry 
*b);
 extern void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
diff --git a/merge-recursive.c b/merge-recursive.c
index 8b3d6781c7..873321e5c2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -318,7 +318,7 @@ static int add_cacheinfo(struct merge_options *o,
struct cache_entry *ce;
int ret;
 
-   ce = make_cache_entry(mode, oid ? oid->hash : null_sha1, path, stage, 
0);
+   ce = make_cache_entry(mode, oid ? oid : &null_oid, path, stage, 0);
if (!ce)
return err(o, _("add_cacheinfo failed for path '%s'; merge 
aborting."), path);
 
diff --git a/read-cache.c b/read-cache.c
index fa8366ecab..c12664c789 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -746,8 +746,10 @@ int add_file_to_index(struct index_state *istate

[PATCH v6 6/8] mem-pool: fill out functionality

2018-07-02 Thread Jameson Miller

Add functions for:

- combining two memory pools

- determining if a memory address is within the range managed by a
  memory pool

These functions will be used by future commits.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 42 ++
 mem-pool.h | 13 +
 2 files changed, 55 insertions(+)

diff --git a/mem-pool.c b/mem-pool.c
index 1769400d2d..b250a5fe40 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -96,3 +96,45 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+
+   /* Check if memory is allocated in a block */
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   struct mp_block *p;
+
+   /* Append the blocks from src to dst */
+   if (dst->mp_block && src->mp_block) {
+   /*
+* src and dst have blocks, append
+* blocks from src to dst.
+*/
+   p = dst->mp_block;
+   while (p->next_block)
+   p = p->next_block;
+
+   p->next_block = src->mp_block;
+   } else if (src->mp_block) {
+   /*
+* src has blocks, dst is empty.
+*/
+   dst->mp_block = src->mp_block;
+   } else {
+   /* src is empty, nothing to do. */
+   }
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+}
diff --git a/mem-pool.h b/mem-pool.h
index f75b3365d5..adeefdcb28 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -41,4 +41,17 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len);
  */
 void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size);
 
+/*
+ * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src'
+ * pool will be empty and not contain any memory. It still needs to be free'd
+ * with a call to `mem_pool_discard`.
+ */
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src);
+
+/*
+ * Check if a memory pointed at by 'mem' is part of the range of
+ * memory managed by the specified mem_pool.
+ */
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
+
 #endif
-- 
2.17.1

RE: [PATCH v5 6/8] mem-pool: fill out functionality

2018-07-02 Thread Jameson Miller

> > +void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src) {
> > +   struct mp_block *p;
> > +
> > +   /* Append the blocks from src to dst */
> > +   if (dst->mp_block && src->mp_block) {
> > +   /*
> > +* src and dst have blocks, append
> > +* blocks from src to dst.
> > +*/
> > +   p = dst->mp_block;
> > +   while (p->next_block)
> > +   p = p->next_block;
> > +
> > +   p->next_block = src->mp_block;
> 
> Just being curious, but does this interact with the "we carve out only from 
> the
> first block" done in step 4/8?  The remaining unused space in the first block 
> in
> the src pool would be wasted, which may not be such a big deal and may not
> even be worth comparing the available space in two blocks and picking a larger
> one.  But we do want to decide _after_ thinking things through nevertheless.

Good question - and this is something I had thought about.

In the context of current usage, this function is currently not
used frequently. It is only used in split index with
move_cache_to_base_index and remove_split_index. In either case,
the cache_entries should already be loaded into memory, and I
don't expect a big enough difference in the amount of left over
space to make a meaningful difference.

In the general case I could see this being a bigger case.

For now, I thought the logic as is was an appropriate tradeoff. I
don't think it would be complicated to do something smarter here,
but I also didn't see much benefit in current usage. If we prefer
something else here, I can update this logic.

RE: [PATCH v4 3/8] mem-pool: only search head block for available space

2018-06-28 Thread Jameson Miller

> Do we have any numbers on performance or memory pressure here?
> (I would think benchmarking fast-import would suffice as that is where the mem
> pool originated)

I ran fast-import on the git.git and linux.git repositories - this action 
reports the 
overall memory allocated to pools. The overall memory used by pools did not 
change
in my tests (git.git used 2 blocks, and linux.git used 12 blocks). I included 
this information
in the corresponding commit message.

> 
> > This pattern will benefit consumers that are able to generate a good
> > estimate for how much memory will be needed, or if they are performing
> > fixed sized allocations, so that once a block is exhausted it will
> > never be able to fulfill a future request.
> 
> Would this be a good candidate to contain parts of

I included these details in the commit message that adds the field indicating 
the
cache_entry was allocated from a mem-pool.

RE: [PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs

2018-06-28 Thread Jameson Miller

> How much work is it to convert these to object_id while at it instead of char
> *sha1?
> (I think we really dislike introducing new sha1 based code; If it doesn't 
> sound
> easy maybe we can have a static inline wrapper here that just converts oid to
> sha1 and then calls this function?)

I have made this change in my latest patch series with 0ac48f3af7 ("read-cache: 
make_cache_entry should take object_id struct", 2018-06-22)

> 
> Is it possible to line break these functions (c.f. refs.h which I think has 
> one of the
> best styles in the git project. It has long parameter lists, but still 
> manages to stay
> below a reasonable line length) ?
> 
> > +extern struct cache_entry *make_empty_cache_entry(struct index_state
> > +*istate, size_t name_len);
> 

I have formatted the function declarations to break them up

> > +
> > +/*
> > + * Create a cache_entry that is not intended to be added to an index.
> > + * Caller is responsible for discarding the cache_entry
> > + * with `discard_cache_entry`.
> > + */
> > +extern struct cache_entry *make_transient_cache_entry(unsigned int
> > +mode, const unsigned char *sha1, const char *path, int stage); extern
> > +struct cache_entry *make_empty_transient_cache_entry(size_t
> > +name_len);
> > +
> > +/*
> > + * Discard cache entry.
> > + */
> > +void discard_cache_entry(struct cache_entry *ce);
> 
> Please be consistent in the use of the extern keyword and omit them at all 
> times
> here and above.

The header file is a bit inconsistent overall, but I switched my declarations 
to not use the extern keyword.

[PATCH v5 6/8] mem-pool: fill out functionality

2018-06-28 Thread Jameson Miller

Add functions for:

- combining two memory pools

- determining if a memory address is within the range managed by a
  memory pool

These functions will be used by future commits.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 42 ++
 mem-pool.h | 13 +
 2 files changed, 55 insertions(+)

diff --git a/mem-pool.c b/mem-pool.c
index 1769400d2d..b250a5fe40 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -96,3 +96,45 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+
+   /* Check if memory is allocated in a block */
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   struct mp_block *p;
+
+   /* Append the blocks from src to dst */
+   if (dst->mp_block && src->mp_block) {
+   /*
+* src and dst have blocks, append
+* blocks from src to dst.
+*/
+   p = dst->mp_block;
+   while (p->next_block)
+   p = p->next_block;
+
+   p->next_block = src->mp_block;
+   } else if (src->mp_block) {
+   /*
+* src has blocks, dst is empty.
+*/
+   dst->mp_block = src->mp_block;
+   } else {
+   /* src is empty, nothing to do. */
+   }
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+}
diff --git a/mem-pool.h b/mem-pool.h
index f75b3365d5..adeefdcb28 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -41,4 +41,17 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len);
  */
 void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size);
 
+/*
+ * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src'
+ * pool will be empty and not contain any memory. It still needs to be free'd
+ * with a call to `mem_pool_discard`.
+ */
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src);
+
+/*
+ * Check if a memory pointed at by 'mem' is part of the range of
+ * memory managed by the specified mem_pool.
+ */
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
+
 #endif
-- 
2.17.1

[PATCH v5 8/8] block alloc: add validations around cache_entry lifecyle

2018-06-28 Thread Jameson Miller

Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.

Signed-off-by: Jameson Miller 
---
 cache.h  |  6 ++
 git.c|  3 +++
 mem-pool.c   |  6 +-
 mem-pool.h   |  2 +-
 read-cache.c | 55 ++--
 5 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 8fd89ae51f..b3faef3efc 100644
--- a/cache.h
+++ b/cache.h
@@ -380,6 +380,12 @@ struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Check configuration if we should perform extra validation on cache
+ * entries.
+ */
+int should_validate_cache_entries(void);
+
 /*
  * Duplicate a cache_entry. Allocate memory for the new entry from a
  * memory_pool. Takes into account cache_entry fields that are meant
diff --git a/git.c b/git.c
index 9dbe6ffaa7..c7e4f0351a 100644
--- a/git.c
+++ b/git.c
@@ -414,7 +414,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(&the_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(&the_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index 139617cb23..a2841a4a9a 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -50,7 +50,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size)
*mem_pool = pool;
 }
 
-void mem_pool_discard(struct mem_pool *mem_pool)
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory)
 {
struct mp_block *block, *block_to_free;
 
@@ -59,6 +59,10 @@ void mem_pool_discard(struct mem_pool *mem_pool)
{
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
diff --git a/mem-pool.h b/mem-pool.h
index adeefdcb28..999d3c3a52 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -29,7 +29,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size);
 /*
  * Discard a memory pool and free all the memory it is responsible for.
  */
-void mem_pool_discard(struct mem_pool *mem_pool);
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory);
 
 /*
  * Alloc memory from the mem_pool.
diff --git a/read-cache.c b/read-cache.c
index f346437800..ba31e929e8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2050,8 +2050,10 @@ int discard_index(struct index_state *istate)
 * Cache entries in istate->cache[] should have been allocated
 * from the memory pool associated with this index, or from an
 * associated split_index. There is no need to free individual
-* cache entries.
+* cache entries. validate_cache_entries can detect when this
+* assertion does not hold.
 */
+   validate_cache_entries(istate);
 
resolve_undo_clear_index(istate);
istate->cache_nr = 0;
@@ -2068,13 +2070,45 @@ int discard_index(struct index_state *istate)
istate->untracked = NULL;
 
if (istate->ce_mem_pool) {
-   mem_pool_discard(istate->ce_mem_pool);
+   mem_pool_discard(istate->ce_mem_pool, 
should_validate_cache_entries());
istate->ce_mem_pool = NULL;
}
 
return 0;
 }
 
+/*
+ * Validate the cache entries of this index.
+ * All cache entries associated with this index
+ * should have been allocated by the memory pool
+ * associated with this index, or by a referenced
+ * split index.
+ */
+void validate_cache_entries(const struct index_state *istate)
+{
+   int i;
+
+   if (!should_validate_cache_entries() ||!istate || !istate->initialized)
+   return;
+
+   for (i = 0; i < istate->cache_nr; i++) {
+   if (!istate) {
+   die("internal error: cache entry is not allocated from 
expected memory pool");
+   } else if (!istate->ce_mem_pool ||
+   !mem_pool_contains(istate->ce_mem_pool, 
istate->cache[i])) {
+   if (!istate->split_index ||
+   !istate->split_index->base ||
+   !istate->split_index->base->ce_mem_pool ||
+

[PATCH v5 7/8] block alloc: allocate cache entries from mem-pool

2018-06-28 Thread Jameson Miller

When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool.  This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.

Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:

Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.

Alternatives:

1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
  Pro: No extra memory overhead to track this state
  Con: Extra complexity in callers to handle this correctly.

The extra complexity and burden to not regress this behavior in the
future was more than we wanted.

2) cache_entry would gain knowledge about which mem_pool allocated it
  Pro: Could (potentially) do extra logic to know when a mem_pool no
   longer had references to any cache_entry
  Con: cache_entry would grow heavier by a pointer, instead of int

We didn't see a tangible benefit to this approach

3) Do not add any extra information to a cache_entry, but when freeing a
   cache entry, check if the memory exists in a region managed by existing
   mem_pools.
  Pro: No extra memory overhead to track state
  Con: Extra computation is performed when freeing cache entries

We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.

Signed-off-by: Jameson Miller 
---
 cache.h|  21 +
 mem-pool.c |   3 +-
 read-cache.c   | 119 +
 split-index.c  |  50 +
 unpack-trees.c |  13 +-
 5 files changed, 167 insertions(+), 39 deletions(-)

diff --git a/cache.h b/cache.h
index 035a627bea..8fd89ae51f 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -156,6 +157,7 @@ struct cache_entry {
struct stat_data ce_stat_data;
unsigned int ce_mode;
unsigned int ce_flags;
+   unsigned int mem_pool_allocated;
unsigned int ce_namelen;
unsigned int index; /* for link extension */
struct object_id oid;
@@ -227,6 +229,7 @@ static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
 {
unsigned int state = dst->ce_flags & CE_HASHED;
+   int mem_pool_allocated = dst->mem_pool_allocated;
 
/* Don't copy hash chain and name */
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
@@ -235,6 +238,9 @@ static inline void copy_cache_entry(struct cache_entry *dst,
 
/* Restore the hash state */
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
+
+   /* Restore the mem_pool_allocated fla

[PATCH v5 1/8] read-cache: teach refresh_cache_entry() to take istate

2018-06-28 Thread Jameson Miller

Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function apply to a specific index.

Signed-off-by: Jameson Miller 
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 9 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index d49092d94d..93af25f586 100644
--- a/cache.h
+++ b/cache.h
@@ -751,7 +751,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index bed4a5be02..8b3d6781c7 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -326,7 +326,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("add_cacheinfo failed to refresh for 
path '%s'; merge aborting."), path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 372588260e..fa8366ecab 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(&the_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1473,10 +1473,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
-  unsigned int options)
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+   struct cache_entry *ce,
+   unsigned int options)
 {
-   return refresh_cache_ent(&the_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.17.1

[PATCH v5 4/8] mem-pool: only search head block for available space

2018-06-28 Thread Jameson Miller

Instead of searching all memory blocks for available space to fulfill
a memory request, only search the head block. If the head block does
not have space, assume that previous block would most likely not be
able to fulfill request either. This could potentially lead to more
memory fragmentation, but also avoids searching memory blocks that
probably will not be able to fulfill request.

This pattern will benefit consumers that are able to generate a good
estimate for how much memory will be needed, or if they are performing
fixed sized allocations, so that once a block is exhausted it will
never be able to fulfill a future request.

The impact of this change on memory was tested by running fast-import on
the git.git and linux.git repositories. The total amount of memory
allocated to the mem-pools did not change with this approach, indicating
that there is not much memory benefit to searching all nodes in the
linked list.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..c80124f1fe 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -21,16 +21,16 @@ static struct mp_block *mem_pool_alloc_block(struct 
mem_pool *mem_pool, size_t b
 
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
-   struct mp_block *p;
+   struct mp_block *p = NULL;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
+   if (mem_pool->mp_block &&
+   mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len)
+   p = mem_pool->mp_block;
 
if (!p) {
if (len >= (mem_pool->block_alloc / 2)) {
-- 
2.17.1

[PATCH v5 0/8] Allocate cache entries from mem_pool

2018-06-28 Thread Jameson Miller

Changes from v4 are minor code review feedback items:

  - Remove extern keyword from new function definitions in cache.h
  - Make_cache_entry(..) functions work with object_id instead of sha
  - Add details to commit message for "block alloc: allocate cache entries from 
mem_pool"
  - Add details to commit message for "mem-pool: only search head block for 
available space"

The largest change is the new commit "read-cache: make_cache_entry should take 
object_id struct"

Base Ref: master
Web-Diff: https://github.com/jamill/git/compare/ed843436dd...55c9b9008f

Jameson Miller (8):
  read-cache: teach refresh_cache_entry() to take istate
  read-cache: make_cache_entry should take object_id struct
  block alloc: add lifecycle APIs for cache_entry structs
  mem-pool: only search head block for available space
  mem-pool: add life cycle management functions
  mem-pool: fill out functionality
  block alloc: allocate cache entries from mem-pool
  block alloc: add validations around cache_entry lifecyle

 apply.c|  24 ++--
 blame.c|   5 +-
 builtin/checkout.c |   8 +-
 builtin/difftool.c |   6 +-
 builtin/reset.c|   2 +-
 builtin/update-index.c |  26 ++--
 cache.h|  64 +-
 git.c  |   3 +
 mem-pool.c | 114 --
 mem-pool.h |  23 
 merge-recursive.c  |   4 +-
 read-cache.c   | 264 ++---
 resolve-undo.c |   4 +-
 split-index.c  |  58 +++--
 tree.c |   4 +-
 unpack-trees.c |  40 ---
 16 files changed, 515 insertions(+), 134 deletions(-)


base-commit: ed843436dd4924c10669820cc73daf50f0b4dabd
-- 
2.17.1

[PATCH v5 5/8] mem-pool: add life cycle management functions

2018-06-28 Thread Jameson Miller

Add initialization and discard functions to mem_pool type. As the
memory allocated by mem_pool can now be freed, we also track the large
allocations.

If the there are existing mp_blocks in the mem_poo's linked list of
mp_blocksl, then the mp_block for a large allocation is inserted
behind the head block. This is because only the head mp_block is considered
when searching for availble space. This results in the following
desirable properties:

1) The mp_block allocated for the large request will not be included
not included in the search for available in future requests, the large
mp_block is sized for the specific request and does not contain any
spare space.

2) The head mp_block will not bumped from considation for future
memory requests just because a request for a large chunk of memory
came in.

These changes are in preparation for a future commit that will utilize
creating and discarding memory pool.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 59 ++
 mem-pool.h | 10 +
 2 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index c80124f1fe..1769400d2d 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,20 +5,65 @@
 #include "cache.h"
 #include "mem-pool.h"
 
-static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
+/*
+ * Allocate a new mp_block and insert it after the block specified in
+ * `insert_after`. If `insert_after` is NULL, then insert block at the
+ * head of the linked list.
+ */
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc, struct mp_block *insert_after)
 {
struct mp_block *p;
 
mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
-   p->next_block = mem_pool->mp_block;
+
p->next_free = (char *)p->space;
p->end = p->next_free + block_alloc;
-   mem_pool->mp_block = p;
+
+   if (insert_after) {
+   p->next_block = insert_after->next_block;
+   insert_after->next_block = p;
+   } else {
+   p->next_block = mem_pool->mp_block;
+   mem_pool->mp_block = p;
+   }
 
return p;
 }
 
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   struct mem_pool *pool;
+
+   if (*mem_pool)
+   return;
+
+   pool = xcalloc(1, sizeof(*pool));
+
+   pool->block_alloc = BLOCK_GROWTH_SIZE;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(pool, initial_size, NULL);
+
+   *mem_pool = pool;
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   struct mp_block *block, *block_to_free;
+
+   while ((block = mem_pool->mp_block))
+   {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p = NULL;
@@ -33,12 +78,10 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
p = mem_pool->mp_block;
 
if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
+   if (len >= (mem_pool->block_alloc / 2))
+   return mem_pool_alloc_block(mem_pool, len, 
mem_pool->mp_block);
 
-   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc, NULL);
}
 
r = p->next_free;
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..f75b3365d5 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -21,6 +21,16 @@ struct mem_pool {
size_t pool_alloc;
 };
 
+/*
+ * Initialize mem_pool with specified initial size.
+ */
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size);
+
+/*
+ * Discard a memory pool and free all the memory it is responsible for.
+ */
+void mem_pool_discard(struct mem_pool *mem_pool);
+
 /*
  * Alloc memory from the mem_pool.
  */
-- 
2.17.1

[PATCH v5 2/8] read-cache: make_cache_entry should take object_id struct

2018-06-28 Thread Jameson Miller

The make_cache_entry function should take an object_id struct instead
of sha.

Signed-off-by: Jameson Miller 
---
 apply.c| 2 +-
 builtin/checkout.c | 2 +-
 builtin/difftool.c | 4 ++--
 builtin/reset.c| 2 +-
 cache.h| 7 ++-
 merge-recursive.c  | 2 +-
 read-cache.c   | 8 +---
 resolve-undo.c | 2 +-
 8 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index 959c457910..8ef975a32d 100644
--- a/apply.c
+++ b/apply.c
@@ -4092,7 +4092,7 @@ static int build_fake_ancestor(struct apply_state *state, 
struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_cache_entry(patch->old_mode, &oid, name, 0, 0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"),
 name);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2e1d2376d2..548bf40f25 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -230,7 +230,7 @@ static int checkout_merged(int pos, const struct checkout 
*state)
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
-   ce = make_cache_entry(mode, oid.hash, path, 2, 0);
+   ce = make_cache_entry(mode, &oid, path, 2, 0);
if (!ce)
die(_("make_cache_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL);
diff --git a/builtin/difftool.c b/builtin/difftool.c
index bc97d4aef2..873a06f0d9 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -321,7 +321,7 @@ static int checkout_path(unsigned mode, struct object_id 
*oid,
struct cache_entry *ce;
int ret;
 
-   ce = make_cache_entry(mode, oid->hash, path, 0, 0);
+   ce = make_cache_entry(mode, oid, path, 0, 0);
ret = checkout_entry(ce, state, NULL);
 
free(ce);
@@ -488,7 +488,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
 * index.
 */
struct cache_entry *ce2 =
-   make_cache_entry(rmode, roid.hash,
+   make_cache_entry(rmode, &roid,
 dst_path, 0, 0);
 
add_index_entry(&wtindex, ce2,
diff --git a/builtin/reset.c b/builtin/reset.c
index a862c70fab..00109b041f 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -134,7 +134,7 @@ static void update_index_from_diff(struct diff_queue_struct 
*q,
continue;
}
 
-   ce = make_cache_entry(one->mode, one->oid.hash, one->path,
+   ce = make_cache_entry(one->mode, &one->oid, one->path,
  0, 0);
if (!ce)
die(_("make_cache_entry failed for path '%s'"),
diff --git a/cache.h b/cache.h
index 93af25f586..3fbf24771a 100644
--- a/cache.h
+++ b/cache.h
@@ -698,7 +698,12 @@ extern int remove_file_from_index(struct index_state *, 
const char *path);
 extern int add_to_index(struct index_state *, const char *path, struct stat *, 
int flags);
 extern int add_file_to_index(struct index_state *, const char *path, int 
flags);
 
-extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned 
char *sha1, const char *path, int stage, unsigned int refresh_options);
+extern struct cache_entry *make_cache_entry(unsigned int mode,
+   const struct object_id *oid,
+   const char *path,
+   int stage,
+   unsigned int refresh_options);
+
 extern int chmod_index_entry(struct index_state *, struct cache_entry *ce, 
char flip);
 extern int ce_same_name(const struct cache_entry *a, const struct cache_entry 
*b);
 extern void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
diff --git a/merge-recursive.c b/merge-recursive.c
index 8b3d6781c7..873321e5c2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -318,7 +318,7 @@ static int add_cacheinfo(struct merge_options *o,
struct cache_entry *ce;
int ret;
 
-   ce = make_cache_entry(mode, oid ? oid->hash : null_sha1, path, stage, 
0);
+   ce = make_cache_entry(mode, oid ? oid : &null_oid, path, stage, 0);
if (!ce)
return err(o, _("add_cacheinfo failed for path '%s&#x

[PATCH v5 3/8] block alloc: add lifecycle APIs for cache_entry structs

2018-06-28 Thread Jameson Miller

Add an API around managing the lifetime of cache_entry
structs. Abstracting memory management details behind this API will
allow for alternative memory management strategies without affecting
all the call sites.  This commit does not change how memory is
allocated or freed. A later commit in this series will allocate cache
entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended
for use with a particular index and cache entries that are not. This
enables us to use the knowledge about how a cache entry will be used
to make informed decisions about how to handle the corresponding
memory.

Signed-off-by: Jameson Miller 
---
 apply.c| 24 +--
 blame.c|  5 +--
 builtin/checkout.c |  8 ++--
 builtin/difftool.c |  6 +--
 builtin/reset.c|  2 +-
 builtin/update-index.c | 26 +---
 cache.h| 40 +++---
 merge-recursive.c  |  2 +-
 read-cache.c   | 93 +-
 resolve-undo.c |  4 +-
 split-index.c  |  8 ++--
 tree.c |  4 +-
 unpack-trees.c | 35 +++-
 13 files changed, 165 insertions(+), 92 deletions(-)

diff --git a/apply.c b/apply.c
index 8ef975a32d..8a4a4439bc 100644
--- a/apply.c
+++ b/apply.c
@@ -4092,12 +4092,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, &oid, name, 0, 0);
+   ce = make_cache_entry(&result, patch->old_mode, &oid, name, 0, 
0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4265,9 +4265,8 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_cache_entry(&the_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4280,13 +4279,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", &s) ||
get_oid_hex(s, &ce->oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   discard_cache_entry(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, &st) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4294,13 +4293,13 @@ static int add_index_file(struct apply_state *state,
fill_stat_cache_info(ce, &st);
}
if (write_object_file(buf, size, blob_type, &ce->oid) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to create backing store "
   "for newly created file %s"), path);
}
}
if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to add cache entry for %s"), path);
}
 
@@ -4424,27 +4423,26 @@ static int add_conflicted_stages_file(struct 
apply_state *state,
   struct patch *patch)
 {
int stage, namelen;
-   unsigned ce_size, mode;
+   unsigned mode;
struct cache_entry *ce;
 
if (!state->update_index)

RE: [PATCH v3 0/7] allocate cache entries from memory pool

2018-06-20 Thread Jameson Miller

> 
> > We debated several approaches for what to do here
> 
> it would be awesome if the list could participate in the discussion even if 
> only
> read-only.

A bit of delay in my response here, but I like the suggestion. here is a 
summary of
some approaches I considered:

1) Do not include any information about where the cache_entry was allocated.
 Pros: No extra memory overhead
 Cons: Caller is responsible for freeing the cache entry correctly

This was our initial approach. We were hoping that all cache
entries could be allocated from a memory pool, and we would not
have to worry about non-pool allocated entries. There are still a
couple of places that need "temporary" cache entries at the
moment, so we couldn't move completely to only memory pool
allocated cache_entries. This would have resulted in the code
allocating many "temporary" cache_entries from a pool, and the
memory would not be eligible to be reclaimed until the entire
memory pool was freed - and this was a tradeoff we didn't want to
make.

2) Include an extra field encoding whether the cache_entry was
   allocated from a memory pool

Pro: the discard function can now make a decision regarding how
 to free the cache_entry
Con: each cache entry grows by an extra int field.

The bits in the existing ce_flags field are all claimed, so we
need an extra field to track this bit of information. We could
claim just a bit in the field now, which would result in the
cache_entry still growing by the same amount, but any future bits
would not require extra space. This pushes off the work for an
actual bit field off into future work.

3) Encode whether the cache_entry was allocated from a memory
   pool as a bit in a field.

Pro: only a single bit is required to encode this information.
Con: All the existings bits in the existing ce_flags field are
 claimed. We need to add an extra bit field and take the same
 memory growth.

I considered this approach (and am still open to it), but I went
for a simpler initial approach to make sure the overall change is
acceptable. There is no difference in the memory footprint with
this change, so it is only to enable future changes more easily.

4) Include pointer to the memory pool from which this cache_entry
   was created from

Pro: Could (potentially) do some extra bookkeeping, such as
 automatically cleaning up the memory_pool when all
 allocated cache_entries are freed.
Con: extra complexity, larger growth to cache_entry struct to
 accommodate mem_pool pointer

In the end, we didn't see a tangible benefit to this option at this point.

Given the tradeoffs, I went with option #2 for now.

[PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs

2018-06-20 Thread Jameson Miller

Add an API around managing the lifetime of cache_entry
structs. Abstracting memory management details behind this API will
allow for alternative memory management strategies without affecting
all the call sites.  This commit does not change how memory is
allocated or freed. A later commit in this series will allocate cache
entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended
for use with a particular index and cache entries that are not. This
enables us to use the knowledge about how a cache entry will be used
to make informed decisions about how to handle the corresponding
memory.

Signed-off-by: Jameson Miller 
---
 apply.c| 24 ++---
 blame.c|  5 ++-
 builtin/checkout.c |  8 ++---
 builtin/difftool.c |  6 ++--
 builtin/reset.c|  2 +-
 builtin/update-index.c | 26 ++
 cache.h| 24 -
 merge-recursive.c  |  2 +-
 read-cache.c   | 96 ++
 resolve-undo.c |  4 ++-
 split-index.c  |  8 ++---
 tree.c |  4 +--
 unpack-trees.c | 35 --
 13 files changed, 155 insertions(+), 89 deletions(-)

diff --git a/apply.c b/apply.c
index d79e61591b..1b5d923f4e 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_cache_entry(&result, patch->old_mode, oid.hash, name, 
0, 0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4263,12 +4263,11 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
if (!state->update_index)
return 0;
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_cache_entry(&the_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4278,13 +4277,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", &s) ||
get_oid_hex(s, &ce->oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   discard_cache_entry(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, &st) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4292,13 +4291,13 @@ static int add_index_file(struct apply_state *state,
fill_stat_cache_info(ce, &st);
}
if (write_object_file(buf, size, blob_type, &ce->oid) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to create backing store "
   "for newly created file %s"), path);
}
}
if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to add cache entry for %s"), path);
}
 
@@ -4422,27 +4421,26 @@ static int add_conflicted_stages_file(struct 
apply_state *state,
   struct patch *patch)
 {
int stage, namelen;
-   unsigned ce_size, mode;
+   unsigned

[PATCH v4 4/8] mem-pool: tweak math on mp_block allocation size

2018-06-20 Thread Jameson Miller

The amount of memory to allocate for mp_blocks was off by a
sizeof(uintmax_t) on platforms that do not support flexible arrays. To
account for this, use the offset of the space field when calculating
the total amount of memory to allocate for a mp_block.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index c80124f1fe..a5d5eed923 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -8,9 +8,11 @@
 static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
 {
struct mp_block *p;
+   size_t total_alloc = st_add(offsetof(struct mp_block, space), 
block_alloc);
+
+   mem_pool->pool_alloc = st_add(mem_pool->pool_alloc, total_alloc);
+   p = xmalloc(total_alloc);
 
-   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
p->next_block = mem_pool->mp_block;
p->next_free = (char *)p->space;
p->end = p->next_free + block_alloc;
-- 
2.14.3

[PATCH v4 6/8] mem-pool: fill out functionality

2018-06-20 Thread Jameson Miller

Add functions for:

- combining two memory pools

- determining if a memory address is within the range managed by a
  memory pool

These functions will be used by future commits.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 42 ++
 mem-pool.h | 13 +
 2 files changed, 55 insertions(+)

diff --git a/mem-pool.c b/mem-pool.c
index 4e544459e9..81fda969d3 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -94,3 +94,45 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+
+   /* Check if memory is allocated in a block */
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   struct mp_block *p;
+
+   /* Append the blocks from src to dst */
+   if (dst->mp_block && src->mp_block) {
+   /*
+* src and dst have blocks, append
+* blocks from src to dst.
+*/
+   p = dst->mp_block;
+   while (p->next_block)
+   p = p->next_block;
+
+   p->next_block = src->mp_block;
+   } else if (src->mp_block) {
+   /*
+* src has blocks, dst is empty.
+*/
+   dst->mp_block = src->mp_block;
+   } else {
+   /* src is empty, nothing to do. */
+   }
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+}
diff --git a/mem-pool.h b/mem-pool.h
index f75b3365d5..adeefdcb28 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -41,4 +41,17 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len);
  */
 void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size);
 
+/*
+ * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src'
+ * pool will be empty and not contain any memory. It still needs to be free'd
+ * with a call to `mem_pool_discard`.
+ */
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src);
+
+/*
+ * Check if a memory pointed at by 'mem' is part of the range of
+ * memory managed by the specified mem_pool.
+ */
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
+
 #endif
-- 
2.14.3

[PATCH v4 7/8] block alloc: allocate cache entries from mem_pool

2018-06-20 Thread Jameson Miller

When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool.  This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.

Signed-off-by: Jameson Miller 
---
 cache.h|  21 ++
 mem-pool.c |   3 +-
 read-cache.c   | 119 -
 split-index.c  |  50 
 unpack-trees.c |  13 +--
 5 files changed, 167 insertions(+), 39 deletions(-)

diff --git a/cache.h b/cache.h
index abcc27ff87..74d3ebebc2 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -156,6 +157,7 @@ struct cache_entry {
struct stat_data ce_stat_data;
unsigned int ce_mode;
unsigned int ce_flags;
+   unsigned int mem_pool_allocated;
unsigned int ce_namelen;
unsigned int index; /* for link extension */
struct object_id oid;
@@ -227,6 +229,7 @@ static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
 {
unsigned int state = dst->ce_flags & CE_HASHED;
+   int mem_pool_allocated = dst->mem_pool_allocated;
 
/* Don't copy hash chain and name */
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
@@ -235,6 +238,9 @@ static inline void copy_cache_entry(struct cache_entry *dst,
 
/* Restore the hash state */
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
+
+   /* Restore the mem_pool_allocated flag */
+   dst->mem_pool_allocated = mem_pool_allocated;
 }
 
 static inline unsigned create_ce_flags(unsigned stage)
@@ -328,6 +334,7 @@ struct index_state {
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
struct ewah_bitmap *fsmonitor_dirty;
+   struct mem_pool *ce_mem_pool;
 };
 
 extern struct index_state the_index;
@@ -362,6 +369,20 @@ extern struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Duplicate a cache_entry. Allocate memory for the new entry from a
+ * memory_pool. Takes into account cache_entry fields that are meant
+ * for managing the underlying memory allocation of the cache_entry.
+ */
+struct cache_entry *dup_cache_entry(const struct cache_entry *ce, struct 
index_state *istate);
+
+/*
+ * Validate the cache entries in the index.  This is an internal
+ * consistency check that the cache_entry structs are allocated from
+ * the expected memory pool.
+ */
+void validate_cache_entries(const struct index_state *istate);
+
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
diff --git a/mem-pool.c b/mem-pool.c
index 81fda969d3..0f19cc01a9 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -55,7 +55,8 @@ void mem_pool_discard(struct mem_pool *mem_pool)
 {
struct mp_block *block, *block_to_free;
 
-   while ((block = mem_pool->mp_block))
+   block = mem_pool->mp_block;
+   while (block)
{
block_to_free = block;
block = block->next_block;
diff --git a/read-cache.c b/read-cache.c
index 6396e04e45..a8932ce2a6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -46,6 +46,48 @@
 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
 SPLIT_INDEX_ORDER

[PATCH v4 8/8] block alloc: add validations around cache_entry lifecyle

2018-06-20 Thread Jameson Miller

Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.

Signed-off-by: Jameson Miller 
---
 cache.h  |  6 ++
 git.c|  3 +++
 mem-pool.c   |  6 +-
 mem-pool.h   |  2 +-
 read-cache.c | 55 +--
 5 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 74d3ebebc2..c0d6b976f5 100644
--- a/cache.h
+++ b/cache.h
@@ -369,6 +369,12 @@ extern struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Check configuration if we should perform extra validation on cache
+ * entries.
+ */
+int should_validate_cache_entries(void);
+
 /*
  * Duplicate a cache_entry. Allocate memory for the new entry from a
  * memory_pool. Takes into account cache_entry fields that are meant
diff --git a/git.c b/git.c
index c2f48d53dd..010898ba6d 100644
--- a/git.c
+++ b/git.c
@@ -414,7 +414,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(&the_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(&the_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index 0f19cc01a9..92d106a637 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -51,7 +51,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size)
*mem_pool = pool;
 }
 
-void mem_pool_discard(struct mem_pool *mem_pool)
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory)
 {
struct mp_block *block, *block_to_free;
 
@@ -60,6 +60,10 @@ void mem_pool_discard(struct mem_pool *mem_pool)
{
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
diff --git a/mem-pool.h b/mem-pool.h
index adeefdcb28..999d3c3a52 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -29,7 +29,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size);
 /*
  * Discard a memory pool and free all the memory it is responsible for.
  */
-void mem_pool_discard(struct mem_pool *mem_pool);
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory);
 
 /*
  * Alloc memory from the mem_pool.
diff --git a/read-cache.c b/read-cache.c
index a8932ce2a6..2652f2aeb0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2047,8 +2047,10 @@ int discard_index(struct index_state *istate)
 * Cache entries in istate->cache[] should have been allocated
 * from the memory pool associated with this index, or from an
 * associated split_index. There is no need to free individual
-* cache entries.
+* cache entries. validate_cache_entries can detect when this
+* assertion does not hold.
 */
+   validate_cache_entries(istate);
 
resolve_undo_clear_index(istate);
istate->cache_nr = 0;
@@ -2065,13 +2067,45 @@ int discard_index(struct index_state *istate)
istate->untracked = NULL;
 
if (istate->ce_mem_pool) {
-   mem_pool_discard(istate->ce_mem_pool);
+   mem_pool_discard(istate->ce_mem_pool, 
should_validate_cache_entries());
istate->ce_mem_pool = NULL;
}
 
return 0;
 }
 
+/*
+ * Validate the cache entries of this index.
+ * All cache entries associated with this index
+ * should have been allocated by the memory pool
+ * associated with this index, or by a referenced
+ * split index.
+ */
+void validate_cache_entries(const struct index_state *istate)
+{
+   int i;
+
+   if (!should_validate_cache_entries() ||!istate || !istate->initialized)
+   return;
+
+   for (i = 0; i < istate->cache_nr; i++) {
+   if (!istate) {
+   die("internal error: cache entry is not allocated from 
expected memory pool");
+   } else if (!istate->ce_mem_pool ||
+   !mem_pool_contains(istate->ce_mem_pool, 
istate->cache[i])) {
+   if (!istate->split_index ||
+   !istate->split_index->base ||
+   !istate->split_index->base->ce_mem_pool ||
+

[PATCH v4 1/8] read-cache: teach refresh_cache_entry() to take istate

2018-06-20 Thread Jameson Miller

Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function apply to a specific index.

Signed-off-by: Jameson Miller 
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 9 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index 89a107a7f7..9538511d9f 100644
--- a/cache.h
+++ b/cache.h
@@ -751,7 +751,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index f110e1c5ec..11a767cc72 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -323,7 +323,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("add_cacheinfo failed to refresh for 
path '%s'; merge aborting."), path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 372588260e..fa8366ecab 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(&the_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1473,10 +1473,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
-  unsigned int options)
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+   struct cache_entry *ce,
+   unsigned int options)
 {
-   return refresh_cache_ent(&the_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.14.3

[PATCH v4 0/8] Allocate cache entries from mem_pool

2018-06-20 Thread Jameson Miller

Changes from V3:

Mainly changes from last round of feedback:

  - Rename make_index_cache_entry -> make_cache_entry

  - Rename make_empty_index_cache_entry -> make_empty-cache_entry

  - Remove tail pointer in mem_pool

  - Small code tweaks

  - More accurately calculate mp_block size for platforms that do not
support flexible arrays


One thing that came up with my testing is that the current automated
tests do not fully cover the code path of "large" allocations from a
memory pool. I was able to force this condition by manually tweaking
some variables and then running the automated tests, but this is not
ideal for preventing regressions in the future.

One way I can think of testing this is to add a test-helper and
directly test the memory pool struct. This will allow me to control
the parameters and different conditions. I was hoping for some
guidance before I actually implemented these tests.

Either way, I would like to do the additional tests in a separate
patch series to have a more focused discussion. I am not sure if these
tests would prevent inclusion of this patch series - I am open to
guidance here.

Base Ref: master
Web-Diff: https://github.com/jamill/git/compare/242ba98e44...667b8de06c

Jameson Miller (8):
  read-cache: teach refresh_cache_entry() to take istate
  block alloc: add lifecycle APIs for cache_entry structs
  mem-pool: only search head block for available space
  mem-pool: tweak math on mp_block allocation size
  mem-pool: add lifecycle management functions
  mem-pool: fill out functionality
  block alloc: allocate cache entries from mem_pool
  block alloc: add validations around cache_entry lifecyle

 apply.c|  24 +++--
 blame.c|   5 +-
 builtin/checkout.c |   8 +-
 builtin/difftool.c |   6 +-
 builtin/reset.c|   2 +-
 builtin/update-index.c |  26 +++--
 cache.h|  53 +-
 git.c  |   3 +
 mem-pool.c | 124 +++
 mem-pool.h |  23 +
 merge-recursive.c  |   4 +-
 read-cache.c   | 259 -
 resolve-undo.c |   4 +-
 split-index.c  |  58 ---
 tree.c |   4 +-
 unpack-trees.c |  40 
 16 files changed, 504 insertions(+), 139 deletions(-)


base-commit: 242ba98e44d8314fb184d240939614a3c9b424db
-- 
2.14.3

[PATCH v4 5/8] mem-pool: add lifecycle management functions

2018-06-20 Thread Jameson Miller

Add initialization and discard functions to mem_pool type. As the
memory allocated by mem_pool can now be freed, we also track the large
memory alllocations so they can be freed as well.

If the there are existing mp_blocks in the mem_pool's linked list of
mp_blocks, then the mp_block for a large allocation is inserted
after the head block. This is because only the head mp_block is considered
when searching for availble space. This results in the following
desirable properties:

1) The mp_block allocated for the large request will not be included
in the search for available space in future requests, as the large
mp_block is sized for the specific request and does not contain any
spare space.

2) The head mp_block will not bumped from considation for future
memory requests just because a request for a large chunk of memory
came in.

These changes are in preparation for a future commit that will utilize
creating and discarding memory pool.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 63 ++
 mem-pool.h | 10 ++
 2 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index a5d5eed923..4e544459e9 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,7 +5,14 @@
 #include "cache.h"
 #include "mem-pool.h"
 
-static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
+/*
+ * Allocate a new mp_block and insert it after the block specified in
+ * `insert_after`. If `insert_after` is NULL, then insert block at the
+ * head of the linked list.
+ */
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc, struct mp_block *insert_after)
 {
struct mp_block *p;
size_t total_alloc = st_add(offsetof(struct mp_block, space), 
block_alloc);
@@ -13,14 +20,51 @@ static struct mp_block *mem_pool_alloc_block(struct 
mem_pool *mem_pool, size_t b
mem_pool->pool_alloc = st_add(mem_pool->pool_alloc, total_alloc);
p = xmalloc(total_alloc);
 
-   p->next_block = mem_pool->mp_block;
p->next_free = (char *)p->space;
p->end = p->next_free + block_alloc;
-   mem_pool->mp_block = p;
+
+   if (insert_after) {
+   p->next_block = insert_after->next_block;
+   insert_after->next_block = p;
+   } else {
+   p->next_block = mem_pool->mp_block;
+   mem_pool->mp_block = p;
+   }
 
return p;
 }
 
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   struct mem_pool *pool;
+
+   if (*mem_pool)
+   return;
+
+   pool = xcalloc(1, sizeof(*pool));
+
+   pool->block_alloc = BLOCK_GROWTH_SIZE;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(pool, initial_size, NULL);
+
+   *mem_pool = pool;
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   struct mp_block *block, *block_to_free;
+
+   while ((block = mem_pool->mp_block))
+   {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p = NULL;
@@ -33,15 +77,10 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
if (mem_pool->mp_block &&
mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len)
p = mem_pool->mp_block;
-
-   if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
-
-   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
-   }
+   else if (len >= (mem_pool->block_alloc / 2))
+   p = mem_pool_alloc_block(mem_pool, len, mem_pool->mp_block);
+   else
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc, NULL);
 
r = p->next_free;
p->next_free += len;
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..f75b3365d5 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -21,6 +21,16 @@ struct mem_pool {
size_t pool_alloc;
 };
 
+/*
+ * Initialize mem_pool with specified initial size.
+ */
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size);
+
+/*
+ * Discard a memory pool and free all the memory it is responsible for.
+ */
+void mem_pool_discard(struct mem_pool *mem_pool);
+
 /*
  * Alloc memory from the mem_pool.
  */
-- 
2.14.3

[PATCH v4 3/8] mem-pool: only search head block for available space

2018-06-20 Thread Jameson Miller

Instead of searching all memory blocks for available space to fulfill
a memory request, only search the head block. If the head block does
not have space, assume that previous block would most likely not be
able to fulfill request either. This could potentially lead to more
memory fragmentation, but also avoids searching memory blocks that
probably will not be able to fulfill request.

This pattern will benefit consumers that are able to generate a good
estimate for how much memory will be needed, or if they are performing
fixed sized allocations, so that once a block is exhausted it will
never be able to fulfill a future request.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..c80124f1fe 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -21,16 +21,16 @@ static struct mp_block *mem_pool_alloc_block(struct 
mem_pool *mem_pool, size_t b
 
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
-   struct mp_block *p;
+   struct mp_block *p = NULL;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
+   if (mem_pool->mp_block &&
+   mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len)
+   p = mem_pool->mp_block;
 
if (!p) {
if (len >= (mem_pool->block_alloc / 2)) {
-- 
2.14.3

RE: [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs

2018-05-24 Thread Jameson Miller



> -Original Message-
> From: Junio C Hamano  On Behalf Of Junio C Hamano
> Sent: Thursday, May 24, 2018 12:52 AM
> To: Jameson Miller 
> Cc: git@vger.kernel.org; pclo...@gmail.com; jonathanta...@google.com;
> sbel...@google.com; peart...@gmail.com
> Subject: Re: [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry
> structs
> 
> Jameson Miller  writes:
> 
> > Add an API around managing the lifetime of cache_entry structs.
> > Abstracting memory management details behind an API will allow for
> > alternative memory management strategies without affecting all the
> > call sites.  This commit does not change how memory is allocated /
> > freed. A later commit in this series will allocate cache entries from
> > memory pools as appropriate.
> >
> > Motivation:
> > It has been observed that the time spent loading an index with a large
> > number of entries is partly dominated by malloc() calls. This change
> > is in preparation for using memory pools to reduce the number of
> > malloc() calls made when loading an index.
> >
> > This API makes a distinction between cache entries that are intended
> > for use with a particular index and cache entries that are not. This
> > enables us to use the knowledge about how a cache entry will be used
> > to make informed decisions about how to handle the corresponding
> > memory.
> 
> Yuck.  make_index_cache_entry()?
> 
> Generally we use "cache" when working on the_index without passing istate,
> and otherwise "index", which means that readers can assume that
> distim_cache_entry(...)" is a shorter and more limited way to say
> "distim_index_entry(&the_index, ...)".  Having both index and cache in the 
> same
> name smells crazy.
> 
> If most of the alocations are for permanent kind, give it a shorter name call 
> it
> make_cache_entry(&the_index, ...), and call the other non-permanent one with
> a longer and more cumbersome name, perhaps
> make_transient_cache_entry(...).  Avoid saying "index" in the former name, as
> the design decision this series is making to allocate memory for a cache-entry
> from a pool associated to an index_state is already seen by what its first
> parameter is.

I like this suggestion - I will make this change in the next version of this 
series.

> 
> > diff --git a/cache.h b/cache.h
> > index f0a407602c..204f788438 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -339,6 +339,29 @@ extern void remove_name_hash(struct index_state
> > *istate, struct cache_entry *ce)  extern void free_name_hash(struct
> > index_state *istate);
> >
> >
> > +/* Cache entry creation and freeing */
> > +
> > +/*
> > + * Create cache_entry intended for use in the specified index. Caller
> > + * is responsible for discarding the cache_entry with
> > + * `discard_cache_entry`.
> > + */
> > +extern struct cache_entry *make_index_cache_entry(struct index_state
> > +*istate, unsigned int mode, const unsigned char *sha1, const char
> > +*path, int stage, unsigned int refresh_options); extern struct
> > +cache_entry *make_empty_index_cache_entry(struct index_state *istate,
> > +size_t name_len);
> > +
> > +/*
> > + * Create a cache_entry that is not intended to be added to an index.
> > + * Caller is responsible for discarding the cache_entry
> > + * with `discard_cache_entry`.
> > + */
> > +extern struct cache_entry *make_transient_cache_entry(unsigned int
> > +mode, const unsigned char *sha1, const char *path, int stage); extern
> > +struct cache_entry *make_empty_transient_cache_entry(size_t
> > +name_len);
> > +
> > +/*
> > + * Discard cache entry.
> > + */
> > +void discard_cache_entry(struct cache_entry *ce);
> 
> I am not yet convinced that it is a good idea to require each istate to hold a
> separate pool.  Anything that uses unpack_trees() can do "starting from this
> src_index, perform various mergy operations and deposit the result in
> dst_index".  Sometimes the two logical indices point at the same istate,
> sometimes different.  When src and dst are different istates, the code that 
> used
> to simply add another pointer to the same ce to the dst index now needs to
> duplicate it out of the pool associated with dst?

I did not see any instances in unpack_trees() where it copied
just the cache_entry pointer from src to dst, but I will check
again.

You are correct, all the cache_entries need to be duplicated
before being added to the destination index, which is what I
think the code already does.  We tried to make this more
explicity by convertin

RE: [PATCH v3 0/7] allocate cache entries from memory pool

2018-05-24 Thread Jameson Miller

> -Original Message-
> From: Junio C Hamano  On Behalf Of Junio C Hamano
> Sent: Thursday, May 24, 2018 12:55 AM
> To: Jameson Miller 
> Cc: git@vger.kernel.org; pclo...@gmail.com; jonathanta...@google.com;
> sbel...@google.com; peart...@gmail.com
> Subject: Re: [PATCH v3 0/7] allocate cache entries from memory pool
> 
> Jameson Miller  writes:
> 
> > Changes from V2:
> >
> > - Tweak logic of finding available memory block for memory
> >   allocation
> >
> >   - Only search head block
> 
> Hmph.  Is that because we generally do not free() a lot so once a block is 
> filled,
> there is not much chance that we have reclaimed space in the block later?
> 

The design of the memory pool is that once the memory is
claimed from the pool, it is not reused until the
containing pool is discarded. Individual entries are not
freed, only the entire memory pool is freed, and only after we
are sure that there are no references to any of the entries in the
pool.

The memory pool design makes some tradeoffs. It is not meant to
be completely replace malloc / free as a general purpose
allocator, but rather used in scenarios where the benefit (faster
allocations, lower bookkeeping overhead) is worth the
tradeoffs (not able to free individual allocations). The access
patterns around cache entries are well matched with the memory
pool to get the benefits - the majority of cache entries are
allocated up front when reading the index from disk, and are then
discarded in bulk when the index is freed (if the index is freed
at all (rather than just existing)).

> > - Tweaked handling of large memory allocations.
> >
> >   - Large blocks now tracked in same manner as "regular"
> > blocks
> >
> >   - Large blocks are placed at end of linked list of memory
> > blocks
> 
> If we are only carving out of the most recently allocated block, it seems that
> there is no point looking for "the end", no?

Right. If we are not searching the list, then there isn't any point in
Appending odd large items to the end vs sticking it immediately past
the head block. I will remove the usage of the tail pointer in the
next version.

Yes, this is true. I can remove the usage of the tail pointer here,
as it is not really leveraged. I will make this change in the next version.

> 
> 
> > - Cache_entry type gains notion of whether it was allocated
> >   from memory pool or not
> >
> >   - Collapsed cache_entry discard logic into single
> > function. This should make code easier to maintain
> 
> That certainly should be safer to have a back-pointer pointing to which pool
> each entry came from, but doesn't it result in memory bloat?

Currently, entries claimed from a memory pool are not freed, so we only need
to know whether the entry came from a memory pool or not. This has less memory 
impact than a full pointer but is also a bit more restrictive.

We debated several approaches for what to do here and landed on using a simple 
bit
for this rather than the full pointer. In the current code we use a full 
integer field for this, but
we can convert this into a bit or bit field. The current flags word is full, so 
this would require
a second flags field.

[PATCH v3 7/7] block alloc: add validations around cache_entry lifecyle

2018-05-23 Thread Jameson Miller

Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.

Signed-off-by: Jameson Miller 
---
 cache.h  |  6 ++
 git.c|  3 +++
 mem-pool.c   |  7 ++-
 mem-pool.h   |  2 +-
 read-cache.c | 55 +--
 5 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 7aae9c8db0..2916e953ad 100644
--- a/cache.h
+++ b/cache.h
@@ -369,6 +369,12 @@ extern struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Check configuration if we should perform extra validation on cache
+ * entries.
+ */
+int should_validate_cache_entries(void);
+
 /*
  * Duplicate a cache_entry. Allocate memory for the new entry from a
  * memory_pool. Takes into account cache_entry fields that are meant
diff --git a/git.c b/git.c
index bab6bbfded..7ce65eab78 100644
--- a/git.c
+++ b/git.c
@@ -347,7 +347,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(&the_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(&the_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index cc7d3a7ab1..6770b4f740 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -63,13 +63,18 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size)
*mem_pool = pool;
 }
 
-void mem_pool_discard(struct mem_pool *mem_pool)
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory)
 {
struct mp_block *block, *block_to_free;
+
for (block = mem_pool->mp_block; block;)
{
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
diff --git a/mem-pool.h b/mem-pool.h
index 5c892d3bdb..68d8428902 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -29,7 +29,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size);
 /*
  * Discard a memory pool and free all the memory it is responsible for.
  */
-void mem_pool_discard(struct mem_pool *mem_pool);
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory);
 
 /*
  * Alloc memory from the mem_pool.
diff --git a/read-cache.c b/read-cache.c
index 02fe5d333c..fb2cec6ac6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2023,8 +2023,10 @@ int discard_index(struct index_state *istate)
 * Cache entries in istate->cache[] should have been allocated
 * from the memory pool associated with this index, or from an
 * associated split_index. There is no need to free individual
-* cache entries.
+* cache entries. validate_cache_entries can detect when this
+* assertion does not hold.
 */
+   validate_cache_entries(istate);
 
resolve_undo_clear_index(istate);
istate->cache_nr = 0;
@@ -2041,13 +2043,45 @@ int discard_index(struct index_state *istate)
istate->untracked = NULL;
 
if (istate->ce_mem_pool) {
-   mem_pool_discard(istate->ce_mem_pool);
+   mem_pool_discard(istate->ce_mem_pool, 
should_validate_cache_entries());
istate->ce_mem_pool = NULL;
}
 
return 0;
 }
 
+/*
+ * Validate the cache entries of this index.
+ * All cache entries associated with this index
+ * should have been allocated by the memory pool
+ * associated with this index, or by a referenced
+ * split index.
+ */
+void validate_cache_entries(const struct index_state *istate)
+{
+   int i;
+
+   if (!should_validate_cache_entries() ||!istate || !istate->initialized)
+   return;
+
+   for (i = 0; i < istate->cache_nr; i++) {
+   if (!istate) {
+   die("internal error: cache entry is not allocated from 
expected memory pool");
+   } else if (!istate->ce_mem_pool ||
+   !mem_pool_contains(istate->ce_mem_pool, 
istate->cache[i])) {
+   if (!istate->split_index ||
+   !istate->split_index->base ||
+   !istate->split_index->base->ce_mem_pool ||
+   
!mem_pool_co

[PATCH v3 4/7] mem-pool: add lifecycle management functions

2018-05-23 Thread Jameson Miller

Add initialization and discard functions to mem-pool type. As part of
this, we now also track "large" allocations of memory so that these
can also be cleaned up when discarding the memory pool.

These changes are in preparation for a future commit that will utilize
creating and discarding memory pool.

Signed-off-by: Jameson Miller 
---
 fast-import.c |  2 +-
 mem-pool.c| 63 +++
 mem-pool.h| 12 +++-
 3 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 34edf3fb8f..571898e5db 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -300,7 +300,7 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static struct mem_pool fi_mem_pool =  {NULL, 2*1024*1024 -
+static struct mem_pool fi_mem_pool =  {NULL, NULL, 2*1024*1024 -
   sizeof(struct mp_block), 0 };
 
 /* Atom management */
diff --git a/mem-pool.c b/mem-pool.c
index c80124f1fe..01595bcca5 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,20 +5,77 @@
 #include "cache.h"
 #include "mem-pool.h"
 
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
 static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
 {
struct mp_block *p;
 
mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+
p->next_block = mem_pool->mp_block;
p->next_free = (char *)p->space;
p->end = p->next_free + block_alloc;
+
mem_pool->mp_block = p;
 
+   if (!mem_pool->mp_block_tail)
+   mem_pool->mp_block_tail = p;
+
+   return p;
+}
+
+static void *mem_pool_alloc_custom(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+
+   p->next_block = NULL;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+
+   if (mem_pool->mp_block_tail)
+   mem_pool->mp_block_tail->next_block = p;
+   else
+   mem_pool->mp_block = p;
+
+   mem_pool->mp_block_tail = p;
return p;
 }
 
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   struct mem_pool *pool;
+
+   if (*mem_pool)
+   return;
+
+   pool = xcalloc(1, sizeof(*pool));
+
+   pool->block_alloc = BLOCK_GROWTH_SIZE;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(pool, initial_size);
+
+   *mem_pool = pool;
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   struct mp_block *block, *block_to_free;
+   for (block = mem_pool->mp_block; block;)
+   {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p = NULL;
@@ -33,10 +90,8 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
p = mem_pool->mp_block;
 
if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
+   if (len >= (mem_pool->block_alloc / 2))
+   return mem_pool_alloc_custom(mem_pool, len);
 
p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
}
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..5d3e6a367a 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -9,7 +9,7 @@ struct mp_block {
 };
 
 struct mem_pool {
-   struct mp_block *mp_block;
+   struct mp_block *mp_block, *mp_block_tail;
 
/*
 * The amount of available memory to grow the pool by.
@@ -21,6 +21,16 @@ struct mem_pool {
size_t pool_alloc;
 };
 
+/*
+ * Initialize mem_pool with specified initial size.
+ */
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size);
+
+/*
+ * Discard a memory pool and free all the memory it is responsible for.
+ */
+void mem_pool_discard(struct mem_pool *mem_pool);
+
 /*
  * Alloc memory from the mem_pool.
  */
-- 
2.14.3

[PATCH v3 5/7] mem-pool: fill out functionality

2018-05-23 Thread Jameson Miller

Add functions for:

- combining two memory pools

- determining if a memory address is within the range managed by a
  memory pool

These functions will be used by future commits.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 40 
 mem-pool.h | 13 +
 2 files changed, 53 insertions(+)

diff --git a/mem-pool.c b/mem-pool.c
index 01595bcca5..cc7d3a7ab1 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -108,3 +108,43 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+
+   /* Check if memory is allocated in a block */
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   /* Append the blocks from src to dst */
+   if (dst->mp_block && src->mp_block) {
+   /*
+* src and dst have blocks, append
+* blocks from src to dst.
+*/
+   dst->mp_block_tail->next_block = src->mp_block;
+   dst->mp_block_tail = src->mp_block_tail;
+   } else if (src->mp_block) {
+   /*
+* src has blocks, dst is empty
+* use pointers from src to set up dst.
+*/
+   dst->mp_block = src->mp_block;
+   dst->mp_block_tail = src->mp_block_tail;
+   } else {
+   // src is empty, nothing to do.
+   }
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+   src->mp_block_tail = NULL;
+}
diff --git a/mem-pool.h b/mem-pool.h
index 5d3e6a367a..5c892d3bdb 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -41,4 +41,17 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len);
  */
 void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size);
 
+/*
+ * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src'
+ * pool will be empty and not contain any memory. It still needs to be free'd
+ * with a call to `mem_pool_discard`.
+ */
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src);
+
+/*
+ * Check if a memory pointed at by 'mem' is part of the range of
+ * memory managed by the specified mem_pool.
+ */
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
+
 #endif
-- 
2.14.3

[PATCH v3 3/7] mem-pool: only search head block for available space

2018-05-23 Thread Jameson Miller

Instead of searching all memory blocks for available space to fulfill
a memory request, only search the head block. If the head block does
not have space, assume that previous block would most likely not be
able to fulfill request either. This could potentially lead to more
memory fragmentation, but also avoids searching memory blocks that
probably will not be able to fulfill request.

This pattern will benefit consumers that are able to generate a good
estimate for how much memory will be needed, or if they are performing
fixed sized allocations, so that once a block is exhausted it will
never be able to fulfill a future request.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..c80124f1fe 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -21,16 +21,16 @@ static struct mp_block *mem_pool_alloc_block(struct 
mem_pool *mem_pool, size_t b
 
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
-   struct mp_block *p;
+   struct mp_block *p = NULL;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
+   if (mem_pool->mp_block &&
+   mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len)
+   p = mem_pool->mp_block;
 
if (!p) {
if (len >= (mem_pool->block_alloc / 2)) {
-- 
2.14.3

[PATCH v3 6/7] block alloc: allocate cache entries from mem_pool

2018-05-23 Thread Jameson Miller

When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool.  This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.

Signed-off-by: Jameson Miller 
---
 cache.h|  21 ++
 read-cache.c   | 119 -
 split-index.c  |  50 
 unpack-trees.c |  13 +--
 4 files changed, 165 insertions(+), 38 deletions(-)

diff --git a/cache.h b/cache.h
index 204f788438..7aae9c8db0 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -156,6 +157,7 @@ struct cache_entry {
struct stat_data ce_stat_data;
unsigned int ce_mode;
unsigned int ce_flags;
+   unsigned int mem_pool_allocated;
unsigned int ce_namelen;
unsigned int index; /* for link extension */
struct object_id oid;
@@ -227,6 +229,7 @@ static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
 {
unsigned int state = dst->ce_flags & CE_HASHED;
+   int mem_pool_allocated = dst->mem_pool_allocated;
 
/* Don't copy hash chain and name */
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
@@ -235,6 +238,9 @@ static inline void copy_cache_entry(struct cache_entry *dst,
 
/* Restore the hash state */
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
+
+   /* Restore the mem_pool_allocated flag */
+   dst->mem_pool_allocated = mem_pool_allocated;
 }
 
 static inline unsigned create_ce_flags(unsigned stage)
@@ -328,6 +334,7 @@ struct index_state {
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
struct ewah_bitmap *fsmonitor_dirty;
+   struct mem_pool *ce_mem_pool;
 };
 
 extern struct index_state the_index;
@@ -362,6 +369,20 @@ extern struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Duplicate a cache_entry. Allocate memory for the new entry from a
+ * memory_pool. Takes into account cache_entry fields that are meant
+ * for managing the underlying memory allocation of the cache_entry.
+ */
+struct cache_entry *dup_cache_entry(const struct cache_entry *ce, struct 
index_state *istate);
+
+/*
+ * Validate the cache entries in the index.  This is an internal
+ * consistency check that the cache_entry structs are allocated from
+ * the expected memory pool.
+ */
+void validate_cache_entries(const struct index_state *istate);
+
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
diff --git a/read-cache.c b/read-cache.c
index d51cc83312..02fe5d333c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -46,6 +46,48 @@
 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
 
+
+/*
+ * This is an estimate of the pathname length in the index.  We use
+ * this for V4 index files to guess the un-deltafied size of the index
+ * in memory because of pathname deltafication.  This is not required
+ * for V2/V3 index formats because their pathnames are not compressed.
+ * If the initial amount of memory set aside is not sufficient, the
+ * mem pool will allocate extra memory.
+ */
+#d

[PATCH v3 1/7] read-cache: teach refresh_cache_entry() to take istate

2018-05-23 Thread Jameson Miller

Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function work on a specific index.

Signed-off-by: Jameson Miller 
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 7 ---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 0c1fb9fbcc..f0a407602c 100644
--- a/cache.h
+++ b/cache.h
@@ -744,7 +744,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624d..693f60e0a3 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -260,7 +260,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("addinfo_cache failed for path '%s'"), 
path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 10f1c6bb8a..2cb4f53b57 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(&the_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1448,10 +1448,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+  struct cache_entry *ce,
   unsigned int options)
 {
-   return refresh_cache_ent(&the_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.14.3

[PATCH v3 0/7] allocate cache entries from memory pool

2018-05-23 Thread Jameson Miller

Changes from V2:

- Tweak logic of finding available memory block for memory
  allocation

  - Only search head block
  
- Tweaked handling of large memory allocations.

  - Large blocks now tracked in same manner as "regular"
blocks
  
  - Large blocks are placed at end of linked list of memory
blocks

- Cache_entry type gains notion of whether it was allocated
  from memory pool or not

  - Collapsed cache_entry discard logic into single
function. This should make code easier to maintain

- Small tweaks based on V1 feedback

Base Ref: master
Web-Diff: g...@github.com:jamill/git.git/commit/d608515f9e
Checkout: git fetch g...@github.com:jamill/git.git 
users/jamill/block_allocation-v3 && git checkout d608515f9e


Jameson Miller (7):
  read-cache: teach refresh_cache_entry() to take istate
  block alloc: add lifecycle APIs for cache_entry structs
  mem-pool: only search head block for available space
  mem-pool: add lifecycle management functions
  mem-pool: fill out functionality
  block alloc: allocate cache entries from mem_pool
  block alloc: add validations around cache_entry lifecyle

 apply.c|  26 +++--
 blame.c|   5 +-
 builtin/checkout.c |   8 +-
 builtin/difftool.c |   8 +-
 builtin/reset.c|   6 +-
 builtin/update-index.c |  26 +++--
 cache.h|  53 +-
 fast-import.c  |   2 +-
 git.c  |   3 +
 mem-pool.c | 116 --
 mem-pool.h |  25 -
 merge-recursive.c  |   4 +-
 read-cache.c   | 261 -
 resolve-undo.c |   6 +-
 split-index.c  |  58 ---
 tree.c |   4 +-
 unpack-trees.c |  38 +++
 17 files changed, 514 insertions(+), 135 deletions(-)


base-commit: ccdcbd54c4475c2238b310f7113ab3075b5abc9c
-- 
2.14.3

[PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs

2018-05-23 Thread Jameson Miller

Add an API around managing the lifetime of cache_entry
structs. Abstracting memory management details behind an API will
allow for alternative memory management strategies without affecting
all the call sites.  This commit does not change how memory is
allocated / freed. A later commit in this series will allocate cache
entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended
for use with a particular index and cache entries that are not. This
enables us to use the knowledge about how a cache entry will be used
to make informed decisions about how to handle the corresponding
memory.

Signed-off-by: Jameson Miller 
---
 apply.c|  26 ++---
 blame.c|   5 +--
 builtin/checkout.c |   8 ++--
 builtin/difftool.c |   8 ++--
 builtin/reset.c|   6 +--
 builtin/update-index.c |  26 ++---
 cache.h|  24 +++-
 merge-recursive.c  |   2 +-
 read-cache.c   | 100 ++---
 resolve-undo.c |   6 ++-
 split-index.c  |   8 ++--
 tree.c |   4 +-
 unpack-trees.c |  33 +++-
 13 files changed, 162 insertions(+), 94 deletions(-)

diff --git a/apply.c b/apply.c
index 7e5792c996..b769fe0d15 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_index_cache_entry(&result, patch->old_mode, oid.hash, 
name, 0, 0);
if (!ce)
-   return error(_("make_cache_entry failed for path '%s'"),
+   return error(_("make_index_cache_entry failed for path 
'%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4263,12 +4263,11 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
if (!state->update_index)
return 0;
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_index_cache_entry(&the_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4278,13 +4277,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", &s) ||
get_oid_hex(s, &ce->oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   discard_cache_entry(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, &st) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4292,13 +4291,13 @@ static int add_index_file(struct apply_state *state,
fill_stat_cache_info(ce, &st);
}
if (write_object_file(buf, size, blob_type, &ce->oid) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to create backing store "
   "for newly created file %s"), path);
}
}
if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to add cache entry for %s"), path);
}
 
@@ -4422,27 +4421,26 @@ static int add_conflicted_stages_file(struct 
apply_state *state,

RE: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Jameson Miller



> -Original Message-
> From: git-ow...@vger.kernel.org  On Behalf Of
> Stefan Beller
> Sent: Thursday, May 3, 2018 4:59 PM
> To: Duy Nguyen 
> Cc: Jameson Miller ; git@vger.kernel.org;
> gits...@pobox.com; jonathanta...@google.com
> Subject: Re: [PATCH v2 0/5] Allocate cache entries from memory pool
> 
> On Thu, May 3, 2018 at 12:17 PM, Duy Nguyen  wrote:
> >
> >> To me it is also a clear yes when it comes to combining these two
> >> memory pools.
> >
> > I also did not notice that jm/mem-pool already landed in master.
> 
> Oh, thanks for telling! Now that I look at it, I am doubting it;
> 
> The reason for my doubt is the potential quadratic behavior for new 
> allocations,
> in mem_pool_alloc() we walk all mp_blocks to see if we can fit the requested
> allocation in one of the later blocks.
> So if we call mem_pool_alloc a million times, we get a O(n) mp_blocks which
> we'd have to walk in each call.

With the current design, when a new mp_block is allocated, it is
placed at the head of the linked list. This means that the most
recently allocated mp_block is the 1st block that is
searched. The *vast* majority of allocations should be fulfilled
from this 1st block. It is only when the block is full that we
search other mp_blocks in the list. If this is a concern, I think
we have a couple low cost options to mitigate it (maybe a flag to
control whether we search past the 1st mp_block for space, or
logic to move blocks out of the search queue when they are
full or fall below a threshold for available space).

If this is of interest, I could contribute a patch to enable one
of these behaviors?

> 
> However in alloc.c we do know that a slab is full as soon as we look take the
> next slab. That is the beauty of knowing 'len' at construction time of the
> allocator.
> 
> So I guess I'll just re-use the mp_block and introduce another struct
> fixed_sized_mem_pool, which will not look into other mp_blocks but the
> current.
> 
> 
> > Have
> > you tried measure (both memory usage and allocation speed) of it and
> > alloc.c?
> 
> No, I was about to, but then started reading the code in an attempt to replace
> alloc.c by a mempool and saw the quadratic behavior.
> 
> > Just take some big repo as an example and do count-objects -v to see
> > how many blobs/trees/commits it has, then allocate the same amount
> > with both alloc.c and mem-pool.c and measure both speed/mem.
> > I'm pretty sure you're right that mem-pool.c is a clear yes. I was
> > just being more conservative because we do (slightly) change
> > allocator's behavior when we make the switch. But it's also very
> > likely that any performance difference will be insignificant.
> >
> > I'm asking this because if mem-pool.c is a clear winner, you can start
> > to update you series to use it now and kill alloc.c in the process.
> 
> I'll implement the fixed_sized_mem_pool and take some measurements.
> 
> >
> > PS. Is Jeff back yet?
> 
> His last email on the public list is Apr 10th, stating that he'll be offline 
> for "a few
> weeks", in <20180406175349.gb32...@sigill.intra.peff.net> he said the
> vacation part is 3 weeks. So I think he is done with vacation and is just 
> hiding to
> figure out a nice comeback. ;-)
> 
> > I'm sure Junio is listening and all but I'm afraid he's too busy being
> > a maintainer so Jeff's opinion in this area is really valuable. He has
> > all the fun and weird use cases to play with at github.
> 
> ok. I'll cc him for these patches.
> 
> Thanks,
> Stefan

RE: [PATCH 00/13] object store: alloc

2018-05-02 Thread Jameson Miller



> -Original Message-
> From: Duy Nguyen 
> Sent: Wednesday, May 2, 2018 2:23 PM
> To: Jameson Miller 
> Cc: Stefan Beller ; Git Mailing List 
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Wed, May 2, 2018 at 8:07 PM, Jameson Miller 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Duy Nguyen 
> >> Sent: Wednesday, May 2, 2018 1:02 PM
> >> To: Stefan Beller 
> >> Cc: Git Mailing List ; Jameson Miller
> >> 
> >> Subject: Re: [PATCH 00/13] object store: alloc
> >>
> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller 
> wrote:
> >> > I also debated if it is worth converting alloc.c via this patch
> >> > series or if it might make more sense to use the new mem-pool by
> Jameson[1].
> >> >
> >> > I vaguely wonder about the performance impact, as the object
> >> > allocation code seemed to be relevant in the past.
> >>
> >> If I remember correctly, alloc.c was added because malloc() has too
> >> high overhead per allocation (and we create like millions of them).
> >> As long as you keep allocation overhead low, it should be ok. Note
> >> that we allocate a lot more than the mem-pool's main target (cache
> >> entries if I remember correctly). We may have a couple thousands
> >> cache entries.  We already deal with a couple million of struct object.
> >
> > The work to move cache entry allocation onto a memory pool was
> > motivated by the fact that we are looking at indexes with millions of
> > entries. If there is scaling concern with the current version of
> > mem-pool, we would like to address it there as well. Or if there is
> improvements that can be shared, that would be nice as well.
> 
> I think the two have quite different characteristics. alloc.c code is driven 
> by
> overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, 
> and
> each malloc has 16 bytes overhead or so if I remember correctly. struct
> cache_entry at minimum in 88 bytes so relative overhead is not that a big deal
> (but sure reducing it is still very nice).
> 
> mem-pool is about allocation speed, but I think that's not a concern for 
> alloc.c
> because when we do full rev walk, I think I/O is always the bottleneck (maybe
> object lookup as well). I don't see a good way to have the one memory 
> allocator
> that satisfyies both to be honest. If you could allocate fixed-size cache 
> entries
> most of the time (e.g.
> larger entries will be allocated using malloc or something, and should be a 
> small
> number), then perhaps we can unify. Or if mem-pool can have an option to
> allocated fixed size pieces with no overhead, that would be great (sorry I 
> still
> have not read that mem-pool series yet)
> --
> Duy

Thank you for the extra details - the extra context was helpful -
especially the motivations for each of the areas. I agree with
your general analysis, but with the extra point that the memory
pool does allocate memory (variable sized) without any overhead,
except for possible alignment considerations and differences in
bookkeeping the larger "blocks" of memory from which small
allocations are made from - but I don't think this would be
enough to have a meaningful overall impact.

The mem-pool only tracks the pointer to the next available bit of
memory, and the end of the available memory range. It has a
similar constraint in that individual allocations cannot be freed
- you have to free the whole block.

It may be that the requirements are different enough (or the
gains worth it) to have another dedicated pooling allocator, but
I think the current design of the memory pool would satisfy both
consumers, even if the memory considerations are a bigger
motivation for blob structs. I would be interested in your
thoughts if you get the opportunity to read the mem-pool series.

RE: [PATCH 00/13] object store: alloc

2018-05-02 Thread Jameson Miller



> -Original Message-
> From: Duy Nguyen 
> Sent: Wednesday, May 2, 2018 1:02 PM
> To: Stefan Beller 
> Cc: Git Mailing List ; Jameson Miller
> 
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller  wrote:
> > I also debated if it is worth converting alloc.c via this patch series
> > or if it might make more sense to use the new mem-pool by Jameson[1].
> >
> > I vaguely wonder about the performance impact, as the object
> > allocation code seemed to be relevant in the past.
> 
> If I remember correctly, alloc.c was added because malloc() has too high
> overhead per allocation (and we create like millions of them). As long as you
> keep allocation overhead low, it should be ok. Note that we allocate a lot 
> more
> than the mem-pool's main target (cache entries if I remember correctly). We
> may have a couple thousands cache entries.  We already deal with a couple
> million of struct object.

The work to move cache entry allocation onto a memory pool was motivated by
the fact that we are looking at indexes with millions of entries. If there is 
scaling
concern with the current version of mem-pool, we would like to address it there
as well. Or if there is improvements that can be shared, that would be nice as 
well.

> --
> Duy

RE: [PATCH v2 3/5] mem-pool: fill out functionality

2018-05-01 Thread Jameson Miller

Thank you for taking a look - I think these are good questions. Please let me 
know if you have further questions.

> -Original Message-
> From: Stefan Beller 
> Sent: Monday, April 30, 2018 5:42 PM
> To: Jameson Miller 
> Cc: git@vger.kernel.org; gits...@pobox.com; pclo...@gmail.com;
> jonathanta...@google.com
> Subject: Re: [PATCH v2 3/5] mem-pool: fill out functionality
> 
> On Mon, Apr 30, 2018 at 8:31 AM, Jameson Miller 
> wrote:
> > Adds the following functionality to memory pools:
> >
> >  - Lifecycle management functions (init, discard)
> >
> >  - Test whether a memory location is part of the managed pool
> >
> >  - Function to combine 2 pools
> >
> > This also adds logic to track all memory allocations made by a memory
> > pool.
> >
> > These functions will be used in a future commit in this commit series.
> >
> > Signed-off-by: Jameson Miller 
> > ---
> >  mem-pool.c | 114
> > ++---
> >  mem-pool.h |  32 +
> 
> > diff --git a/mem-pool.h b/mem-pool.h
> > index 829ad58ecf..34df4fa709 100644
> > --- a/mem-pool.h
> > +++ b/mem-pool.h
> > @@ -19,8 +19,27 @@ struct mem_pool {
> >
> > /* The total amount of memory allocated by the pool. */
> > size_t pool_alloc;
> > +
> > +   /*
> > +* Array of pointers to "custom size" memory allocations.
> > +* This is used for "large" memory allocations.
> > +* The *_end variables are used to track the range of memory
> > +* allocated.
> > +*/
> > +   void **custom, **custom_end;
> > +   int nr, nr_end, alloc, alloc_end;
> >  };
> 
> What is the design goal of this mem pool?
> What is it really good at, which patterns of use should we avoid?
> 

This memory pool is designed to provide many small (small
compared to the memory pool block size) chunks of memory from a
larger block of allocated memory. This reduces the overhead of
performing many small memory allocations from the heap. In the
ideal case, we know the total amount of memory required, and the
pool can make a single allocation to satisfy that requirement,
and hand it out in chunks to consumers.

We should avoid making many large memory requests (large compared
to the memory pool block size), as these requests will be
fulfilled from individual memory allocations (i.e. the "custom"
allocations). While there is not a correctness issue here, it
will not perform as well when requests are fulfilled from the
internal memory blocks.

> It looks like internally the mem-pool can either use mp_blocks that are 
> stored as
> a linked list, or it can have custom allocations stored in an array.
> 
> Is the linked list or the custom part sorted by some key?
> Does it need to be sorted?
> 
> I am currently looking at alloc.c, which is really good for allocating memory 
> for
> equally sized parts, i.e. it is very efficient at providing memory for fixed 
> sized
> structs. And on top of that it is not tracking any memory as it relies on 
> program
> termination for cleanup.

The linked list is ordered in the order the blocks were allocated
in. The last allocated block will be the head of the linked
list. This means that most memory requests should be fulfilled by
the head block, reducing the need to iterate through the list to
find available memory.

The custom memory blocks are in their own list because they will
never contain any free memory. There is no need to include these
allocations in the list of blocks that could potentially have
free memory.

I expect this code would be efficient at allocating many equal
sized parts, as long as those parts are comparitively small
compared to the block size. In this case, you would never
allocate "custom" blocks, and the overwhelming majority of
allocations would come from the head block. If you know the total
amount of memory you will need, then you can size the memory pool
so all allocations come from the head block.

> 
> This memory pool seems to be optimized for allocations of varying sizes, some
> of them huge (to be stored in the custom
> part) and most of them rather small as they go into the mp_blocks?

I would say this memory pool is optimized for allocations of
varying sizes (although it should be pretty efficient when the
allocations are of the same size), but can handle the edge case
when there happens to be a need for a "huge allocation".
> 
> Thanks,
> Stefan

[PATCH v2 1/5] read-cache: teach refresh_cache_entry() to take istate

2018-04-30 Thread Jameson Miller

Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function work on a specific index.

Signed-off-by: Jameson Miller 
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 7 ---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 77b7acebb6..31f8f0420a 100644
--- a/cache.h
+++ b/cache.h
@@ -743,7 +743,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624d..693f60e0a3 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -260,7 +260,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("addinfo_cache failed for path '%s'"), 
path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 10f1c6bb8a..2cb4f53b57 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(&the_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1448,10 +1448,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+  struct cache_entry *ce,
   unsigned int options)
 {
-   return refresh_cache_ent(&the_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.14.3

[PATCH v2 3/5] mem-pool: fill out functionality

2018-04-30 Thread Jameson Miller

Adds the following functionality to memory pools:

 - Lifecycle management functions (init, discard)

 - Test whether a memory location is part of the managed pool

 - Function to combine 2 pools

This also adds logic to track all memory allocations made by a memory
pool.

These functions will be used in a future commit in this commit series.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 114 ++---
 mem-pool.h |  32 +
 2 files changed, 142 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..a495885c4b 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,6 +5,8 @@
 #include "cache.h"
 #include "mem-pool.h"
 
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
 static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
 {
struct mp_block *p;
@@ -19,6 +21,60 @@ static struct mp_block *mem_pool_alloc_block(struct mem_pool 
*mem_pool, size_t b
return p;
 }
 
+static void *mem_pool_alloc_custom(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   char *p;
+   ALLOC_GROW(mem_pool->custom, mem_pool->nr + 1, mem_pool->alloc);
+   ALLOC_GROW(mem_pool->custom_end, mem_pool->nr_end + 1, 
mem_pool->alloc_end);
+
+   p = xmalloc(block_alloc);
+   mem_pool->custom[mem_pool->nr++] = p;
+   mem_pool->custom_end[mem_pool->nr_end++] = p + block_alloc;
+
+   mem_pool->pool_alloc += block_alloc;
+
+   return mem_pool->custom[mem_pool->nr];
+}
+
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   if (!(*mem_pool))
+   {
+   *mem_pool = xmalloc(sizeof(struct mem_pool));
+   (*mem_pool)->pool_alloc = 0;
+   (*mem_pool)->mp_block = NULL;
+   (*mem_pool)->block_alloc = BLOCK_GROWTH_SIZE;
+   (*mem_pool)->custom = NULL;
+   (*mem_pool)->nr = 0;
+   (*mem_pool)->alloc = 0;
+   (*mem_pool)->custom_end = NULL;
+   (*mem_pool)->nr_end = 0;
+   (*mem_pool)->alloc_end = 0;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(*mem_pool, initial_size);
+   }
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   int i;
+   struct mp_block *block, *block_to_free;
+   for (block = mem_pool->mp_block; block;)
+   {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   for (i = 0; i < mem_pool->nr; i++)
+   free(mem_pool->custom[i]);
+
+   free(mem_pool->custom);
+   free(mem_pool->custom_end);
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
@@ -33,10 +89,8 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
break;
 
if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
+   if (len >= (mem_pool->block_alloc / 2))
+   return mem_pool_alloc_custom(mem_pool, len);
 
p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
}
@@ -53,3 +107,55 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   int i;
+   struct mp_block *p;
+
+   /* Check if memory is allocated in a block */
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   /* Check if memory is allocated in custom block */
+   for (i = 0; i < mem_pool->nr; i++)
+   if ((mem >= mem_pool->custom[i]) &&
+   (mem < mem_pool->custom_end[i]))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   int i;
+   struct mp_block **tail = &dst->mp_block;
+
+   /* Find pointer of dst's last block (if any) */
+   while (*tail)
+   tail = &(*tail)->next_block;
+
+   /* Append the blocks from src to dst */
+   *tail = src->mp_block;
+
+   /* Combine custom allocations */
+   ALLOC_GROW(dst->custom, dst->nr + src->nr, dst->alloc);
+   ALLOC_GROW(dst->custom_end, dst->nr_end + src->nr_end, dst->alloc_end);
+
+   for (i = 0; i < src->nr; i++) {
+   dst->custom[dst->nr++] = src->custom[i];
+

[PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle

2018-04-30 Thread Jameson Miller

Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.

Signed-off-by: Jameson Miller 
---
 cache.h  |  7 +++
 git.c|  3 +++
 mem-pool.c   | 24 +++-
 mem-pool.h   |  2 ++
 read-cache.c | 47 +++
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 7ed68f28e0..8f10f0649b 100644
--- a/cache.h
+++ b/cache.h
@@ -369,6 +369,13 @@ void discard_index_cache_entry(struct cache_entry *ce);
  */
 void discard_transient_cache_entry(struct cache_entry *ce);
 
+/*
+ * Validate the cache entries in the index.  This is an internal
+ * consistency check that the cache_entry structs are allocated from
+ * the expected memory pool.
+ */
+void validate_cache_entries(const struct index_state *istate);
+
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
diff --git a/git.c b/git.c
index f598fae7b7..28ec7a6c4f 100644
--- a/git.c
+++ b/git.c
@@ -347,7 +347,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(&the_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(&the_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index a495885c4b..77adb5d5b9 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -60,21 +60,43 @@ void mem_pool_discard(struct mem_pool *mem_pool)
 {
int i;
struct mp_block *block, *block_to_free;
+   int invalidate_memory = should_validate_cache_entries();
+
for (block = mem_pool->mp_block; block;)
{
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
-   for (i = 0; i < mem_pool->nr; i++)
+   for (i = 0; i < mem_pool->nr; i++) {
+   memset(mem_pool->custom[i], 0xDD, ((char 
*)mem_pool->custom_end[i]) - ((char *)mem_pool->custom[i]));
free(mem_pool->custom[i]);
+   }
 
free(mem_pool->custom);
free(mem_pool->custom_end);
free(mem_pool);
 }
 
+int should_validate_cache_entries(void)
+{
+   static int validate_index_cache_entries = -1;
+
+   if (validate_index_cache_entries < 0) {
+   if (getenv("GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES"))
+   validate_index_cache_entries = 1;
+   else
+   validate_index_cache_entries = 0;
+   }
+
+   return validate_index_cache_entries;
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
diff --git a/mem-pool.h b/mem-pool.h
index 34df4fa709..b1f9a920ba 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -63,4 +63,6 @@ void mem_pool_combine(struct mem_pool *dst, struct mem_pool 
*src);
  */
 int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
 
+int should_validate_cache_entries(void);
+
 #endif
diff --git a/read-cache.c b/read-cache.c
index 01cd7fea41..e1dc9f7f33 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1270,6 +1270,8 @@ int add_index_entry(struct index_state *istate, struct 
cache_entry *ce, int opti
 {
int pos;
 
+   validate_cache_entries(istate);
+
if (option & ADD_CACHE_JUST_APPEND)
pos = istate->cache_nr;
else {
@@ -1290,6 +1292,8 @@ int add_index_entry(struct index_state *istate, struct 
cache_entry *ce, int opti
   istate->cache_nr - pos - 1);
set_index_entry(istate, pos, ce);
istate->cache_changed |= CE_ENTRY_ADDED;
+
+   validate_cache_entries(istate);
return 0;
 }
 
@@ -2013,6 +2017,8 @@ int is_index_unborn(struct index_state *istate)
 
 int discard_index(struct index_state *istate)
 {
+   validate_cache_entries(istate);
+
resolve_undo_clear_index(istate);
istate->cache_nr = 0;
istate->cache_changed = 0;
@@ -2035,6 +2041,43 @@ int discard_index(struct index_state *istate)
return 0;
 }
 
+
+/*
+ * Validate the cache entries of this index.
+ * All cache entries associated with this index
+ * should have been allocated by the memory pool
+ * associated with this index, or by a referenced
+ *

[PATCH v2 4/5] block alloc: allocate cache entries from mem_pool

2018-04-30 Thread Jameson Miller

When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.

This change moves the cache entry allocation to be on top of memory
pools.

Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool.  This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.

Alternatives:
The current design does not track whether a cache_entry is allocated
from a pool or not. Instead, it relies on the caller to know how the
cache_entry will be used and to handle its lifecyle appropriately,
including calling the correct free() method. Instead, we could include
a bit of information in the cache_entry (either a bit indicating
whether cache_entry was allocated from a pool or not, or possibly even
a pointer back to the allocating pool), which can then be used to make
informed decisions about these objects. The downside of this approach
is that the cache_entry type would need to grow to incorporate this
information.

Signed-off-by: Jameson Miller 
---
 cache.h   |  2 ++
 read-cache.c  | 95 +--
 split-index.c | 23 +--
 3 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/cache.h b/cache.h
index 3760adbe25..7ed68f28e0 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -328,6 +329,7 @@ struct index_state {
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
struct ewah_bitmap *fsmonitor_dirty;
+   struct mem_pool *ce_mem_pool;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 2a61cee130..01cd7fea41 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -46,6 +46,42 @@
 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
 
+
+/*
+ * This is a guess of an pathname in the index.  We use this for V4
+ * index files to guess the un-deltafied size of the index in memory
+ * because of pathname deltafication.  This is not required for V2/V3
+ * index formats because their pathnames are not compressed.  If the
+ * initial amount of memory set aside is not sufficient, the mem pool
+ * will allocate extra memory.
+ */
+#define CACHE_ENTRY_PATH_LENGTH 80
+
+static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool 
*mem_pool, size_t len)
+{
+   return mem_pool_alloc(mem_pool, cache_entry_size(len));
+}
+
+static inline struct cache_entry *mem_pool__ce_calloc(struct mem_pool 
*mem_pool, size_t len)
+{
+   return mem_pool_calloc(mem_pool, 1, cache_entry_size(len));
+}
+
+static struct mem_pool *find_mem_pool(struct index_state *istate)
+{
+   struct mem_pool **pool_ptr;
+
+   if (istate->split_index && istate->split_index->base)
+   pool_ptr = &istate->split_index->base->ce_mem_pool;
+   else
+   pool_ptr = &istate->ce_mem_pool;
+
+   if (!*pool_ptr)
+   mem_pool_init(pool_ptr, 0);
+
+   return *pool_ptr;
+}
+
 struct index_state the_index;
 static const char *alternate_index_output;
 
@@ -746,7 +782,7 @@ int add_file_to_index(struct index_state *istate, const 
char *path, int flags)
 
 struct cache_entry *make_empty_index_cache_entry(struct index_state *istate, 
size_t len)
 {
-   return xcalloc(1, cache_entry_size(len));
+   return mem_pool__ce_calloc(find_mem_pool(istate), len);
 }
 
 struct cache_entry *make_empty_transient_cache_entry(size_t len)
@@ -1641,13 +1677,13 @@ int read_index(struct ind

[PATCH v2 2/5] block alloc: add lifecycle APIs for cache_entry structs

2018-04-30 Thread Jameson Miller

Add an API around managing the lifetime of cache_entry
structs. Abstracting memory management details behind an API will
allow for alternative memory management strategies without affecting
all the call sites.  This commit does not change how memory is
allocated / freed. A later commit in this series will allocate cache
entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended
for use with a particular index and cache entries that are not. This
enables us to use the knowledge about how a cache entry will be used
to make informed decisions about how to handle the corresponding
memory.

Signed-off-by: Jameson Miller 
---
 apply.c|  26 ++--
 blame.c|   5 +--
 builtin/checkout.c |   8 ++--
 builtin/difftool.c |   8 ++--
 builtin/reset.c|   6 +--
 builtin/update-index.c |  26 ++--
 cache.h|  29 +-
 merge-recursive.c  |   2 +-
 read-cache.c   | 105 +++--
 resolve-undo.c |   6 ++-
 split-index.c  |   8 ++--
 tree.c |   4 +-
 unpack-trees.c |  27 -
 13 files changed, 166 insertions(+), 94 deletions(-)

diff --git a/apply.c b/apply.c
index 7e5792c996..123646e1aa 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_index_cache_entry(&result, patch->old_mode, oid.hash, 
name, 0, 0);
if (!ce)
-   return error(_("make_cache_entry failed for path '%s'"),
+   return error(_("make_index_cache_entry failed for path 
'%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   discard_index_cache_entry(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4263,12 +4263,11 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
if (!state->update_index)
return 0;
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_index_cache_entry(&the_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4278,13 +4277,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", &s) ||
get_oid_hex(s, &ce->oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   discard_index_cache_entry(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, &st) < 0) {
-   free(ce);
+   discard_index_cache_entry(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4292,13 +4291,13 @@ static int add_index_file(struct apply_state *state,
fill_stat_cache_info(ce, &st);
}
if (write_object_file(buf, size, blob_type, &ce->oid) < 0) {
-   free(ce);
+   discard_index_cache_entry(ce);
return error(_("unable to create backing store "
   "for newly created file %s"), path);
}
}
if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) {
-   free(ce);
+   discard_index_cache_entry(ce);
return error(_("unable to add cache entry for %s"), path);
}
 
@@ -4422,27 +4421,26 @@ static int add_conflicted_stages_file(struct 
appl

[PATCH v2 0/5] Allocate cache entries from memory pool

2018-04-30 Thread Jameson Miller

: g...@github.com:jamill/git.git/commit/2152d28016
Checkout: git fetch g...@github.com:jamill/git.git 
users/jamill/block_allocation-v2 && git checkout 2152d28016

Jameson Miller (5):
  read-cache: teach refresh_cache_entry() to take istate
  block alloc: add lifecycle APIs for cache_entry structs
  mem-pool: fill out functionality
  block alloc: allocate cache entries from mem_pool
  block alloc: add validations around cache_entry lifecyle

 apply.c|  26 +++---
 blame.c|   5 +-
 builtin/checkout.c |   8 +-
 builtin/difftool.c |   8 +-
 builtin/reset.c|   6 +-
 builtin/update-index.c |  26 +++---
 cache.h|  40 -
 git.c  |   3 +
 mem-pool.c | 136 -
 mem-pool.h |  34 
 merge-recursive.c  |   4 +-
 read-cache.c   | 232 +++--
 resolve-undo.c |   6 +-
 split-index.c  |  31 +--
 tree.c |   4 +-
 unpack-trees.c |  27 +++---
 16 files changed, 479 insertions(+), 117 deletions(-)


base-commit: 1f1cddd558b54bb0ce19c8ace353fd07b758510d
-- 
2.14.3

Re: [PATCH v1 3/5] mem-pool: fill out functionality

2018-04-23 Thread Jameson Miller




On 04/23/2018 01:49 PM, Jonathan Tan wrote:

On Mon, 23 Apr 2018 13:27:09 -0400
Jameson Miller  wrote:


This seems overly complicated - the struct mem_pool already has a linked
list of pages, so couldn't you create a custom page and insert it behind
the current front page instead whenever you needed a large-size page?


Yes - that is another option. However, the linked list of pages includes
memory that *could* have space for an allocation, while the "custom"
region will never have left over memory that can be used for other
allocations. When searching pages for memory to satisfy a request, there
is no reason to search through the "custom" pages. There is a trade-off
between complexity and implementation, so I am open to suggestions.

This was discussed in [1], where it originally was implemented closer to
what you describe here.


Also, when combining, there could be some wasted space on one of the
pages. I'm not sure if that's worth calling out, though.


Yes, we bring over the whole page. However, these pages are now
available for new allocations.

[1]
https://public-inbox.org/git/xmqqk1u2k91l@gitster-ct.c.googlers.com/


Ah, I didn't realize that the plan was to search over all pages when
allocating memory from the pool, instead of only searching the last
page. This seems like a departure from the fast-import.c way, where as
far as I can tell, new_object() searches only one page. If we do plan to
do this, searching all pages doesn't seem like a good idea to me,
especially since the objects we're storing in the pool are of similar
size.


I see. However, the new_object() logic in fast-import is a different 
than the logic mem_pool was abstracting, and is not covered by the 
mem_pool functionality. The behavior of searching over all pages for one 
to satisfy the request existed previously and was not changed in the 
mem_pool implementation.




If we decide to go ahead with searching all the pages, though, the
"custom" pages should probably be another linked list instead of an
array.



This is an option - I went with the current design because we only need 
pointers to a block of memory (as well tracking how large the allocation 
is for verification purposes). We don't necessarily need the extra 
overhead of a structure to track the linked list nodes, when it is 
provided by the existing array manipulation functions. I am open to 
feedback on this point, however.

Re: [PATCH v1 3/5] mem-pool: fill out functionality

2018-04-23 Thread Jameson Miller




On 04/20/2018 07:21 PM, Jonathan Tan wrote:

On Tue, 17 Apr 2018 16:34:42 +
Jameson Miller  wrote:


@@ -19,8 +19,27 @@ struct mem_pool {
  
  	/* The total amount of memory allocated by the pool. */

size_t pool_alloc;
+
+   /*
+* Array of pointers to "custom size" memory allocations.
+* This is used for "large" memory allocations.
+* The *_end variables are used to track the range of memory
+* allocated.
+*/
+   void **custom, **custom_end;
+   int nr, nr_end, alloc, alloc_end;


This seems overly complicated - the struct mem_pool already has a linked
list of pages, so couldn't you create a custom page and insert it behind
the current front page instead whenever you needed a large-size page?


Yes - that is another option. However, the linked list of pages includes 
memory that *could* have space for an allocation, while the "custom" 
region will never have left over memory that can be used for other 
allocations. When searching pages for memory to satisfy a request, there 
is no reason to search through the "custom" pages. There is a trade-off 
between complexity and implementation, so I am open to suggestions.


This was discussed in [1], where it originally was implemented closer to 
what you describe here.




Also, when combining, there could be some wasted space on one of the
pages. I'm not sure if that's worth calling out, though.



Yes, we bring over the whole page. However, these pages are now 
available for new allocations.


[1]
https://public-inbox.org/git/xmqqk1u2k91l@gitster-ct.c.googlers.com/

Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller




On 04/20/2018 07:34 PM, Jonathan Tan wrote:

On Tue, 17 Apr 2018 16:34:39 +
Jameson Miller  wrote:


Jameson Miller (5):
   read-cache: teach refresh_cache_entry to take istate
   Add an API creating / discarding cache_entry structs
   mem-pool: fill out functionality
   Allocate cache entries from memory pools
   Add optional memory validations around cache_entry lifecyle


In this patch set, there is no enforcement that the cache entry created
by make_index_cache_entry() goes into the correct index when
add_index_entry() is invoked. (Junio described similar things, I
believe, in [1].) This might be an issue when we bring up and drop
multiple indexes, and dropping one index causes a cache entry in another
to become invalidated.


Correct - it is up to the caller here to coordinate this. The code 
should be set up so this is not a problem here. In the case of a 
split-index, the cache entries should be allocated from the memory pool 
associated with the "most common" / base index. If you found a place I 
missed or seems questionable, or have suggestions, I would be glad to 
look into it.




One solution is to store the index for which the cache entry was created
in the cache entry itself, but that does increase its size. Another is


Yes, this is an option. For this initial patch series, I decided to not 
add extra fields to the cache_entry type, but I think incorporating this 
in cache_entry is a viable option, and has some positive properties.



to change the API such that a cache entry is created and added in the
same function, and then have some rollback if the cache entry turns out
to be invalid (to support add-empty-entry -> fill -> verify), but I
don't know if this is feasible. Anyway, all these alternatives should be
at least discussed in the commit message, I think.


I can include a discussion of these in the commit message. Thanks.



The make_transient_cache_entry() function might be poorly named, since
as far as I can tell, the entries produced by that function are actually
the longest lasting, since they are never freed.


They should always be freed (and are usually freed close to where they 
are allocated, or by the calling function). If you see an instance where 
this is not the case, please point it out, because that is not the 
intention.




Along those lines, I was slightly surprised to find out in patch 4 that
cache entry freeing is a no-op. That's fine, but in that case, it would
be better to delete all the calls to the "discard" function, and
document in the others that the entries they create will only be freed
when the memory pool itself is discarded.


I can add a comment inside the function body. In the next commit, I do 
add logic to perform extra (optional) verification in the discard 
function. I did wrestle with this fact, but I feel there is value in 
having the locations where it is appropriate to free these entries in 
code, even if this particular implementation is not utilizing it right 
now. Hopefully the verification logic added in 5/5 is enough to justify 
keeping this function around.




[1] https://public-inbox.org/git/xmqqwox5i0f7@gitster-ct.c.googlers.com/

Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller




On 04/20/2018 01:49 PM, Stefan Beller wrote:

base-commit: cafaccae98f749ebf33495aec42ea25060de8682


I couldn't quite figure out what these five patches were based on,
even with this line.  Basing on and referring to a commit that is
not part of our published history with "base-commit" is not all that
useful to others.


I'd like to second this. In the object store refactoring, I am at a point where
I'd want to migrate the memory management of {object, tree, commit, tag}.c
which currently is done in alloc.c to a memory pool, that has a dedicated
pointer to it.

So I'd either have to refactor alloc.c to take the_repository[1] or
I'd play around with the mem_pool to manage memory in the
object layer. I guess this playing around can happen with
what is at origin/jm/mem-pool, however the life cycle management
part of the third patch[2] would allow for stopping memleaks there.
So I am interested in this series as well.



Sorry about the confusion here. This patch series can be applied to the 
next branch. They apply cleanly on [3]. The lifecycle functions are 
re-introduced in [4], which we could incorporate sooner if useful. I 
didn't have a consumer for the calls in the original patch series, and 
so deferred them until there was a caller. I would be interested to 
understand how the mem_pool would fit your needs, and if it is 
sufficient or needs modification for your use cases.



[1] proof of concept in patches nearby
https://public-inbox.org/git/20180206001749.218943-31-sbel...@google.com/

[2] https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/

Thanks,
Stefan



[3] b46fe60e1d ("merge-fix/ps/test-chmtime-get", 2018-04-20)

[4]
https://public-inbox.org/git/20180417163400.3875-5-jam...@microsoft.com/

Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller




On 04/18/2018 12:49 AM, Junio C Hamano wrote:

Jameson Miller  writes:


This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. ...

Jameson Miller (5):
   read-cache: teach refresh_cache_entry to take istate
   Add an API creating / discarding cache_entry structs
   mem-pool: fill out functionality
   Allocate cache entries from memory pools
   Add optional memory validations around cache_entry lifecyle

  apply.c|  26 +++---
  blame.c|   5 +-
  builtin/checkout.c |   8 +-
  builtin/difftool.c |   8 +-
  builtin/reset.c|   6 +-
  builtin/update-index.c |  26 +++---
  cache.h|  40 -
  git.c  |   3 +
  mem-pool.c | 136 -
  mem-pool.h |  34 
  merge-recursive.c  |   4 +-
  read-cache.c   | 229 +++--
  resolve-undo.c |   6 +-
  split-index.c  |  31 +--
  tree.c |   4 +-
  unpack-trees.c |  27 +++---
  16 files changed, 476 insertions(+), 117 deletions(-)


base-commit: cafaccae98f749ebf33495aec42ea25060de8682

I couldn't quite figure out what these five patches were based on,
even with this line.  Basing on and referring to a commit that is
not part of our published history with "base-commit" is not all that
useful to others.

My apologies - this patch series should be applied to the 'next'
branch.  It applies cleanly on top of
b46fe60e1d ("merge-fix/ps/test-chmtime-get", 2018-04-20), which
is a commit in the 'next' branch.

Offhand without applying these patches and viewing the changes in
wider contexts, one thing that makes me wonder is how the two
allocation schemes can be used with two implementations of free().
Upon add_index_entry() that replaces an index entry for an existing
path, we'd discard an entry that was originally read as part of
read_cache().  If we do that again, the second add_index_entry()
will be discading, in its call to replace_index_entry(), the entry
that was allocated by the caller of the first add_index_entry()
call.  And replace_index_entry() would not have a way to know from
which allocation the entry's memory came from.

Perhaps it is not how you are using the "transient" stuff, and from
the comment in 2/5, it is for "entries that are not meant to go into
the index", but then higher stage index entries in unpack_trees seem
to be allocated via the "transient" stuff, so I am not sure what the
plans are for things like merge_recursive() that uses unpack_trees()
to prepare the index and then later "fix it up" by further futzing
around the index to adjust for renames it finds, etc.


Good points. The intention with this logic is that any entries
that *could* go into an index are allocated from the memory
pool. The "transient" entries only exist for a short period of
time. These have a defined lifetime and we can always trace the
corresponding "free" call. make_transient_cache_entry() is only
used to construct a temporary cache_entry to pass to the
checkout_entry() / write_entry(). There is a note in checkout.c
indicating that write_entry() needs to be re-factored to not take
a cache_entry.

The cache_entry type could gain knowledge about where it was
allocated from. This would allow us to only have a
single "free()" function, which could inspect the cache_entry to
see if it was allocated from a mem_pool (and possibly which
mem_pool) and take the appropriate action. The downside of this
approach is that cache_entry would need to gain a field to track
this information, and I *think* all of the bit field spots are
taken.


Let me read it fully once we know where these patches are to be
applied, but before that I cannot say much about them X-<.

Thanks.

Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-23 Thread Jameson Miller




On 04/17/2018 02:39 PM, Ben Peart wrote:



On 4/17/2018 12:34 PM, Jameson Miller wrote:


100K

Test   baseline [4] block_allocation

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01) 
0.02(0.01+0.01) -33.3%


1M:

Test   baseline block_allocation

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11) 
0.17(0.07+0.09) -26.1%


2M:

Test   baseline block_allocation

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19) 
0.39(0.17+0.20) -13.3%



100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.


I see a 33% change with 100K files which is a substantial improvement 
even in the 100K case.  I do see that the actual wall clock savings 
aren't nearly as much with a small repo as it is with the larger repos 
which makes sense.


You are correct - I should have been more careful in my wording. What I 
meant is that the wall time savings with 100K is not large, because this 
operation is already very fast.

RE: [PATCH v4 0/3] Extract memory pool logic into reusable component

2018-04-17 Thread Jameson Miller

I think this version (V4) should reflect the latest round of feedback. Please 
let me know if there are any other questions or outstanding work to finish here.

I have submitted a patch series to have a second component use this memory pool 
[1].

Thank you

[1] https://public-inbox.org/git/20180417163400.3875-2-jam...@microsoft.com/

-Original Message-
From: Jameson Miller 
Sent: Wednesday, April 11, 2018 2:38 PM
To: git@vger.kernel.org
Cc: gits...@pobox.com; p...@peff.net; sunsh...@sunshineco.com; 
ram...@ramsayjones.plus.com; Jameson Miller 
Subject: [PATCH v4 0/3] Extract memory pool logic into reusable component

Thank you everyone for taking the time review and provide feedback on this 
patch series.

Changes from v3:

  - Based patch off of new commit, to resolve merge conflict.

  - Updated log message in 2/3 based on feedback.

  - Squashed patch from Ramsay Jones into 2/3 to fix warning from
sparse.

  - Updated variable names in 2/3 to reflect updated usage of
variable.

Jameson Miller (3):
  fast-import: rename mem_pool type to mp_block
  fast-import: introduce mem_pool type
  Move reusable parts of memory pool into its own file

 Makefile  |  1 +
 fast-import.c | 77 +--
 mem-pool.c| 55 ++
 mem-pool.h| 34 ++
 4 files changed, 106 insertions(+), 61 deletions(-)  create mode 100644 
mem-pool.c  create mode 100644 mem-pool.h

--
2.14.3

[PATCH v1 4/5] Allocate cache entries from memory pools

2018-04-17 Thread Jameson Miller

Improve performance of reading a large index by reducing the number of
malloc() calls. When reading an index with a large number of entries,
a portion of the time is dominated in malloc() calls. This can be
mitigated by allocating a single large block of memory up front into a
memory pool and have git hand out chunks of time.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entry structs will be allocated from. When reading in
the index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool and cache entries from the memory pool will be
discarded as well. This means the lifetime of cache_entry structs are
tied to the lifetime of the index_state they were allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
---
 cache.h   |  2 ++
 read-cache.c  | 95 +--
 split-index.c | 23 +--
 3 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/cache.h b/cache.h
index eedf154815..7c0d2343c3 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -328,6 +329,7 @@ struct index_state {
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
struct ewah_bitmap *fsmonitor_dirty;
+   struct mem_pool *ce_mem_pool;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 04fa7e1bd0..67438bf375 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -46,6 +46,42 @@
 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
 
+
+/*
+ * This is an estimate of the average pathname length in the index.  We use
+ * this for V4 index files to guess the un-deltafied size of the index in
+ * memory because of pathname deltafication.  This is not required for V2/V3
+ * index formats because their pathnames are not compressed.  If the initial
+ * amount of memory set aside is not sufficient, the mem pool will allocate
+ * extra memory.
+ */
+#define CACHE_ENTRY_AVG_PATH_LENGTH_ESTIMATE 80
+
+static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool 
*mem_pool, size_t len)
+{
+   return mem_pool_alloc(mem_pool, cache_entry_size(len));
+}
+
+static inline struct cache_entry *mem_pool__ce_calloc(struct mem_pool 
*mem_pool, size_t len)
+{
+   return mem_pool_calloc(mem_pool, 1, cache_entry_size(len));
+}
+
+static struct mem_pool *find_mem_pool(struct index_state *istate)
+{
+   struct mem_pool **pool_ptr;
+
+   if (istate->split_index && istate->split_index->base)
+   pool_ptr = &istate->split_index->base->ce_mem_pool;
+   else
+   pool_ptr = &istate->ce_mem_pool;
+
+   if (!*pool_ptr)
+   mem_pool_init(pool_ptr, 0);
+
+   return *pool_ptr;
+}
+
 struct index_state the_index;
 static const char *alternate_index_output;
 
@@ -746,7 +782,7 @@ int add_file_to_index(struct index_state *istate, const 
char *path, int flags)
 
 struct cache_entry *make_empty_index_cache_entry(struct index_state *istate, 
size_t len)
 {
-   return xcalloc(1, cache_entry_size(len));
+   return mem_pool__ce_calloc(find_mem_pool(istate), len);
 }
 
 struct cache_entry *make_empty_transient_cache_entry(size_t len)
@@ -1641,13 +1677,13 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *cache_entry_from_ondisk(struct index_state *istate,
+static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool,
   struct ondisk_cache_entry 
*ondisk,
   unsigned int flags,
   const char *name,
   size_t len)
 {
-   struct cache_entry *ce =

[PATCH v1 5/5] Add optional memory validations around cache_entry lifecyle

2018-04-17 Thread Jameson Miller

Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.
---
 cache.h  |  7 +++
 git.c|  3 +++
 mem-pool.c   | 35 ++-
 mem-pool.h   |  2 ++
 read-cache.c | 44 
 5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 7c0d2343c3..f8934d8113 100644
--- a/cache.h
+++ b/cache.h
@@ -369,6 +369,13 @@ void index_cache_entry_discard(struct cache_entry *ce);
  */
 void transient_cache_entry_discard(struct cache_entry *ce);
 
+/*
+ * Validate the cache entries in the index.  This is an internal
+ * consistency check that the cache_entry structs are allocated from
+ * the expected memory pool.
+ */
+void validate_cache_entries(const struct index_state *istate);
+
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
diff --git a/git.c b/git.c
index 3a89893712..16b6c1685b 100644
--- a/git.c
+++ b/git.c
@@ -347,7 +347,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(&the_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(&the_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index 09fb78d093..a7e28934b0 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -60,20 +60,44 @@ void mem_pool_discard(struct mem_pool *mem_pool)
 {
int i;
struct mp_block *block, *block_to_free;
+   int invalidate_memory = should_validate_cache_entries();
+
for (block = mem_pool->mp_block; block;) {
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
-   for (i = 0; i < mem_pool->nr; i++)
+   for (i = 0; i < mem_pool->nr; i++) {
+   if (invalidate_memory)
+   memset(mem_pool->custom[i], 0xDD, ((char 
*)mem_pool->custom_end[i]) - ((char *)mem_pool->custom[i]));
+
free(mem_pool->custom[i]);
+   }
 
free(mem_pool->custom);
free(mem_pool->custom_end);
free(mem_pool);
 }
 
+int should_validate_cache_entries(void)
+{
+   static int validate_index_cache_entries = -1;
+
+   if (validate_index_cache_entries < 0) {
+   if (getenv("GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES"))
+   validate_index_cache_entries = 1;
+   else
+   validate_index_cache_entries = 0;
+   }
+
+   return validate_index_cache_entries;
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
@@ -110,11 +134,20 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
 int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
 {
struct mp_block *p;
+   int i;
+
+   /* Check if memory is allocated in a block */
for (p = mem_pool->mp_block; p; p = p->next_block)
if ((mem >= ((void *)p->space)) &&
(mem < ((void *)p->end)))
return 1;
 
+   /* Check custom memory allocations */
+   for (i = 0; i < mem_pool->nr; i++)
+   if (mem >= mem_pool->custom[i] &&
+   mem < mem_pool->custom_end[i])
+   return 1;
+
return 0;
 }
 
diff --git a/mem-pool.h b/mem-pool.h
index 34df4fa709..b1f9a920ba 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -63,4 +63,6 @@ void mem_pool_combine(struct mem_pool *dst, struct mem_pool 
*src);
  */
 int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
 
+int should_validate_cache_entries(void);
+
 #endif
diff --git a/read-cache.c b/read-cache.c
index 67438bf375..d2181a0334 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1290,6 +1290,7 @@ int add_index_entry(struct index_state *istate, struct 
cache_entry *ce, int opti
   istate->cache_nr - pos - 1);
set_index_entry(istate, pos, ce);
istate->cache_changed |= CE_ENTRY_ADDED;
+
return 0;
 }
 
@@ -2013,6 +2014,8 @@ int is_index_unborn(struct index_state *istate)
 
 int discard_index(struct index_state *istate)
 {
+   validate_cache_entries(istate);
+
resolve_undo_clear_index(istate);
istate->cache_nr = 0;
istate->cach

[PATCH v1 3/5] mem-pool: fill out functionality

2018-04-17 Thread Jameson Miller

Adds the following functionality to memory pools:

 - Lifecycle management functions (init, discard)

 - Test whether a memory location is part of the managed pool

 - Function to combine 2 pools

This also adds logic to track all memory allocations made by a memory
pool.

These functions will be used in a future commit in this commit series.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 103 ++---
 mem-pool.h |  32 +++
 2 files changed, 131 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..09fb78d093 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,6 +5,8 @@
 #include "cache.h"
 #include "mem-pool.h"
 
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
 static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
 {
struct mp_block *p;
@@ -19,6 +21,59 @@ static struct mp_block *mem_pool_alloc_block(struct mem_pool 
*mem_pool, size_t b
return p;
 }
 
+static void *mem_pool_alloc_custom(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   char *p;
+   ALLOC_GROW(mem_pool->custom, mem_pool->nr + 1, mem_pool->alloc);
+   ALLOC_GROW(mem_pool->custom_end, mem_pool->nr_end + 1, 
mem_pool->alloc_end);
+
+   p = xmalloc(block_alloc);
+   mem_pool->custom[mem_pool->nr++] = p;
+   mem_pool->custom_end[mem_pool->nr_end++] = p + block_alloc;
+
+   mem_pool->pool_alloc += block_alloc;
+
+   return mem_pool->custom[mem_pool->nr - 1];
+}
+
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   if (!(*mem_pool))
+   {
+   *mem_pool = xmalloc(sizeof(struct mem_pool));
+   (*mem_pool)->pool_alloc = 0;
+   (*mem_pool)->mp_block = NULL;
+   (*mem_pool)->block_alloc = BLOCK_GROWTH_SIZE;
+   (*mem_pool)->custom = NULL;
+   (*mem_pool)->nr = 0;
+   (*mem_pool)->alloc = 0;
+   (*mem_pool)->custom_end = NULL;
+   (*mem_pool)->nr_end = 0;
+   (*mem_pool)->alloc_end = 0;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(*mem_pool, initial_size);
+   }
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   int i;
+   struct mp_block *block, *block_to_free;
+   for (block = mem_pool->mp_block; block;) {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   for (i = 0; i < mem_pool->nr; i++)
+   free(mem_pool->custom[i]);
+
+   free(mem_pool->custom);
+   free(mem_pool->custom_end);
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
@@ -33,10 +88,8 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
break;
 
if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
+   if (len >= (mem_pool->block_alloc / 2))
+   return mem_pool_alloc_custom(mem_pool, len);
 
p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
}
@@ -53,3 +106,45 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   int i;
+   struct mp_block **tail = &dst->mp_block;
+
+   /* Find pointer of dst's last block (if any) */
+   while (*tail)
+   tail = &(*tail)->next_block;
+
+   /* Append the blocks from src to dst */
+   *tail = src->mp_block;
+
+   ALLOC_GROW(dst->custom, dst->nr + src->nr, dst->alloc);
+   ALLOC_GROW(dst->custom_end, dst->nr_end + src->nr_end, dst->alloc_end);
+
+   for (i = 0; i < src->nr; i++) {
+   dst->custom[dst->nr++] = src->custom[i];
+   dst->custom_end[dst->nr_end++] = src->custom_end[i];
+   }
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+   src->custom = NULL;
+   src->nr = 0;
+   src->alloc = 0;
+   src->custom_end = NULL;
+   src->nr_end = 0;
+   src->alloc_end = 0;
+}
diff --git a

[PATCH v1 1/5] read-cache: teach refresh_cache_entry to take istate

2018-04-17 Thread Jameson Miller

Refactoring dependencies of make_cache_entry to work on a specific
index, instead of implicitly using the_index. This is in preparation
for making the make_cache_entry function work on a specific index.
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 7 ---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index bbaf5c349a..e50a847aea 100644
--- a/cache.h
+++ b/cache.h
@@ -743,7 +743,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624d..693f60e0a3 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -260,7 +260,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("addinfo_cache failed for path '%s'"), 
path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 10f1c6bb8a..2cb4f53b57 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(&the_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1448,10 +1448,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+  struct cache_entry *ce,
   unsigned int options)
 {
-   return refresh_cache_ent(&the_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.14.3

[PATCH v1 2/5] Add an API creating / discarding cache_entry structs

2018-04-17 Thread Jameson Miller

Add an API around managing the lifetime of cache_entry structs. Abstracting
memory management details behind an API will allow for alternative memory
management strategies without affecting all the call sites.  This commit does
not change how memory is allocated / freed. A later commit in this series will
allocate cache entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This
change is in preparation for using memory pools to reduce the number
of malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended for use
with a particular to an index and cache entries that are not. This enables us
to use the knowledge about how a cache entry will be used to make informed
decisions about how to handle the corresponding memory.
---
 apply.c|  26 ++--
 blame.c|   5 +--
 builtin/checkout.c |   8 ++--
 builtin/difftool.c |   8 ++--
 builtin/reset.c|   6 +--
 builtin/update-index.c |  26 ++--
 cache.h|  29 +-
 merge-recursive.c  |   2 +-
 read-cache.c   | 105 +++--
 resolve-undo.c |   6 ++-
 split-index.c  |   8 ++--
 tree.c |   4 +-
 unpack-trees.c |  27 -
 13 files changed, 166 insertions(+), 94 deletions(-)

diff --git a/apply.c b/apply.c
index 7e5792c996..47903f427b 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_index_cache_entry(&result, patch->old_mode, oid.hash, 
name, 0, 0);
if (!ce)
-   return error(_("make_cache_entry failed for path '%s'"),
+   return error(_("make_index_cache_entry failed for path 
'%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4263,12 +4263,11 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
if (!state->update_index)
return 0;
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_index_cache_entry(&the_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4278,13 +4277,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", &s) ||
get_oid_hex(s, &ce->oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   index_cache_entry_discard(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, &st) < 0) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4292,13 +4291,13 @@ static int add_index_file(struct apply_state *state,
fill_stat_cache_info(ce, &st);
}
if (write_object_file(buf, size, blob_type, &ce->oid) < 0) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error(_("unable to create backing store "
   "for newly created file %s"), path);
}
}
if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error(_("unable to add cache entry for %s"), path);
}
 
@@ -4422,27 +4421,26 @@ static int add_conflicted_stages_file(struct 
apply_state *state,
   struct patch *patch)
 {
int stage, namelen;
-   unsigned ce_size, mode;
+   unsigned mode;
struct cache_entry *ce;
 
if (!state->update_index)
return 0;
na

[PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Jameson Miller

This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.

Git previously allocated block of memory for the index cache entries
until [2].

This 5 part patch series is broken up as follows:

  1/5, 2/5 - Move cache entry lifecycle methods behind an API

  3/5 - Fill out memory pool API to include lifecycle and other
methods used in later patches

  4/5 - Allocate cache entry structs from memory pool

  5/5 - Add extra optional validation

Performance Benchmarks:

To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.

The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.

100K

Test   baseline [4]   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries. For
completeness, here is the p0002-read-cache tests for git.git and
linux.git:

git.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   0.30(0.26+0.03)   0.17(0.13+0.03) 
-43.3%

linux.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   7.05(6.01+0.84)   4.61(3.74+0.66) 
-34.6% 


We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.

100K

Test   baseline [4]  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)   0.03(0.01+0.01) 
+0.0% 0.03(0.02+0.01) +0.0% 

1M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)   0.21(0.10+0.10) 
-8.7% 0.27(0.16+0.10) +17.4%

2M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)   0.46(0.25+0.21) 
+2.2% 0.57(0.36+0.21) +26.7%


[1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/

[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)

[3] Constructing test repositories:

The test repositories were constructed with t/perf/repos/many_files.sh with the 
following parameters:

100K:many-files.sh 4 10 9
1M:  many-files.sh 5 10 9
2M:  many-files.sh 6 8 7

[4] baseline commit: 8b026eda Revert "Merge branch 
'en/rename-directory-detection'"

Jameson Miller (5):
  read-cache: teach refresh_cache_entry to take istate
  Add an API creating / discard

[PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Jameson Miller

This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.

Git previously allocated block of memory for the index cache entries
until [2].

This 5 part patch series is broken up as follows:

  1/5, 2/5 - Move cache entry lifecycle methods behind an API

  3/5 - Fill out memory pool API to include lifecycle and other
methods used in later patches

  4/5 - Allocate cache entry structs from memory pool

  5/5 - Add extra optional validation

Performance Benchmarks:

To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.

The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.

100K

Test   baseline [4]   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.

For completeness, here is the p0002-read-cache tests for git.git and
linux.git:

git.git:

Test  baseline [4] block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   0.30(0.26+0.03)   0.17(0.13+0.03) 
-43.3%

linux.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   7.05(6.01+0.84)   4.61(3.74+0.66) 
-34.6% 


We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.

100K

Test   baseline [4]  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)   0.03(0.01+0.01) 
+0.0% 0.03(0.02+0.01) +0.0% 

1M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)   0.21(0.10+0.10) 
-8.7% 0.27(0.16+0.10) +17.4%

2M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)   0.46(0.25+0.21) 
+2.2% 0.57(0.36+0.21) +26.7%



[1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/

[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)

[3] Constructing test repositories:

The test repositories were constructed with t/perf/repos/many_files.sh with the 
following parameters:

100K:many-files.sh 4 10 9
1M:  many-files.sh 5 10 9
2M:  many-files.sh 6 8 7

[4] baseline commit: 8b026eda Revert "Merge branch 
'en/rename-directory-detection'"

Jameson Miller (5):
  read-cache: teach refresh_cache_entry to take istate
  Add an API creating / discard

[PATCH v4 1/3] fast-import: rename mem_pool type to mp_block

2018-04-11 Thread Jameson Miller

This is part of a patch series to extract the memory pool logic in
fast-import into a more generalized version. The existing mem_pool type
maps more closely to a "block of memory" (mp_block) in the more
generalized memory pool. This commit renames the mem_pool to mp_block to
reduce churn in future patches.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 99f8f56e8c..38af0a294b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -211,8 +211,8 @@ struct last_object {
unsigned no_swap : 1;
 };
 
-struct mem_pool {
-   struct mem_pool *next_pool;
+struct mp_block {
+   struct mp_block *next_block;
char *next_free;
char *end;
uintmax_t space[FLEX_ARRAY]; /* more */
@@ -306,9 +306,9 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mem_pool);
+static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
 static size_t total_allocd;
-static struct mem_pool *mem_pool;
+static struct mp_block *mp_block_head;
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -638,14 +638,14 @@ static unsigned int hc_str(const char *s, size_t len)
 
 static void *pool_alloc(size_t len)
 {
-   struct mem_pool *p;
+   struct mp_block *p;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool; p; p = p->next_pool)
+   for (p = mp_block_head; p; p = p->next_block)
if ((p->end - p->next_free >= len))
break;
 
@@ -654,12 +654,12 @@ static void *pool_alloc(size_t len)
total_allocd += len;
return xmalloc(len);
}
-   total_allocd += sizeof(struct mem_pool) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mem_pool), mem_pool_alloc));
-   p->next_pool = mem_pool;
+   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
+   p->next_block = mp_block_head;
p->next_free = (char *) p->space;
p->end = p->next_free + mem_pool_alloc;
-   mem_pool = p;
+   mp_block_head = p;
}
 
r = p->next_free;
-- 
2.14.3

[PATCH v4 2/3] fast-import: introduce mem_pool type

2018-04-11 Thread Jameson Miller

Introduce the mem_pool type which encapsulates all the information necessary to
manage a pool of memory. This change moves the existing variables in
fast-import used to support the global memory pool to use this structure. It
also renames variables that are no longer used by memory pools to reflect their
more scoped usage.

These changes allow for the multiple instances of a memory pool to
exist and be reused outside of fast-import. In a future commit the
mem_pool type will be moved to its own file.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 81 ++-
 1 file changed, 52 insertions(+), 29 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 38af0a294b..48d4797ab5 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -218,6 +218,19 @@ struct mp_block {
uintmax_t space[FLEX_ARRAY]; /* more */
 };
 
+struct mem_pool {
+   struct mp_block *mp_block;
+
+   /*
+* The amount of available memory to grow the pool by.
+* This size does not include the overhead for the mp_block.
+*/
+   size_t block_alloc;
+
+   /* The total amount of memory allocated by the pool. */
+   size_t pool_alloc;
+};
+
 struct atom_str {
struct atom_str *next_atom;
unsigned short str_len;
@@ -306,9 +319,8 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
-static size_t total_allocd;
-static struct mp_block *mp_block_head;
+static struct mem_pool fi_mem_pool =  {NULL, 2*1024*1024 -
+  sizeof(struct mp_block), 0 };
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -343,6 +355,7 @@ static unsigned int tree_entry_alloc = 1000;
 static void *avail_tree_entry;
 static unsigned int avail_tree_table_sz = 100;
 static struct avail_tree_content **avail_tree_table;
+static size_t tree_entry_allocd;
 static struct strbuf old_tree = STRBUF_INIT;
 static struct strbuf new_tree = STRBUF_INIT;
 
@@ -636,7 +649,21 @@ static unsigned int hc_str(const char *s, size_t len)
return r;
 }
 
-static void *pool_alloc(size_t len)
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+   p->next_block = mem_pool->mp_block;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+   mem_pool->mp_block = p;
+
+   return p;
+}
+
+static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
void *r;
@@ -645,21 +672,17 @@ static void *pool_alloc(size_t len)
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mp_block_head; p; p = p->next_block)
-   if ((p->end - p->next_free >= len))
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if (p->end - p->next_free >= len)
break;
 
if (!p) {
-   if (len >= (mem_pool_alloc/2)) {
-   total_allocd += len;
+   if (len >= (mem_pool->block_alloc / 2)) {
+   mem_pool->pool_alloc += len;
return xmalloc(len);
}
-   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
-   p->next_block = mp_block_head;
-   p->next_free = (char *) p->space;
-   p->end = p->next_free + mem_pool_alloc;
-   mp_block_head = p;
+
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
}
 
r = p->next_free;
@@ -667,10 +690,10 @@ static void *pool_alloc(size_t len)
return r;
 }
 
-static void *pool_calloc(size_t count, size_t size)
+static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
 {
-   size_t len = count * size;
-   void *r = pool_alloc(len);
+   size_t len = st_mult(count, size);
+   void *r = mem_pool_alloc(mem_pool, len);
memset(r, 0, len);
return r;
 }
@@ -678,7 +701,7 @@ static void *pool_calloc(size_t count, size_t size)
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
-   char *r = pool_alloc(len);
+   char *r = mem_pool_alloc(&fi_mem_pool, len);
memcpy(r, s, len);
return r;
 }
@@ -687,7 +710,7 @@ static void insert_mark(uintmax_t idnum, struct 
object_entry *oe)
 {
struct mark_set *s = marks;
while ((idnum >> s->shift) >= 1024) {
-   s = pool_calloc(1, sizeof(struct

[PATCH v4 3/3] Move reusable parts of memory pool into its own file

2018-04-11 Thread Jameson Miller

This moves the reusable parts of the memory pool logic used by
fast-import.c into its own file for use by other components.

Signed-off-by: Jameson Miller 
---
 Makefile  |  1 +
 fast-import.c | 70 +--
 mem-pool.c| 55 ++
 mem-pool.h| 34 +
 4 files changed, 91 insertions(+), 69 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

diff --git a/Makefile b/Makefile
index f181687250..5aa4f8064f 100644
--- a/Makefile
+++ b/Makefile
@@ -844,6 +844,7 @@ LIB_OBJS += log-tree.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
+LIB_OBJS += mem-pool.o
 LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
diff --git a/fast-import.c b/fast-import.c
index 48d4797ab5..05d1079d23 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -170,6 +170,7 @@ Format of STDIN stream:
 #include "run-command.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "mem-pool.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<pool_alloc += sizeof(struct mp_block) + block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
-   p->next_block = mem_pool->mp_block;
-   p->next_free = (char *)p->space;
-   p->end = p->next_free + block_alloc;
-   mem_pool->mp_block = p;
-
-   return p;
-}
-
-static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
-{
-   struct mp_block *p;
-   void *r;
-
-   /* round up to a 'uintmax_t' alignment */
-   if (len & (sizeof(uintmax_t) - 1))
-   len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
-
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
-
-   if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
-
-   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
-   }
-
-   r = p->next_free;
-   p->next_free += len;
-   return r;
-}
-
-static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
-{
-   size_t len = st_mult(count, size);
-   void *r = mem_pool_alloc(mem_pool, len);
-   memset(r, 0, len);
-   return r;
-}
-
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
diff --git a/mem-pool.c b/mem-pool.c
new file mode 100644
index 00..389d7af447
--- /dev/null
+++ b/mem-pool.c
@@ -0,0 +1,55 @@
+/*
+ * Memory Pool implementation logic.
+ */
+
+#include "cache.h"
+#include "mem-pool.h"
+
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+   p->next_block = mem_pool->mp_block;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+   mem_pool->mp_block = p;
+
+   return p;
+}
+
+void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
+{
+   struct mp_block *p;
+   void *r;
+
+   /* round up to a 'uintmax_t' alignment */
+   if (len & (sizeof(uintmax_t) - 1))
+   len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
+
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if (p->end - p->next_free >= len)
+   break;
+
+   if (!p) {
+   if (len >= (mem_pool->block_alloc / 2)) {
+   mem_pool->pool_alloc += len;
+   return xmalloc(len);
+   }
+
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
+   }
+
+   r = p->next_free;
+   p->next_free += len;
+   return r;
+}
+
+void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t size)
+{
+   size_t len = st_mult(count, size);
+   void *r = mem_pool_alloc(mem_pool, len);
+   memset(r, 0, len);
+   return r;
+}
diff --git a/mem-pool.h b/mem-pool.h
new file mode 100644
index 00..829ad58ecf
--- /dev/null
+++ b/mem-pool.h
@@ -0,0 +1,34 @@
+#ifndef MEM_POOL_H
+#define MEM_POOL_H
+
+struct mp_block {
+   struct mp_block *next_block;
+   char *next_free;
+   char *end;
+   uintmax_t space[FLEX_ARRAY]; /* more */
+};
+
+struct mem_pool {
+   struct mp_block *mp_block;
+
+   /*
+* The amount of available memory to grow the pool by.
+* This size does not include the overhead for the mp_block.
+*/
+   size_t bl

[PATCH v4 0/3] Extract memory pool logic into reusable component

2018-04-11 Thread Jameson Miller

Thank you everyone for taking the time review and provide feedback on
this patch series.

Changes from v3:

  - Based patch off of new commit, to resolve merge conflict.

  - Updated log message in 2/3 based on feedback.

  - Squashed patch from Ramsay Jones into 2/3 to fix warning from
sparse.

  - Updated variable names in 2/3 to reflect updated usage of
variable.

Jameson Miller (3):
  fast-import: rename mem_pool type to mp_block
  fast-import: introduce mem_pool type
  Move reusable parts of memory pool into its own file

 Makefile  |  1 +
 fast-import.c | 77 +--
 mem-pool.c| 55 ++
 mem-pool.h| 34 ++
 4 files changed, 106 insertions(+), 61 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

-- 
2.14.3

Re: [PATCH v3 3/3] Move reusable parts of memory pool into its own file

2018-03-29 Thread Jameson Miller

Thank you for the thorough and detailed review - I very much appreciate it.

> I
> am OK if the structure of this series is to make that change after
> these three steps we see here. 

I will add (back) the accounting of large memory blocks in a future patch
series, taking into account your existing feedback. As this patch series did 
not have
any consumers that "discarded" a memory pool, I decided to wait until I had 
a consumer that used this functionality.

> Will queue; the proposed log for step 2/3 may want to be a bit
> polished, though.

I am away the rest of this week and next week, but will update this patch
series when I get back (Week of April 9th).

Thank you
Jameson

From: Junio C Hamano  on behalf of Junio C Hamano 

Sent: Tuesday, March 27, 2018 12:43 PM
To: Jameson Miller
Cc: git@vger.kernel.org; p...@peff.net
Subject: Re: [PATCH v3 3/3] Move reusable parts of memory pool into its own 
file 

Jameson Miller  writes:

> This moves the reusable parts of the memory pool logic used by
> fast-import.c into its own file for use by other components.
>
> Signed-off-by: Jameson Miller 
> ---
>  Makefile  |  1 +
>  fast-import.c | 70 
>+--
>  mem-pool.c    | 55 ++
>  mem-pool.h    | 34 +
>  4 files changed, 91 insertions(+), 69 deletions(-)
>  create mode 100644 mem-pool.c
>  create mode 100644 mem-pool.h

OK.  This is indeed straight-forward line movements and nothing else,
other than obviously a few static helpers are now extern.

I said I'd anticipate that the allocation that bypasses the pool
subsystem would want to become traceable by the pool subsystem,
which would allow us to free() the pieces of memory allocated
directly with xmalloc() in mem_pool_alloc() instead of leaking.  I
am OK if the structure of this series is to make that change after
these three steps we see here.  When that happens, it will start to
make sense to bill the "this is too big so do not attempt to carve
it out from block, and do not overallocate and make the remainder
available for later requests" to the pool instance mem_pool_alloc()
is working on, as that piece of memory is also known to a specific
pool instance.

After I wrote review for 2/3, I found out that you changed the
meaning of total_allocd (which should probably be described in its
log message).  Unlike the original that counted "total", it now is
used only for memory that is allocated directly by fast-import.c and
does not account for memory obtained by calling mem-pool.

The output routine is changed in 2/3 to add fi_mem_pool's pool_alloc
to it, so billing oversized allocation that does *not* belong to any
specific pool to _some_ pool and ignoring total_allocd would cancel
things out.  It still feels a bit fishy, but I think it is OK.

So all in all, I think we are in no worse shape than the original
(we call it "bug-to-bug compatible" ;-)), and successfully extracted
a reusable piece in a separate file in a clean way so that we can
refine and extend it further.  Nicely done.

Will queue; the proposed log for step 2/3 may want to be a bit
polished, though.

Thanks.

[PATCH v3 1/3] fast-import: rename mem_pool type to mp_block

2018-03-26 Thread Jameson Miller

This is part of a patch series to extract the memory pool logic in
fast-import into a more generalized version. The existing mem_pool type
maps more closely to a "block of memory" (mp_block) in the more
generalized memory pool. This commit renames the mem_pool to mp_block to
reduce churn in future patches.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 58ef360da4..6c3215d7c3 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -209,8 +209,8 @@ struct last_object {
unsigned no_swap : 1;
 };
 
-struct mem_pool {
-   struct mem_pool *next_pool;
+struct mp_block {
+   struct mp_block *next_block;
char *next_free;
char *end;
uintmax_t space[FLEX_ARRAY]; /* more */
@@ -304,9 +304,9 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mem_pool);
+static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
 static size_t total_allocd;
-static struct mem_pool *mem_pool;
+static struct mp_block *mp_block_head;
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -636,14 +636,14 @@ static unsigned int hc_str(const char *s, size_t len)
 
 static void *pool_alloc(size_t len)
 {
-   struct mem_pool *p;
+   struct mp_block *p;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool; p; p = p->next_pool)
+   for (p = mp_block_head; p; p = p->next_block)
if ((p->end - p->next_free >= len))
break;
 
@@ -652,12 +652,12 @@ static void *pool_alloc(size_t len)
total_allocd += len;
return xmalloc(len);
}
-   total_allocd += sizeof(struct mem_pool) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mem_pool), mem_pool_alloc));
-   p->next_pool = mem_pool;
+   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
+   p->next_block = mp_block_head;
p->next_free = (char *) p->space;
p->end = p->next_free + mem_pool_alloc;
-   mem_pool = p;
+   mp_block_head = p;
}
 
r = p->next_free;
-- 
2.14.3

[PATCH v3 2/3] fast-import: introduce mem_pool type

2018-03-26 Thread Jameson Miller

Introduce the mem_pool type which encapsulates all the information
necessary to manage a pool of memory.This change moves the existing
variables in fast-import used to support the global memory pool to use
this structure.

These changes allow for the multiple instances of a memory pool to
exist and be reused outside of fast-import. In a future commit the
mem_pool type will be moved to its own file.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 80 +--
 1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 6c3215d7c3..e85512191c 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -216,6 +216,19 @@ struct mp_block {
uintmax_t space[FLEX_ARRAY]; /* more */
 };
 
+struct mem_pool {
+   struct mp_block *mp_block;
+
+   /*
+* The amount of available memory to grow the pool by.
+* This size does not include the overhead for the mp_block.
+*/
+   size_t block_alloc;
+
+   /* The total amount of memory allocated by the pool. */
+   size_t pool_alloc;
+};
+
 struct atom_str {
struct atom_str *next_atom;
unsigned short str_len;
@@ -304,9 +317,7 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
-static size_t total_allocd;
-static struct mp_block *mp_block_head;
+static struct mem_pool fi_mem_pool =  {0, 2*1024*1024 - sizeof(struct 
mp_block), 0 };
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -324,6 +335,7 @@ static off_t pack_size;
 /* Table of objects we've written. */
 static unsigned int object_entry_alloc = 5000;
 static struct object_entry_pool *blocks;
+static size_t total_allocd;
 static struct object_entry *object_table[1 << 16];
 static struct mark_set *marks;
 static const char *export_marks_file;
@@ -634,7 +646,21 @@ static unsigned int hc_str(const char *s, size_t len)
return r;
 }
 
-static void *pool_alloc(size_t len)
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+   p->next_block = mem_pool->mp_block;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+   mem_pool->mp_block = p;
+
+   return p;
+}
+
+static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
void *r;
@@ -643,21 +669,17 @@ static void *pool_alloc(size_t len)
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mp_block_head; p; p = p->next_block)
-   if ((p->end - p->next_free >= len))
-   break;
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if (p->end - p->next_free >= len)
+   break;
 
if (!p) {
-   if (len >= (mem_pool_alloc/2)) {
-   total_allocd += len;
+   if (len >= (mem_pool->block_alloc / 2)) {
+   mem_pool->pool_alloc += len;
return xmalloc(len);
}
-   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
-   p->next_block = mp_block_head;
-   p->next_free = (char *) p->space;
-   p->end = p->next_free + mem_pool_alloc;
-   mp_block_head = p;
+
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
}
 
r = p->next_free;
@@ -665,10 +687,10 @@ static void *pool_alloc(size_t len)
return r;
 }
 
-static void *pool_calloc(size_t count, size_t size)
+static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
 {
-   size_t len = count * size;
-   void *r = pool_alloc(len);
+   size_t len = st_mult(count, size);
+   void *r = mem_pool_alloc(mem_pool, len);
memset(r, 0, len);
return r;
 }
@@ -676,7 +698,7 @@ static void *pool_calloc(size_t count, size_t size)
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
-   char *r = pool_alloc(len);
+   char *r = mem_pool_alloc(&fi_mem_pool, len);
memcpy(r, s, len);
return r;
 }
@@ -685,7 +707,7 @@ static void insert_mark(uintmax_t idnum, struct 
object_entry *oe)
 {
struct mark_set *s = marks;
while ((idnum >> s->shift) >= 1024) {
-   s = pool_calloc(1, sizeof(struct mark_set));
+   s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));

[PATCH v3 0/3] Extract memory pool logic into reusable component

2018-03-26 Thread Jameson Miller

Changes from v2:

  - Code Review Reactions
  
  - Lifetime management functions for mem_pool will be included in
future patch series

This patch series extracts the memory pool implementation, currently
used by fast-import, into a generalized component. This memory pool
can then be generally used by any component that needs a pool of
memory.

This patch is in preparation for a change to speed up the loading of
indexes with a large number of cache entries by reducing the number of
malloc() calls. For a rough example of how this component will be used
in this capacity, please see [1].

[1] 
https://github.com/jamill/git/compare/block_allocation_base...jamill:block_allocation

Jameson Miller (3):
  fast-import: rename mem_pool type to mp_block
  fast-import: introduce mem_pool type
  Move reusable parts of memory pool into its own file

 Makefile  |  1 +
 fast-import.c | 74 +++
 mem-pool.c| 55 
 mem-pool.h| 34 +++
 4 files changed, 104 insertions(+), 60 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

-- 
2.14.3

[PATCH v3 3/3] Move reusable parts of memory pool into its own file

2018-03-26 Thread Jameson Miller

This moves the reusable parts of the memory pool logic used by
fast-import.c into its own file for use by other components.

Signed-off-by: Jameson Miller 
---
 Makefile  |  1 +
 fast-import.c | 70 +--
 mem-pool.c| 55 ++
 mem-pool.h| 34 +
 4 files changed, 91 insertions(+), 69 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

diff --git a/Makefile b/Makefile
index a1d8775adb..1e142b1dd9 100644
--- a/Makefile
+++ b/Makefile
@@ -832,6 +832,7 @@ LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
+LIB_OBJS += mem-pool.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
diff --git a/fast-import.c b/fast-import.c
index e85512191c..12c0058c06 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -168,6 +168,7 @@ Format of STDIN stream:
 #include "dir.h"
 #include "run-command.h"
 #include "packfile.h"
+#include "mem-pool.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<pool_alloc += sizeof(struct mp_block) + block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
-   p->next_block = mem_pool->mp_block;
-   p->next_free = (char *)p->space;
-   p->end = p->next_free + block_alloc;
-   mem_pool->mp_block = p;
-
-   return p;
-}
-
-static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
-{
-   struct mp_block *p;
-   void *r;
-
-   /* round up to a 'uintmax_t' alignment */
-   if (len & (sizeof(uintmax_t) - 1))
-   len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
-
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
-
-   if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
-
-   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
-   }
-
-   r = p->next_free;
-   p->next_free += len;
-   return r;
-}
-
-static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
-{
-   size_t len = st_mult(count, size);
-   void *r = mem_pool_alloc(mem_pool, len);
-   memset(r, 0, len);
-   return r;
-}
-
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
diff --git a/mem-pool.c b/mem-pool.c
new file mode 100644
index 00..389d7af447
--- /dev/null
+++ b/mem-pool.c
@@ -0,0 +1,55 @@
+/*
+ * Memory Pool implementation logic.
+ */
+
+#include "cache.h"
+#include "mem-pool.h"
+
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+   p->next_block = mem_pool->mp_block;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+   mem_pool->mp_block = p;
+
+   return p;
+}
+
+void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
+{
+   struct mp_block *p;
+   void *r;
+
+   /* round up to a 'uintmax_t' alignment */
+   if (len & (sizeof(uintmax_t) - 1))
+   len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
+
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if (p->end - p->next_free >= len)
+   break;
+
+   if (!p) {
+   if (len >= (mem_pool->block_alloc / 2)) {
+   mem_pool->pool_alloc += len;
+   return xmalloc(len);
+   }
+
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
+   }
+
+   r = p->next_free;
+   p->next_free += len;
+   return r;
+}
+
+void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t size)
+{
+   size_t len = st_mult(count, size);
+   void *r = mem_pool_alloc(mem_pool, len);
+   memset(r, 0, len);
+   return r;
+}
diff --git a/mem-pool.h b/mem-pool.h
new file mode 100644
index 00..829ad58ecf
--- /dev/null
+++ b/mem-pool.h
@@ -0,0 +1,34 @@
+#ifndef MEM_POOL_H
+#define MEM_POOL_H
+
+struct mp_block {
+   struct mp_block *next_block;
+   char *next_free;
+   char *end;
+   uintmax_t space[FLEX_ARRAY]; /* more */
+};
+
+struct mem_pool {
+   struct mp_block *mp_block;
+
+   /*
+* The amount of available memory to grow the pool by.
+* This size does not include the overhead for the mp_block.
+*/
+   size_t block_alloc;
+
+   /* T

[PATCH v2 1/5] fast-import: rename mem_pool type to mp_block

2018-03-23 Thread Jameson Miller

This is part of a patch series to extract the memory pool logic in
fast-import into a more generalized version. The existing mem_pool type
maps more closely to a "block of memory" (mp_block) in the more
generalized memory pool. This commit renames the mem_pool to mp_block to
reduce churn in future patches.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 58ef360da4..6c3215d7c3 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -209,8 +209,8 @@ struct last_object {
unsigned no_swap : 1;
 };
 
-struct mem_pool {
-   struct mem_pool *next_pool;
+struct mp_block {
+   struct mp_block *next_block;
char *next_free;
char *end;
uintmax_t space[FLEX_ARRAY]; /* more */
@@ -304,9 +304,9 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mem_pool);
+static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
 static size_t total_allocd;
-static struct mem_pool *mem_pool;
+static struct mp_block *mp_block_head;
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -636,14 +636,14 @@ static unsigned int hc_str(const char *s, size_t len)
 
 static void *pool_alloc(size_t len)
 {
-   struct mem_pool *p;
+   struct mp_block *p;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool; p; p = p->next_pool)
+   for (p = mp_block_head; p; p = p->next_block)
if ((p->end - p->next_free >= len))
break;
 
@@ -652,12 +652,12 @@ static void *pool_alloc(size_t len)
total_allocd += len;
return xmalloc(len);
}
-   total_allocd += sizeof(struct mem_pool) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mem_pool), mem_pool_alloc));
-   p->next_pool = mem_pool;
+   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
+   p->next_block = mp_block_head;
p->next_free = (char *) p->space;
p->end = p->next_free + mem_pool_alloc;
-   mem_pool = p;
+   mp_block_head = p;
}
 
r = p->next_free;
-- 
2.14.3

[PATCH v2 4/5] Move the reusable parts of memory pool into its own file

2018-03-23 Thread Jameson Miller

This moves the reusable parts of the memory pool logic used by
fast-import.c into its own file for use by other components.

Signed-off-by: Jameson Miller 
---
 Makefile  |   1 +
 fast-import.c | 118 +-
 mem-pool.c| 103 ++
 mem-pool.h|  34 +
 4 files changed, 139 insertions(+), 117 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

diff --git a/Makefile b/Makefile
index a1d8775adb..1e142b1dd9 100644
--- a/Makefile
+++ b/Makefile
@@ -832,6 +832,7 @@ LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
+LIB_OBJS += mem-pool.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
diff --git a/fast-import.c b/fast-import.c
index 519e1cbd7f..36ac5760d6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -168,6 +168,7 @@ Format of STDIN stream:
 #include "dir.h"
 #include "run-command.h"
 #include "packfile.h"
+#include "mem-pool.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<pool_alloc += sizeof(struct mp_block) + mem_pool->block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool->block_alloc));
-   p->next_block = mem_pool->mp_block;
-   p->next_free = (char *)p->space;
-   p->end = p->next_free + mem_pool->block_alloc;
-   mem_pool->mp_block = p;
-
-   return p;
-}
-
-/*
- * Allocates a block of memory with a specific size and
- * appends it to the memory pool's list of blocks.
- *
- * This function is used to allocate blocks with sizes
- * different than the default "block_alloc" size for the mem_pool.
- *
- * There are two use cases:
- *  1) The initial block allocation for a memory pool.
- *
- *  2) large" blocks of a specific size, where the entire memory block
- * is going to be used. This means the block will not have any
- * available memory for future allocations. The block is placed at
- * the end of the list so that it will be the last block searched
- * for available space.
- */
-static struct mp_block *mem_pool_alloc_block_with_size(struct mem_pool 
*mem_pool, size_t block_alloc)
-{
-   struct mp_block *p, *block;
-
-   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
-
-   block = mem_pool->mp_block;
-   if (block) {
-   while (block->next_block)
-   block = block->next_block;
-
-   block->next_block = p;
-   } else {
-   mem_pool->mp_block = p;
-   }
-
-   p->next_block = NULL;
-   p->next_free = (char *)p->space;
-   p->end = p->next_free + block_alloc;
-
-   return p;
-}
-
-static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
-{
-   struct mp_block *p;
-   void *r;
-
-   /* round up to a 'uintmax_t' alignment */
-   if (len & (sizeof(uintmax_t) - 1))
-   len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
-
-   p = mem_pool->mp_block;
-
-   /*
-* In performance profiling, there was a minor perf benefit to
-* check for available memory in the head block via the if
-* statement, and only going through the loop when needed.
-*/
-   if (p &&
-  (p->end - p->next_free < len)) {
-   for (p = p->next_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
-   }
-
-   if (!p) {
-   if (len >= (mem_pool->block_alloc / 2))
-   p = mem_pool_alloc_block_with_size(mem_pool, len);
-   else
-   p = mem_pool_alloc_block(mem_pool);
-   }
-
-   r = p->next_free;
-   p->next_free += len;
-   return r;
-}
-
-static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
-{
-   size_t len = st_mult(count, size);
-   void *r = mem_pool_alloc(mem_pool, len);
-   memset(r, 0, len);
-   return r;
-}
-
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
diff --git a/mem-pool.c b/mem-pool.c
new file mode 100644
index 00..992e354e12
--- /dev/null
+++ b/mem-pool.c
@@ -0,0 +1,103 @@
+/*
+ * Memory Pool implementation logic.
+ */
+
+#include "cache.h"
+#include "mem-pool.h"
+
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + mem_pool->block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool->block_alloc));
+   p->next_block = mem_pool->mp_block;
+   p->next_free = (char *)p-&g

[PATCH v2 2/5] fast-import: introduce mem_pool type

2018-03-23 Thread Jameson Miller

Introduce the mem_pool type and wrap the existing mp_block in this new
type. The new mem_pool type will allow for the memory pool logic to be
reused outside of fast-import. This type will be moved into its own file
in a future commit.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 108 +++---
 1 file changed, 89 insertions(+), 19 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 6c3215d7c3..1262d9e6be 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -216,6 +216,19 @@ struct mp_block {
uintmax_t space[FLEX_ARRAY]; /* more */
 };
 
+struct mem_pool {
+   struct mp_block *mp_block;
+
+   /*
+* The amount of available memory to grow the pool by.
+* This size does not include the overhead for the mp_block.
+*/
+   size_t block_alloc;
+
+   /* The total amount of memory allocated by the pool. */
+   size_t pool_alloc;
+};
+
 struct atom_str {
struct atom_str *next_atom;
unsigned short str_len;
@@ -304,9 +317,7 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
-static size_t total_allocd;
-static struct mp_block *mp_block_head;
+static struct mem_pool fi_mem_pool =  {0, 2*1024*1024 - sizeof(struct 
mp_block), 0 };
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -324,6 +335,7 @@ static off_t pack_size;
 /* Table of objects we've written. */
 static unsigned int object_entry_alloc = 5000;
 static struct object_entry_pool *blocks;
+static size_t total_allocd = 0;
 static struct object_entry *object_table[1 << 16];
 static struct mark_set *marks;
 static const char *export_marks_file;
@@ -634,6 +646,60 @@ static unsigned int hc_str(const char *s, size_t len)
return r;
 }
 
+static struct mp_block *pool_alloc_block()
+{
+   struct mp_block *p;
+
+   fi_mem_pool.pool_alloc += sizeof(struct mp_block) + 
fi_mem_pool.block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), fi_mem_pool.block_alloc));
+   p->next_block = fi_mem_pool.mp_block;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + fi_mem_pool.block_alloc;
+   fi_mem_pool.mp_block = p;
+
+   return p;
+}
+
+/*
+ * Allocates a block of memory with a specific size and
+ * appends it to the memory pool's list of blocks.
+ *
+ * This function is used to allocate blocks with sizes
+ * different than the default "block_alloc" size for the mem_pool.
+ *
+ * There are two use cases:
+ *  1) The initial block allocation for a memory pool.
+ *
+ *  2) large" blocks of a specific size, where the entire memory block
+ * is going to be used. This means the block will not have any
+ * available memory for future allocations. The block is placed at
+ * the end of the list so that it will be the last block searched
+ * for available space.
+ */
+static struct mp_block *pool_alloc_block_with_size(size_t block_alloc)
+{
+   struct mp_block *p, *block;
+
+   fi_mem_pool.pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+
+   block = fi_mem_pool.mp_block;
+   if (block) {
+   while (block->next_block)
+   block = block->next_block;
+
+   block->next_block = p;
+   } else {
+   fi_mem_pool.mp_block = p;
+   }
+
+   p->next_block = NULL;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+
+   return p;
+}
+
 static void *pool_alloc(size_t len)
 {
struct mp_block *p;
@@ -643,21 +709,25 @@ static void *pool_alloc(size_t len)
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mp_block_head; p; p = p->next_block)
-   if ((p->end - p->next_free >= len))
-   break;
+   p = fi_mem_pool.mp_block;
+
+   /*
+* In performance profiling, there was a minor perf benefit to
+* check for available memory in the head block via the if
+* statement, and only going through the loop when needed.
+*/
+   if (p &&
+  (p->end - p->next_free < len)) {
+   for (p = p->next_block; p; p = p->next_block)
+   if (p->end - p->next_free >= len)
+   break;
+   }
 
if (!p) {
-   if (len >= (mem_pool_alloc/2)) {
-   total_allocd += len;
-   return xmalloc(len);
-   }
-   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
-   p->next

[PATCH v2 5/5] Expand implementation of mem-pool type

2018-03-23 Thread Jameson Miller

This commit adds functionality to the mem-pool type that can be
generally useful. This functionality will be used in a future commit.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 58 ++
 mem-pool.h | 24 
 2 files changed, 82 insertions(+)

diff --git a/mem-pool.c b/mem-pool.c
index 992e354e12..7d21a7e035 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,6 +5,8 @@
 #include "cache.h"
 #include "mem-pool.h"
 
+#define MIN_ALLOC_GROWTH_SIZE 1024 * 1024
+
 static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool)
 {
struct mp_block *p;
@@ -59,6 +61,36 @@ static struct mp_block 
*mem_pool_alloc_block_with_size(struct mem_pool *mem_pool
return p;
 }
 
+void mem_pool_init(struct mem_pool **mem_pool, size_t block_alloc, size_t 
initial_size)
+{
+   if (!(*mem_pool))
+   {
+   if (block_alloc < (MIN_ALLOC_GROWTH_SIZE - sizeof(struct 
mp_block)))
+   block_alloc = (MIN_ALLOC_GROWTH_SIZE - sizeof(struct 
mp_block));
+
+   *mem_pool = xmalloc(sizeof(struct mem_pool));
+   (*mem_pool)->pool_alloc = 0;
+   (*mem_pool)->mp_block = 0;
+   (*mem_pool)->block_alloc = block_alloc;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block_with_size(*mem_pool, initial_size);
+   }
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   struct mp_block *block, *block_to_free;
+   for (block = mem_pool->mp_block; block;)
+   {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
@@ -101,3 +133,29 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   struct mp_block **tail = &dst->mp_block;
+   /* find pointer of dst's last block (if any) */
+   while (*tail)
+   tail = &(*tail)->next_block;
+
+   /* append the blocks from src to dst */
+   *tail = src->mp_block;
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+}
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..d9e7f21541 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -21,6 +21,17 @@ struct mem_pool {
size_t pool_alloc;
 };
 
+/*
+ * Initialize mem_pool with specified parameters for initial size and
+ * how much to grow when a larger memory block is required.
+ */
+void mem_pool_init(struct mem_pool **mem_pool, size_t alloc_growth_size, 
size_t initial_size);
+
+/*
+ * Discard a memory pool and free all the memory it is responsible for.
+ */
+void mem_pool_discard(struct mem_pool *mem_pool);
+
 /*
  * Alloc memory from the mem_pool.
  */
@@ -31,4 +42,17 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len);
  */
 void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size);
 
+/*
+ * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src'
+ * pool will be empty and not contain any memory. It still needs to be free'd
+ * with a call to `mem_pool_discard`.
+ */
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src);
+
+/*
+ * Check if a memory pointed at by 'mem' is part of the range of
+ * memory managed by the specified mem_pool.
+ */
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
+
 #endif
-- 
2.14.3

[PATCH v2 3/5] fast-import: update pool_* functions to work on local pool

2018-03-23 Thread Jameson Miller

Update memory pool functions to work on a passed in mem_pool instead of
the global mem_pool type. This is in preparation for making the memory
pool logic reusable.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 52 ++--
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 1262d9e6be..519e1cbd7f 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -646,16 +646,16 @@ static unsigned int hc_str(const char *s, size_t len)
return r;
 }
 
-static struct mp_block *pool_alloc_block()
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool)
 {
struct mp_block *p;
 
-   fi_mem_pool.pool_alloc += sizeof(struct mp_block) + 
fi_mem_pool.block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), fi_mem_pool.block_alloc));
-   p->next_block = fi_mem_pool.mp_block;
+   mem_pool->pool_alloc += sizeof(struct mp_block) + mem_pool->block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool->block_alloc));
+   p->next_block = mem_pool->mp_block;
p->next_free = (char *)p->space;
-   p->end = p->next_free + fi_mem_pool.block_alloc;
-   fi_mem_pool.mp_block = p;
+   p->end = p->next_free + mem_pool->block_alloc;
+   mem_pool->mp_block = p;
 
return p;
 }
@@ -676,21 +676,21 @@ static struct mp_block *pool_alloc_block()
  * the end of the list so that it will be the last block searched
  * for available space.
  */
-static struct mp_block *pool_alloc_block_with_size(size_t block_alloc)
+static struct mp_block *mem_pool_alloc_block_with_size(struct mem_pool 
*mem_pool, size_t block_alloc)
 {
struct mp_block *p, *block;
 
-   fi_mem_pool.pool_alloc += sizeof(struct mp_block) + block_alloc;
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
 
-   block = fi_mem_pool.mp_block;
+   block = mem_pool->mp_block;
if (block) {
while (block->next_block)
block = block->next_block;
 
block->next_block = p;
} else {
-   fi_mem_pool.mp_block = p;
+   mem_pool->mp_block = p;
}
 
p->next_block = NULL;
@@ -700,7 +700,7 @@ static struct mp_block *pool_alloc_block_with_size(size_t 
block_alloc)
return p;
 }
 
-static void *pool_alloc(size_t len)
+static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
void *r;
@@ -709,7 +709,7 @@ static void *pool_alloc(size_t len)
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   p = fi_mem_pool.mp_block;
+   p = mem_pool->mp_block;
 
/*
 * In performance profiling, there was a minor perf benefit to
@@ -724,10 +724,10 @@ static void *pool_alloc(size_t len)
}
 
if (!p) {
-   if (len >= (fi_mem_pool.block_alloc / 2))
-   p = pool_alloc_block_with_size(len);
+   if (len >= (mem_pool->block_alloc / 2))
+   p = mem_pool_alloc_block_with_size(mem_pool, len);
else
-   p = pool_alloc_block();
+   p = mem_pool_alloc_block(mem_pool);
}
 
r = p->next_free;
@@ -735,10 +735,10 @@ static void *pool_alloc(size_t len)
return r;
 }
 
-static void *pool_calloc(size_t count, size_t size)
+static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
 {
size_t len = st_mult(count, size);
-   void *r = pool_alloc(len);
+   void *r = mem_pool_alloc(mem_pool, len);
memset(r, 0, len);
return r;
 }
@@ -746,7 +746,7 @@ static void *pool_calloc(size_t count, size_t size)
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
-   char *r = pool_alloc(len);
+   char *r = mem_pool_alloc(&fi_mem_pool, len);
memcpy(r, s, len);
return r;
 }
@@ -755,7 +755,7 @@ static void insert_mark(uintmax_t idnum, struct 
object_entry *oe)
 {
struct mark_set *s = marks;
while ((idnum >> s->shift) >= 1024) {
-   s = pool_calloc(1, sizeof(struct mark_set));
+   s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
s->shift = marks->shift + 10;
s->data.sets[0] = marks;
marks = s;
@@ -764,7 +764,7 @@ static void insert_mark(uintmax_t idnum, struct 
object_entry *oe)
uintmax_t i = idnum >> s->shift;
idnum -= i << s->shift;
if (!s->data.sets[i]) {
-   s->data.sets[i] = pool_calloc(1, sizeof(struct 
mark_set));
+   s

[PATCH v2 0/3] Extract memory pool logic into reusable component

2018-03-23 Thread Jameson Miller

Changes from v1:
  - Rework the structure of commits to be more reviewer friendly.

This patch series extracts the memory pool implementation, currently
used by fast-import, into a generalized component. This memory pool
can then be generally used by any component that needs a pool of
memory.

This patch is in preparation for a change to speed up the loading of
indexes with a large number of cache entries by reducing the number of
malloc() calls. For a rough example of how this component will be used
in this capacity, please see [1].

The last commit in this series includes changes to fill out the
mem_pool type into a more complete implementation. These functions are
not used in this change, but we will be used in an upcoming patch
series. I can remove this commit if it is desired.

[1] 
https://github.com/jamill/git/compare/block_allocation_base...jamill:block_allocation

Jameson Miller (5):
  fast-import: rename mem_pool type to mp_block
  fast-import: introduce mem_pool type
  fast-import: update pool_* functions to work on local pool
  Move the reusable parts of memory pool into its own file
  Expand implementation of mem-pool type

 Makefile  |   1 +
 fast-import.c |  74 +--
 mem-pool.c| 161 ++
 mem-pool.h|  58 +
 4 files changed, 234 insertions(+), 60 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

-- 
2.14.3

[PATCH v4 4/4] status: test ignored modes

2017-10-23 Thread Jameson Miller

Add tests around status reporting ignord files that match an exclude
pattern for both --untracked-files=normal and --untracked-files=all.

Signed-off-by: Jameson Miller 
---
 t/t7521-ignored-mode.sh | 233 
 1 file changed, 233 insertions(+)
 create mode 100755 t/t7521-ignored-mode.sh

diff --git a/t/t7521-ignored-mode.sh b/t/t7521-ignored-mode.sh
new file mode 100755
index 00..91790943c3
--- /dev/null
+++ b/t/t7521-ignored-mode.sh
@@ -0,0 +1,233 @@
+#!/bin/sh
+
+test_description='git status ignored modes'
+
+. ./test-lib.sh
+
+test_expect_success 'setup initial commit and ignore file' '
+   cat >.gitignore <<-\EOF &&
+   *.ign
+   ignored_dir/
+   !*.unignore
+   EOF
+   git add . &&
+   git commit -m "Initial commit"
+'
+
+test_expect_success 'Verify behavior of status on directories with ignored 
files' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! dir/ignored/ignored_1.ign
+   ! dir/ignored/ignored_2.ign
+   ! ignored/ignored_1.ign
+   ! ignored/ignored_2.ign
+   EOF
+
+   mkdir -p ignored dir/ignored &&
+   touch ignored/ignored_1.ign ignored/ignored_2.ign \
+   dir/ignored/ignored_1.ign dir/ignored/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on directory with tracked & 
ignored files' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! dir/tracked_ignored/ignored_1.ign
+   ! dir/tracked_ignored/ignored_2.ign
+   ! tracked_ignored/ignored_1.ign
+   ! tracked_ignored/ignored_2.ign
+   EOF
+
+   mkdir -p tracked_ignored dir/tracked_ignored &&
+   touch tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+   tracked_ignored/ignored_1.ign tracked_ignored/ignored_2.ign \
+   dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 \
+   dir/tracked_ignored/ignored_1.ign 
dir/tracked_ignored/ignored_2.ign &&
+
+   git add tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+   dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 &&
+   git commit -m "commit tracked files" &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on directory with untracked and 
ignored files' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? dir/untracked_ignored/untracked_1
+   ? dir/untracked_ignored/untracked_2
+   ? expect
+   ? output
+   ? untracked_ignored/untracked_1
+   ? untracked_ignored/untracked_2
+   ! dir/untracked_ignored/ignored_1.ign
+   ! dir/untracked_ignored/ignored_2.ign
+   ! untracked_ignored/ignored_1.ign
+   ! untracked_ignored/ignored_2.ign
+   EOF
+
+   mkdir -p untracked_ignored dir/untracked_ignored &&
+   touch untracked_ignored/untracked_1 untracked_ignored/untracked_2 \
+   untracked_ignored/ignored_1.ign untracked_ignored/ignored_2.ign 
\
+   dir/untracked_ignored/untracked_1 
dir/untracked_ignored/untracked_2 \
+   dir/untracked_ignored/ignored_1.ign 
dir/untracked_ignored/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status matching ignored files on ignored 
directory' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/
+   EOF
+
+   mkdir ignored_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on ignored directory containing 
tracked file' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/ignored_1
+   ! ignored_dir/ignored_1.ign
+   ! ignored

[PATCH v4 2/4] status: report matching ignored and normal untracked

2017-10-23 Thread Jameson Miller

Teach status command to handle `--ignored=matching` with
`--untracked-files=normal`

Signed-off-by: Jameson Miller 
---
 dir.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index b9af87eca9..20457724c0 100644
--- a/dir.c
+++ b/dir.c
@@ -1585,6 +1585,7 @@ static enum path_treatment treat_one_path(struct 
dir_struct *dir,
 {
int exclude;
int has_path_in_index = !!index_file_exists(istate, path->buf, 
path->len, ignore_case);
+   enum path_treatment path_treatment;
 
if (dtype == DT_UNKNOWN)
dtype = get_dtype(de, istate, path->buf, path->len);
@@ -1631,8 +1632,23 @@ static enum path_treatment treat_one_path(struct 
dir_struct *dir,
return path_none;
case DT_DIR:
strbuf_addch(path, '/');
-   return treat_directory(dir, istate, untracked, path->buf, 
path->len,
-  baselen, exclude, pathspec);
+   path_treatment = treat_directory(dir, istate, untracked,
+path->buf, path->len,
+baselen, exclude, pathspec);
+   /*
+* If 1) we only want to return directories that
+* match an exclude pattern and 2) this directory does
+* not match an exclude pattern but all of its
+* contents are excluded, then indicate that we should
+* recurse into this directory (instead of marking the
+* directory itself as an ignored path).
+*/
+   if (!exclude &&
+   path_treatment == path_excluded &&
+   (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+   (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+   return path_recurse;
+   return path_treatment;
case DT_REG:
case DT_LNK:
return exclude ? path_excluded : path_untracked;
-- 
2.13.6

[PATCH v4 1/4] status: add option to show ignored files differently

2017-10-23 Thread Jameson Miller

Teach the status command more flexibility in how ignored files are
reported. Currently, the reporting of ignored files and untracked
files are linked. You cannot control how ignored files are reported
independently of how untracked files are reported (i.e. `all` vs
`normal`). This makes it impossible to show untracked files with the
`all` option, but show ignored files with the `normal` option.

This work 1) adds the ability to control the reporting of ignored
files independently of untracked files and 2) introduces the concept
of status reporting ignored paths that explicitly match an ignored
pattern. There are 2 benefits to these changes: 1) if a consumer needs
all untracked files but not all ignored files, there is a performance
benefit to not scanning all contents of an ignored directory and 2)
returning ignored files that explicitly match a path allow a consumer
to make more informed decisions about when a status result might be
stale.

This commit implements --ignored=matching with --untracked-files=all.
The following commit will implement --ignored=matching with
--untracked=files=normal.

As an example of where this flexibility could be useful is that our
application (Visual Studio) runs the status command and presents the
output. It shows all untracked files individually (e.g. using the
'--untracked-files==all' option), and would like to know about which
paths are ignored. It uses information about ignored paths to make
decisions about when the status result might have changed.
Additionally, many projects place build output into directories inside
a repository's working directory (e.g. in "bin/" and "obj/"
directories). Normal usage is to explicitly ignore these 2 directory
names in the .gitignore file (rather than or in addition to the *.obj
pattern).If an application could know that these directories are
explicitly ignored, it could infer that all contents are ignored as
well and make better informed decisions about files in these
directories. It could infer that any changes under these paths would
not affect the output of status. Additionally, there can be a
significant performance benefit by avoiding scanning through ignored
directories.

When status is set to report matching ignored files, it has the
following behavior. Ignored files and directories that explicitly
match an exclude pattern are reported. If an ignored directory matches
an exclude pattern, then the path of the directory is returned. If a
directory does not match an exclude pattern, but all of its contents
are ignored, then the contained files are reported instead of the
directory.

Signed-off-by: Jameson Miller 
---
 builtin/commit.c | 31 +--
 dir.c| 24 
 dir.h|  3 ++-
 wt-status.c  | 11 ---
 wt-status.h  |  8 +++-
 5 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d75b3805ea..98d84d0277 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -118,7 +118,7 @@ static int edit_flag = -1; /* unspecified */
 static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
 static int config_commit_verbose = -1; /* unspecified */
 static int no_post_rewrite, allow_empty_message;
-static char *untracked_files_arg, *force_date, *ignore_submodule_arg;
+static char *untracked_files_arg, *force_date, *ignore_submodule_arg, 
*ignored_arg;
 static char *sign_commit;
 
 /*
@@ -139,7 +139,7 @@ static const char *cleanup_arg;
 static enum commit_whence whence;
 static int sequencer_in_use;
 static int use_editor = 1, include_status = 1;
-static int show_ignored_in_status, have_option_m;
+static int have_option_m;
 static struct strbuf message = STRBUF_INIT;
 
 static enum wt_status_format status_format = STATUS_FORMAT_UNSPECIFIED;
@@ -1075,6 +1075,19 @@ static const char *find_author_by_nickname(const char 
*name)
die(_("--author '%s' is not 'Name ' and matches no existing 
author"), name);
 }
 
+static void handle_ignored_arg(struct wt_status *s)
+{
+   if (!ignored_arg)
+   ; /* default already initialized */
+   else if (!strcmp(ignored_arg, "traditional"))
+   s->show_ignored_mode = SHOW_TRADITIONAL_IGNORED;
+   else if (!strcmp(ignored_arg, "no"))
+   s->show_ignored_mode = SHOW_NO_IGNORED;
+   else if (!strcmp(ignored_arg, "matching"))
+   s->show_ignored_mode = SHOW_MATCHING_IGNORED;
+   else
+   die(_("Invalid ignored mode '%s'"), ignored_arg);
+}
 
 static void handle_untracked_files_arg(struct wt_status *s)
 {
@@ -1363,8 +1376,10 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
  N_("mode"),
  N_("show untracked files, optional modes: all, normal, no. 
(Default: all)"),
  PARS

[PATCH v4 3/4] status: document options to show matching ignored files

2017-10-23 Thread Jameson Miller

Signed-off-by: Jameson Miller 
---
 Documentation/git-status.txt  | 21 +-
 Documentation/technical/api-directory-listing.txt | 27 +++
 2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 9f3a78a36c..fc282e0a92 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -97,8 +97,27 @@ configuration variable documented in linkgit:git-config[1].
(and suppresses the output of submodule summaries when the config option
`status.submoduleSummary` is set).
 
---ignored::
+--ignored[=]::
Show ignored files as well.
++
+The mode parameter is used to specify the handling of ignored files.
+It is optional: it defaults to 'traditional'.
++
+The possible options are:
++
+   - 'traditional' - Shows ignored files and directories, unless
+ --untracked-files=all is specifed, in which case
+ individual files in ignored directories are
+ displayed.
+   - 'no'  - Show no ignored files.
+   - 'matching'- Shows ignored files and directories matching an
+ ignore pattern.
++
+When 'matching' mode is specified, paths that explicity match an
+ignored pattern are shown. If a directory matches an ignore pattern,
+then it is shown, but not paths contained in the ignored directory. If
+a directory does not match an ignore pattern, but all contents are
+ignored, then the directory is not shown, but all contents are shown.
 
 -z::
Terminate entries with NUL, instead of LF.  This implies
diff --git a/Documentation/technical/api-directory-listing.txt 
b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..7fae00f44f 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -22,16 +22,20 @@ The notable options are:
 
 `flags`::
 
-   A bit-field of options (the `*IGNORED*` flags are mutually exclusive):
+   A bit-field of options:
 
 `DIR_SHOW_IGNORED`:::
 
-   Return just ignored files in `entries[]`, not untracked files.
+   Return just ignored files in `entries[]`, not untracked
+   files. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED_TOO`.
 
 `DIR_SHOW_IGNORED_TOO`:::
 
-   Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
-   in addition to untracked files in `entries[]`.
+   Similar to `DIR_SHOW_IGNORED`, but return ignored files in
+   `ignored[]` in addition to untracked files in
+   `entries[]`. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED`.
 
 `DIR_KEEP_UNTRACKED_CONTENTS`:::
 
@@ -39,6 +43,21 @@ The notable options are:
untracked contents of untracked directories are also returned in
`entries[]`.
 
+`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::
+
+   Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
+   this is set, returns ignored files and directories that match
+   an exclude pattern. If a directory matches an exclude pattern,
+   then the directory is returned and the contained paths are
+   not. A directory that does not match an exclude pattern will
+   not be returned even if all of its contents are ignored. In
+   this case, the contents are returned as individual entries.
++
+If this is set, files and directories that explicity match an ignore
+pattern are reported. Implicity ignored directories (directories that
+do not match an ignore pattern, but whose contents are all ignored)
+are not reported, instead all of the contents are reported.
+
 `DIR_COLLECT_IGNORED`:::
 
Special mode for git-add. Return ignored files in `ignored[]` and
-- 
2.13.6

[PATCH v4 0/4] status: add option to show ignored files differently

2017-10-23 Thread Jameson Miller

The previous iteration can be found here:

https://public-inbox.org/git/20171019160601.9382-1-jam...@microsoft.com/


Changes from V3 patch series:
 - Added extra test for --ignored=no an --ignored=traditional
 - Updated wording from "folders" -> "directories"
 - Renamed test to a name not taken in by topics in flight

Jameson Miller (4):
  status: add option to show ignored files differently
  status: report matching ignored and normal untracked
  status: document options to show matching ignored files
  status: test ignored modes

 Documentation/git-status.txt  |  21 +-
 Documentation/technical/api-directory-listing.txt |  27 ++-
 builtin/commit.c  |  31 ++-
 dir.c |  44 +++-
 dir.h |   3 +-
 t/t7521-ignored-mode.sh   | 233 ++
 wt-status.c   |  11 +-
 wt-status.h   |   8 +-
 8 files changed, 360 insertions(+), 18 deletions(-)
 create mode 100755 t/t7521-ignored-mode.sh

-- 
2.13.6

[PATCH v3 4/4] status: test --ignored=matching

2017-10-19 Thread Jameson Miller

Add tests around status reporting ignord files that match an exclude
pattern for both --untracked-files=normal and --untracked-files=all.

Signed-off-by: Jameson Miller 
---
 t/t7519-ignored-mode.sh | 183 
 1 file changed, 183 insertions(+)
 create mode 100755 t/t7519-ignored-mode.sh

diff --git a/t/t7519-ignored-mode.sh b/t/t7519-ignored-mode.sh
new file mode 100755
index 00..6be7701d79
--- /dev/null
+++ b/t/t7519-ignored-mode.sh
@@ -0,0 +1,183 @@
+#!/bin/sh
+
+test_description='git status collapse ignored'
+
+. ./test-lib.sh
+
+# commit initial ignore file
+test_expect_success 'setup initial commit and ignore file' '
+   cat >.gitignore <<-\EOF &&
+   *.ign
+   ignored_dir/
+   !*.unignore
+   EOF
+   git add . &&
+   git commit -m "Initial commit"
+'
+
+test_expect_success 'Verify behavior of status on folders with ignored files' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! dir/ignored/ignored_1.ign
+   ! dir/ignored/ignored_2.ign
+   ! ignored/ignored_1.ign
+   ! ignored/ignored_2.ign
+   EOF
+
+   mkdir -p ignored dir/ignored &&
+   touch ignored/ignored_1.ign ignored/ignored_2.ign \
+   dir/ignored/ignored_1.ign dir/ignored/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on folder with tracked & ignored 
files' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! dir/tracked_ignored/ignored_1.ign
+   ! dir/tracked_ignored/ignored_2.ign
+   ! tracked_ignored/ignored_1.ign
+   ! tracked_ignored/ignored_2.ign
+   EOF
+
+   mkdir -p tracked_ignored dir/tracked_ignored &&
+   touch tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+   tracked_ignored/ignored_1.ign tracked_ignored/ignored_2.ign \
+   dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 \
+   dir/tracked_ignored/ignored_1.ign 
dir/tracked_ignored/ignored_2.ign &&
+
+   git add tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+   dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 &&
+   git commit -m "commit tracked files" &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on folder with untracked and 
ignored files' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? dir/untracked_ignored/untracked_1
+   ? dir/untracked_ignored/untracked_2
+   ? expect
+   ? output
+   ? untracked_ignored/untracked_1
+   ? untracked_ignored/untracked_2
+   ! dir/untracked_ignored/ignored_1.ign
+   ! dir/untracked_ignored/ignored_2.ign
+   ! untracked_ignored/ignored_1.ign
+   ! untracked_ignored/ignored_2.ign
+   EOF
+
+   mkdir -p untracked_ignored dir/untracked_ignored &&
+   touch untracked_ignored/untracked_1 untracked_ignored/untracked_2 \
+   untracked_ignored/ignored_1.ign untracked_ignored/ignored_2.ign 
\
+   dir/untracked_ignored/untracked_1 
dir/untracked_ignored/untracked_2 \
+   dir/untracked_ignored/ignored_1.ign 
dir/untracked_ignored/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status matching ignored files on ignored folder' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/
+   EOF
+
+   mkdir ignored_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on ignored folder containing 
tracked file' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/ignored_1
+   ! ignored_dir/ignored_1.ign
+

[PATCH v3 2/4] status: report matching ignored and normal untracked

2017-10-19 Thread Jameson Miller

Teach status command to handle `--ignored=matching` with
`--untracked-files=normal`

Signed-off-by: Jameson Miller 
---
 dir.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index b9af87eca9..20457724c0 100644
--- a/dir.c
+++ b/dir.c
@@ -1585,6 +1585,7 @@ static enum path_treatment treat_one_path(struct 
dir_struct *dir,
 {
int exclude;
int has_path_in_index = !!index_file_exists(istate, path->buf, 
path->len, ignore_case);
+   enum path_treatment path_treatment;
 
if (dtype == DT_UNKNOWN)
dtype = get_dtype(de, istate, path->buf, path->len);
@@ -1631,8 +1632,23 @@ static enum path_treatment treat_one_path(struct 
dir_struct *dir,
return path_none;
case DT_DIR:
strbuf_addch(path, '/');
-   return treat_directory(dir, istate, untracked, path->buf, 
path->len,
-  baselen, exclude, pathspec);
+   path_treatment = treat_directory(dir, istate, untracked,
+path->buf, path->len,
+baselen, exclude, pathspec);
+   /*
+* If 1) we only want to return directories that
+* match an exclude pattern and 2) this directory does
+* not match an exclude pattern but all of its
+* contents are excluded, then indicate that we should
+* recurse into this directory (instead of marking the
+* directory itself as an ignored path).
+*/
+   if (!exclude &&
+   path_treatment == path_excluded &&
+   (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+   (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+   return path_recurse;
+   return path_treatment;
case DT_REG:
case DT_LNK:
return exclude ? path_excluded : path_untracked;
-- 
2.13.6

[PATCH v3 3/4] status: document options to show matching ignored files

2017-10-19 Thread Jameson Miller

Signed-off-by: Jameson Miller 
---
 Documentation/git-status.txt  | 21 +-
 Documentation/technical/api-directory-listing.txt | 27 +++
 2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 9f3a78a36c..fc282e0a92 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -97,8 +97,27 @@ configuration variable documented in linkgit:git-config[1].
(and suppresses the output of submodule summaries when the config option
`status.submoduleSummary` is set).
 
---ignored::
+--ignored[=]::
Show ignored files as well.
++
+The mode parameter is used to specify the handling of ignored files.
+It is optional: it defaults to 'traditional'.
++
+The possible options are:
++
+   - 'traditional' - Shows ignored files and directories, unless
+ --untracked-files=all is specifed, in which case
+ individual files in ignored directories are
+ displayed.
+   - 'no'  - Show no ignored files.
+   - 'matching'- Shows ignored files and directories matching an
+ ignore pattern.
++
+When 'matching' mode is specified, paths that explicity match an
+ignored pattern are shown. If a directory matches an ignore pattern,
+then it is shown, but not paths contained in the ignored directory. If
+a directory does not match an ignore pattern, but all contents are
+ignored, then the directory is not shown, but all contents are shown.
 
 -z::
Terminate entries with NUL, instead of LF.  This implies
diff --git a/Documentation/technical/api-directory-listing.txt 
b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..7fae00f44f 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -22,16 +22,20 @@ The notable options are:
 
 `flags`::
 
-   A bit-field of options (the `*IGNORED*` flags are mutually exclusive):
+   A bit-field of options:
 
 `DIR_SHOW_IGNORED`:::
 
-   Return just ignored files in `entries[]`, not untracked files.
+   Return just ignored files in `entries[]`, not untracked
+   files. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED_TOO`.
 
 `DIR_SHOW_IGNORED_TOO`:::
 
-   Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
-   in addition to untracked files in `entries[]`.
+   Similar to `DIR_SHOW_IGNORED`, but return ignored files in
+   `ignored[]` in addition to untracked files in
+   `entries[]`. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED`.
 
 `DIR_KEEP_UNTRACKED_CONTENTS`:::
 
@@ -39,6 +43,21 @@ The notable options are:
untracked contents of untracked directories are also returned in
`entries[]`.
 
+`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::
+
+   Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
+   this is set, returns ignored files and directories that match
+   an exclude pattern. If a directory matches an exclude pattern,
+   then the directory is returned and the contained paths are
+   not. A directory that does not match an exclude pattern will
+   not be returned even if all of its contents are ignored. In
+   this case, the contents are returned as individual entries.
++
+If this is set, files and directories that explicity match an ignore
+pattern are reported. Implicity ignored directories (directories that
+do not match an ignore pattern, but whose contents are all ignored)
+are not reported, instead all of the contents are reported.
+
 `DIR_COLLECT_IGNORED`:::
 
Special mode for git-add. Return ignored files in `ignored[]` and
-- 
2.13.6

[PATCH v3 1/4] status: add option to show ignored files differently

2017-10-19 Thread Jameson Miller

Teach the status command more flexibility in how ignored files are
reported. Currently, the reporting of ignored files and untracked
files are linked. You cannot control how ignored files are reported
independently of how untracked files are reported (i.e. `all` vs
`normal`). This makes it impossible to show untracked files with the
`all` option, but show ignored files with the `normal` option.

This work 1) adds the ability to control the reporting of ignored
files independently of untracked files and 2) introduces the concept
of status reporting ignored paths that explicitly match an ignored
pattern. There are 2 benefits to these changes: 1) if a consumer needs
all untracked files but not all ignored files, there is a performance
benefit to not scanning all contents of an ignored directory and 2)
returning ignored files that explicitly match a path allow a consumer
to make more informed decisions about when a status result might be
stale.

This commit implements --ignored=matching with --untracked-files=all.
The following commit will implement --ignored=matching with
--untracked=files=normal.

As an example of where this flexibility could be useful is that our
application (Visual Studio) runs the status command and presents the
output. It shows all untracked files individually (e.g. using the
'--untracked-files==all' option), and would like to know about which
paths are ignored. It uses information about ignored paths to make
decisions about when the status result might have changed.
Additionally, many projects place build output into directories inside
a repository's working directory (e.g. in "bin/" and "obj/"
directories). Normal usage is to explicitly ignore these 2 directory
names in the .gitignore file (rather than or in addition to the *.obj
pattern).If an application could know that these directories are
explicitly ignored, it could infer that all contents are ignored as
well and make better informed decisions about files in these
directories. It could infer that any changes under these paths would
not affect the output of status. Additionally, there can be a
significant performance benefit by avoiding scanning through ignored
directories.

When status is set to report matching ignored files, it has the
following behavior. Ignored files and directories that explicitly
match an exclude pattern are reported. If an ignored directory matches
an exclude pattern, then the path of the directory is returned. If a
directory does not match an exclude pattern, but all of its contents
are ignored, then the contained files are reported instead of the
directory.

Signed-off-by: Jameson Miller 
---
 builtin/commit.c | 31 +--
 dir.c| 24 
 dir.h|  3 ++-
 wt-status.c  | 11 ---
 wt-status.h  |  8 +++-
 5 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d75b3805ea..98d84d0277 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -118,7 +118,7 @@ static int edit_flag = -1; /* unspecified */
 static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
 static int config_commit_verbose = -1; /* unspecified */
 static int no_post_rewrite, allow_empty_message;
-static char *untracked_files_arg, *force_date, *ignore_submodule_arg;
+static char *untracked_files_arg, *force_date, *ignore_submodule_arg, 
*ignored_arg;
 static char *sign_commit;
 
 /*
@@ -139,7 +139,7 @@ static const char *cleanup_arg;
 static enum commit_whence whence;
 static int sequencer_in_use;
 static int use_editor = 1, include_status = 1;
-static int show_ignored_in_status, have_option_m;
+static int have_option_m;
 static struct strbuf message = STRBUF_INIT;
 
 static enum wt_status_format status_format = STATUS_FORMAT_UNSPECIFIED;
@@ -1075,6 +1075,19 @@ static const char *find_author_by_nickname(const char 
*name)
die(_("--author '%s' is not 'Name ' and matches no existing 
author"), name);
 }
 
+static void handle_ignored_arg(struct wt_status *s)
+{
+   if (!ignored_arg)
+   ; /* default already initialized */
+   else if (!strcmp(ignored_arg, "traditional"))
+   s->show_ignored_mode = SHOW_TRADITIONAL_IGNORED;
+   else if (!strcmp(ignored_arg, "no"))
+   s->show_ignored_mode = SHOW_NO_IGNORED;
+   else if (!strcmp(ignored_arg, "matching"))
+   s->show_ignored_mode = SHOW_MATCHING_IGNORED;
+   else
+   die(_("Invalid ignored mode '%s'"), ignored_arg);
+}
 
 static void handle_untracked_files_arg(struct wt_status *s)
 {
@@ -1363,8 +1376,10 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
  N_("mode"),
  N_("show untracked files, optional modes: all, normal, no. 
(Default: all)"),
  PARS

[PATCH v3 0/4] status: add option to show ignored files differently

2017-10-19 Thread Jameson Miller

The previous iteration can be found here:

https://public-inbox.org/git/20171011133504.15049-1-jam...@microsoft.com/

The main difference is to address feedback around commit organization
and wording.

Jameson Miller (4):
  status: add option to show ignored files differently
  status: report matching ignored and normal untracked
  status: document options to show matching ignored files
  status: test --ignored=matching

 Documentation/git-status.txt  |  21 ++-
 Documentation/technical/api-directory-listing.txt |  27 +++-
 builtin/commit.c  |  31 +++-
 dir.c |  44 +-
 dir.h |   3 +-
 t/t7519-ignored-mode.sh   | 183 ++
 wt-status.c   |  11 +-
 wt-status.h   |   8 +-
 8 files changed, 310 insertions(+), 18 deletions(-)
 create mode 100755 t/t7519-ignored-mode.sh

-- 
2.13.6

Re: [PATCH v2 2/5] Update documentation for new directory and status logic

2017-10-12 Thread Jameson Miller




On 10/11/2017 10:55 PM, Junio C Hamano wrote:

Jameson Miller  writes:


Signed-off-by: Jameson Miller 
---
  Documentation/git-status.txt  | 21 +-
  Documentation/technical/api-directory-listing.txt | 27 +++
  2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 9f3a78a36c..fc282e0a92 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -97,8 +97,27 @@ configuration variable documented in linkgit:git-config[1].
(and suppresses the output of submodule summaries when the config option
`status.submoduleSummary` is set).
  
---ignored::

+--ignored[=]::
Show ignored files as well.
++
+The mode parameter is used to specify the handling of ignored files.
+It is optional: it defaults to 'traditional'.
++
+The possible options are:
++
+   - 'traditional' - Shows ignored files and directories, unless
+ --untracked-files=all is specifed, in which case
+ individual files in ignored directories are
+ displayed.
+   - 'no'  - Show no ignored files.
+   - 'matching'- Shows ignored files and directories matching an
+ ignore pattern.
++
+When 'matching' mode is specified, paths that explicity match an
+ignored pattern are shown. If a directory matches an ignore pattern,
+then it is shown, but not paths contained in the ignored directory. If
+a directory does not match an ignore pattern, but all contents are
+ignored, then the directory is not shown, but all contents are shown.

Well explained.


diff --git a/Documentation/technical/api-directory-listing.txt 
b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..7fae00f44f 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -22,16 +22,20 @@ The notable options are:
  
  `flags`::
  
-	A bit-field of options (the `*IGNORED*` flags are mutually exclusive):

+   A bit-field of options:
  
  `DIR_SHOW_IGNORED`:::
  
-	Return just ignored files in `entries[]`, not untracked files.

+   Return just ignored files in `entries[]`, not untracked
+   files. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED_TOO`.
  
  `DIR_SHOW_IGNORED_TOO`:::
  
-	Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`

-   in addition to untracked files in `entries[]`.
+   Similar to `DIR_SHOW_IGNORED`, but return ignored files in
+   `ignored[]` in addition to untracked files in
+   `entries[]`. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED`.
  
  `DIR_KEEP_UNTRACKED_CONTENTS`:::
  
@@ -39,6 +43,21 @@ The notable options are:

untracked contents of untracked directories are also returned in
`entries[]`.
  
+`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::

+
+   Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
+   this is set, returns ignored files and directories that match
+   an exclude pattern. If a directory matches an exclude pattern,
+   then the directory is returned and the contained paths are
+   not. A directory that does not match an exclude pattern will
+   not be returned even if all of its contents are ignored. In
+   this case, the contents are returned as individual entries.
++
+If this is set, files and directories that explicity match an ignore
+pattern are reported. Implicity ignored directories (directories that
+do not match an ignore pattern, but whose contents are all ignored)
+are not reported, instead all of the contents are reported.

Makes me wonder if DIR_SHOW_IGNORED* should be splt out into a short
enum.  We have:

  - Do not show ignored ones (0)

  - Collect ignored ones (DIR_SHOW_IGNORED)

  - Collect ignored and untracked ones separately (DIR_SHOW_IGNORED_TOO)

  - Collect ignored and duntracked ones separately, but limit them to
those mach exclude patterns explicitly 
(DIR_SHOW_IGNORED_TOO|...MODE_MATCHING)

so we need two bits to fit a 4-possiblity enum.

Then we do not have to worry about saying quirky things like A and B
are incompatible, and C makes sense only when B is set, etc.

I could see a potential for other values for the "show ignored
mode" flags - for example: "NORMAL", "MATCHING", "ALL"... Instead
of making more change at this point in time, how would you feel
about waiting until the next change in this area.

If you would prefer for me to change these enums now, I can do
that.




  `DIR_COLLECT_IGNORED`:::
  
  	Special mode for git-add. Return ignored files in `ignored[]` and

Re: [PATCH v2 0/5] Teach Status options around showing ignored files

2017-10-12 Thread Jameson Miller




On 10/11/2017 10:33 PM, Junio C Hamano wrote:

Jameson Miller  writes:


Changes in v2:

Addressed review comments from the original series:

* Made fixes / simplifications suggested for documentation

* Removed test_tick from test scripts

* Merged 2 commits (as suggested)

Jameson Miller (5):
   Teach status options around showing ignored files
   Update documentation for new directory and status logic
   Add tests for git status `--ignored=matching`
   Expand support for ignored arguments on status
   Add tests around status handling of ignored arguments

Please make sure these titles mix well when they appear together
with changes from other people that are completely unrelated to
them.  Keep in mind that your "git status" is *not* the most
important thing in the world (of course, it is no less important
than others, either).  Perhaps

 status: add new options to show ignored files differently
 status: document logic to show new directory
 status: test --ignored=matching
Thank you for the suggestions - I will update the  commit titles in my 
next round.


etc.  Rules of thumb:

  (1) choose ": " prefix appropriately
  (2) keep them short and to the point
  (3) word that follow ": " prefix is not capitalized
  (4) no need for full-stop at the end of the title

Re: [PATCH v2 5/5] Add tests around status handling of ignored arguments

2017-10-12 Thread Jameson Miller




On 10/12/2017 12:06 AM, Junio C Hamano wrote:

Jameson Miller  writes:


Add tests for status handling of '--ignored=matching` and
`--untracked-files=normal`.

Signed-off-by: Jameson Miller 
---

Hmph, having some tests in 3/5, changes in 4/5 and even more tests
in 5/5 makes me as a reader a bit confused, as the description for
these two test patches does not make it clear how they are related
and how they are different.  Is it that changes in 1/5 alone does
not fulfill the promise made by documentation added at 2/5 so 3/5
only has tests for behaviour that works with 1/5 alone but is broken
with respect to what 2/5 claims until 4/5 is applied, and these "not
working with 1/5 alone, but works after 4/5" are added in this step?


Correct. The changes in 1/5 are to implement "--ignored=matching"
with "--untracked-files=all" with corresponding tests in 3/5. The
changes in 4/5 are to implement "--ignored=matching"
with "--untracked-files=normal" and the corresponding tests are
in 5/5.

Do you have a preference on how I organized this work? I see
several possible ways to split up this work. Maybe it would be
less confusing to have the implementation in the first two
commits, then 1 commit with all the tests, and then a commit with
the documentation? I think it makes sense to have the logic for
the different flag combinations split into their own commits, but
I am open to any suggestions.




  t/t7519-ignored-mode.sh | 60 -
  1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/t/t7519-ignored-mode.sh b/t/t7519-ignored-mode.sh
index 76e91427b0..6be7701d79 100755
--- a/t/t7519-ignored-mode.sh
+++ b/t/t7519-ignored-mode.sh
@@ -116,10 +116,68 @@ test_expect_success 'Verify status behavior on ignored 
folder containing tracked
ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign \
ignored_dir/tracked &&
git add -f ignored_dir/tracked &&
-   test_tick &&
git commit -m "Force add file in ignored directory" &&
git status --porcelain=v2 --ignored=matching --untracked-files=all >output 
&&
test_i18ncmp expect output
  '
  
+test_expect_success 'Verify matching ignored files with --untracked-files=normal' '

+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ? untracked_dir/
+   ! ignored_dir/
+   ! ignored_files/ignored_1.ign
+   ! ignored_files/ignored_2.ign
+   EOF
+
+   mkdir ignored_dir ignored_files untracked_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_files/ignored_1.ign ignored_files/ignored_2.ign \
+   untracked_dir/untracked &&
+   git status --porcelain=v2 --ignored=matching --untracked-files=normal >output 
&&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify matching ignored files with 
--untracked-files=normal' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ? untracked_dir/
+   ! ignored_dir/
+   ! ignored_files/ignored_1.ign
+   ! ignored_files/ignored_2.ign
+   EOF
+
+   mkdir ignored_dir ignored_files untracked_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_files/ignored_1.ign ignored_files/ignored_2.ign \
+   untracked_dir/untracked &&
+   git status --porcelain=v2 --ignored=matching --untracked-files=normal >output 
&&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on ignored folder containing 
tracked file' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/ignored_1
+   ! ignored_dir/ignored_1.ign
+   ! ignored_dir/ignored_2
+   ! ignored_dir/ignored_2.ign
+   EOF
+
+   mkdir ignored_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign \
+   ignored_dir/tracked &&
+   git add -f ignored_dir/tracked &&
+   git commit -m "Force add file in ignored directory" &&
+   git status --porcelain=v2 --ignored=matching --untracked-files=normal >output 
&&
+   test_i18ncmp expect output
+'
+
  test_done

Re: [PATCH v2 1/5] Teach status options around showing ignored files

2017-10-12 Thread Jameson Miller




On 10/11/2017 10:49 PM, Junio C Hamano wrote:

Jameson Miller  writes:


This change teaches the status command more options to control which
ignored files are reported independently of the which untracked files
are reported by allowing the `--ignored` option to take additional
arguments.  Currently, the shown ignored files are linked to how
untracked files are reported. Both ignored and untracked files are
controlled by arguments to `--untracked-files` option. This makes it
impossible to show all untracked files individually, but show ignored
"files and directories" (that is, ignored directories are shown as 1
entry, instead of having all contents shown).

The description makes sense.  And it makes sense to show a directory
known to contain only ignored paths as just one entry, instead of
exploding that to individual files.


Our application (Visual Studio) has a specific set of requirements
about how it wants untracked / ignored files reported by git status.

This sentence does not read well.  VS has no obligation to read from
"git status", so there is no "specific set of requirements" that
make us care.  If the output from "status" is insufficient you could
be reading from "ls-files --ignored", for example, if you want more
details than "git status" gives you.

The sentence, and the paragraph that immediately follows it, need a
serious attitude adjustment ;-), even though it is good as read as
an explanation of what the proposed output wants to show.


It was not my intention to have this paragraph come across this
way. I meant to express the ideal format of data that our
application would like (from whatever source) as motivation for
why we are proposing these changes. I will reword this
paragraph to remove any unintended implication otherwise.


The reason for controlling these behaviors separately is that there
can be a significant performance impact to scanning the contents of
excluded directories. Additionally, knowing whether a directory
explicitly matches an exclude pattern can help the application make
decisions about how to handle the directory. If an ignored directory
itself matches an exclude pattern, then the application will know that
any files underneath the directory must be ignored as well.

While the above description taken standalone makes sense, doesn't
the "we want all paths listed, without abbreviated to the directory
and requiring the reader to infer from the fact that aidrectory is
shown that everything in it are ignored" the log message stated
earlier contradict another change you earlier sent, that avoids
scanning a directory that is known to be completely untracked
(i.e. no path under it in the index) and return early once an
untracked file is found in it?


My first set of changes introduced a perf optimization without
any functional changes. The perf optimization was to avoid
scanning a directory that is known to be ignored (i.e no path
under it in the index and the directory matches an exclude
pattern). It returns early once any file is found in it. Any file
found must be ignored, as it is contained in an ignored
directory.

This second set of changes is to allow optional decoupling of how
ignored and untracked items are reported.




Signed-off-by: Jameson Miller 
---
  builtin/commit.c | 31 +--
  dir.c| 24 
  dir.h|  3 ++-
  wt-status.c  | 11 ---
  wt-status.h  |  8 +++-
  5 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d75b3805ea..98d84d0277 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -118,7 +118,7 @@ static int edit_flag = -1; /* unspecified */
  static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
  static int config_commit_verbose = -1; /* unspecified */
  static int no_post_rewrite, allow_empty_message;
-static char *untracked_files_arg, *force_date, *ignore_submodule_arg;
+static char *untracked_files_arg, *force_date, *ignore_submodule_arg, 
*ignored_arg;
  static char *sign_commit;
  
  /*

@@ -139,7 +139,7 @@ static const char *cleanup_arg;
  static enum commit_whence whence;
  static int sequencer_in_use;
  static int use_editor = 1, include_status = 1;
-static int show_ignored_in_status, have_option_m;
+static int have_option_m;
  static struct strbuf message = STRBUF_INIT;
  
  static enum wt_status_format status_format = STATUS_FORMAT_UNSPECIFIED;

@@ -1075,6 +1075,19 @@ static const char *find_author_by_nickname(const char 
*name)
die(_("--author '%s' is not 'Name ' and matches no existing 
author"), name);
  }
  
+static void handle_ignored_arg(struct wt_status *s)

+{
+   if (!ignored_arg)
+   ; /* default already initialized */
+   else if (!strcmp(ignored_arg, "traditional"))
+   s->show_igno

[PATCH v2 5/5] Add tests around status handling of ignored arguments

2017-10-11 Thread Jameson Miller

Add tests for status handling of '--ignored=matching` and
`--untracked-files=normal`.

Signed-off-by: Jameson Miller 
---
 t/t7519-ignored-mode.sh | 60 -
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/t/t7519-ignored-mode.sh b/t/t7519-ignored-mode.sh
index 76e91427b0..6be7701d79 100755
--- a/t/t7519-ignored-mode.sh
+++ b/t/t7519-ignored-mode.sh
@@ -116,10 +116,68 @@ test_expect_success 'Verify status behavior on ignored 
folder containing tracked
ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign \
ignored_dir/tracked &&
git add -f ignored_dir/tracked &&
-   test_tick &&
git commit -m "Force add file in ignored directory" &&
git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
test_i18ncmp expect output
 '
 
+test_expect_success 'Verify matching ignored files with 
--untracked-files=normal' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ? untracked_dir/
+   ! ignored_dir/
+   ! ignored_files/ignored_1.ign
+   ! ignored_files/ignored_2.ign
+   EOF
+
+   mkdir ignored_dir ignored_files untracked_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_files/ignored_1.ign ignored_files/ignored_2.ign \
+   untracked_dir/untracked &&
+   git status --porcelain=v2 --ignored=matching --untracked-files=normal 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify matching ignored files with 
--untracked-files=normal' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ? untracked_dir/
+   ! ignored_dir/
+   ! ignored_files/ignored_1.ign
+   ! ignored_files/ignored_2.ign
+   EOF
+
+   mkdir ignored_dir ignored_files untracked_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_files/ignored_1.ign ignored_files/ignored_2.ign \
+   untracked_dir/untracked &&
+   git status --porcelain=v2 --ignored=matching --untracked-files=normal 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on ignored folder containing 
tracked file' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/ignored_1
+   ! ignored_dir/ignored_1.ign
+   ! ignored_dir/ignored_2
+   ! ignored_dir/ignored_2.ign
+   EOF
+
+   mkdir ignored_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign \
+   ignored_dir/tracked &&
+   git add -f ignored_dir/tracked &&
+   git commit -m "Force add file in ignored directory" &&
+   git status --porcelain=v2 --ignored=matching --untracked-files=normal 
>output &&
+   test_i18ncmp expect output
+'
+
 test_done
-- 
2.13.6

[PATCH v2 3/5] Add tests for git status `--ignored=matching`

2017-10-11 Thread Jameson Miller

Add tests for new ignored mode (matching) when showing all untracked
files.

Signed-off-by: Jameson Miller 
---
 t/t7519-ignored-mode.sh | 125 
 1 file changed, 125 insertions(+)
 create mode 100755 t/t7519-ignored-mode.sh

diff --git a/t/t7519-ignored-mode.sh b/t/t7519-ignored-mode.sh
new file mode 100755
index 00..76e91427b0
--- /dev/null
+++ b/t/t7519-ignored-mode.sh
@@ -0,0 +1,125 @@
+#!/bin/sh
+
+test_description='git status collapse ignored'
+
+. ./test-lib.sh
+
+# commit initial ignore file
+test_expect_success 'setup initial commit and ignore file' '
+   cat >.gitignore <<-\EOF &&
+   *.ign
+   ignored_dir/
+   !*.unignore
+   EOF
+   git add . &&
+   git commit -m "Initial commit"
+'
+
+test_expect_success 'Verify behavior of status on folders with ignored files' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! dir/ignored/ignored_1.ign
+   ! dir/ignored/ignored_2.ign
+   ! ignored/ignored_1.ign
+   ! ignored/ignored_2.ign
+   EOF
+
+   mkdir -p ignored dir/ignored &&
+   touch ignored/ignored_1.ign ignored/ignored_2.ign \
+   dir/ignored/ignored_1.ign dir/ignored/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on folder with tracked & ignored 
files' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! dir/tracked_ignored/ignored_1.ign
+   ! dir/tracked_ignored/ignored_2.ign
+   ! tracked_ignored/ignored_1.ign
+   ! tracked_ignored/ignored_2.ign
+   EOF
+
+   mkdir -p tracked_ignored dir/tracked_ignored &&
+   touch tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+   tracked_ignored/ignored_1.ign tracked_ignored/ignored_2.ign \
+   dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 \
+   dir/tracked_ignored/ignored_1.ign 
dir/tracked_ignored/ignored_2.ign &&
+
+   git add tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+   dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 &&
+   git commit -m "commit tracked files" &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on folder with untracked and 
ignored files' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? dir/untracked_ignored/untracked_1
+   ? dir/untracked_ignored/untracked_2
+   ? expect
+   ? output
+   ? untracked_ignored/untracked_1
+   ? untracked_ignored/untracked_2
+   ! dir/untracked_ignored/ignored_1.ign
+   ! dir/untracked_ignored/ignored_2.ign
+   ! untracked_ignored/ignored_1.ign
+   ! untracked_ignored/ignored_2.ign
+   EOF
+
+   mkdir -p untracked_ignored dir/untracked_ignored &&
+   touch untracked_ignored/untracked_1 untracked_ignored/untracked_2 \
+   untracked_ignored/ignored_1.ign untracked_ignored/ignored_2.ign 
\
+   dir/untracked_ignored/untracked_1 
dir/untracked_ignored/untracked_2 \
+   dir/untracked_ignored/ignored_1.ign 
dir/untracked_ignored/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status matching ignored files on ignored folder' '
+   test_when_finished "git clean -fdx" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/
+   EOF
+
+   mkdir ignored_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign &&
+
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_expect_success 'Verify status behavior on ignored folder containing 
tracked file' '
+   test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+   cat >expect <<-\EOF &&
+   ? expect
+   ? output
+   ! ignored_dir/ignored_1
+   ! ignored_dir/ignored_1.ign
+   ! ignored_dir/ignored_2
+   ! ignored_dir/ignored_2.ign
+   EOF
+
+   mkdir ignored_dir &&
+   touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+   ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign \
+   ignored_dir/tracked &&
+   git add -f ignored_dir/tracked &&
+   test_tick &&
+   git commit -m "Force add file in ignored directory" &&
+   git status --porcelain=v2 --ignored=matching --untracked-files=all 
>output &&
+   test_i18ncmp expect output
+'
+
+test_done
-- 
2.13.6

[PATCH v2 2/5] Update documentation for new directory and status logic

2017-10-11 Thread Jameson Miller

Signed-off-by: Jameson Miller 
---
 Documentation/git-status.txt  | 21 +-
 Documentation/technical/api-directory-listing.txt | 27 +++
 2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 9f3a78a36c..fc282e0a92 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -97,8 +97,27 @@ configuration variable documented in linkgit:git-config[1].
(and suppresses the output of submodule summaries when the config option
`status.submoduleSummary` is set).
 
---ignored::
+--ignored[=]::
Show ignored files as well.
++
+The mode parameter is used to specify the handling of ignored files.
+It is optional: it defaults to 'traditional'.
++
+The possible options are:
++
+   - 'traditional' - Shows ignored files and directories, unless
+ --untracked-files=all is specifed, in which case
+ individual files in ignored directories are
+ displayed.
+   - 'no'  - Show no ignored files.
+   - 'matching'- Shows ignored files and directories matching an
+ ignore pattern.
++
+When 'matching' mode is specified, paths that explicity match an
+ignored pattern are shown. If a directory matches an ignore pattern,
+then it is shown, but not paths contained in the ignored directory. If
+a directory does not match an ignore pattern, but all contents are
+ignored, then the directory is not shown, but all contents are shown.
 
 -z::
Terminate entries with NUL, instead of LF.  This implies
diff --git a/Documentation/technical/api-directory-listing.txt 
b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..7fae00f44f 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -22,16 +22,20 @@ The notable options are:
 
 `flags`::
 
-   A bit-field of options (the `*IGNORED*` flags are mutually exclusive):
+   A bit-field of options:
 
 `DIR_SHOW_IGNORED`:::
 
-   Return just ignored files in `entries[]`, not untracked files.
+   Return just ignored files in `entries[]`, not untracked
+   files. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED_TOO`.
 
 `DIR_SHOW_IGNORED_TOO`:::
 
-   Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
-   in addition to untracked files in `entries[]`.
+   Similar to `DIR_SHOW_IGNORED`, but return ignored files in
+   `ignored[]` in addition to untracked files in
+   `entries[]`. This flag is mutually exclusive with
+   `DIR_SHOW_IGNORED`.
 
 `DIR_KEEP_UNTRACKED_CONTENTS`:::
 
@@ -39,6 +43,21 @@ The notable options are:
untracked contents of untracked directories are also returned in
`entries[]`.
 
+`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::
+
+   Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
+   this is set, returns ignored files and directories that match
+   an exclude pattern. If a directory matches an exclude pattern,
+   then the directory is returned and the contained paths are
+   not. A directory that does not match an exclude pattern will
+   not be returned even if all of its contents are ignored. In
+   this case, the contents are returned as individual entries.
++
+If this is set, files and directories that explicity match an ignore
+pattern are reported. Implicity ignored directories (directories that
+do not match an ignore pattern, but whose contents are all ignored)
+are not reported, instead all of the contents are reported.
+
 `DIR_COLLECT_IGNORED`:::
 
Special mode for git-add. Return ignored files in `ignored[]` and
-- 
2.13.6

[PATCH v2 4/5] Expand support for ignored arguments on status

2017-10-11 Thread Jameson Miller

Teach status command to handle matching ignored mode when showing
untracked files with the normal option.

Signed-off-by: Jameson Miller 
---
 dir.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index b9af87eca9..8636d080b2 100644
--- a/dir.c
+++ b/dir.c
@@ -1585,6 +1585,7 @@ static enum path_treatment treat_one_path(struct 
dir_struct *dir,
 {
int exclude;
int has_path_in_index = !!index_file_exists(istate, path->buf, 
path->len, ignore_case);
+   enum path_treatment path_treatment;
 
if (dtype == DT_UNKNOWN)
dtype = get_dtype(de, istate, path->buf, path->len);
@@ -1631,8 +1632,23 @@ static enum path_treatment treat_one_path(struct 
dir_struct *dir,
return path_none;
case DT_DIR:
strbuf_addch(path, '/');
-   return treat_directory(dir, istate, untracked, path->buf, 
path->len,
-  baselen, exclude, pathspec);
+   path_treatment = treat_directory(dir, istate, untracked,
+path->buf, path->len,
+baselen, exclude, pathspec);
+   /*
+* If we are only want to return directories that
+* match an exclude pattern, and this directories does
+* not match an exclude pattern but all contents are
+* excluded, then indicate that we should recurse into
+* this directory (instead of marking the directory
+* itself as an ignored path)
+*/
+   if (!exclude &&
+   path_treatment == path_excluded &&
+   (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+   (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+   return path_recurse;
+   return path_treatment;
case DT_REG:
case DT_LNK:
return exclude ? path_excluded : path_untracked;
-- 
2.13.6

1 2 >

1 - 100 of 109 matches

Mail list logo