subject:"\[PATCH v8 7\/7\] read\-cache\: load cache entries on worker threads"

Re: [PATCH v8 7/7] read-cache: load cache entries on worker threads

2018-10-22 Thread Ben Peart





On 10/21/2018 10:14 PM, Junio C Hamano wrote:

Jeff King  writes:


On Wed, Oct 10, 2018 at 11:59:38AM -0400, Ben Peart wrote:


+static unsigned long load_cache_entries_threaded(struct index_state *istate, 
const char *mmap, size_t mmap_size,
+   unsigned long src_offset, int nr_threads, struct 
index_entry_offset_table *ieot)


The src_offset parameter isn't used in this function.

In early versions of the series, it was used to feed the p->start_offset
field of each load_cache_entries_thread_data. But after the switch to
ieot, we don't, and instead feed p->ieot_start. But we always begin that
at 0.

Is that right (and we can drop the parameter), or should this logic:


+   offset = ieot_start = 0;
+   ieot_blocks = DIV_ROUND_UP(ieot->nr, nr_threads);
+   for (i = 0; i < nr_threads; i++) {
[...]


be starting at src_offset instead of 0?


I think "offset" has nothing to do with the offset into the mmapped
region of memory.  It is an integer index into a (virtual) array
that is a concatenation of ieot->entries[].entries[], and it is
correct to count from zero.  The value taken from that array using
the index is used to compute the offset into the mmapped region.

Unlike load_all_cache_entries() called from the other side of the
same if() statement in the same caller, this does not depend on the
fact that the first index entry in the mmapped region appears
immediately after the index-file header.  It goes from the offsets
into the file that are recorded in the entry offset table that is an
index extension, so the sizeof(*hdr) that initializes src_offset is
not used by the codepath.

The number of bytes consumed, i.e. its return value from the
function, is not really used, either, as the caller does not use
src_offset for anything other than updating it with "+=" and passing
it to this function (which does not use it) when it calls this
function (i.e. when ieot extension exists--and by definition when
that extension exists extension_offset is not 0, so we do not make
the final load_index_extensions() call in the caller that uses
src_offset).



Thanks for discovering/analyzing this.  You're right, I missed removing 
this when we switched from a single offset to an array of offsets via 
the IEOT.  I'll send a patch to fix both issues shortly.

Re: [PATCH v8 7/7] read-cache: load cache entries on worker threads

2018-10-21 Thread Junio C Hamano

Jeff King  writes:

> On Wed, Oct 10, 2018 at 11:59:38AM -0400, Ben Peart wrote:
>
>> +static unsigned long load_cache_entries_threaded(struct index_state 
>> *istate, const char *mmap, size_t mmap_size,
>> +unsigned long src_offset, int nr_threads, struct 
>> index_entry_offset_table *ieot)
>
> The src_offset parameter isn't used in this function.
>
> In early versions of the series, it was used to feed the p->start_offset
> field of each load_cache_entries_thread_data. But after the switch to
> ieot, we don't, and instead feed p->ieot_start. But we always begin that
> at 0.
>
> Is that right (and we can drop the parameter), or should this logic:
>
>> +offset = ieot_start = 0;
>> +ieot_blocks = DIV_ROUND_UP(ieot->nr, nr_threads);
>> +for (i = 0; i < nr_threads; i++) {
>> [...]
>
> be starting at src_offset instead of 0?

I think "offset" has nothing to do with the offset into the mmapped
region of memory.  It is an integer index into a (virtual) array
that is a concatenation of ieot->entries[].entries[], and it is
correct to count from zero.  The value taken from that array using
the index is used to compute the offset into the mmapped region.

Unlike load_all_cache_entries() called from the other side of the
same if() statement in the same caller, this does not depend on the
fact that the first index entry in the mmapped region appears
immediately after the index-file header.  It goes from the offsets
into the file that are recorded in the entry offset table that is an
index extension, so the sizeof(*hdr) that initializes src_offset is
not used by the codepath.

The number of bytes consumed, i.e. its return value from the
function, is not really used, either, as the caller does not use
src_offset for anything other than updating it with "+=" and passing
it to this function (which does not use it) when it calls this
function (i.e. when ieot extension exists--and by definition when
that extension exists extension_offset is not 0, so we do not make
the final load_index_extensions() call in the caller that uses
src_offset).

Re: [PATCH v8 7/7] read-cache: load cache entries on worker threads

2018-10-19 Thread Jeff King

On Wed, Oct 10, 2018 at 11:59:38AM -0400, Ben Peart wrote:

> +static unsigned long load_cache_entries_threaded(struct index_state *istate, 
> const char *mmap, size_t mmap_size,
> + unsigned long src_offset, int nr_threads, struct 
> index_entry_offset_table *ieot)

The src_offset parameter isn't used in this function.

In early versions of the series, it was used to feed the p->start_offset
field of each load_cache_entries_thread_data. But after the switch to
ieot, we don't, and instead feed p->ieot_start. But we always begin that
at 0.

Is that right (and we can drop the parameter), or should this logic:

> + offset = ieot_start = 0;
> + ieot_blocks = DIV_ROUND_UP(ieot->nr, nr_threads);
> + for (i = 0; i < nr_threads; i++) {
> [...]

be starting at src_offset instead of 0?

-Peff

[PATCH v8 7/7] read-cache: load cache entries on worker threads

2018-10-10 Thread Ben Peart

From: Ben Peart 

This patch helps address the CPU cost of loading the index by utilizing
the Index Entry Offset Table (IEOT) to divide loading and conversion of
the cache entries across multiple threads in parallel.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 files reduced the time by 32.24%
Test w/1,000,000 files reduced the time by -4.77%

Note that on the 1,000,000 files case, multi-threading the cache entry parsing
does not yield a performance win.  This is because the cost to parse the
index extensions in this repo, far outweigh the cost of loading the cache
entries.

The high cost of parsing the index extensions is driven by the cache tree
and the untracked cache extensions. As this is currently the longest pole,
any reduction in this time will reduce the overall index load times so is
worth further investigation in another patch series.

Signed-off-by: Ben Peart 
---
 read-cache.c | 230 ++-
 1 file changed, 193 insertions(+), 37 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 3ace29d58f..7acc2c86f4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1720,7 +1720,8 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *create_from_disk(struct index_state *istate,
+static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
+   unsigned int version,
struct ondisk_cache_entry *ondisk,
unsigned long *ent_size,
const struct cache_entry 
*previous_ce)
@@ -1737,7 +1738,7 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
 * number of bytes to be stripped from the end of the previous name,
 * and the bytes to append to the result, to come up with its name.
 */
-   int expand_name_field = istate->version == 4;
+   int expand_name_field = version == 4;
 
/* On-disk flags are just 16 bits */
flags = get_be16(>flags);
@@ -1761,16 +1762,17 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
const unsigned char *cp = (const unsigned char *)name;
size_t strip_len, previous_len;
 
-   previous_len = previous_ce ? previous_ce->ce_namelen : 0;
+   /* If we're at the begining of a block, ignore the previous 
name */
strip_len = decode_varint();
-   if (previous_len < strip_len) {
-   if (previous_ce)
+   if (previous_ce) {
+   previous_len = previous_ce->ce_namelen;
+   if (previous_len < strip_len)
die(_("malformed name field in the index, near 
path '%s'"),
-   previous_ce->name);
-   else
-   die(_("malformed name field in the index in the 
first path"));
+   previous_ce->name);
+   copy_len = previous_len - strip_len;
+   } else {
+   copy_len = 0;
}
-   copy_len = previous_len - strip_len;
name = (const char *)cp;
}
 
@@ -1780,7 +1782,7 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
len += copy_len;
}
 
-   ce = mem_pool__ce_alloc(istate->ce_mem_pool, len);
+   ce = mem_pool__ce_alloc(ce_mem_pool, len);
 
ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
@@ -1948,6 +1950,52 @@ static void *load_index_extensions(void *_data)
return NULL;
 }
 
+/*
+ * A helper function that will load the specified range of cache entries
+ * from the memory mapped file and add them to the given index.
+ */
+static unsigned long load_cache_entry_block(struct index_state *istate,
+   struct mem_pool *ce_mem_pool, int offset, int nr, const 
char *mmap,
+   unsigned long start_offset, const struct cache_entry 
*previous_ce)
+{
+   int i;
+   unsigned long src_offset = start_offset;
+
+   for (i = offset; i < offset + nr; i++) {
+   struct ondisk_cache_entry *disk_ce;
+   struct cache_entry *ce;
+   unsigned long consumed;
+
+   disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
+   ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, 
, previous_ce);
+   set_index_entry(istate, i, ce);
+
+   src_offset += consumed;
+   previous_ce = ce;
+   }
+   return src_offset - start_offset;
+}
+
+static unsigned long load_all_cache_entries(struct index_state

Re: [PATCH v8 7/7] read-cache: load cache entries on worker threads

Re: [PATCH v8 7/7] read-cache: load cache entries on worker threads

Re: [PATCH v8 7/7] read-cache: load cache entries on worker threads

[PATCH v8 7/7] read-cache: load cache entries on worker threads

4 matches

Site Navigation

Mail list logo

Footer information