Re: [PATCH] alloc.c: replace alloc by mempool

2018-05-07 Thread Junio C Hamano
Stefan Beller  writes:

> The replacement with mem-pool might be easier than making sure
> that alloc.c has no globals and handles allocations per repository
> correctly. It would make the sb/object-store-alloc series shorter than
> it currently is, and maybe easier to review the code.
>
> However now that sb/object-store-alloc is rerolled with keeping
> the logic of alloc.c and not replacing it with mem-pool, the only
> reason would be ease of maintainability by less code

That is sensible considertation; the patch should be sold as such,
instead of with an empty log message as if the reasoning is obvious
to everybody without being told ;-)


Re: [PATCH] alloc.c: replace alloc by mempool

2018-05-07 Thread Stefan Beller
On Mon, May 7, 2018 at 5:37 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>>> So I think we could just replace it for now and optimize again later, if it
>>> turns out to be a problem. I think the easiest optimisation is to increase
>>> the allocation size of having a lot more objects per mp_block.
>>
>> Yeah. I also tested this from a different angle: memory overhead. For
>> 2M objects with one mp_block containing 1024 objects (same setting as
>> alloc.c), the overhead (not counting malloc() internal overhead) is
>> 46KB and we don't have any extra overhead due to padding between
>> objects. This is true for all struct blob, commit, tree and tag. This
>> is really good. alloc.c has zero overhead when measured this way but
>> 46KB is practically zero to me.
>
> Thanks.
>
> The above in short sounds like arguing "replacing alloc.c internal
> with mempool incurs negligible memory overhead and performance
> degradation, but that can be optimized later".  It was unclear to me
> why such a replacement needs to happen in the first place, though.

The replacement with mem-pool might be easier than making sure
that alloc.c has no globals and handles allocations per repository
correctly. It would make the sb/object-store-alloc series shorter than
it currently is, and maybe easier to review the code.

However now that sb/object-store-alloc is rerolled with keeping
the logic of alloc.c and not replacing it with mem-pool, the only
reason would be ease of maintainability by less code
On the other hand we have different implementations of
lists, vectors and hashmaps, so having different memory
allocators doesn't hurt.

> Without such a code churn, there won't be extra overhead or need to
> fix it up later, no?

The original motivation is to get the object store on a per-repository
basis. It might be cheaper to use an existing 'objectified' code instead
of doing that again.


Re: [PATCH] alloc.c: replace alloc by mempool

2018-05-07 Thread Junio C Hamano
Duy Nguyen  writes:

>> So I think we could just replace it for now and optimize again later, if it
>> turns out to be a problem. I think the easiest optimisation is to increase
>> the allocation size of having a lot more objects per mp_block.
>
> Yeah. I also tested this from a different angle: memory overhead. For
> 2M objects with one mp_block containing 1024 objects (same setting as
> alloc.c), the overhead (not counting malloc() internal overhead) is
> 46KB and we don't have any extra overhead due to padding between
> objects. This is true for all struct blob, commit, tree and tag. This
> is really good. alloc.c has zero overhead when measured this way but
> 46KB is practically zero to me.

Thanks.

The above in short sounds like arguing "replacing alloc.c internal
with mempool incurs negligible memory overhead and performance
degradation, but that can be optimized later".  It was unclear to me
why such a replacement needs to happen in the first place, though.
Without such a code churn, there won't be extra overhead or need to
fix it up later, no?


Re: [PATCH] alloc.c: replace alloc by mempool

2018-05-04 Thread Duy Nguyen
On Fri, May 4, 2018 at 12:18 AM, Stefan Beller  wrote:
> I just measured on git.git and linux.git (both of which are not *huge* by
> any standard, but should give a good indication. linux has  6M objects,
> and when allocating 1024 at a time, we run into the new block allocation
> ~6000 times).
>
> I could not measure any meaningful difference.
>
> linux.git $ git count-objects -v
> count: 0
> size: 0
> in-pack: 6036543
> packs: 2
> size-pack: 2492985
> prune-packable: 0
> garbage: 0
> size-garbage: 0
>
> (with this patch)
>  Performance counter stats for '/u/git/git count-objects -v' (30 runs):
>
>   2.123683  task-clock:u (msec)   #0.831 CPUs utilized
> ( +-  2.32% )
>  0  context-switches:u#0.000 K/sec
>  0  cpu-migrations:u  #0.000 K/sec
>126  page-faults:u #0.059 M/sec
> ( +-  0.22% )
>895,900  cycles:u  #0.422 GHz  
> ( +-  1.40% )
>976,596  instructions:u#1.09  insn per cycle   
> ( +-  0.01% )
>218,256  branches:u#  102.772 M/sec
> ( +-  0.01% )
>  8,331  branch-misses:u   #3.82% of all branches  
> ( +-  0.61% )
>
>0.002556203 seconds time elapsed   
>( +-  2.20% )
>
>   Performance counter stats for 'git count-objects -v' (30 runs):
>
>   2.410352  task-clock:u (msec)   #0.801 CPUs utilized
> ( +-  2.79% )
>  0  context-switches:u#0.000 K/sec
>  0  cpu-migrations:u  #0.000 K/sec
>131  page-faults:u #0.054 M/sec
> ( +-  0.16% )
>993,301  cycles:u  #0.412 GHz  
> ( +-  1.99% )
>  1,087,428  instructions:u#1.09  insn per cycle   
> ( +-  0.02% )
>244,292  branches:u#  101.351 M/sec
> ( +-  0.02% )
>  9,264  branch-misses:u   #3.79% of all branches  
> ( +-  0.57% )
>
>0.003010854 seconds time elapsed   
>( +-  2.54% )
>
> So I think we could just replace it for now and optimize again later, if it
> turns out to be a problem. I think the easiest optimisation is to increase
> the allocation size of having a lot more objects per mp_block.

Yeah. I also tested this from a different angle: memory overhead. For
2M objects with one mp_block containing 1024 objects (same setting as
alloc.c), the overhead (not counting malloc() internal overhead) is
46KB and we don't have any extra overhead due to padding between
objects. This is true for all struct blob, commit, tree and tag. This
is really good. alloc.c has zero overhead when measured this way but
46KB is practically zero to me.
-- 
Duy


[PATCH] alloc.c: replace alloc by mempool

2018-05-03 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---

>> The reason for my doubt is the potential quadratic behavior for new 
>> allocations,
>> in mem_pool_alloc() we walk all mp_blocks to see if we can fit the requested
>> allocation in one of the later blocks.
>> So if we call mem_pool_alloc a million times, we get a O(n) mp_blocks which
>> we'd have to walk in each call.
>
> With the current design, when a new mp_block is allocated, it is
> placed at the head of the linked list. This means that the most
> recently allocated mp_block is the 1st block that is
> searched. The *vast* majority of allocations should be fulfilled
> from this 1st block. It is only when the block is full that we
> search other mp_blocks in the list.

I just measured on git.git and linux.git (both of which are not *huge* by
any standard, but should give a good indication. linux has  6M objects,
and when allocating 1024 at a time, we run into the new block allocation
~6000 times).

I could not measure any meaningful difference.

linux.git $ git count-objects -v
count: 0
size: 0
in-pack: 6036543
packs: 2
size-pack: 2492985
prune-packable: 0
garbage: 0
size-garbage: 0

(with this patch)
 Performance counter stats for '/u/git/git count-objects -v' (30 runs):

  2.123683  task-clock:u (msec)   #0.831 CPUs utilized  
  ( +-  2.32% )
 0  context-switches:u#0.000 K/sec  

 0  cpu-migrations:u  #0.000 K/sec  

   126  page-faults:u #0.059 M/sec  
  ( +-  0.22% )
   895,900  cycles:u  #0.422 GHz
  ( +-  1.40% )
   976,596  instructions:u#1.09  insn per cycle 
  ( +-  0.01% )
   218,256  branches:u#  102.772 M/sec  
  ( +-  0.01% )
 8,331  branch-misses:u   #3.82% of all branches
  ( +-  0.61% )

   0.002556203 seconds time elapsed 
 ( +-  2.20% )

  Performance counter stats for 'git count-objects -v' (30 runs):

  2.410352  task-clock:u (msec)   #0.801 CPUs utilized  
  ( +-  2.79% )
 0  context-switches:u#0.000 K/sec  

 0  cpu-migrations:u  #0.000 K/sec  

   131  page-faults:u #0.054 M/sec  
  ( +-  0.16% )
   993,301  cycles:u  #0.412 GHz
  ( +-  1.99% )
 1,087,428  instructions:u#1.09  insn per cycle 
  ( +-  0.02% )
   244,292  branches:u#  101.351 M/sec  
  ( +-  0.02% )
 9,264  branch-misses:u   #3.79% of all branches
  ( +-  0.57% )

   0.003010854 seconds time elapsed 
 ( +-  2.54% )

So I think we could just replace it for now and optimize again later, if it
turns out to be a problem. I think the easiest optimisation is to increase
the allocation size of having a lot more objects per mp_block.

> If this is a concern, I think
> we have a couple low cost options to mitigate it (maybe a flag to
> control whether we search past the 1st mp_block for space, or
> logic to move blocks out of the search queue when they are
> full or fall below a threshold for available space).

Instead of a flag I thought of its own struct with its own functions,
which would not just have a different searching behavior, but also
store the size in the struct such that you can just call
fixed_size_mem_pool_alloc(void) to get another pointer.
A flag might be more elegant.

>
> If this is of interest, I could contribute a patch to enable one
> of these behaviors?

I am tempted to just do away with alloc.c for now and use the mem-pool.

Thanks,
Stefan



 alloc.c | 60 +++--
 1 file changed, 11 insertions(+), 49 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacdd..bf003e161be 100644
--- a/alloc.c
+++ b/alloc.c
@@ -15,6 +15,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "mem-pool.h"
 
 #define BLOCKING 1024
 
@@ -26,61 +27,39 @@ union any_object {
struct tag tag;
 };
 
-struct alloc_state {
-   int count; /* total number of nodes allocated */
-   int nr;/* number of nodes left in current allocation */
-   void *p;   /* first free node in current allocation */
-};
-
-static inline void *alloc_node(struct alloc_state *s, size_t node_size)
-{
-   void *ret;
-
-   if (!s->nr) {
-   s->nr = BLOCKING;
-   s->p = xmalloc(BLOCKING * node_size);
-   }
-   s->nr--;
-   s->count++;
-   ret = s->p;
-   s->p = (char *)s->p + node_size;
-