On 22/03/18 09:32, Jeff King wrote:
> On Wed, Mar 21, 2018 at 04:59:19PM +0100, Duy Nguyen wrote:
> 
>>> I hate to be a wet blanket, but am I the only one who is wondering
>>> whether the tradeoffs is worth it? 8% memory reduction doesn't seem
>>> mind-bogglingly good,
>>
>> AEvar measured RSS. If we count objects[] array alone, the saving is
>> 40% (136 bytes per entry down to 80). Some is probably eaten up by
>> mmap in rss.
> 
> Measuring actual heap usage with massif, I get before/after peak heaps
> of 1728 and 1346MB respectively when repacking linux.git. So that's ~22%
> savings overall.
> 
> Of the used heap after your patches:
> 
>  - ~40% of that is from packlist_alloc()
>  - ~17% goes to "struct object"
>  - ~10% for the object.c hash table to store all the "struct object"
>  - ~7% goes to the delta cache
>  - ~7% goes to the pack revindex (actually, there's a duplicate 7%
>        there, too; I think our peak is when we're sorting the revindex
>        and have to keep two copies in memory at once)

which begs the question, how much slower would it be if we
replaced the radix-sort with an in-place sort (e.g. heapsort).

I hacked up the patch below, just for fun. I don't have any
large repos (or enough disk space) to do any meaningful perf
tests, but I did at least compile it and it passes the test-suite.
(That is no guarantee that I haven't introduced bugs, of course!)

;-)

ATB,
Ramsay Jones

-- >8 --
Subject: [PATCH] pack-revindex: replace radix-sort with in-place heapsort

Signed-off-by: Ramsay Jones <ram...@ramsayjones.plus.com>
---
 pack-revindex.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/pack-revindex.c b/pack-revindex.c
index ff5f62c03..16f17eac1 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -15,6 +15,7 @@
  * get the object sha1 from the main index.
  */
 
+#ifdef DUMMY
 /*
  * This is a least-significant-digit radix sort.
  *
@@ -112,6 +113,65 @@ static void sort_revindex(struct revindex_entry *entries, 
unsigned n, off_t max)
 #undef BUCKETS
 #undef DIGIT_SIZE
 }
+#endif
+
+static inline void swap(struct revindex_entry *a, int i, int j)
+{
+       struct revindex_entry t;
+
+       t = a[i];
+       a[i] = a[j];
+       a[j] = t;
+}
+
+/*
+ * assume that elements first .. last (array index first-1 .. last-1) obey
+ * the partially ordered tree property, except possibly for the children of
+ * the first element. push down the first element until the partially
+ * ordered tree property is restored.
+ */
+static void push_down(struct revindex_entry *a, int first, int last)
+{
+       int parent = first;
+       int last_node = last / 2;
+
+       while (parent <= last_node) {
+
+               int left = 2 * parent;
+               int right = left + 1;
+               int biggest;
+
+               if (right > last) /* parent only has one child */
+                       biggest = left;
+               else {
+                       if (a[left-1].offset >= a[right-1].offset)
+                               biggest = left;
+                       else
+                               biggest = right;
+
+               }
+
+               if (a[parent-1].offset >= a[biggest-1].offset)
+                       break; /* partially ordered tree property, we're done */
+
+               /* push parent down */
+               swap(a, parent-1, biggest-1);
+               parent = biggest;
+       }
+}
+
+static void sort_revindex(struct revindex_entry *entries, unsigned n, off_t 
max)
+{
+       int i;
+
+       for (i = n/2; i > 0; i--)
+               push_down(entries, i, n);
+
+       for (i = n; i > 1; i--) {
+               swap(entries, 0, i-1);
+               push_down(entries, 1, i-1);
+       }
+}
 
 /*
  * Ordered list of offsets of objects in the pack.
-- 
2.16.0

Reply via email to