Re: My take on the diff-optimizations-bytes branch

Johan Corveleyn Sun, 23 Jan 2011 13:47:07 -0800

On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
<stefanfuhrm...@alice-dsl.de> wrote:
> On 03.01.2011 02:14, Johan Corveleyn wrote:
>> It would be interesting to see where the biggest gains
>> are coming from (I'm guessing from the "per-machine-word"
>> reading/comparing; I'd like to try that first, maybe together with the
>> allocating of the file[] on the stack; I'd like to avoid
>> special-casing file_len==2, splitting up functions in *_slow and
>> *_fast, because it makes it a lot more difficult imho (so I'd only
>> like to do that if it's a significant contributor)). But if you want
>> to leave that as an exercise for the reader, that's fine as well :-).
>
> Exercise is certainly not a bad thing ;)
>
> But I think, the stack variable is certainly helpful
> and easy to do. While "chunky" operation gives
> a *big* boost, it is much more difficult to code if
> you need to compare multiple sources. It should
> be possible, though.


Ok, I finished the exercise :-).

- File info structures on stack instead of on heap: committed in
r1060625. Gives 10% boost.

- chunky operation for multiple sources: committed in r1062532. Gives
~250% boost on my machine.

For suffix scanning, I made a couple of changes to the logic you sent
in your patch (as you remember, there was a problem with the
implementation of suffix scanning [1]). It was (in
scan_backward_fast):

[[[
+#if SVN_UNALIGNED_ACCESS_IS_OK
+
+      if (   (file[0].curp - sizeof(apr_size_t) < file[0].curp)
+          && (file[1].curp - sizeof(apr_size_t) < file[1].curp))
+        {
+          do
+            {
+              file[0].curp -= sizeof(apr_size_t);
+              file[1].curp -= sizeof(apr_size_t);
+            }
+          while (   (file[0].curp >= min_curp0)
+                 && (file[1].curp >= min_curp1)
+                 && (   *(const apr_size_t*)(file[0].curp + 1)
+                     == *(const apr_size_t*)(file[1].curp + 1)));
+
+          file[0].curp += sizeof(apr_size_t);
+          file[1].curp += sizeof(apr_size_t);
+        }
]]]

I changed this to (the N-file equivalent of):
[[[
+#if SVN_UNALIGNED_ACCESS_IS_OK
+
+      if (   (file[0].curp - sizeof(apr_size_t) >= min_curp0)
+          && (file[1].curp - sizeof(apr_size_t) >= min_curp1))
+        {
+          do
+            {
+              file[0].curp -= sizeof(apr_size_t);
+              file[1].curp -= sizeof(apr_size_t);
+            }
+          while (   (file[0].curp - sizeof(apr_size_t) >= min_curp0)
+                 && (file[1].curp - sizeof(apr_size_t) >= min_curp1)
+                 && (   *(const apr_size_t*)(file[0].curp + 1)
+                     == *(const apr_size_t*)(file[1].curp + 1)));
+
+          file[0].curp += sizeof(apr_size_t);
+          file[1].curp += sizeof(apr_size_t);
+        }
]]]

(so, changed the if-comparisons before the do-while-loop, to make sure
there is room to subtract a sizeof(apr_size_t) (the original didn't
really make sense to me), and changed the comparisons with min_curpX
in the while condition (check if there is still room to subtract a
sizeof(apr_size_t)).

After that, all tests went fine.


Now, you suggested in a previous post in this thread that hardcoding
file_len to 2 should speed it up with up to 50%. And you are correct
(on my machine, it's sped up by ~40% if I replace all file_len's in
find_identical_* with 2).

So, that leads me to conclude that I should probably throw away the
generic code, and replace it with 3 variants: find_identical_*2,
find_identical_*3 and find_identical_*4 (once I write *2, the rest
should be mostly copy-n-paste). As you said, this should also vastly
simplify the code (drop the for loops, simpler conditionals, ...).

Cheers,
-- 
Johan

[1] http://svn.haxx.se/dev/archive-2011-01/0022.shtml

Re: My take on the diff-optimizations-bytes branch

Reply via email to