Re: My take on the diff-optimizations-bytes branch

Stefan Fuhrmann Mon, 03 Jan 2011 03:18:00 -0800

On 03.01.2011 02:14, Johan Corveleyn wrote:

On Mon, Jan 3, 2011 at 12:13 AM, Johan Corveleyn<jcor...@gmail.com>  wrote:

On Sun, Jan 2, 2011 at 10:04 PM, Johan Corveleyn<jcor...@gmail.com>  wrote:

On Sat, Jan 1, 2011 at 11:25 PM, Stefan Fuhrmann
<stefanfuhrm...@alice-dsl.de>  wrote:

Hi Johan,


Thursday night I did something stupid and had a look at  how
svn blame could be made faster based on the HEAD code in
your branch.

One night and most of the following day later, I think I made it
a few percent faster now. Some of the code I committed directly
to /trunk and you need to pull these changes into your branch
to compile the attached patch.

Feel free to test it, take it apart and morph it any way you like --
I know the patch isn't very pretty in the first place. I tested the
patch on x64 LINUX and would like to hear whether it at least
somewhat improved performance on your system for your
svn blame config.xml use-case.

-- Stefan^2.

[[[
Speed-up of datasource_open() and its sub-routines
by a series of optimizations:

* allocate the file[] array on stack instead of heap
  (all members become directly addressible through
  the stack pointer because all static sub-functions
  will actually be in-lined)
* minor re-arragements in arithmetic expressions to
  maximize reuse of results (e.g. in INCREMENT_POINTERS)
* move hot spots to seperate functions and provide a
  specialized version for file_len == 2
  (many loops collapse to a single execution, other
  values can be hard-coded, etc.)
* use seperate loops to process data within a chunk
  so we don't need to call INCREMENT_POINTERS&  friends
  that frequently
* scan / compare strings in machine-word granularity
  instead of bytes
]]]

Hi Stefan,

Thanks for taking a look. I really appreciate it.

When I saw your first couple of commits, to the adler32 stuff, I
already thought: hmmm, he's up to something :-). And after I saw your
change to eol.c#svn_eol__find_eol_start, reading per machine-word, I
was thinking: hey, I could really use something like that in the
prefix/suffix scanning. Nice ... :-)

(I had some trouble applying your patch. It doesn't apply cleanly
anymore to HEAD of the branch (but most hunks were applied correctly,
and I could manually change the rest, so no problem)).

However, first tests are not so great. In fact, it's about 25% slower
(when blaming my settings.xml file with 2200 revisions, it's spending
about 90 seconds in diff, vs. around 72 with HEAD of
diff-optimizations-bytes).

Analyzing further, I can see it's spending significantly less time in
prefix/suffix scanning, but more time in token.c#svn_diff__get_tokens
(which is where the original algorithm gets the tokens/lines and
inserts them into the "token tree"). This tells me it's not working
correctly: either prefix/suffix scanning fails too soon, so there's
much more left for the regular algorithm. Or it's just not functioning
correctly.

Looking at the result of the blame operation, it seems it's the
latter. The final result of the blame is not correct anymore.

I'll try to look some more into it, to see what's going wrong. Maybe
you can also see it with a simple diff of a large file ... (for a good
example in svn's own codebase, you could try
subversion/tests/cmdline/merge-tests.py (pretty large, and almost 700
revisions)).

Ok, by eliminating parts of the new stuff, I found out the problem is
in scan_backward_fast (part of suffix scanning). If I bypass that, and
always take scan_backward_slow, it works.

And it's fast too! It's taking only 58 seconds in "diff", vs. 72 for
the normal version. Splitting that up further, it's taking only 28
seconds in prefix/suffix scanning, vs. ~47 for the normal version (for
some reason, the lcs processing took a little longer, but that's
probably unrelated (only did a single run)). And that's with
scan_backward_slow. That's pretty amazing.

On x64, datasource_open() got about 10x faster
but my test-case was the TSVN changelog where
we (mainly) insert at the beginning instead of
close to end of the file. So, one broken scan
direction seems completely plausible ;)

The code of scan_backward_fast is pretty difficult to understand for
me. So I'm not sure if I can spot the problem. I'll continue staring
at it for a little while ...

Ok, I can't find it right now. There must be some functional
difference between scan_backward_fast and scan_backward_slow.

One way to get closer to the problem:

* create a backup of the current rename scan_backward_fast code
* copy scan_backward_slow to scan_backward_fast
* replace all file_len with a hard-coded "2"
  -> can give a 50% boost depending on optimizer cleverness
* step-by-step apply changes from the old scan_backward_fast code

For now, some feedback on the rest of the patch:

[[[

Index: subversion/libsvn_diff/diff_file.c
===================================================================
--- subversion/libsvn_diff/diff_file.c  (revision 1054044)
+++ subversion/libsvn_diff/diff_file.c  (working copy)

<  ... snip ...>

@@ -363,53 +364,152 @@
  /* Check whether one of the FILEs has its pointers at EOF (this is the case if
   * one of them has curp == endp (this can only happen at the last chunk)) */
  static svn_boolean_t
-is_one_at_eof(struct file_info *file[], apr_size_t file_len)
+is_one_at_eof(struct file_info file[], apr_size_t file_len)
  {
    apr_size_t i;

    for (i = 0; i<  file_len; i++)
-    if (file[i]->curp == file[i]->endp)
+    if (file[i].curp == file[i].endp)
        return TRUE;

    return FALSE;
  }

-/* Find the prefix which is identical between all elements of the FILE array.
- * Return the number of prefix lines in PREFIX_LINES.  REACHED_ONE_EOF will be
- * set to TRUE if one of the FILEs reached its end while scanning prefix,
- * i.e. at least one file consisted entirely of prefix.  Otherwise,
- * REACHED_ONE_EOF is set to FALSE.
- *
- * After this function is finished, the buffers, chunks, curp's and endp's
- * of the FILEs are set to point at the first byte after the prefix. */
+/* Quickly determine whether there is a new-line char in CHUNK.
+ * (mainly copy-n-paste from find_eol_start).
+ */
+#if APR_SIZEOF_VOIDP == 8
+#  define LOWER_7BITS_SET 0x7f7f7f7f7f7f7f7f
+#  define BIT_7_SET       0x8080808080808080
+#  define R_MASK          0x0a0a0a0a0a0a0a0a
+#  define N_MASK          0x0d0d0d0d0d0d0d0d
+#else
+#  define LOWER_7BITS_SET 0x7f7f7f7f
+#  define BIT_7_SET       0x80808080
+#  define R_MASK          0x0a0a0a0a
+#  define N_MASK          0x0d0d0d0d
+#endif
+
+static svn_boolean_t contains_nl(apr_size_t chunk)
+{
+  apr_size_t r_test = chunk ^ R_MASK;
+  apr_size_t n_test = chunk ^ N_MASK;
+
+  r_test |= (r_test&  LOWER_7BITS_SET) + LOWER_7BITS_SET;
+  n_test |= (n_test&  LOWER_7BITS_SET) + LOWER_7BITS_SET;
+
+  return (r_test&  n_test&  BIT_7_SET) != BIT_7_SET;
+}
+
+/* Optimized variant of the guts of find_identical_prefix for
+ * file_len == 2.
+ */
  static svn_error_t *
-find_identical_prefix(svn_boolean_t *reached_one_eof, apr_off_t *prefix_lines,
-                      struct file_info *file[], apr_size_t file_len,
-                      apr_pool_t *pool)
+find_identical_prefix_fast(svn_boolean_t *reached_one_eof,
+                           apr_off_t *prefix_lines,
+                           struct file_info file[],
+                           apr_pool_t *pool)
  {
    svn_boolean_t had_cr = FALSE;
+  apr_off_t lines = 0;
+  apr_size_t i = 0;
+
+  *reached_one_eof = FALSE;
+
+  while (*file[0].curp == *file[1].curp)
+    {
+      if (*file[0].curp == '\r')
+        {
+          lines++;
+          had_cr = TRUE;
+        }
+      else if (*file[0].curp == '\n'&&  !had_cr)
+        {
+          lines++;
+        }
+      else
+        {
+          had_cr = FALSE;
+        }
+
+      INCREMENT_POINTERS(file, 2, pool);
+
+#if SVN_UNALIGNED_ACCESS_IS_OK
+
+      /* Skip quickly over the stuff between EOLs. */
+
+      while (   (file[0].curp + sizeof(apr_size_t)<  file[0].endp)
+&&  (file[1].curp + sizeof(apr_size_t)<  file[1].endp)
+&&  (   *(const apr_size_t *)file[0].curp
+                 == *(const apr_size_t *)file[1].curp)
+&&  !contains_nl(*(const apr_size_t *)file[0].curp))
+        {
+          file[0].curp += sizeof(apr_size_t);
+          file[1].curp += sizeof(apr_size_t);
+        }
+
+#endif
+
+      /* curp == endp indicates EOF (this can only happen with last chunk) */
+      if (is_one_at_eof(file, 2))
+        {
+          *reached_one_eof = TRUE;
+          break;
+        }
+    }
+
+  *prefix_lines = lines;
+
+  /* If all files reached their end (i.e. are fully identical), we're done */
+  if (file[0].curp == file[0].endp&&  file[1].curp == file[1].endp)
+    return SVN_NO_ERROR;
+
+  if (had_cr)
+    {
+      /* Check if we ended in the middle of a \r\n for one file, but \r for
+         another. If so, back up one byte, so the next loop will back up
+         the entire line. Also decrement *prefix_lines, since we counted one
+         too many for the \r. */
+      if ((*file[0].curp == '\n') || (*file[1].curp == '\n'))

Here (or below DECREMENT, but within this same if condition) you need:
          (*prefix_lines)--;

just like in the original (and *_slow) case. As is explained in the
comment above: if we are here, we actually counted a full prefix line
too much, because we already counted one for a matching '\r', but the
character after that was '\n' in one file, but a different character
in the other. So the eol-termination is different, so this is a
non-matching line which needs to be fully rolled back (hence also the
extra DECREMENT here).

I'm not even sure the original code is correct at all.
For instance, it does not handle '\r' (Mac style) EOLs
properly. Not really sure how to fix that.

+        DECREMENT_POINTERS(file, 2, pool);
+    }
+
+  return SVN_NO_ERROR;
+}
+
+/* Non-optimized variant of the guts of find_identical_prefix for
+ * file_len != 2.
+ */
+static svn_error_t *
+find_identical_prefix_slow(svn_boolean_t *reached_one_eof,
+                           apr_off_t *prefix_lines,
+                           struct file_info file[], apr_size_t file_len,
+                           apr_pool_t *pool)
+{
+  svn_boolean_t had_cr = FALSE;
    svn_boolean_t is_match, reached_all_eof;
    apr_size_t i;
+  apr_off_t lines = 0;

    *prefix_lines = 0;
    for (i = 1, is_match = TRUE; i<  file_len; i++)
-    is_match = is_match&&  *file[0]->curp == *file[i]->curp;
+    is_match = is_match&&  *file[0].curp == *file[i].curp;
+
    while (is_match)
      {
        /* ### TODO: see if we can take advantage of
           diff options like ignore_eol_style or ignore_space. */
        /* check for eol, and count */
-      if (*file[0]->curp == '\r')
+      if (*file[0].curp == '\r')
          {
-          (*prefix_lines)++;
+          lines++;
            had_cr = TRUE;
          }
-      else if (*file[0]->curp == '\n'&&  !had_cr)
+      else if (*file[0].curp == '\n'&&  !had_cr)
          {
-          (*prefix_lines)++;
-          had_cr = FALSE;
+          lines++;
          }
-      else
+      else
          {
            had_cr = FALSE;
          }
@@ -422,12 +522,14 @@
          break;
        else
          for (i = 1, is_match = TRUE; i<  file_len; i++)
-          is_match = is_match&&  *file[0]->curp == *file[i]->curp;
+          is_match = is_match&&  *file[0].curp == *file[i].curp;
      }

+  *prefix_lines = lines;
+
    /* If all files reached their end (i.e. are fully identical), we're done */
    for (i = 0, reached_all_eof = TRUE; i<  file_len; i++)
-    reached_all_eof = reached_all_eof&&  file[i]->curp == file[i]->endp;
+    reached_all_eof = reached_all_eof&&  file[i].curp == file[i].endp;
    if (reached_all_eof)
      return SVN_NO_ERROR;

@@ -440,20 +542,40 @@
        svn_boolean_t ended_at_nonmatching_newline = FALSE;
        for (i = 0; i<  file_len; i++)
          ended_at_nonmatching_newline = ended_at_nonmatching_newline
-                                       || *file[i]->curp == '\n';
+                                       || *file[i].curp == '\n';
        if (ended_at_nonmatching_newline)
-        {
-          (*prefix_lines)--;

Same as above: this (*prefix_lines)-- needs to stay, I think.

-          DECREMENT_POINTERS(file, file_len, pool);
-        }
+        DECREMENT_POINTERS(file, file_len, pool);
      }

+  return SVN_NO_ERROR;
+}
+
+/* Find the prefix which is identical between all elements of the FILE array.
+ * Return the number of prefix lines in PREFIX_LINES.  REACHED_ONE_EOF will be
+ * set to TRUE if one of the FILEs reached its end while scanning prefix,
+ * i.e. at least one file consisted entirely of prefix.  Otherwise,
+ * REACHED_ONE_EOF is set to FALSE.
+ *
+ * After this function is finished, the buffers, chunks, curp's and endp's
+ * of the FILEs are set to point at the first byte after the prefix. */
+static svn_error_t *
+find_identical_prefix(svn_boolean_t *reached_one_eof, apr_off_t *prefix_lines,
+                      struct file_info file[], apr_size_t file_len,
+                      apr_pool_t *pool)
+{
+  if (file_len == 2)
+    SVN_ERR(find_identical_prefix_fast(reached_one_eof, prefix_lines,
+                                       file, pool));
+  else
+    SVN_ERR(find_identical_prefix_slow(reached_one_eof, prefix_lines,
+                                       file, file_len, pool));
+

Small problem here (not really visible in practice, but potential
change in behavior none the less): if either of the above function
calls exited early because of reached_all_eof, this function should
also exit early.

I'll have a look at that tonight.

If both (all) files reached their end, they are completely identical,
and it makes no sense to roll back the last (incomplete) line.

So maybe the check for reached_all_eof should be pulled out of the
above functions, and into this one? But then you'll also need to move
the ended_at_nonmatching_newline stuff, so transfer the information of
had_cr. Maybe you've considered that already ...

The rest looks fine from a correctness standpoint (and tests
correctly), except for scan_backwards_fast.

Also, it would really help if you could split up this patch (though
this was certainly a good way to try it out and give me an idea of the
possibilities). It would be interesting to see where the biggest gains
are coming from (I'm guessing from the "per-machine-word"
reading/comparing; I'd like to try that first, maybe together with the
allocating of the file[] on the stack; I'd like to avoid
special-casing file_len==2, splitting up functions in *_slow and
*_fast, because it makes it a lot more difficult imho (so I'd only
like to do that if it's a significant contributor)). But if you want
to leave that as an exercise for the reader, that's fine as well :-).

Thanks.

Cheers,
Johan


-- Stefan^2.

Re: My take on the diff-optimizations-bytes branch

Reply via email to