RE: place of svnrdump

2010-09-28 Thread Bolstridge, Andrew


-Original Message-
From: Ramkumar Ramachandra [mailto:artag...@gmail.com] 
Sent: 27 September 2010 19:58
To: Neels J Hofmeyr
Cc: dev@subversion.apache.org; Daniel Shahaf; Stefan Sperling
Subject: Re: place of svnrdump

Neels J Hofmeyr writes:

 While we're at it... svnsync's slowness is particularly painful when 
 doing 'svnsync copy-revprops'. With revprop changes enabled, any 
 revprops may be changed at any time. So to maintain an up-to-date 
 mirror, one would like to copy *all* revprops at the very least once 
 per day. With a repos of average corporate size, though, that can take

 the whole night and soon longer than the developers need to go home 
 and come back to work next morning (to find the mirror lagging). So 
 one could copy only the youngest 1000 revprops each night and do a 
 complete run every weekend. Or script a revprop-change hook that 
 propagates revprop change signals to mirrors. :(

Of course, you could put a post-revprop-change hook in place to note
which revprop was changed, and then run a script that only syncs those
revprops.

I wouldn't recommend putting the 'sync copy-revprops' command in the
post-revprop-change hook, if someone commits a revision then immediately
updates the revprop the sync will fail (as the rev may not have been
synced yet). 

If anything, changing svnsync to ignore a failed copy-revprop command if
no revision existed to sync to would fix this problem, and the
copy-revprop could then be put in the hook without worry.




Re: NODES table - conditional compilation for op_depth

2010-09-28 Thread Greg Stein
Seems like this will make things even more complicated. I'd be in
favor of *not* switching to NODES unless/until the op_depth is done
properly. If you switch early, then you're going to require another
format bump to reset all the op_depth fields to their proper values. I
don't think an early-switch to NODES before proper op_depth
computation will buy us anything.

Cheers,
-g

On Mon, Sep 27, 2010 at 10:00, Julian Foad julian.f...@wandisco.com wrote:
 Philip and I have started implementing op_depth in the NODES table, but
 we soon found there is more to do than simply calculating a value here
 and there.

 (Implementing op_depth means enabling multiply nested
 copies/replacements/adds/deletes etc., with the ability to revert at
 each level.)

 In the meantime, some tests were breaking, so we have made the full
 op_depth support conditional on SVN_WC__OP_DEPTH.

 Why?  The interim 0/1/2 op_depth values have been working OK in the
 limited context of the amount of nesting that BASE+WORKING supports.  We
 might want to make a transition from BASE_NODE/WORKING_NODE to NODES
 first, before enabling the full op_depth support.  That is probably the
 main reason why this further conditional is useful.  The alternative
 would be to complete op_depth support before switching the default build
 over to NODES.

 Any concerns about working within SVN_WC__OP_DEPTH?

 - Julian





Re: svn propchange: r1001678 - svn:log

2010-09-28 Thread Philip Martin
Julian Foad julian.f...@wandisco.com writes:

 Greg Stein wrote:
 Why are tests failing because of the different values?

 I think it's not because the op_depth value is different, but because
 creating a row with a different op_depth now in some cases means
 creating an *additional* row, and the rest of the code is not yet
 prepared for finding two or more WORKING rows for the same node.
 Something like that.

A typical failure was a delete followed by an add/replace.  The delete
creates a row with op_depth=2 (a temporary value) and the add creates
a row with op_depth=1 (a correct value).  When we select the row with
the highest op_depth we don't see the add.

We had a brief discussion on IRC about this.  The add does an INSERT
OR REPLACE and the op_depth difference makes it an insert rather than
a replace.  Should add determine whether it should insert or replace
and make the appropriate query?  The temporary op_depth values lead to
a corrupt database if the op_depth values are assummed to be correct.
Should add detect the corruption?

-- 
Philip


Re: NODES table - conditional compilation for op_depth

2010-09-28 Thread Philip Martin
Greg Stein gst...@gmail.com writes:

 Seems like this will make things even more complicated. I'd be in
 favor of *not* switching to NODES unless/until the op_depth is done
 properly. If you switch early, then you're going to require another
 format bump to reset all the op_depth fields to their proper values. I
 don't think an early-switch to NODES before proper op_depth
 computation will buy us anything.

There is an overhead to maintaining both sets of code.  It's possible
to build the wrong version, run regression tests with the wrong
version, or break the other version.

-- 
Philip


Re: migration to NODES

2010-09-28 Thread Daniel Shahaf
I'd rather have O(1) hand-crafted wc's than all 950 wc's the test suite
generates, of which likely 99% are 'normal'... (especially as it's the
end-of-run state, after any conflicts have been resolved etc)

Greg Stein wrote on Tue, Sep 28, 2010 at 05:47:36 -0400:
 I took the 1.6.x branch and ran the test suite. It produced about 950
 working copies, and the (compressed) tarball sits at about 850k. I'm
 thinking about recording 'svn status' for each working copy, and
 checking the tarball in as test data. We can then run an 'svn upgrade'
 on each working copy, then 'svn status', and verify that the two
 status results match. (caveat minor improvements in state tracking and
 status reporting)
 
 But... running upgrade on about 950 working copies and checking their
 status isn't cheap. The tarball sizes don't bother me too much... I'm
 more concerned about test suite runtime.
 
 Anybody else? Should this be normal test run? Or should we set up an
 extended set of tests and drop this into that batch?
 
 Or not even go with this approach?


Re: [WIP] Gracefully handle svn_wc_prop_set4() errors in svn patch

2010-09-28 Thread Daniel Shahaf
Daniel Näslund wrote on Mon, Sep 27, 2010 at 22:09:26 +0200:
 * What is wrong with the way I handle the error? I hit err_abort() when
   the program terminates. (I'm expecting the answer to hurt a bit - this
   is surely something that I should have understood by now). I thought
   that since the error is allocated on the heap I could just store the
   pointer to it and free it later, e.g. call svn_error_clear().

err_abort() is called when an error object hadn't been svn_error_clear()'d.
(The error creation installed err_abort() as a pool cleanup callback,
and clearing the error unregisters the callback.)

So, yeah, you can do whatever you want with the error (they get
allocated in a global pool) as long as you svn_error_clear() them
eventually.

 
 Index: subversion/svn/notify.c
 ===
 --- subversion/svn/notify.c   (revision 1001829)
 +++ subversion/svn/notify.c   (arbetskopia)
 @@ -464,6 +464,16 @@ notify(void *baton, const svn_wc_notify_t *n, apr_
  goto print_error;
break;
  
 +case svn_wc_notify_failed_prop_patch:
 +  nb-received_some_change = TRUE;
 +  err = svn_cmdline_printf(pool,
 +   _(property '%s' rejected from '%s'.\n),
 +   n-prop_name, path_local);
 +  svn_handle_warning2(stderr, n-err, svn: );
 +  if (err)
 +goto print_error;
 +  break;
 +

That's fine, print_error: clears the error.

 Index: subversion/libsvn_client/patch.c
 ===
 --- subversion/libsvn_client/patch.c  (revision 1001829)
 +++ subversion/libsvn_client/patch.c  (arbetskopia)
 @@ -130,6 +130,12 @@ typedef struct prop_patch_target_t {
  
/* ### Here we'll add flags telling if the prop was added, deleted,
 * ### had_rejects, had_local_mods prior to patching and so on. */
 +
 +  /* TRUE if the property could not be set on the path. */
 +  svn_boolean_t was_rejected;
 +
 +  /* Set if was_rejected is TRUE. */
 +  svn_error_t *err;
  } prop_patch_target_t;
  
  typedef struct patch_target_t {
 @@ -1573,6 +1579,22 @@ send_patch_notification(const patch_target_t *targ

prop_target = svn__apr_hash_index_val(hash_index);
  
 +  if (prop_target-was_rejected)
 +{
 +  svn_wc_notify_t *notify;
 +  svn_wc_notify_action_t action = 
 svn_wc_notify_failed_prop_patch;
 +
 +  notify = svn_wc_create_notify(target-local_abspath 
 +? target-local_abspath
 +: target-local_relpath,
 +action, pool);
 +  notify-prop_name = prop_target-name;
 +  notify-err = prop_target-err;
 +
 +  (*ctx-notify_func2)(ctx-notify_baton2, notify, pool);
 +  svn_error_clear(prop_target-err);
 +}
 +
for (i = 0; i  prop_target-content_info-hunks-nelts; i++)
  {
const hunk_info_t *hi;
 @@ -2189,6 +2211,7 @@ install_patched_prop_targets(patch_target_t *targe
svn_stringbuf_t *prop_content;
const char *eol_str;
svn_boolean_t eof;
 +  svn_error_t *err;
  
svn_pool_clear(iterpool);
  
 @@ -2260,14 +2283,23 @@ install_patched_prop_targets(patch_target_t *targe
 * ### stsp: I'd say reject the property hunk.
 * ###   We should verify all modified prop hunk texts using
 * ###   svn_wc_canonicalize_svn_prop() before starting the
 -   * ###   patching process, to reject them as early as possible. */
 -  SVN_ERR(svn_wc_prop_set4(ctx-wc_ctx, target-local_abspath,
 -   prop_target-name,
 -   svn_string_create_from_buf(prop_content, 
 -  iterpool),
 -   TRUE /* skip_checks */,
 -   NULL, NULL,
 -   iterpool));
 +   * ###   patching process, to reject them as early as possible. 
 +   *
 +   * ### dannas: Unfortunately we need the prop_content to run
 +   * ### svn_wc_canonicalize_svn_prop() and we don't have that 
 +   * ### until we've applied our text changes. */
 +  err = svn_wc_prop_set4(ctx-wc_ctx, target-local_abspath,
 + prop_target-name,
 + svn_string_create_from_buf(prop_content, 
 +iterpool),
 + TRUE /* skip_checks */,
 + NULL, NULL,
 + iterpool);
 +  if (err)
 +{
 +  prop_target-was_rejected = TRUE;
 +  prop_target-err = err;

Does prop_target-err always get cleared?

(The answer is probably No.)

 +}
  }
  
   

Re: place of svnrdump

2010-09-28 Thread Daniel Shahaf
Ramkumar Ramachandra wrote on Tue, Sep 28, 2010 at 00:27:38 +0530:
 Hi Neels,
 
 Neels J Hofmeyr writes:
   I just benchmarked it recently and found that it dumps 1 revisions
   of the ASF repository in 106 seconds: that's about 94 revisions per
   second.
  
  Wow! That's magnitudes better than the 5 to 10 revisions per second I'm used
  to (I think that's using svnsync).
 
 Yep :)
 
  While we're at it... svnsync's slowness is particularly painful when doing
  'svnsync copy-revprops'. With revprop changes enabled, any revprops may be
  changed at any time. So to maintain an up-to-date mirror, one would like to
  copy *all* revprops at the very least once per day. With a repos of average
  corporate size, though, that can take the whole night and soon longer than
  the developers need to go home and come back to work next morning (to find
  the mirror lagging). So one could copy only the youngest 1000 revprops each
  night and do a complete run every weekend. Or script a revprop-change hook
  that propagates revprop change signals to mirrors. :(
 
 Wow. This is quite a serious problem. I'm a very new developer, and I
 don't really use Subversion. You should probably let the other
 Subversion developers know about this on a new thread?
 
 @Daniel, @Stefan: Thoughts on this?
 

Use the commits@ list and run copy-revprops only on revisions that
actually had been revprop-edited?

  svnrdump won't help in that compartment, would it?
 
 That would be a feature request (although I'm not sure svnrdump will
 ever be extended to handle that),

How could svnrdump help here?  What we might need is an RA call that has
the server provide the N last revisions to have undergone revprop edits...

 because svnrdump is still very
 young- it just dumps/ loads dumpfiles from remote repositories quickly
 at the moment. I've decided to feature freeze until I fix the perf
 issues for the upcoming release- I'll keep this in mind though.
 
 -- Ram


Re: svnrdump tool

2010-09-28 Thread Daniel Shahaf
Ramkumar Ramachandra wrote on Tue, Sep 28, 2010 at 00:44:40 +0530:
 Hi Daniel,
 
 Daniel Shahaf writes:
  svnsync allows you to sync a subdir of a repository (i.e.,
svnsync $REPOS/trunk/A/B $MIRROR
  ), but does it also create /trunk/A/B in the mirror?
 
 Yes, it does.
 
  But for now I still think that svnrdump invocation I quoted above
  shouldn't have outputted a 'create /trunk' entry in the dumpfile :-).
  @all, what do you think?
 
 Again, it works exactly like svnsync- you might like to read the
 tests. Here's a verbose example.
 

So, I suppose we should have to leave it the way it is, to parallel
svnsync.  Okay.

 $ cat /test.dump
 Node-path: test.txt
 Node-path: trunk
 Node-path: trunk/A
 Node-path: trunk/A/B
 Node-path: trunk/A/B/E
 Node-path: trunk/A/B/E/alpha
 Node-path: trunk/A/B/E/beta
 Node-path: trunk/A/B/F
 Node-path: trunk/A/B/lambda
 Node-path: trunk/A/C
 Node-path: trunk/A/D
 Node-path: trunk/A/D/G
 Node-path: trunk/A/D/G/pi
 Node-path: trunk/A/D/G/rho
 Node-path: trunk/A/D/G/tau
 Node-path: trunk/A/D/H
 Node-path: trunk/A/D/H/chi
 Node-path: trunk/A/D/H/omega
 Node-path: trunk/A/D/H/psi
 Node-path: trunk/A/D/gamma
 Node-path: trunk/A/mu
 Node-path: trunk/iota
 Node-path: trunk/A/D/H/psi
 Node-path: trunk/A
 Node-path: trunk/iota
 Node-path: trunk/B
 Node-path: trunk/A
 Node-path: trunk/A
 Node-path: trunk/B/D
 Node-path: trunk

Why does Node-path: trunk appear twice?

 $ # svnadmin create /test-repo, /mirror and enable pre-revprop-change
 $ svnadmin load /test-repo  /test.dump
 $ svnsync init file:///mirror file:///test-repo/trunk/A/B
 $ svnsync sync file:///mirror
 $ svnadmin dump /mirror | grep Node-path:
 Node-path: trunk
 Node-path: trunk/A
 Node-path: trunk/A/B
 Node-path: trunk/A/B/E
 Node-path: trunk/A/B/E/alpha
 Node-path: trunk/A/B/E/beta
 Node-path: trunk/A/B/F
 Node-path: trunk/A/B/lambda
 Node-path: trunk/A
 Node-path: trunk/A
 Node-path: trunk/A
 Node-path: trunk/A/G
 Node-path: trunk/A/G/pi
 Node-path: trunk/A/G/rho
 Node-path: trunk/A/G/tau
 Node-path: trunk/A/H
 Node-path: trunk/A/H/chi
 Node-path: trunk/A/H/omega
 Node-path: trunk/A/H/psi
 Node-path: trunk/A/gamma
 Node-path: trunk
 $ svnrdump dump file:///test-repo/trunk/A/B
 Node-path: trunk
 Node-path: trunk/A
 Node-path: trunk/A/B
 Node-path: trunk/A/B/E
 Node-path: trunk/A/B/E/alpha
 Node-path: trunk/A/B/E/beta
 Node-path: trunk/A/B/F
 Node-path: trunk/A/B/lambda
 Node-path: trunk/A
 Node-path: trunk/A
 Node-path: trunk/A
 Node-path: trunk/A/G
 Node-path: trunk/A/G/pi
 Node-path: trunk/A/G/rho
 Node-path: trunk/A/G/tau
 Node-path: trunk/A/H
 Node-path: trunk/A/H/chi
 Node-path: trunk/A/H/omega
 Node-path: trunk/A/H/psi
 Node-path: trunk/A/gamma
 Node-path: trunk
 
 Hope that helps.
 
 -- Ram

Thanks for the example,

Daniel


Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

2010-09-28 Thread Daniel Shahaf
 Index: subversion/include/svn_diff.h
 ===
 --- subversion/include/svn_diff.h (revision 1001548)
 +++ subversion/include/svn_diff.h (working copy)
 @@ -112,6 +112,11 @@
(personally I prefer 'svn diff -x-p' to show the function name here)
svn_error_t *(*datasource_open)(void *diff_baton,
svn_diff_datasource_e datasource);
  
 +  /** Open the datasources of type @a datasources. */
 +  svn_error_t *(*datasources_open)(void *diff_baton, apr_off_t *prefix_lines,
 +   svn_diff_datasource_e datasource0,
 +   svn_diff_datasource_e datasource1);
 +

So, you're extending the svn_diff_fns_t struct, which is defined in
a public header.

It's a public struct with no constructor function, so I believe you have
to revv it (into svn_diff_fns2_t) in order to extend it (for binary
compatibility: people allocating this struct and then using a newer
Subversion library at runtime).

If it did have a constructor function, you'd have to extend it only at
the end, and even then make sure that if the added elements are NULL (eg
because an old caller didn't know to fill them) then everything still
worked.

Probably more important to get the logic right than to revv it right
away, though; the latter is a SMOP.

/** Close the datasource of type @a datasource. */
svn_error_t *(*datasource_close)(void *diff_baton,
 svn_diff_datasource_e datasource);
 @@ -124,6 +129,9 @@
  void *diff_baton,
  svn_diff_datasource_e 
 datasource);
  
 +  /** Get the number of identical prefix lines from the @a diff_baton. */
 +  apr_off_t (*get_prefix_lines)(void *diff_baton);
 +
/** A function for ordering the tokens, resembling 'strcmp' in 
 functionality.
 * @a compare should contain the return value of the comparison:
 * If @a ltoken and @a rtoken are equal, return 0.  If @a ltoken is
 Index: subversion/libsvn_diff/diff_memory.c
 ===
 --- subversion/libsvn_diff/diff_memory.c  (revision 1001548)
 +++ subversion/libsvn_diff/diff_memory.c  (working copy)
 @@ -95,7 +95,23 @@
return SVN_NO_ERROR;
  }
  
 +/* Implements svn_diff_fns_t::datasources_open */
 +static svn_error_t *
 +datasources_open(void *baton, apr_off_t *prefix_lines,
 + svn_diff_datasource_e datasource0, 
 + svn_diff_datasource_e datasource1)
 +{
 +  /* Do nothing: everything is already there and initialized to 0 */
 +  return SVN_NO_ERROR;
 +}
  
 +/* Implements svn_diff_fns_t::datasource_get_prefix_lines */
 +static apr_off_t
 +get_prefix_lines(void *baton)
 +{
 +  return 0;
 +}
 +
  /* Implements svn_diff_fns_t::datasource_close */
  static svn_error_t *
  datasource_close(void *baton, svn_diff_datasource_e datasource)
 @@ -189,8 +205,10 @@
  static const svn_diff_fns_t svn_diff__mem_vtable =
  {
datasource_open,
 +  datasources_open,
datasource_close,
datasource_get_next_token,
 +  get_prefix_lines,
token_compare,
token_discard,
token_discard_all
 Index: subversion/libsvn_diff/diff_file.c
 ===
 --- subversion/libsvn_diff/diff_file.c(revision 1001548)
 +++ subversion/libsvn_diff/diff_file.c(working copy)
 @@ -77,6 +77,10 @@
char *curp[4];
char *endp[4];
  
 +  apr_off_t prefix_lines;
 +  int suffix_start_chunk[4];
 +  apr_off_t suffix_offset_in_chunk[4];
 +
/* List of free tokens that may be reused. */
svn_diff__file_token_t *tokens;
  
 @@ -233,7 +237,330 @@
  curp, length, 0, file_baton-pool);
  }
  
 +static svn_error_t *
 +increment_pointer_or_chunk(svn_diff__file_baton_t *file_baton,
 +   char **curp, char **endp, int *chunk_number,
 +   char *buffer, apr_off_t last_chunk_number, int 
 idx)
 +{
 +  apr_off_t length;
  
 +  if ((*curp) == (*endp) - 1)
 +{
 +  if (*chunk_number == last_chunk_number)
 +(*curp)++; /* *curp == *endp with last chunk signals end of file */
 +  else
 +{
 +  (*chunk_number)++;
 +  length = *chunk_number == last_chunk_number ?
 +offset_in_chunk(file_baton-size[idx]) : CHUNK_SIZE;
 +  SVN_ERR(read_chunk(file_baton-file[idx],
 + file_baton-path[idx],
 + buffer, length,
 + chunk_to_offset(*chunk_number),
 + file_baton-pool));
 +  *endp = buffer + length;
 +  *curp = buffer;
 +}
 +}
 +  else
 +{
 +  (*curp)++;
 +}
 +
 +  return SVN_NO_ERROR;
 +}
 +
 +static svn_error_t *
 +decrement_pointer_or_chunk(svn_diff__file_baton_t *file_baton,
 +  

Re: migration to NODES

2010-09-28 Thread Hyrum K. Wright
On Tue, Sep 28, 2010 at 4:47 AM, Greg Stein gst...@gmail.com wrote:
 On Mon, Sep 27, 2010 at 13:25, Julian Foad julian.f...@wandisco.com wrote:
 On Fri, 2010-09-24, Greg Stein wrote:
 On Fri, Sep 24, 2010 at 11:16, Julian Foad julian.f...@wandisco.com wrote:
 ...
  I think we should produce a test framework that can give us a WC
  containing all the different possible WC states.  Then we can write
  tests against this framework, some tests that test specific state, and
  other tests that apply the same operations to all the states and check
  that it works in all the states that it should.

 This requires a manual process of thinking of all states and all
 permutations. I don't trust it.

 This kind of testing is more about checking that the design is
 implemented and that the code paths are exercised ...

 If we could somehow capture working copies from during normal test
 runs, *then* I think we'd have everything. We can easily get the
 terminal state for each test, which is a great first step. It would be
 great if we could also get the intermediate steps.

 ... while this kind is more about checking for regression or unwanted
 changes of behaviour.  The two approaches are complementary.

 Fair enough.

 I took the 1.6.x branch and ran the test suite. It produced about 950
 working copies, and the (compressed) tarball sits at about 850k. I'm
 thinking about recording 'svn status' for each working copy, and
 checking the tarball in as test data. We can then run an 'svn upgrade'
 on each working copy, then 'svn status', and verify that the two
 status results match. (caveat minor improvements in state tracking and
 status reporting)

 But... running upgrade on about 950 working copies and checking their
 status isn't cheap. The tarball sizes don't bother me too much... I'm
 more concerned about test suite runtime.

 Anybody else? Should this be normal test run? Or should we set up an
 extended set of tests and drop this into that batch?

I've been advocating for a long time the need for an extended test
suite.  (I mean, really, merge_tests.py exercises all of the basic
functions in so many more ways than basic_tests.py, that the latter is
pretty much redundant when doing pre-commit checks.)  This might be an
opportunity to start that ball rolling, but it's probably also a lot
of work. :)

-Hyrum


Re: svnrdump tool

2010-09-28 Thread Ramkumar Ramachandra
Hi Daniel,

Daniel Shahaf writes:
  $ cat /test.dump
  Node-path: test.txt
  Node-path: trunk
  Node-path: trunk/A
  Node-path: trunk/A/B
  Node-path: trunk/A/B/E
  Node-path: trunk/A/B/E/alpha
  Node-path: trunk/A/B/E/beta
  Node-path: trunk/A/B/F
  Node-path: trunk/A/B/lambda
  Node-path: trunk/A/C
  Node-path: trunk/A/D
  Node-path: trunk/A/D/G
  Node-path: trunk/A/D/G/pi
  Node-path: trunk/A/D/G/rho
  Node-path: trunk/A/D/G/tau
  Node-path: trunk/A/D/H
  Node-path: trunk/A/D/H/chi
  Node-path: trunk/A/D/H/omega
  Node-path: trunk/A/D/H/psi
  Node-path: trunk/A/D/gamma
  Node-path: trunk/A/mu
  Node-path: trunk/iota
  Node-path: trunk/A/D/H/psi
  Node-path: trunk/A
  Node-path: trunk/iota
  Node-path: trunk/B
  Node-path: trunk/A
  Node-path: trunk/A
  Node-path: trunk/B/D
  Node-path: trunk
 
 Why does Node-path: trunk appear twice?

Some prop change. Unrelated anyway.

-- Ram


Re: add NODES.prior_deleted boolean column

2010-09-28 Thread Greg Stein
On Tue, Sep 28, 2010 at 08:46, Philip Martin philip.mar...@wandisco.com wrote:
 Julian Foad julian.f...@wandisco.com writes:

 I believe the answer is often. A simple 'svn status' will need to
 distinguish between 'A' and 'R', and that is done by checking
 prior_deleted.

 And no... this amount of denormalization will not hurt us.

 OK.  I know that svn status speed is a big deal.

 I'm not sure prior_deleted is an optimisation.  When I last looked at

Great analysis below. I'll take your word for it :-)

prior_deleted is effectively something like base_shadowed, and yeah...
if we're iterating over all nodes within a parent, then it can be
generated simply.

 the performance of SQLite I concluded that status would be too slow if
 it ran per-node queries, it has to run per-dir queries.  So

  SELECT ... FROM BASE_NODE WHERE wc_id = ?1 AND local_relpath = ?2

 is too slow, we need to run

  SELECT ... FROM BASE_NODE WHERE wc_id = ?1 AND parent_relpath = ?2

 iterate over the rows caching the data and then generate the status
 for each child.

To do this, it seems that we're going to need to expose (from wc_db.h)
a structure containing all the row data. Thankfully, this big ol'
structure is *private* to libsvn_wc, and we can change it at will
(unlike svn_wc_status_t). I would suggest that we use a callback --
wc_db can step through each row of the result, fill in the structure,
and execute a callback (clearing an iterpool on each step with the
cursor).

The one tricky part to a callback is eliding all but the highest
op_depth. Does an ORDER BY in the query affect its runtime at all?

...
 but that nested SELECT is expensive.  It's not as bad as doing
 per-node queries but it is still too slow, the database query alone is
 slower than 1.6 status.  I don't think this is something that can be
 fixed using an index because on my test data the runtime generally
 goes up by orders of magnitude when the indexing is wrong.

Yeah... the more indexes present, the more that need to be maintained
when rows are modified.

 I can get round this by querying for all op_depth and using the cache
 to select the highest.  The cache is a hash, keyed on local_relpath,
 that stores the data associated with the highest op_depth seen and it
 simply discards/overwrites lower op_depth.  I've prototyped this and

If we order by local_relpath, then the cache is simply one row
(hanging out in a structure, waiting to be passed to that callback).

 it's as fast as the per-dir BASE_NODE query on my test data.  This is
 not surprising since my test data consists of mostly single op_depth
 with a few double op_depth.  We have to have good performance on such
 working copies because they are the most common, I think it will be
 unusual to have a large working copy where lots of nodes have a large
 number of op_depth.

Agreed.

 Now to prior_deleted.  The fundamental status query looks like

  SELECT ... FROM NODES WHERE wc_id = ?1 AND parent_relpath = ?2

 and all op_depth are seen.  It's quite simple to cache the two highest
 op_depth so prior_deleted only provides information that is already
 available. It's not an optimisation for status, if anything it will
 make the database bigger and slower.

Again, thanks for the analysis. Okay... let's skip it!

...

Cheers,
-g


Re: Some tips on profiling

2010-09-28 Thread Ramkumar Ramachandra
Hi Stefan,

Stefan Fuhrmann writes:
 Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind.
 That gives me a good idea of what code gets executed too often, how
 often a jump (loop or condition) has been taken etc. It will not show the
 non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the
 execution be a factor of 100 (YMMV).

Ah, they're brilliant! Just what I was looking for :D I'm yet to
interpret the output and interpret it to opimize svnrdump though-
might need some more help there; are you available on #svn-dev or
something sometime? Will keep you updated on this either way.

 Under Windows, I'm using the VisualStudio sampling profiler. The
 measurements
 are pretty accurate and the overhead is low. It does not tell me, how often
 a certain code path has been executed. Due to the low overhead, it is
 well-suited for long running (1 min) operations.

Interesting. I didn't know Visual Studio had a profiler. Then again,
I've not used Windows for several years now.

 Also, I find a simple time command very useful to get a first impression
 whether my code is bound by user-runtime, I/O wait or system runtime.

Ah, I want to know how to interpret this correctly. For equivalent
operations
svnrdump says: 7.49s user 1.34s system 91% cpu 9.682 total
svnadmin says: 4.84s user 0.44s system 97% cpu 5.417 total

 To collect data on I/O activity, compression rates and other not readily
 available information, I use simple static counters and printf()
 their values
 at the end of each client call, for instance.

I see.

 Hope that helps.

A ton! Thanks a lot :)

-- Ram


Re: Some tips on profiling

2010-09-28 Thread Philip Martin
Stefan Fuhrmann stefanfuhrm...@alice-dsl.de writes:

  On 27.09.2010 21:44, Ramkumar Ramachandra wrote:
 Could you tell me which tools you use to profile the various
 applications in trunk? I'm looking to profile svnrdump to fix some
 perf issues, but OProfile doesn't seem to work for me.

 Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind.
 That gives me a good idea of what code gets executed too often, how
 often a jump (loop or condition) has been taken etc. It will not show the
 non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the
 execution be a factor of 100 (YMMV).

The performance of svnrdump is likely to be dominated by IO from the
repository, network or disk depending on the RA layer.  strace is a
useful tool to see opens/reads/writes.  You can see what order the
calls occur, how many there are, how big they are and how long they
take.

Valgrind/Callgrind is good and doesn't require you to instrument the
code, but it does help to build with debug information.  It does
impose a massive runtime overhead.

OProfile will give you CPU usage with far lower overhead than
valgrind/callgrind.  Like valgrind/callgrind you don't need to
instrument the code but it works better with debug information and
with modern gcc if you use -O2 then you need -fno-omit-frame-pointer
for the callgraph stuff to work.  I use it like so:

opcontrol --init
opcontrol --no-vmlinux --separate=library --callgraph=32
opcontrol --start
opcontrol --reset
subversion/svnrdump/svnrdump ...
opcontrol --stop
opcontrol --dump
opreport --merge all -l image:/path/to/lt-svnrdump

This is what I get when dumping 1000 revisions from a local mirror of
the Subversion repository over ra_neon:

CPU: Core 2, speed 1200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (Unhalted core cycles) count 10
samples  %app name symbol name
4738 41.1893  no-vmlinux   (no symbols)
1037  9.0150  libxml2.so.2.6.32(no symbols)
700   6.0854  libneon.so.27.1.2(no symbols)
238   2.0690  libc-2.7.so  _int_malloc
228   1.9821  libc-2.7.so  memcpy
221   1.9212  libc-2.7.so  memset
217   1.8865  libc-2.7.so  strlen
191   1.6604  libsvn_subr-1.so.0.0.0   decode_bytes
180   1.5648  libc-2.7.so  vfprintf
171   1.4866  libc-2.7.so  strcmp
153   1.3301  libapr-1.so.0.2.12   apr_hashfunc_default
134   1.1649  libapr-1.so.0.2.12   apr_vformatter
130   1.1301  libapr-1.so.0.2.12   apr_palloc

That's on my Debian desktop.  At the recent Apache Retreat I tried to
demonstrate OProfile on my Ubuntu laptop and could not get it to work
properly, probably because I forgot about -fno-omit-frame-pointer.

Finally there is traditional gprof.  It's a long time since I used it
so I don't remember the details.  You instrument the code at compile
time using CFLAGS=-pg.  If an instrumented function foo calls into a
library bar that is not instrumented then bar is invisible, all you
see is how long foo took to execute.

-- 
Philip


Re: svn commit: r1002271 - /subversion/trunk/subversion/include/svn_client.h

2010-09-28 Thread Daniel Shahaf
julianf...@apache.org wrote on Tue, Sep 28, 2010 at 17:16:59 -:
 Author: julianfoad
 Date: Tue Sep 28 17:16:59 2010
 New Revision: 1002271
 
 URL: http://svn.apache.org/viewvc?rev=1002271view=rev
 Log:
 * subversion/include/svn_client.h
   (svn_client_move4): Document the current rather than the historical
 behaviour of the 'force' flag.  A follow-up to r1002260.
 
 +++ subversion/trunk/subversion/include/svn_client.h Tue Sep 28 17:16:59 2010
 @@ -3892,11 +3892,10 @@ svn_client_move5(svn_commit_info_t **com
   * move_as_child set to @c FALSE, @a revprop_table passed as NULL, and
   * @a make_parents set to @c FALSE.
   *
 - * If @a src_path is a working copy path:
 - *
 - *   - If one of @a src_paths contains locally modified and/or unversioned
 - * items and @a force is not set, the move will fail. If @a force is set
 - * such items will be removed.
 + * Note: The behaviour of @a force changed in r860885 and r861421, when the

Given that's it's a public API's docstring, wouldn't it make more sense
to talk here in terms of release numbers than revision numbers?

i.e., the behaviour of @a force changed in 1.7.2 (r860885 and r861421) ...

 + * 'move' semantics were improved to just move the source including any
 + * modified and/or unversioned items in it.  Before that, @a force
 + * controlled what happened to such items, but now @a force is ignored.
   *
   * @since New in 1.4.
   *
 
 


Re: [WIP] Gracefully handle svn_wc_prop_set4() errors in svn patch

2010-09-28 Thread Daniel Näslund
On Tue, Sep 28, 2010 at 03:45:33PM +0200, Daniel Shahaf wrote:
 Daniel Näslund wrote on Mon, Sep 27, 2010 at 22:09:26 +0200:
  * What is wrong with the way I handle the error? I hit err_abort() when
the program terminates. (I'm expecting the answer to hurt a bit - this
is surely something that I should have understood by now). I thought
that since the error is allocated on the heap I could just store the
pointer to it and free it later, e.g. call svn_error_clear().
 
 err_abort() is called when an error object hadn't been svn_error_clear()'d.
 (The error creation installed err_abort() as a pool cleanup callback,
 and clearing the error unregisters the callback.)

Yes, that was how I understood it.

 So, yeah, you can do whatever you want with the error (they get
 allocated in a global pool) as long as you svn_error_clear() them
 eventually.

Ok.

  Index: subversion/svn/notify.c
  ===
  --- subversion/svn/notify.c (revision 1001829)
  +++ subversion/svn/notify.c (arbetskopia)
  @@ -464,6 +464,16 @@ notify(void *baton, const svn_wc_notify_t *n, apr_
   goto print_error;
 break;
   
  +case svn_wc_notify_failed_prop_patch:
  +  nb-received_some_change = TRUE;
  +  err = svn_cmdline_printf(pool,
  +   _(property '%s' rejected from '%s'.\n),
  +   n-prop_name, path_local);
  +  svn_handle_warning2(stderr, n-err, svn: );
  +  if (err)
  +goto print_error;
  +  break;
  +
 
 That's fine, print_error: clears the error.

  Index: subversion/libsvn_client/patch.c
  ===
  --- subversion/libsvn_client/patch.c(revision 1001829)
  +++ subversion/libsvn_client/patch.c(arbetskopia)
  @@ -130,6 +130,12 @@ typedef struct prop_patch_target_t {
   
 /* ### Here we'll add flags telling if the prop was added, deleted,
  * ### had_rejects, had_local_mods prior to patching and so on. */
  +
  +  /* TRUE if the property could not be set on the path. */
  +  svn_boolean_t was_rejected;
  +
  +  /* Set if was_rejected is TRUE. */
  +  svn_error_t *err;
   } prop_patch_target_t;
   
   typedef struct patch_target_t {
  @@ -1573,6 +1579,22 @@ send_patch_notification(const patch_target_t *targ
 
 prop_target = svn__apr_hash_index_val(hash_index);
   
  +  if (prop_target-was_rejected)
  +{
  +  svn_wc_notify_t *notify;
  +  svn_wc_notify_action_t action = 
  svn_wc_notify_failed_prop_patch;
  +
  +  notify = svn_wc_create_notify(target-local_abspath 
  +? target-local_abspath
  +: target-local_relpath,
  +action, pool);
  +  notify-prop_name = prop_target-name;
  +  notify-err = prop_target-err;
  +
  +  (*ctx-notify_func2)(ctx-notify_baton2, notify, pool);
  +  svn_error_clear(prop_target-err);

Here I'm clearing prop_target-err. Since prop_target-was_rejected is
only set if and error exists (e.g. ! prop_target-err) I was expecting
that err would always be cleared.

[...]

  @@ -2260,14 +2283,23 @@ install_patched_prop_targets(patch_target_t *targe
  +  err = svn_wc_prop_set4(ctx-wc_ctx, target-local_abspath,
  + prop_target-name,
  + svn_string_create_from_buf(prop_content, 
  +iterpool),
  + TRUE /* skip_checks */,
  + NULL, NULL,
  + iterpool);
  +  if (err)
  +{
  +  prop_target-was_rejected = TRUE;
  +  prop_target-err = err;
 
 Does prop_target-err always get cleared?
 
 (The answer is probably No.)

As I said above, my intention was to clear it in
send_patch_notification().

I'll check again and see if I can spot why the err isn't always cleared.

Thanks for your feedback,
Daniel (dannas)


Re: [WIP] Gracefully handle svn_wc_prop_set4() errors in svn patch

2010-09-28 Thread Philip Martin
Daniel Näslund dan...@longitudo.com writes:

 On Tue, Sep 28, 2010 at 03:45:33PM +0200, Daniel Shahaf wrote:
 Daniel Näslund wrote on Mon, Sep 27, 2010 at 22:09:26 +0200:
 
 err_abort() is called when an error object hadn't been svn_error_clear()'d.
 (The error creation installed err_abort() as a pool cleanup callback,
 and clearing the error unregisters the callback.)

 Yes, that was how I understood it.

If you run the program gdb, it will catch the abort.  If you step up
the stack to err_abort and print err[0] then you will see the file and
line where the error was created.  (You may well have worked this out
already).

  @@ -2260,14 +2283,23 @@ install_patched_prop_targets(patch_target_t *targe
  +  err = svn_wc_prop_set4(ctx-wc_ctx, target-local_abspath,
  + prop_target-name,
  + svn_string_create_from_buf(prop_content, 
  +iterpool),
  + TRUE /* skip_checks */,
  + NULL, NULL,
  + iterpool);
  +  if (err)
  +{
  +  prop_target-was_rejected = TRUE;
  +  prop_target-err = err;
 
 Does prop_target-err always get cleared?
 
 (The answer is probably No.)

 As I said above, my intention was to clear it in
 send_patch_notification().

You will still have a problem if there is more that one error and the
assignment above overwrites a previous err, the overwritten err will
have leaked.

-- 
Philip


Re: svn commit: r1002151 - in /subversion/trunk/subversion: libsvn_auth_kwallet/kwallet.cpp svn/main.c svnsync/main.c

2010-09-28 Thread Blair Zajac

On 9/28/2010 6:07 AM, s...@apache.org wrote:

Author: stsp
Date: Tue Sep 28 13:07:23 2010
New Revision: 1002151

URL: http://svn.apache.org/viewvc?rev=1002151view=rev
Log:
Remove some goo introduced in r878078 and follow-ups, which was related to
the Linux-specific code which has been removed in r1002144.



-  INITIALIZE_APPLICATION
+  QCoreApplication *app;
+  if (! qApp)
+{
+  int argc = 1;
+  app = new QCoreApplication(argc, (char *[1]) {(char *) svn});
+}


Out of curiosity, what does this do?

QCoreApplication *app isn't static, so does this do some setup logic 
that we don't need to keep track of?


Blair


Re: svn commit: r1002313 - /subversion/trunk/subversion/tests/cmdline/prop_tests.py

2010-09-28 Thread Daniel Shahaf
pbu...@apache.org wrote on Tue, Sep 28, 2010 at 18:38:19 -:
 +  # Run propget -vR svn:mergeinfo and collect the stdout.
 +  exit_code, pg_stdout, pg_stderr = svntest.actions.run_and_verify_svn(
 +None, None, [], 'propget', SVN_PROP_MERGEINFO, '-vR', wc_dir)
 +

exit_code and pg_stderr aren't checked anywhere.

 +  # Run propget -vR svn:mergeinfo, redirecting the stdout to a file.
 +  arglist = ['svn.exe', 'propget', SVN_PROP_MERGEINFO, '-vR', wc_dir]

s/.exe// ?

 +  redir_file = open(redirect_file, 'wb')
 +  pg_proc = subprocess.Popen(arglist, stdout=redir_file)

Shouldn't this use the svntest/ infrastructure?  Compare
svntest.actions.check_prop().

 +  pg_proc.wait()
 +  redir_file.close()
 +  pg_stdout_redir = open(redirect_file, 'r').readlines()
 +
 +  # Check if the redirected output of svn pg -vR is what we expect.
 +  #
 +  # Currently this fails because the mergeinfo for the three paths is
 +  # interleaved and the lines endings are (at least on Windows) a mix
 +  # of CRLF and LF. See
 +  # http://subversion.tigris.org/issues/show_bug.cgi?id=3721#desc1
 +  unordered_expected_output = svntest.verify.UnorderedOutput([
 +Properties on ' + B_path +  ':\n,
 +Properties on ' + C_path +  ':\n,
 +Properties on ' + D_path +  ':\n,
 +  svn:mergeinfo\n,
 +/subversion/branches/1.5.x:872364-874936\n,
 +/subversion/branches/1.5.x-34184:874657-874741\n,
 +/subversion/branches/1.5.x-34432:874744-874798\n,

So, 'unordered' also ignores repetitions?  (since the last 4 lines
appear only once each, rather than three times each)


Re: [WIP] Gracefully handle svn_wc_prop_set4() errors in svn patch

2010-09-28 Thread Daniel Näslund
On Tue, Sep 28, 2010 at 07:18:39PM +0100, Philip Martin wrote:
 Daniel Näslund dan...@longitudo.com writes:
 
  On Tue, Sep 28, 2010 at 03:45:33PM +0200, Daniel Shahaf wrote:
  Daniel Näslund wrote on Mon, Sep 27, 2010 at 22:09:26 +0200:
  
  err_abort() is called when an error object hadn't been svn_error_clear()'d.
  (The error creation installed err_abort() as a pool cleanup callback,
  and clearing the error unregisters the callback.)
 
  Yes, that was how I understood it.
 
 If you run the program gdb, it will catch the abort.  If you step up
 the stack to err_abort and print err[0] then you will see the file and
 line where the error was created.  (You may well have worked this out
 already).

Turned out that it was caused by prop_target-was_deleted (the flag that
was set when an error was detected) not beeing initialized to FALSE.

Thanks for the suggestion on checking err. Didn't think of that (but I
really should have!).

Thanks,
Daniel


Re: svn commit: r1001413 - in /subversion/branches/performance/subversion: include/svn_string.h libsvn_subr/svn_string.c

2010-09-28 Thread Ed Price
One correction (in the comments -- which are great, BTW):

s/descression/discretion

-Ed


On Mon, Sep 27, 2010 at 5:05 PM, Hyrum K. Wright
hyrum_wri...@mail.utexas.edu wrote:
 +1 to merge to trunk.

 On Sun, Sep 26, 2010 at 6:01 AM,  stef...@apache.org wrote:
 Author: stefan2
 Date: Sun Sep 26 11:01:03 2010
 New Revision: 1001413

 URL: http://svn.apache.org/viewvc?rev=1001413view=rev
 Log:
 Extensively document the benefits of svn_stringbuf_appendbyte and
 the rationals behind its implementation. To simplify the explanations,
 one statement had to be moved.

 * subversion/include/svn_string.h
  (svn_stringbuf_appendbyte): extend docstring to indicate that this method
  is cheaper to call and execute
 * subversion/libsvn_subr/svn_string.c
  (svn_stringbuf_appendbyte): reorder statements for easier description;
  add extensive description about the optimizations done

 Modified:
    subversion/branches/performance/subversion/include/svn_string.h
    subversion/branches/performance/subversion/libsvn_subr/svn_string.c

 Modified: subversion/branches/performance/subversion/include/svn_string.h
 URL: 
 http://svn.apache.org/viewvc/subversion/branches/performance/subversion/include/svn_string.h?rev=1001413r1=1001412r2=1001413view=diff
 ==
 --- subversion/branches/performance/subversion/include/svn_string.h 
 (original)
 +++ subversion/branches/performance/subversion/include/svn_string.h Sun Sep 
 26 11:01:03 2010
 @@ -259,6 +259,10 @@ void
  svn_stringbuf_fillchar(svn_stringbuf_t *str, unsigned char c);

  /** Append a single character @a byte onto @a targetstr.
 + * This is an optimized version of @ref svn_stringbuf_appendbytes
 + * that is much faster to call and execute. Gains vary with the ABI.
 + * The advantages extend beyond the actual call because the reduced
 + * register pressure allows for more optimization within the caller.
  *
  * reallocs if necessary. @a targetstr is affected, nothing else is.
  * @since New in 1.7.

 Modified: subversion/branches/performance/subversion/libsvn_subr/svn_string.c
 URL: 
 http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_subr/svn_string.c?rev=1001413r1=1001412r2=1001413view=diff
 ==
 --- subversion/branches/performance/subversion/libsvn_subr/svn_string.c 
 (original)
 +++ subversion/branches/performance/subversion/libsvn_subr/svn_string.c Sun 
 Sep 26 11:01:03 2010
 @@ -393,28 +393,60 @@ svn_stringbuf_ensure(svn_stringbuf_t *st
  }


 +/* WARNING - Optimized code ahead!
 + * This function has been hand-tuned for performance. Please read
 + * the comments below before modifying the code.
 + */
  void
  svn_stringbuf_appendbyte(svn_stringbuf_t *str, char byte)
  {
 +  char *dest;
 +  apr_size_t old_len = str-len;
 +
   /* In most cases, there will be pre-allocated memory left
    * to just write the new byte at the end of the used section
    * and terminate the string properly.
    */
 -  apr_size_t old_len = str-len;
 -  if (str-blocksize  old_len + 1)
 +  if (str-blocksize  old_len + 1)
     {
 -      char *dest = str-data;
 +      /* The following read does not depend this write, so we
 +       * can issue the write first to minimize register pressure:
 +       * The value of old_len+1 is no longer needed; on most processors,
 +       * dest[old_len+1] will be calculated implicitly as part of
 +       * the addressing scheme.
 +       */
 +      str-len = old_len+1;
 +
 +      /* Since the compiler cannot be sure that *src-data and *src
 +       * don't overlap, we read src-data *once* before writing
 +       * to *src-data. Replacing dest with str-data would force
 +       * the compiler to read it again after the first byte.
 +       */
 +      dest = str-data;

 +      /* If not already available in a register as per ABI, load
 +       * byte into the register (e.g. the one freed from old_len+1),
 +       * then write it to the string buffer and terminate it properly.
 +       *
 +       * Including the byte fetch, all operations so far could be
 +       * issued at once and be scheduled at the CPU's descression.
 +       * Most likely, no-one will soon depend on the data that will be
 +       * written in this function. So, no stalls there, either.
 +       */
       dest[old_len] = byte;
       dest[old_len+1] = '\0';
 -
 -      str-len = old_len+1;
     }
   else
     {
       /* we need to re-allocate the string buffer
        * - let the more generic implementation take care of that part
        */
 +
 +      /* Depending on the ABI, byte is a register value. If we were
 +       * to take its address directly, the compiler might decide to
 +       * put in on the stack *unconditionally*, even if that would
 +       * only be necessary for this block.
 +       */
       char b = byte;
       svn_stringbuf_appendbytes(str, b, 1);
     }






Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

2010-09-28 Thread Johan Corveleyn
Hi Daniel,

Thanks for the feedback.

On Tue, Sep 28, 2010 at 4:11 PM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 Index: subversion/include/svn_diff.h
 ===
 --- subversion/include/svn_diff.h     (revision 1001548)
 +++ subversion/include/svn_diff.h     (working copy)
 @@ -112,6 +112,11 @@
 (personally I prefer 'svn diff -x-p' to show the function name here)

Ok, will do next time.

    svn_error_t *(*datasource_open)(void *diff_baton,
                                    svn_diff_datasource_e datasource);

 +  /** Open the datasources of type @a datasources. */
 +  svn_error_t *(*datasources_open)(void *diff_baton, apr_off_t 
 *prefix_lines,
 +                                   svn_diff_datasource_e datasource0,
 +                                   svn_diff_datasource_e datasource1);
 +

 So, you're extending the svn_diff_fns_t struct, which is defined in
 a public header.

 It's a public struct with no constructor function, so I believe you have
 to revv it (into svn_diff_fns2_t) in order to extend it (for binary
 compatibility: people allocating this struct and then using a newer
 Subversion library at runtime).

 If it did have a constructor function, you'd have to extend it only at
 the end, and even then make sure that if the added elements are NULL (eg
 because an old caller didn't know to fill them) then everything still
 worked.

 Probably more important to get the logic right than to revv it right
 away, though; the latter is a SMOP.

Doh! I'm sure that observation was in the back of my head somewhere,
but I forgot about it while working on the solution. Anyway, you're
right: there is first some more work to get the algorithm right.

I've had some progress:
- The blame_tests.py now all pass (tests 2 and 7 provoked a core
dump). That makes all tests pass. However, although I fixed the
coredump, I don't fully understand the root cause (why they ended up
in the situation where they ended up). So I'm going to study that
first a bit more.
- I have now included support for files with \r eol-style.

I'll send a new version of the patch shortly.

I'm also thinking that I could try to take advantage of -x-b/-x-w or
-x--ignore-eol-style options to make it even faster (right now the
prefix/suffix scanning will stop at the first difference, regardless
if it's a whitespace or eol difference that could/should be ignored).

However, I'm not sure if I should implement this myself, as part of
the find_identical_prefix and find_identical_suffix functions, or
switch to the usage of datasource_get_next_token (which is the
function that is used by the real diff algorithm to extract the
lines, and which uses util.c#svn_diff__normalize_buffer to normalize
whitespace and eol's if needed).

Right now, I don't read entire lines (tokens) but compare each byte as
I go. But I could do it line-based as well (extract line from file1,
extract line from file2, memcmp lines). I would have to make the
calculation of the adler32 hash in datasource_get_next_token
conditional on some boolean, or factor out the part of the function
that's useful to me into a new static function.

There is one caveat to this approach: I'm not sure if it would work
backwards (for suffix scanning). Well, the normalization function
wouldn't have to be changed, but the extraction of lines would have to
go backward. Surely it's possible, but I have no idea how much I'd
have to change the code in get_next_token to get lines backwards...

I'm also not sure if one would be (significantly) faster than the
other: comparing byte-by-byte while going through both files, or
extracting entire lines and then comparing the lines.

Thoughts?

Cheers,
-- 
Johan


Re: svn commit: r1002313 - /subversion/trunk/subversion/tests/cmdline/prop_tests.py

2010-09-28 Thread Paul Burba
Hi Daniel,

Comments inline.  All fixes mentioned were done in r1002372.

On Tue, Sep 28, 2010 at 3:30 PM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 pbu...@apache.org wrote on Tue, Sep 28, 2010 at 18:38:19 -:
 +  # Run propget -vR svn:mergeinfo and collect the stdout.
 +  exit_code, pg_stdout, pg_stderr = svntest.actions.run_and_verify_svn(
 +    None, None, [], 'propget', SVN_PROP_MERGEINFO, '-vR', wc_dir)
 +

 exit_code and pg_stderr aren't checked anywhere.

Neither is pg_stdout...but that entire statement was cruft from an
earlier version of the test; removed it.

 +  # Run propget -vR svn:mergeinfo, redirecting the stdout to a file.
 +  arglist = ['svn.exe', 'propget', SVN_PROP_MERGEINFO, '-vR', wc_dir]

 s/.exe// ?

Actually that should ideally be 'svntest.main.svn_binary'.  Fixed that.

 +  redir_file = open(redirect_file, 'wb')
 +  pg_proc = subprocess.Popen(arglist, stdout=redir_file)

 Shouldn't this use the svntest/ infrastructure?  Compare
 svntest.actions.check_prop().

I didn't use it for two reasons:

First, svntest.actions.check_prop() only supports finding the props on
a single path (and as far as I can tell that works fine, no issue
#3721).

Second, Issue #3721 only occurs when the output of 'svn pg -vR' is
*redirected to a file* - see
http://subversion.tigris.org/issues/show_bug.cgi?id=3721#desc1.
check_prop() is (obviously) all processes and pipes underneath the
covers, so while it may also be possible to show the bug using it, I
wrote the test to hew as closely to the actual bug I witnessed as
possible.

 +  pg_proc.wait()
 +  redir_file.close()
 +  pg_stdout_redir = open(redirect_file, 'r').readlines()
 +
 +  # Check if the redirected output of svn pg -vR is what we expect.
 +  #
 +  # Currently this fails because the mergeinfo for the three paths is
 +  # interleaved and the lines endings are (at least on Windows) a mix
 +  # of CRLF and LF. See
 +  # http://subversion.tigris.org/issues/show_bug.cgi?id=3721#desc1
 +  unordered_expected_output = svntest.verify.UnorderedOutput([
 +    Properties on ' + B_path +  ':\n,
 +    Properties on ' + C_path +  ':\n,
 +    Properties on ' + D_path +  ':\n,
 +      svn:mergeinfo\n,
 +        /subversion/branches/1.5.x:872364-874936\n,
 +        /subversion/branches/1.5.x-34184:874657-874741\n,
 +        /subversion/branches/1.5.x-34432:874744-874798\n,

 So, 'unordered' also ignores repetitions?  (since the last 4 lines
 appear only once each, rather than three times each)

I think you mean the first 3 lines appear only once, all the other
lines appear 3 times each (because the test sets the same mergeinfo on
all three paths and the expected output is for svn pg -v***R***).

But yeah, this test isn't perfect as it would allow repetitions.  That
is the price we pay when using svntest.verify.UnorderedOutput(), which
is required here because there is no guarantee as to the order in
which svn pg -vR will report the three paths.  I tweaked the test to
check the expected number of lines in the actual redirected output, so
that we catch any dups.

Paul


Re: svn commit: r1002313 - /subversion/trunk/subversion/tests/cmdline/prop_tests.py

2010-09-28 Thread Daniel Shahaf
Paul Burba wrote on Tue, Sep 28, 2010 at 18:06:36 -0400:
 On Tue, Sep 28, 2010 at 3:30 PM, Daniel Shahaf d...@daniel.shahaf.name 
 wrote:
  pbu...@apache.org wrote on Tue, Sep 28, 2010 at 18:38:19 -:
  +  # Run propget -vR svn:mergeinfo and collect the stdout.
  +  exit_code, pg_stdout, pg_stderr = svntest.actions.run_and_verify_svn(
  +    None, None, [], 'propget', SVN_PROP_MERGEINFO, '-vR', wc_dir)
  +
 
  exit_code and pg_stderr aren't checked anywhere.
 
 Neither is pg_stdout...but that entire statement was cruft from an
 earlier version of the test; removed it.
 

:-)

  +  redir_file = open(redirect_file, 'wb')
  +  pg_proc = subprocess.Popen(arglist, stdout=redir_file)
 
  Shouldn't this use the svntest/ infrastructure?  Compare
  svntest.actions.check_prop().
 
 I didn't use it for two reasons:

I didn't actually mean check_prop() specifically, but rather the
infrastructure it uses --- main.run_command() and main.svn_binary.

 First, svntest.actions.check_prop() only supports finding the props on
 a single path (and as far as I can tell that works fine, no issue
 #3721).
 
 Second, Issue #3721 only occurs when the output of 'svn pg -vR' is
 *redirected to a file* - see
 http://subversion.tigris.org/issues/show_bug.cgi?id=3721#desc1.
 check_prop() is (obviously) all processes and pipes underneath the
 covers, so while it may also be possible to show the bug using it, I
 wrote the test to hew as closely to the actual bug I witnessed as
 possible.
 

Aha, so you're using Popen() directly because you need to have svn's
stdout be a file and not a pipe.  I'm a bit surprised that we have a bug
that's that sensitive to trigger, but okay. :-)

  +  pg_proc.wait()
  +  redir_file.close()
  +  pg_stdout_redir = open(redirect_file, 'r').readlines()
  +
  +  # Check if the redirected output of svn pg -vR is what we expect.
  +  #
  +  # Currently this fails because the mergeinfo for the three paths is
  +  # interleaved and the lines endings are (at least on Windows) a mix
  +  # of CRLF and LF. See
  +  # http://subversion.tigris.org/issues/show_bug.cgi?id=3721#desc1
  +  unordered_expected_output = svntest.verify.UnorderedOutput([
  +    Properties on ' + B_path +  ':\n,
  +    Properties on ' + C_path +  ':\n,
  +    Properties on ' + D_path +  ':\n,
  +      svn:mergeinfo\n,
  +        /subversion/branches/1.5.x:872364-874936\n,
  +        /subversion/branches/1.5.x-34184:874657-874741\n,
  +        /subversion/branches/1.5.x-34432:874744-874798\n,
 
  So, 'unordered' also ignores repetitions?  (since the last 4 lines
  appear only once each, rather than three times each)
 
 I think you mean the first 3 lines appear only once, all the other
 lines appear 3 times each (because the test sets the same mergeinfo on
 all three paths and the expected output is for svn pg -v***R***).
 
 But yeah, this test isn't perfect as it would allow repetitions.  That
 is the price we pay when using svntest.verify.UnorderedOutput(), which
 is required here because there is no guarantee as to the order in
 which svn pg -vR will report the three paths.  I tweaked the test to
 check the expected number of lines in the actual redirected output, so
 that we catch any dups.
 

In short, Yes (UnorderedOutput() does ignore repetitions), but thanks
for the detailed-as-usual replies!

 Paul


Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

2010-09-28 Thread Daniel Shahaf
Johan Corveleyn wrote on Tue, Sep 28, 2010 at 23:37:23 +0200:
 On Tue, Sep 28, 2010 at 4:11 PM, Daniel Shahaf d...@daniel.shahaf.name 
 wrote:
  Index: subversion/include/svn_diff.h
  ===
  --- subversion/include/svn_diff.h     (revision 1001548)
  +++ subversion/include/svn_diff.h     (working copy)
  @@ -112,6 +112,11 @@
  (personally I prefer 'svn diff -x-p' to show the function name here)
 
 Ok, will do next time.
 

Thanks.

 I've had some progress:
 - The blame_tests.py now all pass (tests 2 and 7 provoked a core
 dump). That makes all tests pass. However, although I fixed the
 coredump, I don't fully understand the root cause (why they ended up
 in the situation where they ended up). So I'm going to study that
 first a bit more.
 - I have now included support for files with \r eol-style.
 
 I'll send a new version of the patch shortly.
 

Sounds good.

 I'm also thinking that I could try to take advantage of -x-b/-x-w or
 -x--ignore-eol-style options to make it even faster (right now the
 prefix/suffix scanning will stop at the first difference, regardless
 if it's a whitespace or eol difference that could/should be ignored).
 
...
 
 Thoughts?

None, actually; I'm not (yet) sufficiently familiar with diff internals. 
But let's please have this work in small, digestible (read: reviewable) 
pieces --- support for -x--foo is exactly the sort of additional 
optimization that can be done in a separate patch (on top of this 
starting patch).


Re: [PATCH] don't do autoprops on symbolic links

2010-09-28 Thread Gavin Beau Baumanis
As a result of no comments for this patch , I have logged it into the issue 
tracker so that it doesn't get lost.

http://subversion.tigris.org/issue-tracker.html
Issue Number : #3722



Gavin Beau Baumanis


On 04/09/2010, at 10:25 PM, Wei-Yin Chen wrote:

 Dear Gavin,
 
 Thanks. The attachment was in my sent box, but it's absent in the mailing 
 archive. Don't know why.
 
 Per Branko's suggestion, I'm using join this time. In case the attachment 
 gets missing again, it is also embedded in the mail body.
 
 Regards,
 Wei-Yin
 
 --- svn_apply_autoprops.py.old  2010-09-04 20:16:28.0 +0800
 +++ svn_apply_autoprops.py.nolink   2010-09-04 20:17:20.0 +0800
 @@ -122,10 +122,12 @@
for autoprops_line in autoprop_lines:
  fnmatch_str = autoprops_line[0]
  prop_list = autoprops_line[1]
 
  matching_filenames = fnmatch.filter(filenames, fnmatch_str)
 +matching_filenames = [f for f in matching_filenames \
 +  if not os.path.islink(os.path.join(dirname, f))]
  if not matching_filenames:
continue
 
  for prop in prop_list:
command = ['svn', 'propset', prop[0], prop[1]]
 
 
 On Sat, Sep 4, 2010 at 4:08 PM, Gavin Beau Baumanis ga...@thespidernet.com 
 wrote:
 Hi Wei-Yin,
 
 Just thought I would mention that you did not attach your updated patch.
 
 
 Gavin Beau Baumanis
 E: ga...@thespidernet.com
 



Re: [PATCH] Support command line options in svn_apply_autoprops.py

2010-09-28 Thread Gavin Beau Baumanis
As a result of no comments for this patch , I have logged it into the issue 
tracker so that it doesn't get lost.

http://subversion.tigris.org/issue-tracker.html
Issue Number : #3723



Gavin Beau Baumanis


On 04/09/2010, at 11:43 PM, Wei-Yin Chen wrote:

 Sorry, don't know why the attachment was removed again. Does this
 mailing list have a strict Content-Type filter? I'm trying
 text/x-patch this time from my linux box. Type
 application/octet-stream doesn't seem to pass through.
 
 From web interface of the mailing archive, the format of the inlined
 version looks strange, and the indentation seems wrong, so here's
 another way to get it:
 
 http://pastebin.com/YWhpnZvF
 
 On Sat, Sep 4, 2010 at 9:05 PM, Wei-Yin Chen wyc...@video.ee.ntu.edu.tw 
 wrote:
 
 Hi,
 
 This patch makes trunk/contrib/client-side/svn_apply_autoprops.py take
 command line options.
 The usage would be:
 svn_apply_autoprops.py [-c config_file] [paths to process...]
 
 The -c option specifies the configuration file name, and it overrides
 the setting in environment variable SVN_CONFIG_FILENAME.
 The rest of the command line arguments are treated as paths to process.
 
 [[[
 Make svn_apply_autoprops.py take command line options.
 Option -c for svn configuration filename, and the rest for paths to process.
 ]]]
 
 Regards,
 Wei-Yin
 
 svn_apply_autoprops.py.patch


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-28 Thread Gavin Beau Baumanis
Hi Daniel (Trebbien),

Just thought I would ask for an update, please.
I still have you on my watch - list - so there is no rush - just making sure 
for my own sanity that I haven't missed an email / thread about your patch 
submission.


Gavin Beau Baumanis


On 22/09/2010, at 2:56 AM, Daniel Shahaf wrote:

 Daniel Trebbien wrote on Tue, Sep 21, 2010 at 08:12:38 -0700:
 On Sun, Sep 19, 2010 at 9:39 PM, Daniel Shahaf d...@daniel.shahaf.name 
 wrote:
 Need the word to after the semicolon for the English to be correct.
 
 Fixed.  (When I say fixed, I mean that I committed a fix in my own
 fork of Subversion:  http://github.com/dtrebbien/subversion )
 
 
 Fine, but please post patches and log messages to this list as usual.
 It's easier for me/us to process them that way.  Thanks.
 
 +svn_error_t *
 +svn_subst_translate_cstring2(const char *src,
 + const char **dst,
 + const char *eol_str,
 + svn_boolean_t repair,
 + apr_hash_t *keywords,
 + svn_boolean_t expand,
 + apr_pool_t *pool)
 +{
 +  return translate_cstring2(dst, NULL, src, eol_str, repair, keywords, 
 expand,
 +pool);
 +}
 
 So, this is revved because svn_subst_translate_string2() needs it (and
 svn_subst_translate_string() doesn't).
 
 I am unsure of what you mean.
 
 I took the original implementation of `svn_subst_translate_cstring2`
 and placed it in the new static utility function `translate_cstring2`.
 `translate_cstring2` accepts another parameter, and its definition
 (the old definition on `svn_subst_translate_cstring2`) was
 appropriately modified to handle this new parameter. This way, I don't
 have to modify the public API.
 
 
 Yes, yes.  You revved three svn_subst_* functions; one publicly and two
 privately.  So I was pointing out (to myself), on each of the latter
 two, where is the call site that needs the additional functionality.
 
 I suggest you go ahead (respond to the review, submit an updated patch,
 etc) without waiting; the svnsync part will eventually get its share of
 reviews.  Also, you may want to consider splitting this patch into two
 parts (subst work separate from svnsync work).
 
 I like the idea of splitting this into two patches.
 
 Okay.  Please post an updated patch to this list when you have it ready :-)
 
 Best,
 
 Daniel
 


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-28 Thread Daniel Trebbien
Hi Gavin,

There have been some things that I have needed to do recently, so I
haven't been working on the patches as much as I would like.  But, I
am still interested in finishing them.

My apologies for the delay...

Daniel


On Tue, Sep 28, 2010 at 6:07 PM, Gavin Beau Baumanis
gav...@thespidernet.com wrote:
 Hi Daniel (Trebbien),

 Just thought I would ask for an update, please.
 I still have you on my watch - list - so there is no rush - just making 
 sure for my own sanity that I haven't missed an email / thread about your 
 patch submission.


 Gavin Beau Baumanis


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 2)

2010-09-28 Thread Gavin Beau Baumanis
Daniel,

Absolutely no apology required.
Just about everyone here is a volunteer of the own spare time.
I was checking for my own purposes, in case I had missed an alternate message 
thread.


Gavin Beau Baumanis


On 29/09/2010, at 11:14 AM, Daniel Trebbien wrote:

 Hi Gavin,
 
 There have been some things that I have needed to do recently, so I
 haven't been working on the patches as much as I would like.  But, I
 am still interested in finishing them.
 
 My apologies for the delay...
 
 Daniel
 
 
 On Tue, Sep 28, 2010 at 6:07 PM, Gavin Beau Baumanis
 gav...@thespidernet.com wrote:
 Hi Daniel (Trebbien),
 
 Just thought I would ask for an update, please.
 I still have you on my watch - list - so there is no rush - just making 
 sure for my own sanity that I haven't missed an email / thread about your 
 patch submission.
 
 
 Gavin Beau Baumanis