Re: Inconsistent behavior in cat command

2010-12-02 Thread Noorul Islam K M
Noorul Islam K M noo...@collab.net writes:

 When I was trying to come up with a patch for issue 3713, I observed the
 following.

 For example I have two files 1.txt and 2.txt in a repository located at
 file:///tmp/testrepo

 svn cat behaves differently for local paths and URLs. See the
 illustration below.

 noo...@noorul:/tmp/wc/testrepo$ svn cat 1.txt 2.txt
 1
 2

 A) Local non-existent target followed by existing target

 noo...@noorul:/tmp/wc/testrepo$ svn cat 3.txt 1.txt
 svn: warning: '/tmp/wc/testrepo/3.txt' is not under version control
 1

 B) Non-existent URL followed by existing URL

 noo...@noorul:/tmp/wc/testrepo$ svn cat ^/3.txt ^/1.txt

 svn: File not found: revision 1, path '/3.txt'


 In case A, even though the first target was non-existent it performs cat
 operation on the second target but in the case of B, svn errors out at
 the first failure itself.

 I am not sure about behavior of other svn commands which accepts
 multiple targets. When I discussed this Julian in IRC, he said a
 discussion is needed to come up with standardized behavior across svn
 commands. Any thoughts?


Any updates?

Thanks and Regard
Noorul


Re: Can I add NOT NULL to PRISTINE table columns?

2010-12-02 Thread Julian Foad
I (Julian Foad) wrote:
  Julian Foad julian.f...@wandisco.com writes:
  
   I imagine it should be possible to add 'NOT NULL' to these columns
   without performing a format bump or writing any upgrade code.  Am I
   right?
 
 Hyrum Wright wrote:
  If we're already enforcing it in the C code, I see no problem with
  doing so in the sql schema.  Additionally, I don't really see how you
  could add the 'NOT NULL' clause in a format bump without doing table
  duplication, which doesn't seem worth the trouble for something like
  this.
 
 Right.
 
 Philip Martin wrote:
  Existing working copies would not be upgraded, although I suppose the
  client would work.  Perhaps make it a format 99 step?
 
 The software won't see any difference except it will provide a sanity
 check when we're developing and modifying Subversion code and testing
 against new WCs.  I don't think there is any need to upgrade existing
 development WCs in this respect.
 
 I'll mention it in a comment in the format 99 step, but don't think we
 need to write any SQL for it.

Committed r1041321.

- Julian




Re: svn commit: r1041230 - /subversion/trunk/subversion/include/svn_checksum.h

2010-12-02 Thread Julian Foad
On Wed, 2010-12-01, Blair Zajac wrote:
 On 12/1/10 4:38 PM, stef...@apache.org wrote:
  Author: stefan2
  Date: Thu Dec  2 00:38:17 2010
  New Revision: 1041230
 
  URL: http://svn.apache.org/viewvc?rev=1041230view=rev
  Log:
  Fix the svn_checksum_to_cstring() docstring to actually say what
  was intended. Also, make clear that the behavior is new for 1.7 and
  trying to use it in 1.6 will cause segfaults.
 
  * subversion/include/svn_checksum.h
 (svn_checksum_to_cstring): fix docstring
 
 What happens if somebody makes a svn tool that is compiled and built against 
 the 
 new 1.7 behavior and then it is backported to 1.6, it may core dump.

Any back-port effort of this kind needs to take account of all API
changes, not just newly created APIs.  If it doesn't, then yes it may
crash.  I believe that is an expected result of our compatibility rules,
and not without precedent, although I can't quickly lay my hands on a
good example.

 Should we add a svn_checksum_to_cstring2() instead with the new behavior or 
 backport this change to 1.6?  But even then we'll have 1.6 versions with 
 different behavior.  It seems making a new svn_checksum_to_cstring2() is 
 better.

We should not port this change to 1.6.x.  I don't believe we need to rev
the API for this reason; however, I haven't reviewed the change and so I
don't know if there are other reasons.

- Julian




Re: svn commit: r1040831 - in /subversion/trunk/subversion: include/svn_checksum.h libsvn_subr/checksum.c

2010-12-02 Thread Julian Foad
Hi Stefan2.

A good test for whether it's worth making an API accept NULL as an input
is: what proportion of the callers would find that useful?  I see there
are about 40 callers in the code base.  Would you mind scanning through
them and letting us know?

- Julian


On Thu, 2010-12-02 at 07:51 +0200, Daniel Shahaf wrote:
 Stefan Fuhrmann wrote on Thu, Dec 02, 2010 at 01:39:34 +0100:
  On 01.12.2010 04:25, Daniel Shahaf wrote:
  stef...@apache.org wrote on Tue, Nov 30, 2010 at 23:56:41 -:
  Author: stefan2
  Date: Tue Nov 30 23:56:40 2010
  New Revision: 1040831
 
  URL: http://svn.apache.org/viewvc?rev=1040831view=rev
  Log:
  Fix 'svnadmin verify' for format 5 repositories. During the checking 
  process,
  they yield NULL for SHA1 checksums. The most robust way to fix that is to
  allow NULL for checksum in svn_checksum_to_cstring. This seems justified
  as NULL is already a valid return value instead of e.g. an empty string. 
  So,
  callers may receive NULL as result and could therefore assume that NULL is
  a valid input, too
 
  Can you explain how to trigger the bug you are fixing?  Just running
  'svnadmin verify' on my r1040058-created Greek repository doesn't.
  Sure:
 
  $svnadmin-1.5.4 create /mnt/svnroot/test
  $add pre-revprop-change hook
  $svnsync-1.5.4 init file:///mnt/svnroot/test http://svn.apache.org/repos/asf
  $svnsync-1.5.4 sync file:///mnt/svnroot/test
  $cancel after a few revs; rev 1 will already trigger the error
  $svnadmin-trunk verify /mnt/svnroot/test
 
 And then the dump logic calls svn_fs_file_checksum(force=FALSE), which
 causes a NULL checksum to be returned.
 
 Thanks for the recipe; knowing it it's easier to review the fix.
 
 Daniel




Re: svn commit: r1041102 - /subversion/trunk/subversion/include/svn_io.h

2010-12-02 Thread Julian Foad
On Thu, 2010-12-02, Daniel Shahaf wrote:
 julianf...@apache.org wrote on Wed, Dec 01, 2010 at 17:44:50 -:
  -/** Copy file @a file from location @a src_path to location @a dest_path.
  - * Use @a pool for memory allocations.
  +/** Copy the file whose basename or relative path is @a file within
  + * directory @a src_path to the same basename or relative path within
  + * directory @a dest_path.  Overwrite the destination if it already
 
 If you allow relative paths as input for @a file, then shouldn't the
 docstring say whether or not this creates intermediate directories?
 
 Example: svn_io_dir_file_copy('A/B', 'A/C', 'E/alpha', NULL).

r1041336. Thanks.

- Julian


  + * exists.  Set the destination file's permissions to match those of
  + * the source.  Use @a pool for memory allocations.
*/
   svn_error_t *
   svn_io_dir_file_copy(const char *src_path,
const char *dest_path,
const char *file,
apr_pool_t *pool)
  
  




Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-02 Thread Julian Foad
On Wed, 2010-12-01, Daniel Shahaf wrote: 
 Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +:
  On Wed, 2010-12-01, stef...@apache.org wrote:
   Port (not merge) a fix for a FSFS packing race condition from the
   performance branch to /trunk: There is a slight time window
   between finding the name of a rev file and actually open it. If
   the revision in question gets packed just within this window,
   we will find the new (and final) file name with just one retry.
   
   * subversion/libsvn_fs_fs/fs_fs.c
 (open_pack_or_rev_file): retry once upon missing file error
  
  Hi Stefan.
  
  (1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
  that the returned path may no longer be valid by the time you come to
  use it, and the reason for that.  Does my patch below sound right?
  
 
 Well, yes, and Stefan already added such a comment at the definition at
 my suggestion.  But +1 to move that to the declaration.
 
  (2) Doesn't the exact same race exist in *all* uses of
  svn_fs_fs__path_rev_absolute()?  There are five other calls to it,
 
 As far as I can tell, apart from open_pack_or_rev_file(), all callers
 always[1] run under the write lock.  Since pack operation take the write
 lock too, it's not possible for the result of svn_fs_fs__path_rev_absolute()
 to become outdated for those callers.

OK, that's good.  No change needed then.


 [1] with the exception of set_revision_proplist(), which can be called
 either under the write lock or under write_revision_zero() --- but in
 the latter case it's called before the format file has been created.
 
  all of them using the result as a permissions reference to set the
  file permissions on some new file.  It seems to me that those uses can
  fail in the same way.
  
  The root problem is the existence of a get the path of this file API
  in the presence of semantics whereby the file might be removed from that
  path at any time.
  
  Perhaps your fix is the best fix for this caller, but for the other
  callers I think creating and using a copy file permissions from
  rev-file of rev R API would be a better solution than adding re-tries
  there.  That API can do the re-try internally.
  
  What do you think?
  
  
  Patch for (1) above:
  [[[
  Index: subversion/libsvn_fs_fs/fs_fs.h
  ===
  --- subversion/libsvn_fs_fs/fs_fs.h (revision 1040662)
  +++ subversion/libsvn_fs_fs/fs_fs.h (working copy)
  @@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
  PERMS_REFERENCE.  Temporary allocations are from POOL. */
   svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
   const char *new_filename,
   const char *perms_reference,
   apr_pool_t *pool);
   
  -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
  +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
  +   The path is not guaranteed to remain correct after the function returns,
  +   because it is possible that the revision will become packed just after
  +   this call, so the caller should re-try once if the path is not found.
  Allocate in POOL. */
 
 Sounds right.
 
 Bordering on bikeshed, I would suggest some changes:
 
 * Separate describing what the function does (Sets *PATH to FOO and
   allocates in POOL) from describing the conditions the caller should
   beware of / check for (sometimes the return value will be out-of-date
   by the time you receive it).
 
 * Mention that the non-guarantee is not in effect if the caller has the
   write lock.

OK, committed in r1041056 with these improvements.

   svn_error_t *
   svn_fs_fs__path_rev_absolute(const char **path,
svn_fs_t *fs,
svn_revnum_t rev,
apr_pool_t *pool);
 
 Also, svn_fs_fs__path_rev_absolute() also does, internally, a single
 repeat loop.  Theoretically this isn't needed any more now (since all
 callers either run under the write lock or retry)...

Can you check the attached patch for this, please?

- Julian


Remove the re-try logic from svn_fs_fs__path_rev_absolute().  Since
r1040832, all its callers correctly account for the possibility of an
out-of-date result due to a concurrent packing operation.

The re-try logic was introduced in r875097 and reduced but did not eliminate
the window of opportunity for the caller to use an out-of-date result.

See the email thread http://svn.haxx.se/dev/archive-2010-12/0019.shtml,
subject Re: svn commit: r1040832 - Port a fix for a FSFS packing race.

* subversion/libsvn_fs_fs/fs_fs.c
  (svn_fs_fs__path_rev_absolute): Remove the re-try logic.

* subversion/libsvn_fs_fs/fs_fs.h
  (svn_fs_fs__path_rev_absolute): Update the doc string accordingly.
--This line, and those below, will be ignored--

Index: 

Re: Input validation observations

2010-12-02 Thread Julian Foad
On Thu, 2010-12-02 at 14:05 +0530, Noorul Islam K M wrote:
 Julian Foad julian.f...@wandisco.com writes:
 
  I tried some potentially invalid inputs to svn a week or two ago and
  made notes on what I found.  Just posting here in case anyone wants to
  do something about one or more of them.
 
  Noorul, I'm including you in the To addresses because you said you
  were looking for more small tasks to do, so feel free to pick one of
  these if you want to.
 
  Where I end with a question mark, it means I'm not sure if we want this
  change, it's just a suggestion.
 
* svn checkout ^/ ^/y - A asf/cxf, A asf/cxf/utils,   (Don't
  try this without being ready on the Ctrl-C or Ctrl-\!)  It seems to
  ignore ^/y and create ./(basename(^/)); should fail: '^/y' is not a
  WC path.
 
* svn checkout ^/subversion/trunk/build ^/y - Checked out revision
  1040465. URL 'https://svn.apache.org/repos/asf/y' doesn't exist.
  Bleach - that's just crazy.  Should fail: '^/y' is not a WC path.
 
 
 In this case code assumes that the user is trying to checkout from both
 source into current directory. i.e, If all source targets are URLs then
 the checkout target becomes .

Oh yes, you're right - there's no problem there.  I didn't notice that
svn checkout URL URL was a supported syntax.  Thanks.

- Julian




Re: Input validation observations

2010-12-02 Thread Julian Foad
On Thu, 2010-12-02 at 13:58 +0530, Noorul Islam K M wrote:
 Julian Foad julianf...@btopenworld.com writes:
 
  On Tue, 2010-11-30 at 18:42 +0530, Noorul Islam K M wrote:
 
  Julian Foad julian.f...@wandisco.com writes:
  
   I tried some potentially invalid inputs to svn a week or two ago and
   made notes on what I found.  Just posting here in case anyone wants to
   do something about one or more of them.
  
   Noorul, I'm including you in the To addresses because you said you
   were looking for more small tasks to do, so feel free to pick one of
   these if you want to.
  
  Thank you! I will start working on these one by one.
 
  Great!
 
 
   Where I end with a question mark, it means I'm not sure if we want this
   change, it's just a suggestion.
  
 * svn checkout ^/ ^/y - A asf/cxf, A asf/cxf/utils,   (Don't
   try this without being ready on the Ctrl-C or Ctrl-\!)  It seems to
   ignore ^/y and create ./(basename(^/)); should fail: '^/y' is not a
   WC path.
  
 * svn checkout ^/subversion/trunk/build ^/y - Checked out revision
   1040465. URL 'https://svn.apache.org/repos/asf/y' doesn't exist.
   Bleach - that's just crazy.  Should fail: '^/y' is not a WC path.
  
 * svn copy a ^/b c doesn't detect the mixed source types in cl, only
   in lib; should reject them at CLI level?
  
 * svn info ^/b - Not a valid URL; should be path '/b' not found
   in revision REV?
  
 * svn mkdir ^/ a - Illegal repository URL 'a'; should say can't
   mix URL and local targets?
  
 * svn mkdir a ^/ - Can't create directory
   'https:/svn.apache.org/repos/asf'; should not print the URL as if it's
   a local path.
  
 * svn mv ^/ ^/ - Cannot move path
   'https:/svn.apache.org/repos/asf' into itself; should not print the URL
   as if it's a local path.
  
 * svn update ^/a - Skipped
   'https://svn.apache.org/repos/asf/a' ...; should fail: '^/a' is not a
   WC path?
  
  
  svn help update says that the argument should be a PATH. I think it is
  right to force the user to enter local PATH.
 
  Good.  Thanks for finding that.  I agree.  That change will make some of
  my recent patch (r1040232) redundant: it will no longer need to remove
  URL targets from the array of targets, since it will just return an
  error if it finds any, like you already did in some of the other
  subcomands.
 
 
 I further looked into the code, since update accepts multiple targets,
 the program skips URL targets assuming that there might one or more
 local paths as part of targets list. The skipping part is intentional
 and I am not sure whether we should change this behavior. 

I think we should change this behaviour and make svn update throw an
error if any target is a URL.

- Julian




Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-02 Thread Daniel Shahaf
Julian Foad wrote on Thu, Dec 02, 2010 at 12:15:19 +:
 On Wed, 2010-12-01, Daniel Shahaf wrote: 
  Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +:
   (2) Doesn't the exact same race exist in *all* uses of
   svn_fs_fs__path_rev_absolute()?  There are five other calls to it,
  
  As far as I can tell, apart from open_pack_or_rev_file(), all callers
  always[1] run under the write lock.  Since pack operation take the write
  lock too, it's not possible for the result of svn_fs_fs__path_rev_absolute()
  to become outdated for those callers.
 
 OK, that's good.  No change needed then.
 
 
   -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
   +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
   +   The path is not guaranteed to remain correct after the function 
   returns,
   +   because it is possible that the revision will become packed just after
   +   this call, so the caller should re-try once if the path is not found.
   Allocate in POOL. */
  
  Sounds right.
  
  Bordering on bikeshed, I would suggest some changes:
  
  * Separate describing what the function does (Sets *PATH to FOO and
allocates in POOL) from describing the conditions the caller should
beware of / check for (sometimes the return value will be out-of-date
by the time you receive it).
  
  * Mention that the non-guarantee is not in effect if the caller has the
write lock.
 
 OK, committed in r1041056 with these improvements.
 

+1, thanks.

svn_error_t *
svn_fs_fs__path_rev_absolute(const char **path,
 svn_fs_t *fs,
 svn_revnum_t rev,
 apr_pool_t *pool);
  
  Also, svn_fs_fs__path_rev_absolute() also does, internally, a single
  repeat loop.  Theoretically this isn't needed any more now (since all
  callers either run under the write lock or retry)...
 
 Can you check the attached patch for this, please?
 
 - Julian
 
 

 Remove the re-try logic from svn_fs_fs__path_rev_absolute().  Since
 r1040832, all its callers correctly account for the possibility of an
 out-of-date result due to a concurrent packing operation.
 
 The re-try logic was introduced in r875097 and reduced but did not eliminate
 the window of opportunity for the caller to use an out-of-date result.
 
 See the email thread http://svn.haxx.se/dev/archive-2010-12/0019.shtml,
 subject Re: svn commit: r1040832 - Port a fix for a FSFS packing race.
 
 * subversion/libsvn_fs_fs/fs_fs.c
   (svn_fs_fs__path_rev_absolute): Remove the re-try logic.
 
 * subversion/libsvn_fs_fs/fs_fs.h
   (svn_fs_fs__path_rev_absolute): Update the doc string accordingly.
 --This line, and those below, will be ignored--
 
 Index: subversion/libsvn_fs_fs/fs_fs.c
 ===
 --- subversion/libsvn_fs_fs/fs_fs.c   (revision 1041339)
 +++ subversion/libsvn_fs_fs/fs_fs.c   (working copy)
 @@ -246,8 +246,6 @@ path_rev(svn_fs_t *fs, svn_revnum_t rev,
apr_psprintf(pool, %ld, rev), NULL);
  }
  
 -/* Returns the path of REV in FS, whether in a pack file or not.
 -   Allocate in POOL. */
  svn_error_t *
  svn_fs_fs__path_rev_absolute(const char **path,
   svn_fs_t *fs,
 @@ -256,45 +254,16 @@ svn_fs_fs__path_rev_absolute(const char 
  {
fs_fs_data_t *ffd = fs-fsap_data;
  
 -  if (ffd-format  SVN_FS_FS__MIN_PACKED_FORMAT)
 +  if (ffd-format  SVN_FS_FS__MIN_PACKED_FORMAT
 +  || ! is_packed_rev(fs, rev))
  {
*path = path_rev(fs, rev, pool);
 -  return SVN_NO_ERROR;
  }
 -
 -  if (! is_packed_rev(fs, rev))
 +  else
  {
 -  svn_node_kind_t kind;
 -
 -  /* Initialize the return variable. */
 -  *path = path_rev(fs, rev, pool);
 -
 -  SVN_ERR(svn_io_check_path(*path, kind, pool));
 -  if (kind == svn_node_file)
 -{
 -  /* *path is already set correctly. */
 -  return SVN_NO_ERROR;
 -}
 -  else
 -{
 -  /* Someone must have run 'svnadmin pack' while this fs object
 -   * was open. */
 -
 -  SVN_ERR(update_min_unpacked_rev(fs, pool));
 -
 -  /* The rev really should be present now. */
 -  if (! is_packed_rev(fs, rev))
 -return svn_error_createf(APR_ENOENT, NULL,
 - _(Revision file '%s' does not exist, 
 -   and r%ld is not packed),
 - svn_dirent_local_style(*path, pool),
 - rev);
 -  /* Fall through. */
 -}
 +  *path = path_rev_packed(fs, rev, pack, pool);
  }
  
 -  *path = path_rev_packed(fs, rev, pack, pool);
 -
return SVN_NO_ERROR;
  }
  

This part looks good.  I'm more concerned about a bug in the All
callers use the write lock, ... analysis than about a bug in the
above hunk.

In fact, doesn't the 

Re: 1.5.8 up for signing/testing

2010-12-02 Thread Philip Martin
Stefan Sperling s...@elego.de writes:

 On Wed, Dec 01, 2010 at 01:14:30PM -0600, Hyrum K. Wright wrote:
 1.5.8 tarballs are up for testing and signing.  The magic revision is 
 r1041089:
 http://people.apache.org/~hwright/svn/1.5.8/

 1.5.8 is missing the critial blame -g server-side memory leak crash fix.
 The trunk revisions merge cleanly into 1.5.x (apart from the property
 conflicts mentioned in the STATUS entry), so I've nominated those
 revisions for 1.5.x.

 I am running tests on 1.5.8 anyway, but would prefer cutting 1.5.9
 with the blame -g fix instead of releasing 1.5.8.

The blame fix does look like something we should release.

I've just finished running the tests on 1.5.8, so for the record:

Platform:

  Linux (Debian/stable).

Verified:

   signatures, MD5 checksums, SHA1 checksums
   only expected difference compared to tags/1.5.8

Tested:

  1.5.8 tarball with local dependencies (not deps tarball)
  (local, svn, serf, neon) x (fsfs, bdb) + swig-pl/py/rb + javahl + svn+sasl

Dependencies:

  subversion : 1.5.8
  apache2-threaded-dev : 2.2.9-10+lenny8
  libapr1-dev : 1.4.2-3~bpo50+2
  libaprutil1-dev : 1.2.12+dfsg-8+lenny5
  libdb4.6-dev : 4.6.21-11
  libneon27-dev : 0.28.2-6.1
  libsasl2-dev : 2.1.22.dfsg1-23+lenny1
  libsqlite3-dev : 3.6.21-2~bpo50+1
  swig : 1.3.36-1
  perl : 5.10.0-19lenny2
  python2.5-dev : 2.5.2-15+lenny1
  ruby1.8-dev : 1.8.7.72-3lenny1
  openjdk-6-jdk : 6b11-9.1+lenny2
  serf : tags/0.3.1

Results:

  All tests pass.

Signatures:

subversion-1.5.8.tar.bz2

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iQEcBAABCAAGBQJM959UAAoJEHbXiOHtGlmcRi0H/3ulIEf/O0Gdo+IAJ71f9WbG
FnNUzOt1AWRi9GXo5EqQRciuSy7atKF+N3gQOLcXLVEiox6B/xnv2kErKmSpFBOn
O9RGbd1QUxW4IKo6XtvR7L5taVQqWLeXdbhaMF9xmoVkOeMKDy22/hUsFpoizmIN
3GYt/2Clu0PJRy4JOdirM/VtJyeEDJXlpgbANvXipFXvzdwFWLR8fJzuyNOSrxTT
Pclj7czadJDOklxqmDO26DIbCp6PPqVPVpXZAaDKV+XSiqcG0uq5hXxYfxY0yY++
gGouXDEBG3JXkxPjXyQkmQnwtTfDoYqMty7hh4YIN5jVHqgdUv0mzb7fCUbiqjY=
=YvJv
-END PGP SIGNATURE-

subversion-1.5.8.tar.gz

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iQEcBAABCAAGBQJM959wAAoJEHbXiOHtGlmcotoH/j3nZ8+PzJcBGwrCG+6qhl3y
PxM54qe8hscV6Um7UUND0xL0SdrcKqbW10LMcBbquAXrpIPY51nvT3DumzZGf1wO
iTYNj0FiyOUM0OSMEqcG8xD3qf8B2WRivEqC+fhbHP4byLIYK6fZf7wGS7I/v2+B
86Ro/3TnQGJhvasKlyMU+HX850rvPt3mlsGzKmVBR0mBM+RCdslHYmtA5fPQoKxQ
YyFCKrbRR6r69dEqHmv4Xa/zRO+IgvoF4FVCqObtK1EXg+ZHYbD+h2xLuwx8yIfz
9yGzeH/d4OImT1t6iHvSgD+CEfnF1/dLXhTI53BZmFF0rvvSXT32fxqtmUL9VgQ=
=5mKA
-END PGP SIGNATURE-


-- 
Philip


Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-02 Thread Johan Corveleyn
On Thu, Dec 2, 2010 at 6:43 AM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 [ finally getting back to this mail; having slept on it, etc. ]

 Johan Corveleyn wrote on Wed, Dec 01, 2010 at 13:34:48 +0100:
 On Wed, Dec 1, 2010 at 11:44 AM, Daniel Shahaf d...@daniel.shahaf.name 
 wrote:
  Johan Corveleyn wrote on Wed, Dec 01, 2010 at 10:05:29 +0100:
  On Wed, Dec 1, 2010 at 3:38 AM, Daniel Shahaf d...@daniel.shahaf.name 
  wrote:
   Johan Corveleyn wrote on Wed, Dec 01, 2010 at 00:25:27 +0100:
   I am now considering to abandon the tokens-approach, for the following 
   reasons:
   ...
   So, unless someone can convince me otherwise, I'm probably going to
   stop with the token approach. Because of 2), I don't think it's worth
   it to spend the effort needed for 1), especially because the
   byte-based approach already works.
  
  
   In other words, you're saying that the token-based approach: (b) won't 
   be
   as fast as the bytes-based approach can be, and (a) requires much effort
   to be spent on implementing the reverse reading of tokens?  (i.e.,
   a backwards gets())
 
  Yes.
 
  The reverse reading is quite hard (in the diff_file.c case) because of
  the chunked reading of files. A line may be split by a chunk
  boundary (it may even be split in the middle of an eol sequence
  (between \r and \n), and it all still needs to be
  canonicalized/normalized correctly (that's why we'll also need a
  reverse normalization function). The current forward get_next_token
  does this very well, and the reverse should be analogous, but I just
  find it hard to reason about, and to get it implemented correctly. It
  will take me a lot of time, and with a high probability of introducing
  subtle bugs for special edge cases.
 
 
  OK, so a reverse get_next_token() could be tricky to implement.  But,
  worse, won't having it mean that implementors of svn_diff_fns_t can't
  have streamy access to their source?  Since they would be required to
  provide sometimes a token from the start and sometimes a token from the
  end...
 
  Well, it's not streamy in our usual sense of the word, but it's
  double-streamy (no one requests the middle byte until either all
  bytes before or all bytes after it were transmitted already)

 Oooh, I hadn't considered other implementors besides the internal
 diff_memory.c and diff_file.c.

 Just to clarify: diff_file.c and diff_memory.c are the only implementors
 of svn_diff_fns_t in the core code, correct?

Yes.

 You're right, that would impose
 additional constraints on other implementors. I don't know if being
 non-streamy (or less streamy anyway) would be problem ...


 We should have asked this before, but:

 Do we know who are the other implementors / typical use-cases of
 svn_diff_fns_t?

Yeah, was wondering about this too. Are there in fact other
implementors? Maybe plugins for IDE's or something? How could we find
out? How public is this API in fact?

For blame, I know all revisions are first converted to full-texts,
which are written out to temp files. Then diff_file.c works on those
two files.

 In my last commit on the -tokens branch, I added a flag open_at_end
 to the datasource_open function (function of svn_diff_fns_t), so the
 datasource could be opened at the end. Also, other supporting
 functions were needed: token_pushback_prefix, token_pushback_suffix
 (to push back tokens that were read too far when scanning for
 prefix/suffix) and the dreaded datasource_get_previous_token. Anyway,
 the high-level idea was:

 1) Open datasources at the end.

 2) Scan for identical suffix (getting tokens in reverse).

 3) At first non-match: pushback suffix token, and note where suffix starts.

 4) Open datasources at the beginning.

 5) Scan for identical prefix (getting tokens normally, but without hash).

 6) At first non-match: pushback prefix token.

 7) Run the old algorithm: getting tokens forward, but with hash. The
 get_next_token function should stop (return NULL for token) when it
 arrives at the suffix.


 Sidestep: I just now realized that I probably don't need to have the
 reverse normalization algorithm for implementing get_previous_token.
 The call to the normalization function in get_next_token is (I think)
 only needed to be able to calculate the hash. But since
 get_previous_token doesn't need to calculate hashes, I may be able to
 get by without normalization there. I'd only need to normalize inside
 token_compare, and I *think* I can just to that forwardly, instead
 of backwards. Just thinking out loud here ...


 Is normalization function the thing that collapses newlines and
 whitespaces if -x-w or -x--ignore-eol-style is in effect?  If so,
 another purpose of calling that might be to make suffixes that are
 not bytewise-identical, but only modulo-whitespace-identical, also
 be lumped into the identical suffix.

 (Haven't dived into the code yet; the above is based on my understanding
 of your high-level descriptions)

Yes, it's svn_diff__normalize_buffer in 

Re: 1.5.8 up for signing/testing

2010-12-02 Thread C. Michael Pilato
On 12/02/2010 01:11 AM, C. Michael Pilato wrote:
 On 12/01/2010 02:14 PM, Hyrum K. Wright wrote:
 1.5.8 tarballs are up for testing and signing.  The magic revision is 
 r1041089:
 http://people.apache.org/~hwright/svn/1.5.8/
 
 Summary:
 
+0 to release, pending review of the ra_serf situation (see below).

In light of discussion elsethread, I'm upgrading this to +1, but with the
*recommendation* that we scuttle this release and retry with a version that
includes the 'blame -g' fixes (and perhaps some 'configure'-time warning
about building against a Serf newer than 0.3.1.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand


Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-02 Thread Daniel Shahaf
Julian Foad wrote on Thu, Dec 02, 2010 at 14:33:19 +:
 On Thu, 2010-12-02 at 14:56 +0200, Daniel Shahaf wrote:
  Julian Foad wrote on Thu, Dec 02, 2010 at 12:15:19 +:
   Remove the re-try logic from svn_fs_fs__path_rev_absolute().  Since
   r1040832, all its callers correctly account for the possibility of an
   out-of-date result due to a concurrent packing operation.
   
   The re-try logic was introduced in r875097 and reduced but did not 
   eliminate
   the window of opportunity for the caller to use an out-of-date result.
   
   See the email thread http://svn.haxx.se/dev/archive-2010-12/0019.shtml,
   subject Re: svn commit: r1040832 - Port a fix for a FSFS packing race.
   
   * subversion/libsvn_fs_fs/fs_fs.c
 (svn_fs_fs__path_rev_absolute): Remove the re-try logic.
   
   * subversion/libsvn_fs_fs/fs_fs.h
 (svn_fs_fs__path_rev_absolute): Update the doc string accordingly.

  
  In fact, doesn't the correctness of this change depend on the fact that
  svn_fs_fs__with_write_lock() calls update_min_unpacked_rev() before
  executing the body() callback?
 
 Yes, it does.  Do you think it shouldn't?
 

I'm not sure whether it should or shouldn't depend.  The change IS
correct, but it is only correct because the ffd-min_unpacked_rev is
known to be up-to-date whenever the write lock is held.

That's a bit of a spaghetti effect there.  Now, is there another such
effect that means the patch is wrong?  (I haven't come up with one yet,
but that doesn't mean there isn't any.)

(Otherwise we don't know whether there
  is a concurrent packer, or just ffd-min_unpacked_rev is stale.)
 
 You know, I'm not too clear on the design intention here.  I would have
 assumed it would be the job of all the fs_fs.c code to ensure that
 ffd-min_unpacked_rev is never stale (due to its own actions) at any
 point where it matters.  But the way it does that seems to be to re-read
 the file in several places, instead of simply updating
 ffd-min_unpacked_rev when it writes the file.  Maybe the idea was that
 that method would pick up both internal and concurrent (external)
 changes in the same way - I don't know.  Seems odd.
 

Yes, that was the idea.  If two processes access the filesystem at the
same time, and one of them calls svn_fs_pack(), then the other's
ffd-min_unpacked_rev would be stale.

It's the same story with ffd-youngest_rev, by the way; compare
ensure_revision_exists(), which I believe predates the packing logic.

 Do you have an idea whether something should be done differently?  I
 don't, not without studying it further.
 

I'm not sure.  Ultimately, ffd-min_unpacked_rev is just a cache, so we
have to update it when we're about to do something that depends on its
value.

In the meantime, let me at least attempt to define the something which
perhaps should be done differently:

Given the way packing works today, it's possible for a revision file to
stop existing where you expected it to exist.  Today the code informs
itself of that situation by re-reading min-packed-rev and retrying.
(Further up the stack, the code calculating the offset to seek to also
needs to know whether it's seeking a pack file or a revision file.)

 I note that the following comment in pack_shard() is not quite right:
 
   /* Update the min-unpacked-rev file to reflect our newly packed shard.
* (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
 
 That may or may not be true, but it doesn't seem like this function has
 any right to make that assertion.
 

We could remove the comment and have that function call 
update_min_unpacked_rev().
(This would also save us a failed disk access later.)

But is it strictly *necessary* to do so?  I think not: by not calling
update_min_unpacked_rev(), we cause ffd-min_unpacked_rev to become
stale --- a situation which the code has to account for anyway.

 And in pack_revprop_shard(), it's the same, verbatim:
 
   /* Update the min-unpacked-rev file to reflect our newly packed shard.
 
 should say 'the min-unpacked-revprop file';
 
* (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
 
 and that second line looks completely bogus in this context.
 

Yes, whoever did the copy-paste forgot to update some places.

 
   Index: subversion/libsvn_fs_fs/fs_fs.h
   ===
   --- subversion/libsvn_fs_fs/fs_fs.h   (revision 1041339)
   +++ subversion/libsvn_fs_fs/fs_fs.h   (working copy)
   @@ -411,8 +411,10 @@ svn_error_t *svn_fs_fs__move_into_place(
   Allocate *PATH in POOL.

   Note: If the caller does not have the write lock on FS, then the path 
   is
   -   not guaranteed to remain correct after the function returns, because 
   the
   -   revision might become packed just after this call. */
   +   not guaranteed to be correct or to remain correct after the function
   +   returns, because the revision might become packed before or after this
   +   call.  If a file 

Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-02 Thread Julian Foad
Hi Johan.

I've just read the whole of this thread.

I didn't quite understand your original point (2) that token-based
suffix scanning will not be as fast as byte-based suffix scanning.
Sure it won't, but is there any reason you mentioned suffix scanning
there specifically?  The same is true of prefix scanning, of course.
And both of them could be fast enough, I assume, if you disable the hash
calculation in the get token callbacks like you were talking about.

But I don't think that necessarily affects the main point.  It looks
like you've thoroughly investigated using a token based approach.  Thank
you for doing so.  My initial feeling that it was worth investigating
was in the hope that you might find some fairly straightforward and
self-contained modification to the existing token-handling layer.  I
think the result of this investigation, in which you needed to add
token-fetch-backwards callbacks and so on, shows that this approach is
too complex.  I don't want to see a complex implementation.  Therefore I
support your inclination to abandon that approach and use the byte-wise
approach instead.

- Julian




Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-02 Thread Julian Foad
Daniel Shahaf wrote:
 Julian Foad wrote on Thu, Dec 02, 2010 at 14:33:19 +:
  I note that the following comment in pack_shard() is not quite right:
  
/* Update the min-unpacked-rev file to reflect our newly packed shard.
 * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
  
  That may or may not be true, but it doesn't seem like this function has
  any right to make that assertion.
 
 We could remove the comment and have that function call 
 update_min_unpacked_rev().
 (This would also save us a failed disk access later.)
 
 But is it strictly *necessary* to do so?  I think not: by not calling
 update_min_unpacked_rev(), we cause ffd-min_unpacked_rev to become
 stale --- a situation which the code has to account for anyway.
 
  And in pack_revprop_shard(), it's the same, verbatim:
  
/* Update the min-unpacked-rev file to reflect our newly packed shard.
  
  should say 'the min-unpacked-revprop file';
  
 * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
  
  and that second line looks completely bogus in this context.
  
 
 Yes, whoever did the copy-paste forgot to update some places.

First step: this patch fixes the comments.  Good to commit?

[[[
Index: subversion/libsvn_fs_fs/fs_fs.c
===
--- subversion/libsvn_fs_fs/fs_fs.c (revision 1041350)
+++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
@@ -7606,8 +7606,8 @@ pack_shard(const char *revs_dir,
   SVN_ERR(svn_io_set_file_read_only(manifest_file_path, FALSE, pool));
 
   /* Update the min-unpacked-rev file to reflect our newly packed shard.
-   * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
-   */
+   * (This doesn't update ffd-min_unpacked_rev.  That will be updated by
+   * open_pack_or_rev_file() when necessary.) */
   final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REV, iterpool);
   SVN_ERR(svn_stream_open_unique(tmp_stream, tmp_path, fs_path,
svn_io_file_del_none, iterpool, iterpool));
@@ -7674,9 +7674,8 @@ pack_revprop_shard(svn_fs_t *fs,
   SVN_ERR(svn_sqlite__insert(NULL, stmt));
 }
 
-  /* Update the min-unpacked-rev file to reflect our newly packed shard.
-   * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
-   */
+  /* Update the min-unpacked-revprop file to reflect our newly packed shard.
+   * (This doesn't update ffd-min_unpacked_revprop.) */
   final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REVPROP, iterpool);
   SVN_ERR(svn_stream_open_unique(tmp_stream, tmp_path, fs_path,
  svn_io_file_del_none, iterpool, iterpool));
]]]

- Julian




Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-02 Thread Daniel Shahaf
Julian Foad wrote on Thu, Dec 02, 2010 at 15:34:48 +:
 First step: this patch fixes the comments.  Good to commit?
 
 [[[
 Index: subversion/libsvn_fs_fs/fs_fs.c
 ===
 --- subversion/libsvn_fs_fs/fs_fs.c   (revision 1041350)
 +++ subversion/libsvn_fs_fs/fs_fs.c   (working copy)
 @@ -7606,8 +7606,8 @@ pack_shard(const char *revs_dir,
SVN_ERR(svn_io_set_file_read_only(manifest_file_path, FALSE, pool));
  
/* Update the min-unpacked-rev file to reflect our newly packed shard.
 -   * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
 -   */
 +   * (This doesn't update ffd-min_unpacked_rev.  That will be updated by
 +   * open_pack_or_rev_file() when necessary.) */

Didn't you mean s/open_pack_or_rev_file/update_min_unpacked_rev/?

final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REV, iterpool);
SVN_ERR(svn_stream_open_unique(tmp_stream, tmp_path, fs_path,
 svn_io_file_del_none, iterpool, 
 iterpool));
 @@ -7674,9 +7674,8 @@ pack_revprop_shard(svn_fs_t *fs,
SVN_ERR(svn_sqlite__insert(NULL, stmt));
  }
  
 -  /* Update the min-unpacked-rev file to reflect our newly packed shard.
 -   * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
 -   */
 +  /* Update the min-unpacked-revprop file to reflect our newly packed shard.
 +   * (This doesn't update ffd-min_unpacked_revprop.) */
final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REVPROP, iterpool);
SVN_ERR(svn_stream_open_unique(tmp_stream, tmp_path, fs_path,
   svn_io_file_del_none, iterpool, iterpool));
 ]]]
 

+1

 - Julian
 
 


Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-02 Thread Julian Foad
On Thu, 2010-12-02 at 17:40 +0200, Daniel Shahaf wrote:
 Julian Foad wrote on Thu, Dec 02, 2010 at 15:34:48 +:
  First step: this patch fixes the comments.  Good to commit?
  
  [[[
  Index: subversion/libsvn_fs_fs/fs_fs.c
  ===
  --- subversion/libsvn_fs_fs/fs_fs.c (revision 1041350)
  +++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
  @@ -7606,8 +7606,8 @@ pack_shard(const char *revs_dir,
 SVN_ERR(svn_io_set_file_read_only(manifest_file_path, FALSE, pool));
   
 /* Update the min-unpacked-rev file to reflect our newly packed shard.
  -   * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
  -   */
  +   * (This doesn't update ffd-min_unpacked_rev.  That will be updated by
  +   * open_pack_or_rev_file() when necessary.) */
 
 Didn't you mean s/open_pack_or_rev_file/update_min_unpacked_rev/?

I was just modifying the existing comment and assuming it had some truth
or meaning in mentioning open_pack_or_rev_file().  Not sure what,
exactly.

 final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REV, iterpool);
 SVN_ERR(svn_stream_open_unique(tmp_stream, tmp_path, fs_path,
  svn_io_file_del_none, iterpool, 
  iterpool));
  @@ -7674,9 +7674,8 @@ pack_revprop_shard(svn_fs_t *fs,
 SVN_ERR(svn_sqlite__insert(NULL, stmt));
   }
   
  -  /* Update the min-unpacked-rev file to reflect our newly packed shard.
  -   * (ffd-min_unpacked_rev will be updated by open_pack_or_rev_file().)
  -   */
  +  /* Update the min-unpacked-revprop file to reflect our newly packed 
  shard.
  +   * (This doesn't update ffd-min_unpacked_revprop.) */
 final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REVPROP, 
  iterpool);
 SVN_ERR(svn_stream_open_unique(tmp_stream, tmp_path, fs_path,
svn_io_file_del_none, iterpool, 
  iterpool));
  ]]]
  
 
 +1

r1041422.  If you the point above still needs changing, please do so or
let me know.

- Julian




Re: 1.5.8 up for signing/testing

2010-12-02 Thread Stefan Sperling
On Thu, Dec 02, 2010 at 08:54:35AM -0500, C. Michael Pilato wrote:
 On 12/02/2010 07:38 AM, Stefan Sperling wrote:
  You need to use serf 0.3.x with Subversion 1.5.
  Subversion 1.6.x includes changes to make it compatible with newer serf
  versions, but those haven't been merged to 1.5.x.
  1.5.x works fine with serf 0.3.x.
 
 Great, thanks.  Now, if we wind up rolling a new release anyway, then can we
 enforce this serf version cap at build time for 1.5.x?

Makes sense to me.


Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-02 Thread Bill Tutt
Note: This email only tangentially relates to svn diff and more about
reverse token scanning in general:

As someone who has implemented suffix reverse token scanning before:

* It simply isn't possible in DBCS code pages. Stick to byte only here.
   SBCS and UTF-16 make reverse token stuff relatively
straightforward. UTF-8 is a little trickier but still tractable.
   At least UTF-8 is tractable in a way that DBCS isn't. You always
know which part of a Unicode code point you are in. (i.e. byte 4 vs.
byte 3 vs. etc...)

* I would recommend only supporting a subset of the diff options for
reverse token scanning. i.e. ignore whitespace/ignore eol but skip
ignore case (if svn has that, I forget...)
   If tokens include keyword expansion operations then stop once you
hit one. The possible source of bugs outways the perf gain in my mind
here.
* Suffix scanning does really require a seekable stream, if it isn't
seekable then don't perform the reverse scanning.  It is only an
optimization after all.

Additional ignore whitespace related comment:
* IIRC, Perforce had an interesting twist on ignoring whitespace. You
could ignore just line leading/ending whitespace instead of all
whitespace differences but pay attention to any whitespace change
after the trim operation had completed.

e.g.:
*     aaa bbb    vs aaa bbb would compare as equal
*     aaa  bbb   vs aaa  bbb would compare as equal
*     aaa  bbb   vs aaa bbb would compare as non-equal due to the
white space change in the middle of the line

Fyi,
Bill


Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-02 Thread C. Michael Pilato
On 12/02/2010 12:18 PM, Bill Tutt wrote:
[...]

. o O ( Who the heck is this Bill Tutt guy? )

Nice to read you again, Bill!

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature


Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-02 Thread Johan Corveleyn
On Thu, Dec 2, 2010 at 4:21 PM, Julian Foad julian.f...@wandisco.com wrote:
 Hi Johan.

 I've just read the whole of this thread.

 I didn't quite understand your original point (2) that token-based
 suffix scanning will not be as fast as byte-based suffix scanning.
 Sure it won't, but is there any reason you mentioned suffix scanning
 there specifically?  The same is true of prefix scanning, of course.

I mentioned suffix specifically, because for prefix scanning the bytes
approach doesn't have that same advantage. It has to count the number
of lines, to have a correct line offset for the rest of the normal
algorithm. So the byte-based prefix scanning needs to check for
eol-characters just like the token-based version. That means that
theoretically (implementation details aside) both approaches should be
equally fast for prefix scanning.

But for suffix scanning, it seems I don't need to know the number of
lines, so the bytes approach can just ignore line structure for the
suffix.

 And both of them could be fast enough, I assume, if you disable the hash
 calculation in the get token callbacks like you were talking about.

Granted, the difference could be minimal (I haven't actually tested).

However, an additional two comparisons (against \r and \n), for every
byte in the suffix of say 500Kb, for say 2000 diffs of a blame
operation, it might cost another couple of seconds :-).

 But I don't think that necessarily affects the main point.  It looks
 like you've thoroughly investigated using a token based approach.  Thank
 you for doing so.  My initial feeling that it was worth investigating
 was in the hope that you might find some fairly straightforward and
 self-contained modification to the existing token-handling layer.  I
 think the result of this investigation, in which you needed to add
 token-fetch-backwards callbacks and so on, shows that this approach is
 too complex.  I don't want to see a complex implementation.  Therefore I
 support your inclination to abandon that approach and use the byte-wise
 approach instead.

Yep, it just seems too complex (as also confirmed by Bill Tutt in the
other mail).

The pushback stuff (which is inevitable, because we will always read
one token too far to discover the non-match, and that line needs to be
reread to calculate the hash) is also rather dirty/annoying (adds
additional svn_diff_fns_t-ness).

It was definitely worth investigating, if only to give me some peace
of mind that we have considered the options (and it was a great
learning experience).

Cheers,
-- 
Johan


Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-02 Thread Johan Corveleyn
On Thu, Dec 2, 2010 at 6:18 PM, Bill Tutt b...@tutts.org wrote:
 Note: This email only tangentially relates to svn diff and more about
 reverse token scanning in general:

 As someone who has implemented suffix reverse token scanning before:

Thanks for the input. It's nice to see other people have also
struggled with this :-).

 * It simply isn't possible in DBCS code pages. Stick to byte only here.
    SBCS and UTF-16 make reverse token stuff relatively
 straightforward. UTF-8 is a little trickier but still tractable.
   At least UTF-8 is tractable in a way that DBCS isn't. You always
 know which part of a Unicode code point you are in. (i.e. byte 4 vs.
 byte 3 vs. etc...)

Ok, this further supports the decision to focus on the byte-based
approach. We'll only consider stuff identical if all bytes are
identical. That's the simplest route, and since it's only an
optimization anyway ...

 * I would recommend only supporting a subset of the diff options for
 reverse token scanning. i.e. ignore whitespace/ignore eol but skip
 ignore case (if svn has that, I forget...)

svn diff doesn't have an ignore-case option, so that's ok :-).

    If tokens include keyword expansion operations then stop once you
 hit one. The possible source of bugs outways the perf gain in my mind
 here.

Haven't thought about keyword expansion yet. But as you suggest: I'm
not going to bother doing special stuff for (expanded) keywords. If we
find a mismatch, we'll stop with the optimized scanning, and fall back
to the default algorithm.

 * Suffix scanning does really require a seekable stream, if it isn't
 seekable then don't perform the reverse scanning.  It is only an
 optimization after all.

Hm, yes, we'll need to be careful about that. I'll start another mail
thread asking for known implementors of the svn_diff_fns_t functions,
to find out whether seeking around like that for suffix would be
supported.

 Additional ignore whitespace related comment:
 * IIRC, Perforce had an interesting twist on ignoring whitespace. You
 could ignore just line leading/ending whitespace instead of all
 whitespace differences but pay attention to any whitespace change
 after the trim operation had completed.

 e.g.:
 *     aaa bbb    vs aaa bbb would compare as equal
 *     aaa  bbb   vs aaa  bbb would compare as equal
 *     aaa  bbb   vs aaa bbb would compare as non-equal due to the
 white space change in the middle of the line

Cool (svn doesn't have that option). But I'm not sure what that would
be useful for (as a user, I can't immediately imagine an important use
case). Anyway, could still be a nice option...

Cheers,
-- 
Johan


Re: gpg-agent branch treats PGP passphrase as repository password?

2010-12-02 Thread Stefan Sperling
On Wed, Dec 01, 2010 at 06:29:06PM -0500, Dan Engel wrote:
 On Wed, 2010-12-01 at 14:08 +0100, Stefan Sperling wrote:
  However, I still see a potential risk here because the name
  gpg-agent
  is very misleading. It violates the principle of least surprise.
  How can we prevent users misunderstanding what Subversion's gpg-agent
  feature does from entering their private pgp key passphrase (which
  will
  then be sent to the server)?  Can we control the prompt printed by
  gpg-agent? (Enter your Subversion password, NOT your secret PGP
  passphrase!) 
 
 
 Yes, the agent protocol provides for customized prompts, and the patch
 itself refers to the Subversion repository server (or something like
 that) in that prompt.

If we can control the prompt, then let's just make the prompt
clear enough so that people who read it don't accidentally enter
their pgp passphrase. That will make the mistake much less likely.
If people don't read the prompt, well, then we cannot help them either.

Thanks for pointing out how gpg-agent really works.

Stefan


Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-02 Thread Daniel Shahaf
Johan Corveleyn wrote on Thu, Dec 02, 2010 at 21:21:20 +0100:
 On Thu, Dec 2, 2010 at 6:18 PM, Bill Tutt b...@tutts.org wrote:
     If tokens include keyword expansion operations then stop once you
  hit one. The possible source of bugs outways the perf gain in my mind
  here.
 
 Haven't thought about keyword expansion yet. But as you suggest: I'm
 not going to bother doing special stuff for (expanded) keywords. If we
 find a mismatch, we'll stop with the optimized scanning, and fall back
 to the default algorithm.

Only for the suffix scanning?  Or also for the prefix scanning?

(I think you meant the former.)  For prefix scanning, I think it's
common to have $Id$ or similar near the top of the file, so being able
to collect matching lines past it too would be beneficial.


Re: 1.7.x - merge now accesses all files in WC?

2010-12-02 Thread Daniel Becroft
On Wed, Dec 1, 2010 at 7:34 PM, Philip Martin philip.mar...@wandisco.comwrote:

 Daniel Becroft djcbecr...@gmail.com writes:

  Under 1.7.x, the following file(s) are accessed (merging the same
 revision
  as above):
 
   - .svn\wc.db
   - Every versioned file in the working copy

 What does accessed mean?  stat(), open(), read(), write()?


I'm not sure. I'm pretty sure it's not read() or write(), but it's most
likely stat().


 There is a known issue that involves a stat() on files when accessing
 the metadata (svn_wc__db_pdh_parse_local_abspath).  Is that what you are
 seeing?


I've just managed to build/install trunk on my ubuntu box at home (first
application I've ever compiled on it - yey!).

What debugging tools would you recommend to investigate this further? I've
seen output posted that lists function names, and time spent on each.


  I can't see any reason why all these files would need to be accessed. I
 seem
  to recall some discussion about preventing/warning merging into modified
  working copies, could this be the cause?

 The new check is for a single revision working copy, not an unmodified
 one.


Ah, that makes more sense, I guess. Checking for an unmodified WC would mean
that the ability to run consecutive 'svn merge -c' commands would be
removed.

Cheers,
Daniel B.


Re: 1.7.x - merge now accesses all files in WC?

2010-12-02 Thread Hyrum K. Wright
On Thu, Dec 2, 2010 at 3:43 PM, Daniel Becroft djcbecr...@gmail.com wrote:
 On Wed, Dec 1, 2010 at 7:34 PM, Philip Martin 
 philip.mar...@wandisco.comwrote:
...
  I can't see any reason why all these files would need to be accessed. I
 seem
  to recall some discussion about preventing/warning merging into modified
  working copies, could this be the cause?

 The new check is for a single revision working copy, not an unmodified
 one.


 Ah, that makes more sense, I guess. Checking for an unmodified WC would mean
 that the ability to run consecutive 'svn merge -c' commands would be
 removed.

You can run 'svn merge -c17,85,90,123' if you need to merge multiple
revs in the same operation.

-Hyrum


1.5.9 up for testing/signing

2010-12-02 Thread Hyrum K. Wright
1.5.9 tarballs are up for testing and signing.  The magic revision is r1041577:
http://people.apache.org/~hwright/svn/1.5.9/

To sign the release, please input your signatures using the script here:
http://work.hyrumwright.org/pub/svn/collect_sigs.py

(The script worked pretty well for 1.6.15, but if you discover any
bugs please let me know.)

Testing by enthusiastic users is also welcomed (but remember that this
is not yet a blessed release, with all that that implies).  If you are
a package maintainer, please do not included this release in your
distribution until after it has been formally released.

I'd like to collect all the signatures in time to do a release by December 6.

-Hyrum


Re: 1.7.x - merge now accesses all files in WC?

2010-12-02 Thread Daniel Becroft
On Fri, Dec 3, 2010 at 7:50 AM, Hyrum K. Wright 
hyrum_wri...@mail.utexas.edu wrote:

 On Thu, Dec 2, 2010 at 3:43 PM, Daniel Becroft djcbecr...@gmail.com
 wrote:
  On Wed, Dec 1, 2010 at 7:34 PM, Philip Martin 
 philip.mar...@wandisco.comwrote:
 ...
   I can't see any reason why all these files would need to be accessed.
 I
  seem
   to recall some discussion about preventing/warning merging into
 modified
   working copies, could this be the cause?
 
  The new check is for a single revision working copy, not an unmodified
  one.
 
 
  Ah, that makes more sense, I guess. Checking for an unmodified WC would
 mean
  that the ability to run consecutive 'svn merge -c' commands would be
  removed.

 You can run 'svn merge -c17,85,90,123' if you need to merge multiple
 revs in the same operation.


Very true, but I've had some instances where it's easier to do one merge -c
(postponing conflicts), resolve, and then do the next one. Not all the time,
but occasionally.

Cheers,
Daniel B.


Re: 1.7.x - merge now accesses all files in WC?

2010-12-02 Thread Daniel Shahaf
Daniel Becroft wrote on Fri, Dec 03, 2010 at 08:06:40 +1000:
 On Fri, Dec 3, 2010 at 7:50 AM, Hyrum K. Wright 
 hyrum_wri...@mail.utexas.edu wrote:
 
  On Thu, Dec 2, 2010 at 3:43 PM, Daniel Becroft djcbecr...@gmail.com
  wrote:
   On Wed, Dec 1, 2010 at 7:34 PM, Philip Martin 
  philip.mar...@wandisco.comwrote:
  ...
I can't see any reason why all these files would need to be accessed.
  I
   seem
to recall some discussion about preventing/warning merging into
  modified
working copies, could this be the cause?
  
   The new check is for a single revision working copy, not an unmodified
   one.
  

There is a flag that bypasses the new check:

  --allow-mixed-revisions  : Allow merge into mixed-revision working copy.
 Use of this option is not recommended!
 Please run 'svn update' instead.
  
   Ah, that makes more sense, I guess. Checking for an unmodified WC would
  mean
   that the ability to run consecutive 'svn merge -c' commands would be
   removed.
 
  You can run 'svn merge -c17,85,90,123' if you need to merge multiple
  revs in the same operation.
 
 
 Very true, but I've had some instances where it's easier to do one merge -c
 (postponing conflicts), resolve, and then do the next one. Not all the time,
 but occasionally.
 
 Cheers,
 Daniel B.


Re: 1.5.9 up for testing/signing

2010-12-02 Thread C. Michael Pilato
On 12/02/2010 04:55 PM, Hyrum K. Wright wrote:
 1.5.9 tarballs are up for testing and signing.  The magic revision is 
 r1041577:
 http://people.apache.org/~hwright/svn/1.5.9/

Summary:

   +1 to release.

Platform:

   Ubuntu 10.04 (lucid) Linux.

Tested:

   1.5.9 tarball with local dependencies (not from deps tarball)
   ((neon,local,svn) x fsfs) + ((fs,fs_base) x bdb) + py + pl + rb + javahl

   Verified that 1.5.9 deps were logically equivalent to 1.5.8 deps.

Results:

   All tests pass.

MD5 Checksums:

   973e87cd8aa64f44ed6b4e569c635abc  subversion-1.5.9.tar.gz
   1154485516cfb59db0f009b2d1a5c0cc  subversion-deps-1.5.9.tar.gz
   d8de4f33decb9e608c8cfd43288ebe89  subversion-1.5.9.tar.bz2
   50b4559ad6cef3caa810b80f1c18679c  subversion-deps-1.5.9.tar.bz2

GPG Signatures (left-aligned because Hyrum likes it that way):

::: subversion-1.5.9.tar.gz :::
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEABECAAYFAkz4F3MACgkQokEGqRcG/W7QjgCgko2Jj6TGMydTDCIavuZn0z8e
QKEAnR/lbg2oyBW23c6r9q02qTng8zqB
=lVxF
-END PGP SIGNATURE-

::: subversion-1.5.9.tar.bz2 :::
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEABECAAYFAkz4F3UACgkQokEGqRcG/W6s8QCeLKN0e8Y/1i+NW2QsPOIBaQeS
80AAoMCVeRcMGhxpN7kVqOBHICJsKiYu
=df9z
-END PGP SIGNATURE-

::: subversion-deps-1.5.9.tar.gz :::
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEABECAAYFAkz4F3YACgkQokEGqRcG/W55bACePkiq3tLax9+mQ/2NifEe+Nzb
dVcAnRPJTMfZgBCV/HbdBnJERR+FBD5o
=A2u7
-END PGP SIGNATURE-

::: subversion-deps-1.5.9.tar.bz2 :::
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEABECAAYFAkz4F3gACgkQokEGqRcG/W73awCg0IdiPcuvnPcyvpBreMScXVbz
exEAoNDVkIX+b3J7uSuDTlXwvCTneH1a
=17vb
-END PGP SIGNATURE-

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand


Re: 1.7.x - merge now accesses all files in WC?

2010-12-02 Thread Peter Samuelson

[Daniel Becroft]
 I've just managed to build/install trunk on my ubuntu box at home (first
 application I've ever compiled on it - yey!).
 
 What debugging tools would you recommend to investigate this further? I've
 seen output posted that lists function names, and time spent on each.

The obvious start is 'strace', as in 'strace svn merge ...'.  It spits
out every system call.  There's a lot of noise up front as it's loading
shared libraries and such, but it should still be obvious what we're
doing when crawling the tree (stat / lstat, open, etc.).
-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/


Implementations of svn_diff_fns_t?

2010-12-02 Thread Johan Corveleyn
Hi,

This question came up during recent discussion about the
diff-optimizations-tokens branch [1]:

What are the known implementors of svn_diff_fns_t, the vtable of
svn_diff callback functions in subversion/include/svn_diff.h? Besides
the internal diff_memory.c and diff_file.c that is.

Are there other implementations out there of these callbacks? IDE
plugins, third-party tools, ...?

Cheers,
-- 
Johan

[1] http://svn.haxx.se/dev/archive-2010-12/0057.shtml


Re: subversion cross compile (arm)

2010-12-02 Thread Takács András
Hi!

I continued the work on my issue.
It seems to be a memory allocation or over-writing problem.

There is the section (see between HEADER_TEXT and HEADER_TEXT OK)
where it calling representation_string, which has to generate the
'text: ...' string. I printed out the input parameters.
Later, there is a totally different string (at the parsing section, at
the bottom of the log).

If I'm changing the order of the debug print lines (rev, offset, size,
etc) The value of these are changing too (everything else is the
same).
Ie: If there is only the rev: ... debug line, the result is:
rev 4618626049922564096

My imagine is that, this is a memory corruption bug, but I'm unable to
compile valgrind, or any other high level debug utility to my
development board.
Can anybody help me in debugging and fixing???

Regards,
András


/ # svn mkdir file:///var/svn/testrepo/xxx -m aaa
fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev
fs_fs: [LINE 2140] calling read_rep_offsets '0 0 4 4
2d2977d1c96f487abe4a1e202dd03b4e'
read_rep_offsets: [LINE 1947] '0 0 4 4 2d2977d1c96f487abe4a1e202dd03b4e'
read_rep_offsets: [LINE 1956] '0'
read_rep_offsets: [LINE 1973] '0'
read_rep_offsets: [LINE 1984] '4'
read_rep_offsets: [LINE 1995] '4'
read_rep_offsets: [LINE 2009] '2d2977d1c96f487abe4a1e202dd03b4e'
apr_file_open: '/var/svn/testrepo/db/transactions/0-0.txn/node.0.0'
Call svn_fsfs__write_noderev in svn_fs_fs__put_node_revision [LINE 2390]
svn_fsfs__write_noderev HEADER_TEXT
rev 0
offs 4618626049922564096
size 4
exp size 4
md5 2d2977d1c96f487abe4a1e202dd03b4e
svn_fsfs__write_noderev HEADER_TEXT OK
fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev
fs_fs: [LINE 2140] calling read_rep_offsets '0 4 4 531704 (null)'
read_rep_offsets: [LINE 1947] '0 4 4 531704 (null)'
read_rep_offsets: [LINE 1956] '0'
read_rep_offsets: [LINE 1973] '4'
read_rep_offsets: [LINE 1984] '4'
read_rep_offsets: [LINE 1995] '531704'
subversion/libsvn_fs_fs/fs_fs.c:2239: (apr_err=160004)
svn: Corrupt node-revision '0.0.t0-0'
subversion/libsvn_fs_fs/fs_fs.c:2006: (apr_err=160004)
svn: Malformed text representation offset line in node-rev
read_rep_offsets: apr_strtok #4 last_string '' string '0' str '(null)'
strlen(str) 6 (APR_MD5_DIGESTS
IZE*2) 32 revision 0 offset 4 size 0 expsize 4
/ #


--
Takács András
Skype: wakoond
GTalk: wakoond
MSN: wako...@freestart.hu



2010/12/2 Takács András wako...@gmail.com:
 Hi All, Hi daniel,

 Daniel, thanks a lot your help, I'm getting more closer to the origin
 of the issue. (I hope.)
 I CC'ed the dev@ list as well, so I'm summarize the problem again.

 I'm cross compiling subversion for an ARM9 based development board (mini2440).
 (You'll find my configure options below.)
 This is a totaly clean environbent. There are just the libraries of
 toolchain (codesourcery).
 I built apr, apr-util, zlib, and sqlite from subversion-deps package,
 and the subversion itself.
 It is version 1.6.15.

 I'm createing a new repository with svn create, and after it I'm
 trying svn mkdir file:///...
 The result is:
 svn: Commit failed (details follow):
 svn: Corrupt node-revision '0.0.t0-0'
 svn: Malformed text representation offset line in node-rev

 Based on Daniel's help, I started to patch fs_fs.c, and adding debug
 messages to it:
 / # svnadmin create /var/svn/testrepo   === I'm creating the repository
 / # cat /var/svn/testrepo/db/revs/0/0     === Checking the revision
 file. Seems to be OK.
 PLAIN
 END
 ENDREP
 id: 0.0.r0/17
 type: dir
 count: 0
 text: 0 0 4 4 2d2977d1c96f487abe4a1e202dd03b4e
 cpath: /


 17 107
 / # svn mkdir file:///var/svn/testrepo/xxx -m m
 fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev   === It is the
 bottom of the get_node_revision_body function
 fs_fs: [LINE 2140] calling read_rep_offsets '0 0 4 4
 2d2977d1c96f487abe4a1e202dd03b4e'
 read_rep_offsets: [LINE 1947] '0 0 4 4
 2d2977d1c96f487abe4a1e202dd03b4e' == The string parameter of
 read_rep_offsets
 read_rep_offsets: [LINE 1956] '0'  === Return str of first apr_strtok
 read_rep_offsets: [LINE 1973] '0'  === Return str of second apr_strtok
 read_rep_offsets: [LINE 1984] '4'  === Return str of third apr_strtok
 read_rep_offsets: [LINE 1995] '4'  === Return str of fourth apr_strtok
 read_rep_offsets: [LINE 2009] '2d2977d1c96f487abe4a1e202dd03b4e'  ===
 Return str of fifth apr_strtok
 fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev
 apr_strtok: [LINE 35] '0.0.t0-0'
 apr_strtok: [LINE 58] '0'
 apr_strtok: [LINE 35] '0.t0-0'
 apr_strtok: [LINE 58] '0'
 apr_strtok: [LINE 35] 't0-0'
 apr_strtok: [LINE 58] 't0-0'
 fs_fs: [LINE 2140] calling read_rep_offsets '0 4 4 4621479420036127952 (null)'
 read_rep_offsets: [LINE 1947] '0 4 4 4621479420036127952 (null)'
 read_rep_offsets: [LINE 1956] '0'
 read_rep_offsets: [LINE 1973] '4'
 read_rep_offsets: [LINE 1984] '4'
 read_rep_offsets: [LINE 1995] '4621479420036127952'
 subversion/libsvn_fs_fs/fs_fs.c:2239: (apr_err=160004)
 svn: Corrupt node-revision '0.0.t0-0'
 subversion/libsvn_fs_fs/fs_fs.c:2006: (apr_err=160004)
 svn: