Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-01 Thread Johan Corveleyn
On Wed, Dec 1, 2010 at 3:38 AM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 Johan Corveleyn wrote on Wed, Dec 01, 2010 at 00:25:27 +0100:
 I am now considering to abandon the tokens-approach, for the following 
 reasons:
 ...
 So, unless someone can convince me otherwise, I'm probably going to
 stop with the token approach. Because of 2), I don't think it's worth
 it to spend the effort needed for 1), especially because the
 byte-based approach already works.


 In other words, you're saying that the token-based approach: (b) won't be
 as fast as the bytes-based approach can be, and (a) requires much effort
 to be spent on implementing the reverse reading of tokens?  (i.e.,
 a backwards gets())

Yes.

The reverse reading is quite hard (in the diff_file.c case) because of
the chunked reading of files. A line may be split by a chunk
boundary (it may even be split in the middle of an eol sequence
(between \r and \n), and it all still needs to be
canonicalized/normalized correctly (that's why we'll also need a
reverse normalization function). The current forward get_next_token
does this very well, and the reverse should be analogous, but I just
find it hard to reason about, and to get it implemented correctly. It
will take me a lot of time, and with a high probability of introducing
subtle bugs for special edge cases.

 Any thoughts?


 -tokens/BRANCH-README mentions one of the advantages of the tokens
 approach being that the comparison is done only after whitespace and
 newlines have been canonicalized (if -x-w or -x--ignore-eol-style is in
 effect).  IIRC, the -bytes approach doesn't currently take advantage of
 these -x flags?

 What is the practical effect of the fact the -bytes approach doesn't
 take advantage of these flags?  If a file (with a moderately long
 history) has had all its newlines changed in rN, then I assume that your
 -bytes optimizations will speed up all the diffs that 'blame' performs
 on that file, except for the single diff between rN and its predecessor?

Yes, I thought that would be an important advantage of the tokens
approach, but as you suggest: the practical effect will most likely be
limited. Indeed, only this single diff between rN and its predecessor
(which may be 1 in 1000 diffs) will not benifit from
prefix/suffix-optimization. Even if the rate of such changes is like
1/100th (let's say you change leading tabs to spaces, and vice versa,
every 100 revisions), it will hardly be noticeable.

The perfectionist in me still thinks: hey, you can also implement this
normalization with the byte-based approach (in a streamy way). But
that will probably be quite hard, because I'd have to take into
account that I might have to read more bytes from one file than from
the other file (right now I can always simply read a single byte from
both files). And other than that, it will at least be algorithm
duplication: it will be a (streamy) re-implementation of the
normalization algorithm that's already used in the token layer. So all
in all it's probably not worth it ...

Cheers,
-- 
Johan


Re: [PATCH] Fix Makefile.svn to build APR with threads

2010-12-01 Thread Stefan Sperling
On Wed, Dec 01, 2010 at 10:48:37AM +0530, Ramkumar Ramachandra wrote:
 Hi Stefan,
 
 Thanks for the suggestions. How about this?

+1

A single 'Approved by: stsp' line in the log message is sufficient,
you don't necessarily need the 'suggested' and 'review' lines.

Thanks!

 
 [[[
 Makefile.svn: Optionally allow building with threading support
 
 * tools/dev/unix-build/Makefile.svn: Add a new THREADING variable to
   control whether APR and sqlite should be built with threading
   support.
 
 Suggested by: stsp
 Review by: stsp
 ]]]
 
 Index: tools/dev/unix-build/Makefile.svn
 ===
 --- tools/dev/unix-build/Makefile.svn (revision 1040690)
 +++ tools/dev/unix-build/Makefile.svn (working copy)
 @@ -233,6 +233,12 @@
   fi
   touch $@
  
 +ifdef THREADING
 +THREADS_FLAG=--enable-threads
 +else
 +THREADS_FLAG=--disable-threads
 +endif
 +
  # configure apr
  $(APR_OBJDIR)/.configured: $(APR_OBJDIR)/.retrieved
   cp $(APR_SRCDIR)/build/apr_hints.m4 \
 @@ -246,7 +252,7 @@
   $(APR_SRCDIR)/configure \
   --prefix=$(PREFIX)/apr \
   --enable-maintainer-mode \
 - --disable-threads
 + $(THREADS_FLAG)
   touch $@
  
  # compile apr
 @@ -704,6 +710,12 @@
   tar -C $(SRCDIR) -zxf $(DISTDIR)/$(SQLITE_DIST)
   touch $@
  
 +ifdef THREADING
 +THREADSAFE_FLAG=--enable-threadsafe
 +else
 +THREADSAFE_FLAG=--disable-threadsafe
 +endif
 +
  # configure sqlite
  $(SQLITE_OBJDIR)/.configured: $(SQLITE_OBJDIR)/.retrieved
   cd $(SQLITE_OBJDIR) \
 @@ -711,7 +723,7 @@
   $(SQLITE_SRCDIR)/configure \
   --prefix=$(PREFIX)/sqlite \
   --disable-tcl \
 - --disable-threadsafe
 + $(THREADSAFE_FLAG)
   touch $@
  
  # compile sqlite


Re: svn commit: r1040831 - in /subversion/trunk/subversion: include/svn_checksum.h libsvn_subr/checksum.c

2010-12-01 Thread Stefan Sperling
On Wed, Dec 01, 2010 at 05:25:28AM +0200, Daniel Shahaf wrote:
 stef...@apache.org wrote on Tue, Nov 30, 2010 at 23:56:41 -:
  Author: stefan2
  Date: Tue Nov 30 23:56:40 2010
  New Revision: 1040831
  
  URL: http://svn.apache.org/viewvc?rev=1040831view=rev
  Log:
  Fix 'svnadmin verify' for format 5 repositories. During the checking 
  process,
  they yield NULL for SHA1 checksums. The most robust way to fix that is to
  allow NULL for checksum in svn_checksum_to_cstring. This seems justified
  as NULL is already a valid return value instead of e.g. an empty string. So,
  callers may receive NULL as result and could therefore assume that NULL is
  a valid input, too
  
 
 Can you explain how to trigger the bug you are fixing?  Just running
 'svnadmin verify' on my r1040058-created Greek repository doesn't.
 
  * subversion/include/svn_checksum.h
(svn_checksum_to_cstring): extend doc string
  * subversion/libsvn_subr/checksum.c
(svn_checksum_to_cstring): return NULL for NULL checksums as well
  
  Modified:
  subversion/trunk/subversion/include/svn_checksum.h
  subversion/trunk/subversion/libsvn_subr/checksum.c
  
  Modified: subversion/trunk/subversion/include/svn_checksum.h
  URL: 
  http://svn.apache.org/viewvc/subversion/trunk/subversion/include/svn_checksum.h?rev=1040831r1=1040830r2=1040831view=diff
  ==
  --- subversion/trunk/subversion/include/svn_checksum.h (original)
  +++ subversion/trunk/subversion/include/svn_checksum.h Tue Nov 30 23:56:40 
  2010
  @@ -121,7 +121,7 @@ svn_checksum_to_cstring_display(const sv
   
   /** Return the hex representation of @a checksum, allocating the
* string in @a pool.  If @a checksum-digest is all zeros (that is,
  - * 0, not '0'), then return NULL.
  + * 0, not '0') or NULL, then return NULL.
*
 
 First, this change doesn't match the code change: the docstring change
 says CHECKSUM-DIGEST may be NULL, but the code change allows CHECKSUM
 to be NULL.
 
 Second, let's have the docstring say that NULL is only allowed by 1.7+.
 (Passing NULL will segfault in 1.6.)
 
 (doxygen has a @note directive.)

Yes, this certainly violates API backwards compatibility rules.

A regression test that shows the problem would be nice.

Thanks,
Stefan

 
* @since New in 1.6.
*/
  
  Modified: subversion/trunk/subversion/libsvn_subr/checksum.c
  URL: 
  http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/checksum.c?rev=1040831r1=1040830r2=1040831view=diff
  ==
  --- subversion/trunk/subversion/libsvn_subr/checksum.c (original)
  +++ subversion/trunk/subversion/libsvn_subr/checksum.c Tue Nov 30 23:56:40 
  2010
  @@ -135,6 +135,9 @@ const char *
   svn_checksum_to_cstring(const svn_checksum_t *checksum,
   apr_pool_t *pool)
   {
  +  if (checksum == NULL)
  +return NULL;
  +
 switch (checksum-kind)
   {
 case svn_checksum_md5:
  
  


Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-01 Thread Julian Foad
On Wed, 2010-12-01, stef...@apache.org wrote:
 Port (not merge) a fix for a FSFS packing race condition from the
 performance branch to /trunk: There is a slight time window
 between finding the name of a rev file and actually open it. If
 the revision in question gets packed just within this window,
 we will find the new (and final) file name with just one retry.
 
 * subversion/libsvn_fs_fs/fs_fs.c
   (open_pack_or_rev_file): retry once upon missing file error

Hi Stefan.

(1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
that the returned path may no longer be valid by the time you come to
use it, and the reason for that.  Does my patch below sound right?

(2) Doesn't the exact same race exist in *all* uses of
svn_fs_fs__path_rev_absolute()?  There are five other calls to it, all
of them using the result as a permissions reference to set the file
permissions on some new file.  It seems to me that those uses can fail
in the same way.

The root problem is the existence of a get the path of this file API
in the presence of semantics whereby the file might be removed from that
path at any time.

Perhaps your fix is the best fix for this caller, but for the other
callers I think creating and using a copy file permissions from
rev-file of rev R API would be a better solution than adding re-tries
there.  That API can do the re-try internally.

What do you think?


Patch for (1) above:
[[[
Index: subversion/libsvn_fs_fs/fs_fs.h
===
--- subversion/libsvn_fs_fs/fs_fs.h (revision 1040662)
+++ subversion/libsvn_fs_fs/fs_fs.h (working copy)
@@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
PERMS_REFERENCE.  Temporary allocations are from POOL. */
 svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
 const char *new_filename,
 const char *perms_reference,
 apr_pool_t *pool);
 
-/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
+/* Set *PATH to the path of REV in FS, whether in a pack file or not.
+   The path is not guaranteed to remain correct after the function returns,
+   because it is possible that the revision will become packed just after
+   this call, so the caller should re-try once if the path is not found.
Allocate in POOL. */
 svn_error_t *
 svn_fs_fs__path_rev_absolute(const char **path,
  svn_fs_t *fs,
  svn_revnum_t rev,
  apr_pool_t *pool);
]]]

- Julian


 Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
 URL: 
 http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832r1=1040831r2=1040832view=diff
 ==
 --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
 +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec  1 00:15:11 2010
 @@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file,
  {
svn_error_t *err;
const char *path;
svn_boolean_t retry = FALSE;
  
do
  {
err = svn_fs_fs__path_rev_absolute(path, fs, rev, pool);
  
/* open the revision file in buffered r/o mode */
if (! err)
  err = svn_io_file_open(file, path,
APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool);
  
if (err  APR_STATUS_IS_ENOENT(err-apr_err))
  {
/* Could not open the file. This may happen if the
 * file once existed but got packed later. */
svn_error_clear(err);
  
/* if that was our 2nd attempt, leave it at that. */
if (retry)
  return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL,
  _(No such revision %ld), rev);
  
/* we failed for the first time. Refresh cache  retry. */
SVN_ERR(update_min_unpacked_rev(fs, pool));
  
retry = TRUE;
  }
else
  {
/* the file exists but something prevented us from opnening it */
return svn_error_return(err);
  }
  }
while (err);
  
return SVN_NO_ERROR;
  }




Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-01 Thread Johan Corveleyn
On Wed, Dec 1, 2010 at 11:44 AM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 Johan Corveleyn wrote on Wed, Dec 01, 2010 at 10:05:29 +0100:
 On Wed, Dec 1, 2010 at 3:38 AM, Daniel Shahaf d...@daniel.shahaf.name 
 wrote:
  Johan Corveleyn wrote on Wed, Dec 01, 2010 at 00:25:27 +0100:
  I am now considering to abandon the tokens-approach, for the following 
  reasons:
  ...
  So, unless someone can convince me otherwise, I'm probably going to
  stop with the token approach. Because of 2), I don't think it's worth
  it to spend the effort needed for 1), especially because the
  byte-based approach already works.
 
 
  In other words, you're saying that the token-based approach: (b) won't be
  as fast as the bytes-based approach can be, and (a) requires much effort
  to be spent on implementing the reverse reading of tokens?  (i.e.,
  a backwards gets())

 Yes.

 The reverse reading is quite hard (in the diff_file.c case) because of
 the chunked reading of files. A line may be split by a chunk
 boundary (it may even be split in the middle of an eol sequence
 (between \r and \n), and it all still needs to be
 canonicalized/normalized correctly (that's why we'll also need a
 reverse normalization function). The current forward get_next_token
 does this very well, and the reverse should be analogous, but I just
 find it hard to reason about, and to get it implemented correctly. It
 will take me a lot of time, and with a high probability of introducing
 subtle bugs for special edge cases.


 OK, so a reverse get_next_token() could be tricky to implement.  But,
 worse, won't having it mean that implementors of svn_diff_fns_t can't
 have streamy access to their source?  Since they would be required to
 provide sometimes a token from the start and sometimes a token from the
 end...

 Well, it's not streamy in our usual sense of the word, but it's
 double-streamy (no one requests the middle byte until either all
 bytes before or all bytes after it were transmitted already)

Oooh, I hadn't considered other implementors besides the internal
diff_memory.c and diff_file.c. You're right, that would impose
additional constraints on other implementors. I don't know if being
non-streamy (or less streamy anyway) would be problem ...

In my last commit on the -tokens branch, I added a flag open_at_end
to the datasource_open function (function of svn_diff_fns_t), so the
datasource could be opened at the end. Also, other supporting
functions were needed: token_pushback_prefix, token_pushback_suffix
(to push back tokens that were read too far when scanning for
prefix/suffix) and the dreaded datasource_get_previous_token. Anyway,
the high-level idea was:

1) Open datasources at the end.

2) Scan for identical suffix (getting tokens in reverse).

3) At first non-match: pushback suffix token, and note where suffix starts.

4) Open datasources at the beginning.

5) Scan for identical prefix (getting tokens normally, but without hash).

6) At first non-match: pushback prefix token.

7) Run the old algorithm: getting tokens forward, but with hash. The
get_next_token function should stop (return NULL for token) when it
arrives at the suffix.


Sidestep: I just now realized that I probably don't need to have the
reverse normalization algorithm for implementing get_previous_token.
The call to the normalization function in get_next_token is (I think)
only needed to be able to calculate the hash. But since
get_previous_token doesn't need to calculate hashes, I may be able to
get by without normalization there. I'd only need to normalize inside
token_compare, and I *think* I can just to that forwardly, instead
of backwards. Just thinking out loud here ...

So: that makes the token-approach again a little bit more possible.
But do we want it? It requires a lot more from implementors of
svn_diff_fns_t. OTOH, it does offer a generic prefix/suffix
optimization to all implementors of svn_diff_fns_t ...

  Any thoughts?
 
 
  -tokens/BRANCH-README mentions one of the advantages of the tokens
  approach being that the comparison is done only after whitespace and
  newlines have been canonicalized (if -x-w or -x--ignore-eol-style is in
  effect).  IIRC, the -bytes approach doesn't currently take advantage of
  these -x flags?
 
  What is the practical effect of the fact the -bytes approach doesn't
  take advantage of these flags?  If a file (with a moderately long
  history) has had all its newlines changed in rN, then I assume that your
  -bytes optimizations will speed up all the diffs that 'blame' performs
  on that file, except for the single diff between rN and its predecessor?

 Yes, I thought that would be an important advantage of the tokens
 approach, but as you suggest: the practical effect will most likely be
 limited. Indeed, only this single diff between rN and its predecessor
 (which may be 1 in 1000 diffs) will not benifit from
 prefix/suffix-optimization. Even if the rate of such changes is like
 1/100th (let's say you change 

Re: svn commit: r1040663 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c

2010-12-01 Thread Julian Foad
Daniel Shahaf wrote:
 So we loop over the remaining sha1's and remove each of them...
 I wonder if there is room for further optimization here?  e.g., does
 this prepare/reset the statement just once, or once per iteration?

Each iteration of this loop prepares, uses and resets a SQL statement,
and also removes a pristine file from disk.  So yes there is room for
further optimization of the SQL part of that.

The main concern I was addressing was that the previous method was
*quadratic* in the total number of pristines in the store, because for
each one in the store it would scan the NODES and ACTUAL_NODE tables
looking for a reference to it.  I had noticed that even a no-op cleanup
took a very long time on a large WC.  It will help if I show some real
timings.

Wall clock times for svn cleanup on a clean checkout of
^/subversion/branc...@1040943 on my Linux system.

  r1040662 build: first time = 15 minutes, second = 14.8 minutes.

  r1040663 build: first time = 4.4s, best of many repetitions = 0.7s.

Now the algorithm is only linear time, which is a *huge* win.  A
'cleanup' operation doesn't need to be blisteringly fast, so I don't
think it needs more optimisation.

I've edited the log message to clarify the main point, and to point out
the big-WC timing improvement.

- Julian


# r1040662 build
$ time ~/build/subversion-c/subversion/svn/svn cleanup branches/
real15m4.962s
user9m0.306s
sys 6m3.967s

# r1040663 build
$ time ~/build/subversion-c/subversion/svn/svn cleanup branches/
real0m0.708s
user0m0.436s
sys 0m0.212s





Re: gpg-agent branch treats PGP passphrase as repository password?

2010-12-01 Thread Stefan Sperling
On Wed, Dec 01, 2010 at 06:50:35AM -0500, Dan Engel wrote:
 On Sun, 2010-10-17 at 16:03 +0200, Stefan Sperling wrote:
  If my understanding is correct, it seems the current code on the gpg-agent
  branch essentially uses the gpg-agent as a glorified password prompt.
  
  Here's what the code seems to be doing:
  
  Storing a password:
Nothing happens.
  
  Retreiving a password:
  Step 1: svn contacts the gpg-agent to find out the passphrase for the 
  private
  PGP key the agent is managing
 
 I have not reviewed the current code on the gpg-agent branch, but was
 the original submitter for the gpg-agent support patch. I've wanted to
 return to it, but have not had the time. However, I would like to take a
 moment and address this concern.
 
 The name of the program (gpg-agent) may be misleading in this, but all
 gpg-agent does is to safely (securely, without swapping, etc.) cache
 secrets indexed by a domain. The domain is whatever the client provides
 and the secret is the password that the user types in. So what happens
 is:
 
 Step 1: svn contacts gpg-agent, providing the username and repository
 URL as the domain.
 
 Step 2: gpg-agent checks to see whether there is a password associated
 with that domain. If not, it prompts for one. Here, the user enters
 their subversion password, NOT their pgp private key passphrase.
 
 Step 3: gpg-agent associates the password with the provided domain and
 stores that in the memory cache in protected memory. There is typically
 a timeout (one hour, 24 hours, whatever).
 
 Step 4: svn receives the password and sends it to the server.
 
 Step 5: Later, the user invokes svn again (in a way that needs the
 server). Normally, this would be the point at which svn would prompt for
 another password. Instead...
 
 Step 6: svn contacts the gpg-agent, providing the same domain (username
 + repository URL) as before.
 
 Step 7: gpg-agent checks to see whether there is a password associated
 with that domain, and this time there is, because of step 3 before. Now,
 gpg-agent just returns this password instead of prompting for a new one.
 
 Step 8: svn receives the password and sends it to the server...
 
 So, you see, in spite of the name (gpg-agent) there's no inherent
 interaction between gpg-agent and the pgp private key store, and at no
 time is the user prompted for their private pgp key.
 
 The purpose of the gpg-agent patch is to make it so that a user who
 doesn't have one of the GUI wallets available (gnome-keyring or kwallet)
 doesn't have to type in their subversion password EVERY time they invoke
 a command against the repository. The gpg program uses gpg-agent (when
 so configured) for exactly the same purpose--to keep the user from
 having to re-type that passphrase more than once every X hours. But the
 agent itself is generic enough to serve a broad range of password
 caching needs.
 
 gpg-agent allows the user to configure how long passwords persist in
 protected memory. If the timeout has passed, then in step 7, the
 gpg-agent would request the password again since it would no longer be
 in the memory cache.
 
 Does that all make sense?

Thanks, this makes sense. I was expecting a scheme that would persist
the password on disk. Using gpg-agent as a mere runtime password cache
as you describe is certainly fine and, if used right, does not expose
the private key passphase.

However, I still see a potential risk here because the name gpg-agent
is very misleading. It violates the principle of least surprise.
How can we prevent users misunderstanding what Subversion's gpg-agent
feature does from entering their private pgp key passphrase (which will
then be sent to the server)?  Can we control the prompt printed by
gpg-agent? (Enter your Subversion password, NOT your secret PGP passphrase!)

Remember that people may want to use Subversion password caches even for
publicly accessible servers which are password protected with a commonly
known user/password combination. Any such server could potentially receive
PGP passphrases in clear text if the gpg-agent feature is misused.

Because of this concern, I'd rather have a new tool called 'svn-agent'
for this purpose, licensed under Apache License v2 and distributed
along with Subversion. jIn that case users would not make a mental
association with PGP and are very unlikely to enter their PGP passphrase.
This would also avoid introducing a dependency on an external tool.
It would not be very hard to write such a tool, using a protocol similar
to the one used in gpg-agent.

I would also support a scheme that uses PGP in the way I described,
persisting passwords on disk encrypted with a PGP private key.

Thanks,
Stefan


Re: svn commit: r1040663 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c

2010-12-01 Thread Daniel Shahaf
Julian Foad wrote on Wed, Dec 01, 2010 at 13:06:04 +:
 Daniel Shahaf wrote:
  So we loop over the remaining sha1's and remove each of them...
  I wonder if there is room for further optimization here?  e.g., does
  this prepare/reset the statement just once, or once per iteration?
 
 Each iteration of this loop prepares, uses and resets a SQL statement,
 and also removes a pristine file from disk.  So yes there is room for
 further optimization of the SQL part of that.
 

Thanks for the overview.

 The main concern I was addressing was that the previous method was
 *quadratic* in the total number of pristines in the store, because for
 each one in the store it would scan the NODES and ACTUAL_NODE tables
 looking for a reference to it.  I had noticed that even a no-op cleanup
 took a very long time on a large WC.  It will help if I show some real
 timings.
 
 Wall clock times for svn cleanup on a clean checkout of
 ^/subversion/branc...@1040943 on my Linux system.
 
   r1040662 build: first time = 15 minutes, second = 14.8 minutes.
 
   r1040663 build: first time = 4.4s, best of many repetitions = 0.7s.
 


:-)!

 Now the algorithm is only linear time, which is a *huge* win.  A
 'cleanup' operation doesn't need to be blisteringly fast, so I don't
 think it needs more optimisation.
 
 I've edited the log message to clarify the main point, and to point out
 the big-WC timing improvement.
 

Fair enough; let's table this additional optimization for now.  We can
always add it later if needed (i.e., if the time spent removing
unreferenced pristines should become an issue).

 - Julian
 
 
 # r1040662 build
 $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/
 real  15m4.962s
 user  9m0.306s
 sys   6m3.967s
 
 # r1040663 build
 $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/
 real  0m0.708s
 user  0m0.436s
 sys   0m0.212s
 
 
 

Thanks,

Daniel



Re: Can I add NOT NULL to PRISTINE table columns?

2010-12-01 Thread Julian Foad
 Julian Foad julian.f...@wandisco.com writes:
 
  I imagine it should be possible to add 'NOT NULL' to these columns
  without performing a format bump or writing any upgrade code.  Am I
  right?

Hyrum Wright wrote:
 If we're already enforcing it in the C code, I see no problem with
 doing so in the sql schema.  Additionally, I don't really see how you
 could add the 'NOT NULL' clause in a format bump without doing table
 duplication, which doesn't seem worth the trouble for something like
 this.

Right.

Philip Martin wrote:
 Existing working copies would not be upgraded, although I suppose the
 client would work.  Perhaps make it a format 99 step?

The software won't see any difference except it will provide a sanity
check when we're developing and modifying Subversion code and testing
against new WCs.  I don't think there is any need to upgrade existing
development WCs in this respect.

I'll mention it in a comment in the format 99 step, but don't think we
need to write any SQL for it.

Thanks.

- Julian




Re: 1.5.8 up for signing/testing

2010-12-01 Thread Mark Phippard
SUMMARY:
+1 to release

PLATFORM:
 Windows 7
 VS 2008 SP1
 Java 1.6

COMPONENTS:
 Apache2.2.15
 APR 1.4.2
 APR-UTIL1.3.9
 OpenSSL 1.0.0a
 Neon0.29.5
 ZLib  (from deps)

VERIFIED:
 md5 and sha1 sums
 gpg signature verified

TESTED:
 JavaHL
 ra_local | ra_svn | ra_neon X fsfs

RESULTS:
 All passed

SIGNATURES:

subversion-1.5.8.zip
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (Darwin)

iEYEABECAAYFAkz2oiIACgkQJl34oANalqn7OACgkXfcbCr17XRqvYSFIkwGrqst
3V0AniZnJGzvU3o5ePj4k1u/OO8dMFhI
=B+uM
-END PGP SIGNATURE-

subversion-deps-1.5.8.zip
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (Darwin)

iEYEABECAAYFAkz2oi0ACgkQJl34oANalqmpgACgmek/6rW9+JJpAfKJ7wx0H4BV
i2oAn0oTiXZea3TsnlcYpaE/TSg5AOeG
=d74O
-END PGP SIGNATURE-




-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: gpg-agent branch treats PGP passphrase as repository password?

2010-12-01 Thread Dan Engel
On Wed, 2010-12-01 at 14:08 +0100, Stefan Sperling wrote:
 However, I still see a potential risk here because the name
 gpg-agent
 is very misleading. It violates the principle of least surprise.
 How can we prevent users misunderstanding what Subversion's gpg-agent
 feature does from entering their private pgp key passphrase (which
 will
 then be sent to the server)?  Can we control the prompt printed by
 gpg-agent? (Enter your Subversion password, NOT your secret PGP
 passphrase!) 


Yes, the agent protocol provides for customized prompts, and the patch
itself refers to the Subversion repository server (or something like
that) in that prompt.

I have no emotional investment in the gpg-agent idea (aside from the
don't re-invent the wheel argument), but here's my $0.02:

I think most people who know enough to use gpg agent (it's a bit more
involved to set up, etc. than things like gnome-keyring) probably
understand what it does well enough to not make that mistake.

Also, in most corporate or enterprise environments (where the stakes are
really high) Subversion will be installed and set up by administrators
(who *better* know what they're doing) and used by users who may not
even know that gpg-agent is running in the background. All they get is a
prompt for their subversion password.

I know those lines get a little more blurred in Linux-land than in
Windows-land, but I think the point is still a valid one.

From a purely personal point of view, I'd be happy with ANY disk or
memory password cache for Subversion on Linux that is safe
(security-wise) and doesn't rely on the presence of any GUI libraries or
capabilities. The gpg-agent path was just the easiest one for me to
implement directly.

Thanks,
-Dan



Re: svn commit: r1040831 - in /subversion/trunk/subversion: include/svn_checksum.h libsvn_subr/checksum.c

2010-12-01 Thread Stefan Fuhrmann

On 01.12.2010 04:25, Daniel Shahaf wrote:

stef...@apache.org wrote on Tue, Nov 30, 2010 at 23:56:41 -:

Author: stefan2
Date: Tue Nov 30 23:56:40 2010
New Revision: 1040831

URL: http://svn.apache.org/viewvc?rev=1040831view=rev
Log:
Fix 'svnadmin verify' for format 5 repositories. During the checking process,
they yield NULL for SHA1 checksums. The most robust way to fix that is to
allow NULL for checksum in svn_checksum_to_cstring. This seems justified
as NULL is already a valid return value instead of e.g. an empty string. So,
callers may receive NULL as result and could therefore assume that NULL is
a valid input, too


Can you explain how to trigger the bug you are fixing?  Just running
'svnadmin verify' on my r1040058-created Greek repository doesn't.

Sure:

$svnadmin-1.5.4 create /mnt/svnroot/test
$add pre-revprop-change hook
$svnsync-1.5.4 init file:///mnt/svnroot/test http://svn.apache.org/repos/asf
$svnsync-1.5.4 sync file:///mnt/svnroot/test
$cancel after a few revs; rev 1 will already trigger the error
$svnadmin-trunk verify /mnt/svnroot/test

* subversion/include/svn_checksum.h
   (svn_checksum_to_cstring): extend doc string
* subversion/libsvn_subr/checksum.c
   (svn_checksum_to_cstring): return NULL for NULL checksums as well

Modified:
 subversion/trunk/subversion/include/svn_checksum.h
 subversion/trunk/subversion/libsvn_subr/checksum.c

Modified: subversion/trunk/subversion/include/svn_checksum.h
URL: 
http://svn.apache.org/viewvc/subversion/trunk/subversion/include/svn_checksum.h?rev=1040831r1=1040830r2=1040831view=diff
==
--- subversion/trunk/subversion/include/svn_checksum.h (original)
+++ subversion/trunk/subversion/include/svn_checksum.h Tue Nov 30 23:56:40 2010
@@ -121,7 +121,7 @@ svn_checksum_to_cstring_display(const sv

  /** Return the hex representation of @a checksum, allocating the
   * string in @a pool.  If @a checksum-digest is all zeros (that is,
- * 0, not '0'), then return NULL.
+ * 0, not '0') or NULL, then return NULL.
   *

First, this change doesn't match the code change: the docstring change
says CHECKSUM-DIGEST may be NULL, but the code change allows CHECKSUM
to be NULL.

Second, let's have the docstring say that NULL is only allowed by 1.7+.
(Passing NULL will segfault in 1.6.)

(doxygen has a @note directive.)


Done.

-- Stefan^2.

   * @since New in 1.6.
   */

Modified: subversion/trunk/subversion/libsvn_subr/checksum.c
URL: 
http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/checksum.c?rev=1040831r1=1040830r2=1040831view=diff
==
--- subversion/trunk/subversion/libsvn_subr/checksum.c (original)
+++ subversion/trunk/subversion/libsvn_subr/checksum.c Tue Nov 30 23:56:40 2010
@@ -135,6 +135,9 @@ const char *
  svn_checksum_to_cstring(const svn_checksum_t *checksum,
  apr_pool_t *pool)
  {
+  if (checksum == NULL)
+return NULL;
+
switch (checksum-kind)
  {
case svn_checksum_md5:






Re: diff-optimizations-tokens branch: I think I'm going to abandon it

2010-12-01 Thread Daniel Shahaf
[ finally getting back to this mail; having slept on it, etc. ]

Johan Corveleyn wrote on Wed, Dec 01, 2010 at 13:34:48 +0100:
 On Wed, Dec 1, 2010 at 11:44 AM, Daniel Shahaf d...@daniel.shahaf.name 
 wrote:
  Johan Corveleyn wrote on Wed, Dec 01, 2010 at 10:05:29 +0100:
  On Wed, Dec 1, 2010 at 3:38 AM, Daniel Shahaf d...@daniel.shahaf.name 
  wrote:
   Johan Corveleyn wrote on Wed, Dec 01, 2010 at 00:25:27 +0100:
   I am now considering to abandon the tokens-approach, for the following 
   reasons:
   ...
   So, unless someone can convince me otherwise, I'm probably going to
   stop with the token approach. Because of 2), I don't think it's worth
   it to spend the effort needed for 1), especially because the
   byte-based approach already works.
  
  
   In other words, you're saying that the token-based approach: (b) won't be
   as fast as the bytes-based approach can be, and (a) requires much effort
   to be spent on implementing the reverse reading of tokens?  (i.e.,
   a backwards gets())
 
  Yes.
 
  The reverse reading is quite hard (in the diff_file.c case) because of
  the chunked reading of files. A line may be split by a chunk
  boundary (it may even be split in the middle of an eol sequence
  (between \r and \n), and it all still needs to be
  canonicalized/normalized correctly (that's why we'll also need a
  reverse normalization function). The current forward get_next_token
  does this very well, and the reverse should be analogous, but I just
  find it hard to reason about, and to get it implemented correctly. It
  will take me a lot of time, and with a high probability of introducing
  subtle bugs for special edge cases.
 
 
  OK, so a reverse get_next_token() could be tricky to implement.  But,
  worse, won't having it mean that implementors of svn_diff_fns_t can't
  have streamy access to their source?  Since they would be required to
  provide sometimes a token from the start and sometimes a token from the
  end...
 
  Well, it's not streamy in our usual sense of the word, but it's
  double-streamy (no one requests the middle byte until either all
  bytes before or all bytes after it were transmitted already)
 
 Oooh, I hadn't considered other implementors besides the internal
 diff_memory.c and diff_file.c.

Just to clarify: diff_file.c and diff_memory.c are the only implementors
of svn_diff_fns_t in the core code, correct?

 You're right, that would impose
 additional constraints on other implementors. I don't know if being
 non-streamy (or less streamy anyway) would be problem ...
 

We should have asked this before, but:

Do we know who are the other implementors / typical use-cases of
svn_diff_fns_t?

 In my last commit on the -tokens branch, I added a flag open_at_end
 to the datasource_open function (function of svn_diff_fns_t), so the
 datasource could be opened at the end. Also, other supporting
 functions were needed: token_pushback_prefix, token_pushback_suffix
 (to push back tokens that were read too far when scanning for
 prefix/suffix) and the dreaded datasource_get_previous_token. Anyway,
 the high-level idea was:
 
 1) Open datasources at the end.
 
 2) Scan for identical suffix (getting tokens in reverse).
 
 3) At first non-match: pushback suffix token, and note where suffix starts.
 
 4) Open datasources at the beginning.
 
 5) Scan for identical prefix (getting tokens normally, but without hash).
 
 6) At first non-match: pushback prefix token.
 
 7) Run the old algorithm: getting tokens forward, but with hash. The
 get_next_token function should stop (return NULL for token) when it
 arrives at the suffix.
 
 
 Sidestep: I just now realized that I probably don't need to have the
 reverse normalization algorithm for implementing get_previous_token.
 The call to the normalization function in get_next_token is (I think)
 only needed to be able to calculate the hash. But since
 get_previous_token doesn't need to calculate hashes, I may be able to
 get by without normalization there. I'd only need to normalize inside
 token_compare, and I *think* I can just to that forwardly, instead
 of backwards. Just thinking out loud here ...
 

Is normalization function the thing that collapses newlines and
whitespaces if -x-w or -x--ignore-eol-style is in effect?  If so,
another purpose of calling that might be to make suffixes that are
not bytewise-identical, but only modulo-whitespace-identical, also
be lumped into the identical suffix.

(Haven't dived into the code yet; the above is based on my understanding
of your high-level descriptions)

 So: that makes the token-approach again a little bit more possible.
 But do we want it? It requires a lot more from implementors of
 svn_diff_fns_t. OTOH, it does offer a generic prefix/suffix
 optimization to all implementors of svn_diff_fns_t ...
 

Another option is to add the read backwards callbacks to the API, but
designate them as optional.  Then you only do the suffix half of the
optimization of both/all sources support the 

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

2010-12-01 Thread Daniel Shahaf
 On Wed, 2010-12-01, stef...@apache.org wrote:
  Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
  URL: 
  http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832r1=1040831r2=1040832view=diff
  ==
  --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
  +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec  1 00:15:11 
  2010
  @@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file,
   {
 svn_error_t *err;
 const char *path;
 svn_boolean_t retry = FALSE;
   

I agree the code below is correct, but I found it confusing:

 do
   {
 err = svn_fs_fs__path_rev_absolute(path, fs, rev, pool);
   
 /* open the revision file in buffered r/o mode */
 if (! err)
   err = svn_io_file_open(file, path,
 APR_READ | APR_BUFFERED, APR_OS_DEFAULT, 
  pool);
   
 if (err  APR_STATUS_IS_ENOENT(err-apr_err))
   {
 /* Could not open the file. This may happen if the
  * file once existed but got packed later. */
 svn_error_clear(err);
   
 /* if that was our 2nd attempt, leave it at that. */
 if (retry)
   return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL,
   _(No such revision %ld), rev);
   
 /* we failed for the first time. Refresh cache  retry. */
 SVN_ERR(update_min_unpacked_rev(fs, pool));
   

Philip noted that this call should be guarded by a format number check
(otherwise we would assert on format-3 repositories that are missing
a rev file).  I've fixed that.

 retry = TRUE;
   }
 else
   {
 /* the file exists but something prevented us from opnening it */
 return svn_error_return(err);

The comment doesn't indicate that the else{} block is also entered in
the rare case where ERR is SVN_NO_ERROR.

In other words, the success case is handled not by the 'return SVN_NO_ERROR'
below (which in fact is never reached), but by this else{} block.

   }
   }
 while (err);
   
 return SVN_NO_ERROR;
   }

The error handling confused me here: it clears ERR but then checks that
it's non-NULL, and right after that check (which normally means there
is an error) it overwrites ERR.  I think the loop would be clearer if
were just 'while (1)'.

 
 


Re: subversion cross compile (arm)

2010-12-01 Thread Takács András
Hi All, Hi daniel,

Daniel, thanks a lot your help, I'm getting more closer to the origin
of the issue. (I hope.)
I CC'ed the dev@ list as well, so I'm summarize the problem again.

I'm cross compiling subversion for an ARM9 based development board (mini2440).
(You'll find my configure options below.)
This is a totaly clean environbent. There are just the libraries of
toolchain (codesourcery).
I built apr, apr-util, zlib, and sqlite from subversion-deps package,
and the subversion itself.
It is version 1.6.15.

I'm createing a new repository with svn create, and after it I'm
trying svn mkdir file:///...
The result is:
svn: Commit failed (details follow):
svn: Corrupt node-revision '0.0.t0-0'
svn: Malformed text representation offset line in node-rev

Based on Daniel's help, I started to patch fs_fs.c, and adding debug
messages to it:
/ # svnadmin create /var/svn/testrepo   === I'm creating the repository
/ # cat /var/svn/testrepo/db/revs/0/0 === Checking the revision
file. Seems to be OK.
PLAIN
END
ENDREP
id: 0.0.r0/17
type: dir
count: 0
text: 0 0 4 4 2d2977d1c96f487abe4a1e202dd03b4e
cpath: /


17 107
/ # svn mkdir file:///var/svn/testrepo/xxx -m m
fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev   === It is the
bottom of the get_node_revision_body function
fs_fs: [LINE 2140] calling read_rep_offsets '0 0 4 4
2d2977d1c96f487abe4a1e202dd03b4e'
read_rep_offsets: [LINE 1947] '0 0 4 4
2d2977d1c96f487abe4a1e202dd03b4e' == The string parameter of
read_rep_offsets
read_rep_offsets: [LINE 1956] '0'  === Return str of first apr_strtok
read_rep_offsets: [LINE 1973] '0'  === Return str of second apr_strtok
read_rep_offsets: [LINE 1984] '4'  === Return str of third apr_strtok
read_rep_offsets: [LINE 1995] '4'  === Return str of fourth apr_strtok
read_rep_offsets: [LINE 2009] '2d2977d1c96f487abe4a1e202dd03b4e'  ===
Return str of fifth apr_strtok
fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev
apr_strtok: [LINE 35] '0.0.t0-0'
apr_strtok: [LINE 58] '0'
apr_strtok: [LINE 35] '0.t0-0'
apr_strtok: [LINE 58] '0'
apr_strtok: [LINE 35] 't0-0'
apr_strtok: [LINE 58] 't0-0'
fs_fs: [LINE 2140] calling read_rep_offsets '0 4 4 4621479420036127952 (null)'
read_rep_offsets: [LINE 1947] '0 4 4 4621479420036127952 (null)'
read_rep_offsets: [LINE 1956] '0'
read_rep_offsets: [LINE 1973] '4'
read_rep_offsets: [LINE 1984] '4'
read_rep_offsets: [LINE 1995] '4621479420036127952'
subversion/libsvn_fs_fs/fs_fs.c:2239: (apr_err=160004)
svn: Corrupt node-revision '0.0.t0-0'
subversion/libsvn_fs_fs/fs_fs.c:2006: (apr_err=160004)
svn: Malformed text representation offset line in node-rev
read_rep_offsets: apr_strtok #4 last_string '' string '0' str '(null)'
strlen(str) 6 (APR_MD5_DIGESTS
IZE*2) 32 revision 0 offset 4 size 0 expsize 4
/ #

I figured out, that the first execution of fs_fs__read_noderev is
parsing the /var/svn/testrepo/db/revs/0/0 file.
What is the input of the second execution? I see thet it is in wrong
format, but what is the origin of this?


Thanks a lot!

Regards,
András






Toolchain: arm-2008q3-72-arm-none-linux-gnueabi-i686-pc-linux-gnu
Cross gcc:  arm-none-linux-gnueabi-gcc
Cross cflags: -march=armv4t -mtune=arm920t

Apr configure:
   ./configure \
   --prefix=/usr \
   --host=$(CROSS_COMPILE) \
   ac_cv_file__dev_zero=yes \
   ac_cv_func_setpgrp_void=yes \
   apr_cv_process_shared_works=yes \
   apr_cv_mutex_robust_shared=no \
   apr_cv_tcp_nodelay_with_cork=yes \
   ac_cv_sizeof_struct_iovec=8 \
   apr_cv_mutex_recursive=yes \
   CFLAGS=$(CROSS_CFLAGS) \
   LDFLAGS=$(CROSS_LDFLAGS)

Apr-utils configure:
   ./configure \
   --with-apr=MY_APR_BUILD_DIR \
   --prefix=/usr \
   --host=$(CROSS_COMPILE) \
   CFLAGS=$(CROSS_CFLAGS) \
   LDFLAGS=$(CROSS_LDFLAGS); \

Subversion configure:
   ./configure \
   --with-apr=$(PACKAGES_DIR)/apr/$(TARGET_PACKAGE)/apr \
   
--with-apr-util=$(PACKAGES_DIR)/apr-util/$(TARGET_PACKAGE)/apr-util
\
   --with-sqlite=$(TARGET_DEV_ROOT)/usr \
   --with-zlib=$(TARGET_DEV_ROOT)/usr \
   --host=$(CROSS_COMPILE) \
   --prefix=/usr \
   CFLAGS=$(CROSS_CFLAGS) \
   LDFLAGS=$(CROSS_LDFLAGS)

Other compiled libraries: sqlite3, zlib
I'm usung the latest, 1.6.15 subversion and subversion-deps packages.
(I compiled all libraries from subversion-deps, no other installed library)





--
Takács András
Skype: wakoond
GTalk: wakoond
MSN: wako...@freestart.hu