Re: diff-optimizations-tokens branch: I think I'm going to abandon it
On Wed, Dec 1, 2010 at 3:38 AM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Wed, Dec 01, 2010 at 00:25:27 +0100: I am now considering to abandon the tokens-approach, for the following reasons: ... So, unless someone can convince me otherwise, I'm probably going to stop with the token approach. Because of 2), I don't think it's worth it to spend the effort needed for 1), especially because the byte-based approach already works. In other words, you're saying that the token-based approach: (b) won't be as fast as the bytes-based approach can be, and (a) requires much effort to be spent on implementing the reverse reading of tokens? (i.e., a backwards gets()) Yes. The reverse reading is quite hard (in the diff_file.c case) because of the chunked reading of files. A line may be split by a chunk boundary (it may even be split in the middle of an eol sequence (between \r and \n), and it all still needs to be canonicalized/normalized correctly (that's why we'll also need a reverse normalization function). The current forward get_next_token does this very well, and the reverse should be analogous, but I just find it hard to reason about, and to get it implemented correctly. It will take me a lot of time, and with a high probability of introducing subtle bugs for special edge cases. Any thoughts? -tokens/BRANCH-README mentions one of the advantages of the tokens approach being that the comparison is done only after whitespace and newlines have been canonicalized (if -x-w or -x--ignore-eol-style is in effect). IIRC, the -bytes approach doesn't currently take advantage of these -x flags? What is the practical effect of the fact the -bytes approach doesn't take advantage of these flags? If a file (with a moderately long history) has had all its newlines changed in rN, then I assume that your -bytes optimizations will speed up all the diffs that 'blame' performs on that file, except for the single diff between rN and its predecessor? Yes, I thought that would be an important advantage of the tokens approach, but as you suggest: the practical effect will most likely be limited. Indeed, only this single diff between rN and its predecessor (which may be 1 in 1000 diffs) will not benifit from prefix/suffix-optimization. Even if the rate of such changes is like 1/100th (let's say you change leading tabs to spaces, and vice versa, every 100 revisions), it will hardly be noticeable. The perfectionist in me still thinks: hey, you can also implement this normalization with the byte-based approach (in a streamy way). But that will probably be quite hard, because I'd have to take into account that I might have to read more bytes from one file than from the other file (right now I can always simply read a single byte from both files). And other than that, it will at least be algorithm duplication: it will be a (streamy) re-implementation of the normalization algorithm that's already used in the token layer. So all in all it's probably not worth it ... Cheers, -- Johan
Re: [PATCH] Fix Makefile.svn to build APR with threads
On Wed, Dec 01, 2010 at 10:48:37AM +0530, Ramkumar Ramachandra wrote: Hi Stefan, Thanks for the suggestions. How about this? +1 A single 'Approved by: stsp' line in the log message is sufficient, you don't necessarily need the 'suggested' and 'review' lines. Thanks! [[[ Makefile.svn: Optionally allow building with threading support * tools/dev/unix-build/Makefile.svn: Add a new THREADING variable to control whether APR and sqlite should be built with threading support. Suggested by: stsp Review by: stsp ]]] Index: tools/dev/unix-build/Makefile.svn === --- tools/dev/unix-build/Makefile.svn (revision 1040690) +++ tools/dev/unix-build/Makefile.svn (working copy) @@ -233,6 +233,12 @@ fi touch $@ +ifdef THREADING +THREADS_FLAG=--enable-threads +else +THREADS_FLAG=--disable-threads +endif + # configure apr $(APR_OBJDIR)/.configured: $(APR_OBJDIR)/.retrieved cp $(APR_SRCDIR)/build/apr_hints.m4 \ @@ -246,7 +252,7 @@ $(APR_SRCDIR)/configure \ --prefix=$(PREFIX)/apr \ --enable-maintainer-mode \ - --disable-threads + $(THREADS_FLAG) touch $@ # compile apr @@ -704,6 +710,12 @@ tar -C $(SRCDIR) -zxf $(DISTDIR)/$(SQLITE_DIST) touch $@ +ifdef THREADING +THREADSAFE_FLAG=--enable-threadsafe +else +THREADSAFE_FLAG=--disable-threadsafe +endif + # configure sqlite $(SQLITE_OBJDIR)/.configured: $(SQLITE_OBJDIR)/.retrieved cd $(SQLITE_OBJDIR) \ @@ -711,7 +723,7 @@ $(SQLITE_SRCDIR)/configure \ --prefix=$(PREFIX)/sqlite \ --disable-tcl \ - --disable-threadsafe + $(THREADSAFE_FLAG) touch $@ # compile sqlite
Re: svn commit: r1040831 - in /subversion/trunk/subversion: include/svn_checksum.h libsvn_subr/checksum.c
On Wed, Dec 01, 2010 at 05:25:28AM +0200, Daniel Shahaf wrote: stef...@apache.org wrote on Tue, Nov 30, 2010 at 23:56:41 -: Author: stefan2 Date: Tue Nov 30 23:56:40 2010 New Revision: 1040831 URL: http://svn.apache.org/viewvc?rev=1040831view=rev Log: Fix 'svnadmin verify' for format 5 repositories. During the checking process, they yield NULL for SHA1 checksums. The most robust way to fix that is to allow NULL for checksum in svn_checksum_to_cstring. This seems justified as NULL is already a valid return value instead of e.g. an empty string. So, callers may receive NULL as result and could therefore assume that NULL is a valid input, too Can you explain how to trigger the bug you are fixing? Just running 'svnadmin verify' on my r1040058-created Greek repository doesn't. * subversion/include/svn_checksum.h (svn_checksum_to_cstring): extend doc string * subversion/libsvn_subr/checksum.c (svn_checksum_to_cstring): return NULL for NULL checksums as well Modified: subversion/trunk/subversion/include/svn_checksum.h subversion/trunk/subversion/libsvn_subr/checksum.c Modified: subversion/trunk/subversion/include/svn_checksum.h URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/include/svn_checksum.h?rev=1040831r1=1040830r2=1040831view=diff == --- subversion/trunk/subversion/include/svn_checksum.h (original) +++ subversion/trunk/subversion/include/svn_checksum.h Tue Nov 30 23:56:40 2010 @@ -121,7 +121,7 @@ svn_checksum_to_cstring_display(const sv /** Return the hex representation of @a checksum, allocating the * string in @a pool. If @a checksum-digest is all zeros (that is, - * 0, not '0'), then return NULL. + * 0, not '0') or NULL, then return NULL. * First, this change doesn't match the code change: the docstring change says CHECKSUM-DIGEST may be NULL, but the code change allows CHECKSUM to be NULL. Second, let's have the docstring say that NULL is only allowed by 1.7+. (Passing NULL will segfault in 1.6.) (doxygen has a @note directive.) Yes, this certainly violates API backwards compatibility rules. A regression test that shows the problem would be nice. Thanks, Stefan * @since New in 1.6. */ Modified: subversion/trunk/subversion/libsvn_subr/checksum.c URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/checksum.c?rev=1040831r1=1040830r2=1040831view=diff == --- subversion/trunk/subversion/libsvn_subr/checksum.c (original) +++ subversion/trunk/subversion/libsvn_subr/checksum.c Tue Nov 30 23:56:40 2010 @@ -135,6 +135,9 @@ const char * svn_checksum_to_cstring(const svn_checksum_t *checksum, apr_pool_t *pool) { + if (checksum == NULL) +return NULL; + switch (checksum-kind) { case svn_checksum_md5:
Re: svn commit: r1040832 - Port a fix for a FSFS packing race
On Wed, 2010-12-01, stef...@apache.org wrote: Port (not merge) a fix for a FSFS packing race condition from the performance branch to /trunk: There is a slight time window between finding the name of a rev file and actually open it. If the revision in question gets packed just within this window, we will find the new (and final) file name with just one retry. * subversion/libsvn_fs_fs/fs_fs.c (open_pack_or_rev_file): retry once upon missing file error Hi Stefan. (1) I think the doc string of svn_fs_fs__path_rev_absolute() should note that the returned path may no longer be valid by the time you come to use it, and the reason for that. Does my patch below sound right? (2) Doesn't the exact same race exist in *all* uses of svn_fs_fs__path_rev_absolute()? There are five other calls to it, all of them using the result as a permissions reference to set the file permissions on some new file. It seems to me that those uses can fail in the same way. The root problem is the existence of a get the path of this file API in the presence of semantics whereby the file might be removed from that path at any time. Perhaps your fix is the best fix for this caller, but for the other callers I think creating and using a copy file permissions from rev-file of rev R API would be a better solution than adding re-tries there. That API can do the re-try internally. What do you think? Patch for (1) above: [[[ Index: subversion/libsvn_fs_fs/fs_fs.h === --- subversion/libsvn_fs_fs/fs_fs.h (revision 1040662) +++ subversion/libsvn_fs_fs/fs_fs.h (working copy) @@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc PERMS_REFERENCE. Temporary allocations are from POOL. */ svn_error_t *svn_fs_fs__move_into_place(const char *old_filename, const char *new_filename, const char *perms_reference, apr_pool_t *pool); -/* Sets *PATH to the path of REV in FS, whether in a pack file or not. +/* Set *PATH to the path of REV in FS, whether in a pack file or not. + The path is not guaranteed to remain correct after the function returns, + because it is possible that the revision will become packed just after + this call, so the caller should re-try once if the path is not found. Allocate in POOL. */ svn_error_t * svn_fs_fs__path_rev_absolute(const char **path, svn_fs_t *fs, svn_revnum_t rev, apr_pool_t *pool); ]]] - Julian Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832r1=1040831r2=1040832view=diff == --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original) +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec 1 00:15:11 2010 @@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file, { svn_error_t *err; const char *path; svn_boolean_t retry = FALSE; do { err = svn_fs_fs__path_rev_absolute(path, fs, rev, pool); /* open the revision file in buffered r/o mode */ if (! err) err = svn_io_file_open(file, path, APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool); if (err APR_STATUS_IS_ENOENT(err-apr_err)) { /* Could not open the file. This may happen if the * file once existed but got packed later. */ svn_error_clear(err); /* if that was our 2nd attempt, leave it at that. */ if (retry) return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, _(No such revision %ld), rev); /* we failed for the first time. Refresh cache retry. */ SVN_ERR(update_min_unpacked_rev(fs, pool)); retry = TRUE; } else { /* the file exists but something prevented us from opnening it */ return svn_error_return(err); } } while (err); return SVN_NO_ERROR; }
Re: diff-optimizations-tokens branch: I think I'm going to abandon it
On Wed, Dec 1, 2010 at 11:44 AM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Wed, Dec 01, 2010 at 10:05:29 +0100: On Wed, Dec 1, 2010 at 3:38 AM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Wed, Dec 01, 2010 at 00:25:27 +0100: I am now considering to abandon the tokens-approach, for the following reasons: ... So, unless someone can convince me otherwise, I'm probably going to stop with the token approach. Because of 2), I don't think it's worth it to spend the effort needed for 1), especially because the byte-based approach already works. In other words, you're saying that the token-based approach: (b) won't be as fast as the bytes-based approach can be, and (a) requires much effort to be spent on implementing the reverse reading of tokens? (i.e., a backwards gets()) Yes. The reverse reading is quite hard (in the diff_file.c case) because of the chunked reading of files. A line may be split by a chunk boundary (it may even be split in the middle of an eol sequence (between \r and \n), and it all still needs to be canonicalized/normalized correctly (that's why we'll also need a reverse normalization function). The current forward get_next_token does this very well, and the reverse should be analogous, but I just find it hard to reason about, and to get it implemented correctly. It will take me a lot of time, and with a high probability of introducing subtle bugs for special edge cases. OK, so a reverse get_next_token() could be tricky to implement. But, worse, won't having it mean that implementors of svn_diff_fns_t can't have streamy access to their source? Since they would be required to provide sometimes a token from the start and sometimes a token from the end... Well, it's not streamy in our usual sense of the word, but it's double-streamy (no one requests the middle byte until either all bytes before or all bytes after it were transmitted already) Oooh, I hadn't considered other implementors besides the internal diff_memory.c and diff_file.c. You're right, that would impose additional constraints on other implementors. I don't know if being non-streamy (or less streamy anyway) would be problem ... In my last commit on the -tokens branch, I added a flag open_at_end to the datasource_open function (function of svn_diff_fns_t), so the datasource could be opened at the end. Also, other supporting functions were needed: token_pushback_prefix, token_pushback_suffix (to push back tokens that were read too far when scanning for prefix/suffix) and the dreaded datasource_get_previous_token. Anyway, the high-level idea was: 1) Open datasources at the end. 2) Scan for identical suffix (getting tokens in reverse). 3) At first non-match: pushback suffix token, and note where suffix starts. 4) Open datasources at the beginning. 5) Scan for identical prefix (getting tokens normally, but without hash). 6) At first non-match: pushback prefix token. 7) Run the old algorithm: getting tokens forward, but with hash. The get_next_token function should stop (return NULL for token) when it arrives at the suffix. Sidestep: I just now realized that I probably don't need to have the reverse normalization algorithm for implementing get_previous_token. The call to the normalization function in get_next_token is (I think) only needed to be able to calculate the hash. But since get_previous_token doesn't need to calculate hashes, I may be able to get by without normalization there. I'd only need to normalize inside token_compare, and I *think* I can just to that forwardly, instead of backwards. Just thinking out loud here ... So: that makes the token-approach again a little bit more possible. But do we want it? It requires a lot more from implementors of svn_diff_fns_t. OTOH, it does offer a generic prefix/suffix optimization to all implementors of svn_diff_fns_t ... Any thoughts? -tokens/BRANCH-README mentions one of the advantages of the tokens approach being that the comparison is done only after whitespace and newlines have been canonicalized (if -x-w or -x--ignore-eol-style is in effect). IIRC, the -bytes approach doesn't currently take advantage of these -x flags? What is the practical effect of the fact the -bytes approach doesn't take advantage of these flags? If a file (with a moderately long history) has had all its newlines changed in rN, then I assume that your -bytes optimizations will speed up all the diffs that 'blame' performs on that file, except for the single diff between rN and its predecessor? Yes, I thought that would be an important advantage of the tokens approach, but as you suggest: the practical effect will most likely be limited. Indeed, only this single diff between rN and its predecessor (which may be 1 in 1000 diffs) will not benifit from prefix/suffix-optimization. Even if the rate of such changes is like 1/100th (let's say you change
Re: svn commit: r1040663 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c
Daniel Shahaf wrote: So we loop over the remaining sha1's and remove each of them... I wonder if there is room for further optimization here? e.g., does this prepare/reset the statement just once, or once per iteration? Each iteration of this loop prepares, uses and resets a SQL statement, and also removes a pristine file from disk. So yes there is room for further optimization of the SQL part of that. The main concern I was addressing was that the previous method was *quadratic* in the total number of pristines in the store, because for each one in the store it would scan the NODES and ACTUAL_NODE tables looking for a reference to it. I had noticed that even a no-op cleanup took a very long time on a large WC. It will help if I show some real timings. Wall clock times for svn cleanup on a clean checkout of ^/subversion/branc...@1040943 on my Linux system. r1040662 build: first time = 15 minutes, second = 14.8 minutes. r1040663 build: first time = 4.4s, best of many repetitions = 0.7s. Now the algorithm is only linear time, which is a *huge* win. A 'cleanup' operation doesn't need to be blisteringly fast, so I don't think it needs more optimisation. I've edited the log message to clarify the main point, and to point out the big-WC timing improvement. - Julian # r1040662 build $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ real15m4.962s user9m0.306s sys 6m3.967s # r1040663 build $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ real0m0.708s user0m0.436s sys 0m0.212s
Re: gpg-agent branch treats PGP passphrase as repository password?
On Wed, Dec 01, 2010 at 06:50:35AM -0500, Dan Engel wrote: On Sun, 2010-10-17 at 16:03 +0200, Stefan Sperling wrote: If my understanding is correct, it seems the current code on the gpg-agent branch essentially uses the gpg-agent as a glorified password prompt. Here's what the code seems to be doing: Storing a password: Nothing happens. Retreiving a password: Step 1: svn contacts the gpg-agent to find out the passphrase for the private PGP key the agent is managing I have not reviewed the current code on the gpg-agent branch, but was the original submitter for the gpg-agent support patch. I've wanted to return to it, but have not had the time. However, I would like to take a moment and address this concern. The name of the program (gpg-agent) may be misleading in this, but all gpg-agent does is to safely (securely, without swapping, etc.) cache secrets indexed by a domain. The domain is whatever the client provides and the secret is the password that the user types in. So what happens is: Step 1: svn contacts gpg-agent, providing the username and repository URL as the domain. Step 2: gpg-agent checks to see whether there is a password associated with that domain. If not, it prompts for one. Here, the user enters their subversion password, NOT their pgp private key passphrase. Step 3: gpg-agent associates the password with the provided domain and stores that in the memory cache in protected memory. There is typically a timeout (one hour, 24 hours, whatever). Step 4: svn receives the password and sends it to the server. Step 5: Later, the user invokes svn again (in a way that needs the server). Normally, this would be the point at which svn would prompt for another password. Instead... Step 6: svn contacts the gpg-agent, providing the same domain (username + repository URL) as before. Step 7: gpg-agent checks to see whether there is a password associated with that domain, and this time there is, because of step 3 before. Now, gpg-agent just returns this password instead of prompting for a new one. Step 8: svn receives the password and sends it to the server... So, you see, in spite of the name (gpg-agent) there's no inherent interaction between gpg-agent and the pgp private key store, and at no time is the user prompted for their private pgp key. The purpose of the gpg-agent patch is to make it so that a user who doesn't have one of the GUI wallets available (gnome-keyring or kwallet) doesn't have to type in their subversion password EVERY time they invoke a command against the repository. The gpg program uses gpg-agent (when so configured) for exactly the same purpose--to keep the user from having to re-type that passphrase more than once every X hours. But the agent itself is generic enough to serve a broad range of password caching needs. gpg-agent allows the user to configure how long passwords persist in protected memory. If the timeout has passed, then in step 7, the gpg-agent would request the password again since it would no longer be in the memory cache. Does that all make sense? Thanks, this makes sense. I was expecting a scheme that would persist the password on disk. Using gpg-agent as a mere runtime password cache as you describe is certainly fine and, if used right, does not expose the private key passphase. However, I still see a potential risk here because the name gpg-agent is very misleading. It violates the principle of least surprise. How can we prevent users misunderstanding what Subversion's gpg-agent feature does from entering their private pgp key passphrase (which will then be sent to the server)? Can we control the prompt printed by gpg-agent? (Enter your Subversion password, NOT your secret PGP passphrase!) Remember that people may want to use Subversion password caches even for publicly accessible servers which are password protected with a commonly known user/password combination. Any such server could potentially receive PGP passphrases in clear text if the gpg-agent feature is misused. Because of this concern, I'd rather have a new tool called 'svn-agent' for this purpose, licensed under Apache License v2 and distributed along with Subversion. jIn that case users would not make a mental association with PGP and are very unlikely to enter their PGP passphrase. This would also avoid introducing a dependency on an external tool. It would not be very hard to write such a tool, using a protocol similar to the one used in gpg-agent. I would also support a scheme that uses PGP in the way I described, persisting passwords on disk encrypted with a PGP private key. Thanks, Stefan
Re: svn commit: r1040663 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c
Julian Foad wrote on Wed, Dec 01, 2010 at 13:06:04 +: Daniel Shahaf wrote: So we loop over the remaining sha1's and remove each of them... I wonder if there is room for further optimization here? e.g., does this prepare/reset the statement just once, or once per iteration? Each iteration of this loop prepares, uses and resets a SQL statement, and also removes a pristine file from disk. So yes there is room for further optimization of the SQL part of that. Thanks for the overview. The main concern I was addressing was that the previous method was *quadratic* in the total number of pristines in the store, because for each one in the store it would scan the NODES and ACTUAL_NODE tables looking for a reference to it. I had noticed that even a no-op cleanup took a very long time on a large WC. It will help if I show some real timings. Wall clock times for svn cleanup on a clean checkout of ^/subversion/branc...@1040943 on my Linux system. r1040662 build: first time = 15 minutes, second = 14.8 minutes. r1040663 build: first time = 4.4s, best of many repetitions = 0.7s. :-)! Now the algorithm is only linear time, which is a *huge* win. A 'cleanup' operation doesn't need to be blisteringly fast, so I don't think it needs more optimisation. I've edited the log message to clarify the main point, and to point out the big-WC timing improvement. Fair enough; let's table this additional optimization for now. We can always add it later if needed (i.e., if the time spent removing unreferenced pristines should become an issue). - Julian # r1040662 build $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ real 15m4.962s user 9m0.306s sys 6m3.967s # r1040663 build $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ real 0m0.708s user 0m0.436s sys 0m0.212s Thanks, Daniel
Re: Can I add NOT NULL to PRISTINE table columns?
Julian Foad julian.f...@wandisco.com writes: I imagine it should be possible to add 'NOT NULL' to these columns without performing a format bump or writing any upgrade code. Am I right? Hyrum Wright wrote: If we're already enforcing it in the C code, I see no problem with doing so in the sql schema. Additionally, I don't really see how you could add the 'NOT NULL' clause in a format bump without doing table duplication, which doesn't seem worth the trouble for something like this. Right. Philip Martin wrote: Existing working copies would not be upgraded, although I suppose the client would work. Perhaps make it a format 99 step? The software won't see any difference except it will provide a sanity check when we're developing and modifying Subversion code and testing against new WCs. I don't think there is any need to upgrade existing development WCs in this respect. I'll mention it in a comment in the format 99 step, but don't think we need to write any SQL for it. Thanks. - Julian
Re: 1.5.8 up for signing/testing
SUMMARY: +1 to release PLATFORM: Windows 7 VS 2008 SP1 Java 1.6 COMPONENTS: Apache2.2.15 APR 1.4.2 APR-UTIL1.3.9 OpenSSL 1.0.0a Neon0.29.5 ZLib (from deps) VERIFIED: md5 and sha1 sums gpg signature verified TESTED: JavaHL ra_local | ra_svn | ra_neon X fsfs RESULTS: All passed SIGNATURES: subversion-1.5.8.zip -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (Darwin) iEYEABECAAYFAkz2oiIACgkQJl34oANalqn7OACgkXfcbCr17XRqvYSFIkwGrqst 3V0AniZnJGzvU3o5ePj4k1u/OO8dMFhI =B+uM -END PGP SIGNATURE- subversion-deps-1.5.8.zip -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (Darwin) iEYEABECAAYFAkz2oi0ACgkQJl34oANalqmpgACgmek/6rW9+JJpAfKJ7wx0H4BV i2oAn0oTiXZea3TsnlcYpaE/TSg5AOeG =d74O -END PGP SIGNATURE- -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: gpg-agent branch treats PGP passphrase as repository password?
On Wed, 2010-12-01 at 14:08 +0100, Stefan Sperling wrote: However, I still see a potential risk here because the name gpg-agent is very misleading. It violates the principle of least surprise. How can we prevent users misunderstanding what Subversion's gpg-agent feature does from entering their private pgp key passphrase (which will then be sent to the server)? Can we control the prompt printed by gpg-agent? (Enter your Subversion password, NOT your secret PGP passphrase!) Yes, the agent protocol provides for customized prompts, and the patch itself refers to the Subversion repository server (or something like that) in that prompt. I have no emotional investment in the gpg-agent idea (aside from the don't re-invent the wheel argument), but here's my $0.02: I think most people who know enough to use gpg agent (it's a bit more involved to set up, etc. than things like gnome-keyring) probably understand what it does well enough to not make that mistake. Also, in most corporate or enterprise environments (where the stakes are really high) Subversion will be installed and set up by administrators (who *better* know what they're doing) and used by users who may not even know that gpg-agent is running in the background. All they get is a prompt for their subversion password. I know those lines get a little more blurred in Linux-land than in Windows-land, but I think the point is still a valid one. From a purely personal point of view, I'd be happy with ANY disk or memory password cache for Subversion on Linux that is safe (security-wise) and doesn't rely on the presence of any GUI libraries or capabilities. The gpg-agent path was just the easiest one for me to implement directly. Thanks, -Dan
Re: svn commit: r1040831 - in /subversion/trunk/subversion: include/svn_checksum.h libsvn_subr/checksum.c
On 01.12.2010 04:25, Daniel Shahaf wrote: stef...@apache.org wrote on Tue, Nov 30, 2010 at 23:56:41 -: Author: stefan2 Date: Tue Nov 30 23:56:40 2010 New Revision: 1040831 URL: http://svn.apache.org/viewvc?rev=1040831view=rev Log: Fix 'svnadmin verify' for format 5 repositories. During the checking process, they yield NULL for SHA1 checksums. The most robust way to fix that is to allow NULL for checksum in svn_checksum_to_cstring. This seems justified as NULL is already a valid return value instead of e.g. an empty string. So, callers may receive NULL as result and could therefore assume that NULL is a valid input, too Can you explain how to trigger the bug you are fixing? Just running 'svnadmin verify' on my r1040058-created Greek repository doesn't. Sure: $svnadmin-1.5.4 create /mnt/svnroot/test $add pre-revprop-change hook $svnsync-1.5.4 init file:///mnt/svnroot/test http://svn.apache.org/repos/asf $svnsync-1.5.4 sync file:///mnt/svnroot/test $cancel after a few revs; rev 1 will already trigger the error $svnadmin-trunk verify /mnt/svnroot/test * subversion/include/svn_checksum.h (svn_checksum_to_cstring): extend doc string * subversion/libsvn_subr/checksum.c (svn_checksum_to_cstring): return NULL for NULL checksums as well Modified: subversion/trunk/subversion/include/svn_checksum.h subversion/trunk/subversion/libsvn_subr/checksum.c Modified: subversion/trunk/subversion/include/svn_checksum.h URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/include/svn_checksum.h?rev=1040831r1=1040830r2=1040831view=diff == --- subversion/trunk/subversion/include/svn_checksum.h (original) +++ subversion/trunk/subversion/include/svn_checksum.h Tue Nov 30 23:56:40 2010 @@ -121,7 +121,7 @@ svn_checksum_to_cstring_display(const sv /** Return the hex representation of @a checksum, allocating the * string in @a pool. If @a checksum-digest is all zeros (that is, - * 0, not '0'), then return NULL. + * 0, not '0') or NULL, then return NULL. * First, this change doesn't match the code change: the docstring change says CHECKSUM-DIGEST may be NULL, but the code change allows CHECKSUM to be NULL. Second, let's have the docstring say that NULL is only allowed by 1.7+. (Passing NULL will segfault in 1.6.) (doxygen has a @note directive.) Done. -- Stefan^2. * @since New in 1.6. */ Modified: subversion/trunk/subversion/libsvn_subr/checksum.c URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/checksum.c?rev=1040831r1=1040830r2=1040831view=diff == --- subversion/trunk/subversion/libsvn_subr/checksum.c (original) +++ subversion/trunk/subversion/libsvn_subr/checksum.c Tue Nov 30 23:56:40 2010 @@ -135,6 +135,9 @@ const char * svn_checksum_to_cstring(const svn_checksum_t *checksum, apr_pool_t *pool) { + if (checksum == NULL) +return NULL; + switch (checksum-kind) { case svn_checksum_md5:
Re: diff-optimizations-tokens branch: I think I'm going to abandon it
[ finally getting back to this mail; having slept on it, etc. ] Johan Corveleyn wrote on Wed, Dec 01, 2010 at 13:34:48 +0100: On Wed, Dec 1, 2010 at 11:44 AM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Wed, Dec 01, 2010 at 10:05:29 +0100: On Wed, Dec 1, 2010 at 3:38 AM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Wed, Dec 01, 2010 at 00:25:27 +0100: I am now considering to abandon the tokens-approach, for the following reasons: ... So, unless someone can convince me otherwise, I'm probably going to stop with the token approach. Because of 2), I don't think it's worth it to spend the effort needed for 1), especially because the byte-based approach already works. In other words, you're saying that the token-based approach: (b) won't be as fast as the bytes-based approach can be, and (a) requires much effort to be spent on implementing the reverse reading of tokens? (i.e., a backwards gets()) Yes. The reverse reading is quite hard (in the diff_file.c case) because of the chunked reading of files. A line may be split by a chunk boundary (it may even be split in the middle of an eol sequence (between \r and \n), and it all still needs to be canonicalized/normalized correctly (that's why we'll also need a reverse normalization function). The current forward get_next_token does this very well, and the reverse should be analogous, but I just find it hard to reason about, and to get it implemented correctly. It will take me a lot of time, and with a high probability of introducing subtle bugs for special edge cases. OK, so a reverse get_next_token() could be tricky to implement. But, worse, won't having it mean that implementors of svn_diff_fns_t can't have streamy access to their source? Since they would be required to provide sometimes a token from the start and sometimes a token from the end... Well, it's not streamy in our usual sense of the word, but it's double-streamy (no one requests the middle byte until either all bytes before or all bytes after it were transmitted already) Oooh, I hadn't considered other implementors besides the internal diff_memory.c and diff_file.c. Just to clarify: diff_file.c and diff_memory.c are the only implementors of svn_diff_fns_t in the core code, correct? You're right, that would impose additional constraints on other implementors. I don't know if being non-streamy (or less streamy anyway) would be problem ... We should have asked this before, but: Do we know who are the other implementors / typical use-cases of svn_diff_fns_t? In my last commit on the -tokens branch, I added a flag open_at_end to the datasource_open function (function of svn_diff_fns_t), so the datasource could be opened at the end. Also, other supporting functions were needed: token_pushback_prefix, token_pushback_suffix (to push back tokens that were read too far when scanning for prefix/suffix) and the dreaded datasource_get_previous_token. Anyway, the high-level idea was: 1) Open datasources at the end. 2) Scan for identical suffix (getting tokens in reverse). 3) At first non-match: pushback suffix token, and note where suffix starts. 4) Open datasources at the beginning. 5) Scan for identical prefix (getting tokens normally, but without hash). 6) At first non-match: pushback prefix token. 7) Run the old algorithm: getting tokens forward, but with hash. The get_next_token function should stop (return NULL for token) when it arrives at the suffix. Sidestep: I just now realized that I probably don't need to have the reverse normalization algorithm for implementing get_previous_token. The call to the normalization function in get_next_token is (I think) only needed to be able to calculate the hash. But since get_previous_token doesn't need to calculate hashes, I may be able to get by without normalization there. I'd only need to normalize inside token_compare, and I *think* I can just to that forwardly, instead of backwards. Just thinking out loud here ... Is normalization function the thing that collapses newlines and whitespaces if -x-w or -x--ignore-eol-style is in effect? If so, another purpose of calling that might be to make suffixes that are not bytewise-identical, but only modulo-whitespace-identical, also be lumped into the identical suffix. (Haven't dived into the code yet; the above is based on my understanding of your high-level descriptions) So: that makes the token-approach again a little bit more possible. But do we want it? It requires a lot more from implementors of svn_diff_fns_t. OTOH, it does offer a generic prefix/suffix optimization to all implementors of svn_diff_fns_t ... Another option is to add the read backwards callbacks to the API, but designate them as optional. Then you only do the suffix half of the optimization of both/all sources support the
Re: svn commit: r1040832 - Port a fix for a FSFS packing race
On Wed, 2010-12-01, stef...@apache.org wrote: Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832r1=1040831r2=1040832view=diff == --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original) +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec 1 00:15:11 2010 @@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file, { svn_error_t *err; const char *path; svn_boolean_t retry = FALSE; I agree the code below is correct, but I found it confusing: do { err = svn_fs_fs__path_rev_absolute(path, fs, rev, pool); /* open the revision file in buffered r/o mode */ if (! err) err = svn_io_file_open(file, path, APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool); if (err APR_STATUS_IS_ENOENT(err-apr_err)) { /* Could not open the file. This may happen if the * file once existed but got packed later. */ svn_error_clear(err); /* if that was our 2nd attempt, leave it at that. */ if (retry) return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, _(No such revision %ld), rev); /* we failed for the first time. Refresh cache retry. */ SVN_ERR(update_min_unpacked_rev(fs, pool)); Philip noted that this call should be guarded by a format number check (otherwise we would assert on format-3 repositories that are missing a rev file). I've fixed that. retry = TRUE; } else { /* the file exists but something prevented us from opnening it */ return svn_error_return(err); The comment doesn't indicate that the else{} block is also entered in the rare case where ERR is SVN_NO_ERROR. In other words, the success case is handled not by the 'return SVN_NO_ERROR' below (which in fact is never reached), but by this else{} block. } } while (err); return SVN_NO_ERROR; } The error handling confused me here: it clears ERR but then checks that it's non-NULL, and right after that check (which normally means there is an error) it overwrites ERR. I think the loop would be clearer if were just 'while (1)'.
Re: subversion cross compile (arm)
Hi All, Hi daniel, Daniel, thanks a lot your help, I'm getting more closer to the origin of the issue. (I hope.) I CC'ed the dev@ list as well, so I'm summarize the problem again. I'm cross compiling subversion for an ARM9 based development board (mini2440). (You'll find my configure options below.) This is a totaly clean environbent. There are just the libraries of toolchain (codesourcery). I built apr, apr-util, zlib, and sqlite from subversion-deps package, and the subversion itself. It is version 1.6.15. I'm createing a new repository with svn create, and after it I'm trying svn mkdir file:///... The result is: svn: Commit failed (details follow): svn: Corrupt node-revision '0.0.t0-0' svn: Malformed text representation offset line in node-rev Based on Daniel's help, I started to patch fs_fs.c, and adding debug messages to it: / # svnadmin create /var/svn/testrepo === I'm creating the repository / # cat /var/svn/testrepo/db/revs/0/0 === Checking the revision file. Seems to be OK. PLAIN END ENDREP id: 0.0.r0/17 type: dir count: 0 text: 0 0 4 4 2d2977d1c96f487abe4a1e202dd03b4e cpath: / 17 107 / # svn mkdir file:///var/svn/testrepo/xxx -m m fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev === It is the bottom of the get_node_revision_body function fs_fs: [LINE 2140] calling read_rep_offsets '0 0 4 4 2d2977d1c96f487abe4a1e202dd03b4e' read_rep_offsets: [LINE 1947] '0 0 4 4 2d2977d1c96f487abe4a1e202dd03b4e' == The string parameter of read_rep_offsets read_rep_offsets: [LINE 1956] '0' === Return str of first apr_strtok read_rep_offsets: [LINE 1973] '0' === Return str of second apr_strtok read_rep_offsets: [LINE 1984] '4' === Return str of third apr_strtok read_rep_offsets: [LINE 1995] '4' === Return str of fourth apr_strtok read_rep_offsets: [LINE 2009] '2d2977d1c96f487abe4a1e202dd03b4e' === Return str of fifth apr_strtok fs_fs: [LINE 2082] calling svn_fs_fs__read_noderev apr_strtok: [LINE 35] '0.0.t0-0' apr_strtok: [LINE 58] '0' apr_strtok: [LINE 35] '0.t0-0' apr_strtok: [LINE 58] '0' apr_strtok: [LINE 35] 't0-0' apr_strtok: [LINE 58] 't0-0' fs_fs: [LINE 2140] calling read_rep_offsets '0 4 4 4621479420036127952 (null)' read_rep_offsets: [LINE 1947] '0 4 4 4621479420036127952 (null)' read_rep_offsets: [LINE 1956] '0' read_rep_offsets: [LINE 1973] '4' read_rep_offsets: [LINE 1984] '4' read_rep_offsets: [LINE 1995] '4621479420036127952' subversion/libsvn_fs_fs/fs_fs.c:2239: (apr_err=160004) svn: Corrupt node-revision '0.0.t0-0' subversion/libsvn_fs_fs/fs_fs.c:2006: (apr_err=160004) svn: Malformed text representation offset line in node-rev read_rep_offsets: apr_strtok #4 last_string '' string '0' str '(null)' strlen(str) 6 (APR_MD5_DIGESTS IZE*2) 32 revision 0 offset 4 size 0 expsize 4 / # I figured out, that the first execution of fs_fs__read_noderev is parsing the /var/svn/testrepo/db/revs/0/0 file. What is the input of the second execution? I see thet it is in wrong format, but what is the origin of this? Thanks a lot! Regards, András Toolchain: arm-2008q3-72-arm-none-linux-gnueabi-i686-pc-linux-gnu Cross gcc: arm-none-linux-gnueabi-gcc Cross cflags: -march=armv4t -mtune=arm920t Apr configure: ./configure \ --prefix=/usr \ --host=$(CROSS_COMPILE) \ ac_cv_file__dev_zero=yes \ ac_cv_func_setpgrp_void=yes \ apr_cv_process_shared_works=yes \ apr_cv_mutex_robust_shared=no \ apr_cv_tcp_nodelay_with_cork=yes \ ac_cv_sizeof_struct_iovec=8 \ apr_cv_mutex_recursive=yes \ CFLAGS=$(CROSS_CFLAGS) \ LDFLAGS=$(CROSS_LDFLAGS) Apr-utils configure: ./configure \ --with-apr=MY_APR_BUILD_DIR \ --prefix=/usr \ --host=$(CROSS_COMPILE) \ CFLAGS=$(CROSS_CFLAGS) \ LDFLAGS=$(CROSS_LDFLAGS); \ Subversion configure: ./configure \ --with-apr=$(PACKAGES_DIR)/apr/$(TARGET_PACKAGE)/apr \ --with-apr-util=$(PACKAGES_DIR)/apr-util/$(TARGET_PACKAGE)/apr-util \ --with-sqlite=$(TARGET_DEV_ROOT)/usr \ --with-zlib=$(TARGET_DEV_ROOT)/usr \ --host=$(CROSS_COMPILE) \ --prefix=/usr \ CFLAGS=$(CROSS_CFLAGS) \ LDFLAGS=$(CROSS_LDFLAGS) Other compiled libraries: sqlite3, zlib I'm usung the latest, 1.6.15 subversion and subversion-deps packages. (I compiled all libraries from subversion-deps, no other installed library) -- Takács András Skype: wakoond GTalk: wakoond MSN: wako...@freestart.hu