Re: Returned post for annou...@apache.org
On Thu, Feb 11, 2021, 14:36 Private List Moderation < mod-priv...@gsuite.cloud.apache.org> wrote: > On Thu, 11 Feb 2021 at 12:15, Branko Čibej wrote: > >> On 11.02.2021 12:23, Stefan Sperling wrote: >> >> On Thu, Feb 11, 2021 at 11:02:32AM +, Private List Moderation wrote: >> >> Irrelevant. >> >> Given that this discussion doesn't seem to be going anywhere and the >> same arguments from May 2020 are just being rehashed, I guess we will >> simply stop using the announce@ mailing list. >> >> >> I agree. This nitpicking bureaucratic mission creep has gone way over the >> top. We have our own announce@svn.a.o list anyway; I expect anyone who's >> really interested is subscribed to that. >> >> I find it kind of ironically funny that the same moderator(s) who feel >> they're empowered to enforce release policy don't feel that the normal >> escalation path (i.e., bug report to dev@) is worth taking. >> >> > There was a problem with the download page at the time it was checked. > What does a problem with the download page have to do with spam prevention? Why does that problem make this spam? Please try to see it from the moderator's point of view. > I can only look at it from my what I perceive to be the responsibility of a moderator. And I am looking at it from that perspective. Erik
Re: Returned post for annou...@apache.org
How can a link be more important than an announcement for a fix of an *unauthenticated* remote DoS ? Same for the KEYS file??? Don't you think that's way out of proportion? Erik. On Wed, Feb 10, 2021 at 4:50 PM Private List Moderation wrote: > > I don't see how the missing links can be regarded as trivial. > This obviously needs to be fixed before the announce can be accepted. > > At the same time, I asked for the KEYS file link to be standardised. > There is already a KEYS file at the standard location - why not link to that > instead? > > > On Wed, 10 Feb 2021 at 15:35, Stefan Sperling wrote: >> >> Sebb, blocking our release announcements over trivialities like this >> really is not a nice thing to do. Last time it happened in May 2020. >> It was already discussed back then and raised with the announce@ >> moderation team. >> >> The Subversion PMC came to the conclusion that our handling of >> the KEYS files is adequate for our purposes: >> https://svn.haxx.se/dev/archive-2020-05/0156.shtml >> >> Please raise the issue on our dev@subversion.a.o list if it bothers you. >> The moderation mechanism is supposed to prevent spam. Using it to enforce >> release workflow policies amounts to misuse of your moderation privileges. >> >> Regards, >> Stefan >> >> On Wed, Feb 10, 2021 at 03:20:41PM -, announce-ow...@apache.org wrote: >> > >> > Hi! This is the ezmlm program. I'm managing the >> > annou...@apache.org mailing list. >> > >> > I'm working for my owner, who can be reached >> > at announce-ow...@apache.org. >> > >> > I'm sorry, your message (enclosed) was not accepted by the moderator. >> > If the moderator has made any comments, they are shown below. >> > >> > > > >> > Sorry, but the announce cannot be accepted. >> > The linked download page does not contain links for the version in the >> > email. >> > >> > Also, the standard name for the KEYS file is KEYS - no prefix, no suffix. >> > Please correct the download page, check it, and submit a corrected announce >> > mail. >> > >> > Thanks, >> > Sebb. >> > < < >> > >> >> > Date: Wed, 10 Feb 2021 14:37:00 +0100 >> > From: Stefan Sperling >> > To: annou...@subversion.apache.org, us...@subversion.apache.org, >> > dev@subversion.apache.org, annou...@apache.org >> > Cc: secur...@apache.org, oss-secur...@lists.openwall.com, >> > bugt...@securityfocus.com >> > Subject: [SECURITY][ANNOUNCE] Apache Subversion 1.10.7 released >> > Message-ID: >> > Reply-To: us...@subversion.apache.org >> > Content-Type: text/plain; charset=utf-8 >> > >> > I'm happy to announce the release of Apache Subversion 1.10.7. >> > Please choose the mirror closest to you by visiting: >> > >> > https://subversion.apache.org/download.cgi#supported-releases >> > >> > This is a stable bugfix and security release of the Apache Subversion >> > open source version control system. >> > >> > THIS RELEASE CONTAINS AN IMPORTANT SECURITY FIX: >> > >> > CVE-2020-17525 >> > "Remote unauthenticated denial-of-service in Subversion mod_authz_svn" >> > >> > The full security advisory for CVE-2020-17525 is available at: >> > https://subversion.apache.org/security/CVE-2020-17525-advisory.txt >> > >> > A brief summary of this advisory follows: >> > >> > Subversion's mod_authz_svn module will crash if the server is using >> > in-repository authz rules with the AuthzSVNReposRelativeAccessFile >> > option and a client sends a request for a non-existing repository URL. >> > >> > This can lead to disruption for users of the service. >> > >> > We recommend all users to upgrade to the 1.10.7 or 1.14.1 release >> > of the Subversion mod_dav_svn server. >> > >> > As a workaround, the use of in-repository authz rules files with >> > the AuthzSVNReposRelativeAccessFile can be avoided by switching >> > to an alternative configuration which fetches an authz rules file >> > from the server's filesystem, rather than from an SVN repository. >> > >> > This issue was reported by Thomas Åkesson. >> > >> > SHA-512 checksums are available at: >> > >> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.bz2.sha512 >> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.gz.sha512 >> > https://www.apache.org/dist/subversion/subversion-1.10.7.zip.sha512 >> > >> > PGP Signatures are available at: >> > >> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.bz2.asc >> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.gz.asc >> > https://www.apache.org/dist/subversion/subversion-1.10.7.zip.asc >> > >> > For this release, the following people have provided PGP signatures: >> > >> >Stefan Sperling [2048R/4F7DBAA99A59B973] with fingerprint: >> > 8BC4 DAE0 C5A4 D65F 4044 0107 4F7D BAA9 9A59 B973 >> >Branko Čibej [4096R/1BCA6586A347943F] with fingerprint: >> > BA3C 15B1 337C F0FB 222B D41A 1BCA 6586 A347 943F >> >Johan Corveleyn [4096R/B59CE6D6010C8AAD] wit
Re: svn commit: r1854072 - in /subversion/trunk/subversion: libsvn_subr/io.c tests/libsvn_subr/io-test.c
> > By the way, I'm not sure why we carry around the "defined(__OS2__)" > check in io.c. As far as I'm aware, no-one has ever actually tested > Subversion on OS/2 ... these checks are probably just lifted out of APR, > but don't do anything useful. > Maybe not tested, but there are supposedly floating OS/2 binaries around: https://os2ports.smedley.id.au/index.php?page=subversion Lacking an OS/2 installation, I have no idea if they actually work... > > -- Brane > > Regards, -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: Migrating Subversion issues to ...
> > Hi Mark, > > > > I'm going to start migration process tomorrow morning. Could you > > please lock tigris.org project? I think it will ok if our issue > > tracker will read-only for day or something. > > > > > Issues are finally migrated to ASF JIRA: > https://issues.apache.org/jira/browse/SVN > > Great! Thanks so much! Regards, -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: AW: Convenient array & hash iterators & accessors
> > Heh :-) I meant the branch-specific code -- not *all* of the client and > > library! I have no idea what that means, because I didn't study the code > > closely (yet). I'll need some directions on where to look for the > > branch-specific code so I can try to figure out where to hook Lua in. > > Oh, so you you want to try it? Well, my idea would be that if we're able to address the code additions/changes in that branch with an integration design, then it fulfills the requirement you were talking about at the beginning of this thread. If it doesn't work, then it might not be the solution we're looking for. > OK, the new code is pretty well > segrated from the existing code. Almost all of the relevant code is in > these new files: > > subversion/svnmover/svnmover.c > subversion/libsvn_delta/{element,branch,editor3e,compat3e}.c > Thanks! I'll have a look. -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: AW: Convenient array & hash iterators & accessors
> > Am I right that if we were to run this experiment with the > > move-tracking-2 branch code, that the entire client and library would > > be subject to conversion to the higher level language? > > No! That would be literally years of rewriting and debugging and > re-testing, not to mention interesting interfacing with the rest of the > (pool-bound) code. > Heh :-) I meant the branch-specific code -- not *all* of the client and library! I have no idea what that means, because I didn't study the code closely (yet). I'll need some directions on where to look for the branch-specific code so I can try to figure out where to hook Lua in. -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: AW: Convenient array & hash iterators & accessors
> > @Julian, do you have a specific area of our code that would most benefit > > from "moving 'up' from C"? Preferably some part of code that's currently > > very much in flux? > > 'svnmover' on the 'move-tracking-2' branch. It includes both 'client' > and 'library' code, and I'm moving code freely between the two as I > figure out what is the best layering. So it's important that a > language would be good in both roles. > Well, Lua supports calling both ways. A call isn't a straight C call, though (in Lua, it's a straight Lua function invocation), but a call that follows a certain calling protocol. Going from Lua to pure C or pure C to Lua requires a bit of glue code much like sqlite does for its parameter bindings. Am I right that if we were to run this experiment with the move-tracking-2 branch code, that the entire client and library would be subject to conversion to the higher level language? -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: AW: Convenient array & hash iterators & accessors
> Hence my suggestion for Lua, which doesn't have a GIL, as far as I can >> find. Nor does it need manual reference-keeping like is needed with Python >> or Perl. With Lua you can have as many evaluation environments as you want. >> Instantiating them when crossing a certain API boundary to be used by the >> library internals. >> > > I don't have direct experience with Lua, but have read/observed it for > many years. This is something that I could get behind as an embedded > *experimental* solution (to move "up" from lower-level C code), based on > what I've read. > That would be the first step for any implementation, I take it -- we'd want to evaluate the benefits to be had. If we agree to start experimenting with Lua, the next step would be to create a high level design. Something I might be able to spend time on. @Julian, do you have a specific area of our code that would most benefit from "moving 'up' from C"? Preferably some part of code that's currently very much in flux? -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: AW: Convenient array & hash iterators & accessors
> > These days, I suppose we'd be looking at something like Go, which can > > be linked with C/C++ and also natively export functions that can be > > called from C/C++. > > As far as I can see, Go always comes with Garbage Collection instead of a > deterministic memory management. > > Also, as far as I can see, Go does not go as far as Rust with what the > compiler can check at compiletime. > On the other hand, I see on rust-lang that the current state of Rust is 1.0.0-alpha2 where Lua has 22 years of experience and development. -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: AW: Convenient array & hash iterators & accessors
> In the past I'd thought about embedding Python into our sources, but > Python still (after 20 years ...) depends on a global interpreter lock > which pretty much kills any chance of lockless thread-safe code. > Hence my suggestion for Lua, which doesn't have a GIL, as far as I can find. Nor does it need manual reference-keeping like is needed with Python or Perl. With Lua you can have as many evaluation environments as you want. Instantiating them when crossing a certain API boundary to be used by the library internals. These days, I suppose we'd be looking at something like Go, which can be > linked with C/C++ and also natively export functions that can be called > from C/C++. > Do you mean that the code one writes is exported as a C function? Or that there's a C interface (the latter isn't better than e.g. Lua, Python and Perl, so I assume you mean the former?) Would it be an idea, if we really want this, to come up with a list of requirements and nice-to-haves against which each of the languages brought up should be measured? If we don't do that, we probably go on another 10 years with C only (and another 10... and anothor 10...) -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: Convenient array & hash iterators & accessors
> > It would make sense to design type-safe, light-weight container and > > iterator template wrappers around the APR structures if we decided to > > write code in C++. Since we're not, "explicit is better than > > implicit". > > I understand the point. I note that "explicit" is not a binary quality: > there are degrees of it. > > I suppose I want to be writing in a higher level language. Maybe I should > just go ahead and really do so. Exactly. There's been talk about doing so for much too long without action (other than attempts - including my own) to find a way to "upgrade" C to something less verbose and more expressive. I've been long thinking that there are specific areas which are more-or-less stand-alone, might be a good place to start this strategy. One place like that might qualify is the piece of code that deduces the eligeable revisions in merge tracking. That's the code I'm thinking you're now working in? What kind of language were you thinking about? One of the languages that came to mind is 'lua' which seems to have a pretty strong focus on being integratable with C code. For lua there are also tools to embed the compiled bytecode in a C library so the entire higherlevel language can be fully encapsulated inside our libraries. -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: Configuring Subversion with Berkeley DB Error: configure: error: Berkeley DB 4.0.14 or 5.x wasn't found
Hi kay, On Thu, Feb 12, 2015 at 5:48 PM, kay wrote: > Just to clarify the "support security" was a typo. I meant they thought BDB > will have better features for user authentication, privacy, permission and > security issues. Well, then I think your customer, or you, don't get Branko's "There's *nothing* BDB does that FSFS can't do (better)." Because there's really nothing that's better supported with BDB. That includes authentication, privacy, permissions and security. > I brought up the issue of deprecation and lack of future > support for BDB. > -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Re: [svnbench] Revision: 1507876 compiled Jul 29 2013, 00:21:55 on x86_64-unknown-linux-gnu
Hi Neels, Would it be an idea to switch the baseline of the tests to 1.8.1? I regularly look at them, but got confused with the reported performance gain. Just to let you know :-) Erik. sent from my phone On Jul 29, 2013 2:38 AM, wrote: > 1.7.0@1181106 vs. trunk@1507860 > Started at Mon Jul 29 00:26:13 UTC 2013 > > *DISCLAIMER* - This tests only file://-URL access on a GNU/Linux VM. > This is intended to measure changes in performance of the local working > copy layer, *only*. These results are *not* generally true for everyone. > > Charts of this data are available at http://svn-qavm.apache.org/charts/ > > Averaged-total results across all runs: > --- > > Compare trunk@1507860 to 1.7.0 >Navg operation > 51/90.54|-34.946 TOTAL RUN >3K/5301.23| +0.005 add >102/180.76| -0.205 checkout >408/720.63| -0.741 commit > 51/90.86| -0.003 copy > 51/90.76| -0.070 delete >255/450.12| -3.828 info >102/180.52| -1.016 merge >2K/5160.84| -0.002 mkdir >136/210.92| -0.001 propdel >38K/6K0.73| -0.003 proplist >38K/6K0.75| -0.003 propset >3K/5910.77| -0.003 ps >102/181.92| +0.009 resolve >102/180.81| -0.038 resolved > 714/1260.71| -0.052 status > 51/90.70| -0.326 switch > 714/1260.77| -0.157 update > (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds; > factor < 1 and seconds < 0 means 'trunk@1507860' is faster. > "2/3" means: '1.7.0' has 2 timings on record, the other has 3.) > > > Above totals split into separate x runs: > > > Compare trunk@1507860,5x5 to 1.7.0,5x5 >Navg operation > 17/30.54|-95.838 TOTAL RUN >2K/4561.25| +0.005 add > 34/60.78| -0.499 checkout >136/240.64| -1.900 commit > 17/30.80| -0.004 copy > 17/30.78| -0.162 delete > 85/150.11|-11.319 info > 34/60.54| -2.567 merge >2K/4700.83| -0.002 mkdir >136/200.91| -0.001 propdel >35K/6K0.74| -0.002 proplist >36K/6K0.76| -0.003 propset >2K/5520.77| -0.002 ps > 34/63.77| +0.024 resolve > 34/60.80| -0.102 resolved >238/420.72| -0.125 status > 17/30.73| -0.755 switch >238/420.82| -0.300 update > (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds; > factor < 1 and seconds < 0 means 'trunk@1507860,5x5' is faster. > "2/3" means: '1.7.0,5x5' has 2 timings on record, the other has 3.) > > Compare trunk@1507860,100x1 to 1.7.0,100x1 >Navg operation > 17/30.55| -7.301 TOTAL RUN >476/710.98| -0.000 add > 34/60.57| -0.083 checkout >136/240.50| -0.254 commit > 17/30.90| -0.002 copy > 17/30.67| -0.039 delete > 85/150.47| -0.144 info > 34/60.35| -0.378 merge >238/460.89| -0.002 mkdir >1K/3370.62| -0.004 proplist >1K/2730.65| -0.005 propset >119/330.66| -0.005 ps > 34/61.32| +0.003 resolve > 34/60.91| -0.006 resolved >238/420.68| -0.024 status > 17/30.49| -0.185 switch >238/420.50| -0.151 update > (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds; > factor < 1 and seconds < 0 means 'trunk@1507860,100x1' is faster. > "2/3" means: '1.7.0,100x1' has 2 timings on record, the other has 3.) > > Compare trunk@1507860,1x100 to 1.7.0,1x100 >Navg operation > 17/30.61| -1.698 TOTAL RUN > 17/31.80| +0.042 add > 34/60.63| -0.033 checkout >136/240.56| -0.070 commit > 17/30.89| -0.002 copy > 17/30.66| -0.010 delete > 85/150.71| -0.020 info > 34/60.41| -0.102 merge > 629/1110.59| -0.004 proplist > 714/1260.61| -0.005 propset > 34/60.63| -0.004 ps > 34/60.93| -0.001 resolve > 34/60.67| -0.007 resolved >238/420.62| -0.008 status > 17/30.56| -0.037 switch >238/420.65| -0.019 update > (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds; > factor < 1 and seconds < 0 means 'trunk@1507860,1x100' is faster. > "2/3" means: '1.7.0,1x100' has 2 timings on record, the other has 3.) > > > > More detail: > > > Timings for 1.7.0,5x5 >Nmin max avg operation (unit is seconds) > 17 192.14 255.84 207.06 TOTAL RUN > 2K0.012.210.02 add > 340.025.152.29 checkout > 1361.07 17.295.34 commit > 170.010.130.02 copy > 170.610.960.73 delete > 856.32 31.69 12.70 info > 345.318.495.60 merge > 2K0.010.04
Re: svnadmin upgrade output message i18n issue
One application has multiple active code page settings on Windows. Or course if your example was the only option, we would not be having this discussion. Bye, Erik. sent from my phone On May 23, 2013 6:44 PM, "Dongsheng Song" wrote: > On Thu, May 23, 2013 at 11:38 PM, Erik Huelsmann wrote: > > That was not my point nor the point we discussed back then. As long as > > gettext tries to convert its translations to *any* encoding, it's flawed > by > > design, because some systems have multiple active output encodings (e.g. > > Windows). > > > > This does not matter. If I open 2 console window, one is CP437, the > other is CP936. Then svn in CP437 windows generate English (ASCII) > output, CP936 windows generate Chinese (GBK/GB18030) output. >
Re: svnadmin upgrade output message i18n issue
Found at least one of the related discussions: http://svn.haxx.se/dev/archive-2004-05/0078.shtml bye, Erik. On May 23, 2013 5:38 PM, "Erik Huelsmann" wrote: > > > > > > >> I think the best solution is: DO NOTconvert the GETTEXT(3) returned > > >> messages, write it ***AS IS***, since GETTEXT(3) already do the > > >> correct conversion for us. > > > > > > Well, even though gettext may want us to believe otherwise, this > doesn't > > > work for cross platform applications: e.g. in windows the locale for > output > > > on the console may be different from the locale for other uses. Back > when we > > > went with gettext (2004?), we've hashed this through pretty > thoroughly. I > > > hope that discussion is still available in the archives. > > > > > > > As I said in the first email of this thread, gettext 0.18.2 and 0.14.1 > > give me the different behavior, it seems that gettext 0.14.1 do not do > > the correct thing. But do we still need support this OLD and BUGGY > > version ? > > That was not my point nor the point we discussed back then. As long as > gettext tries to convert its translations to *any* encoding, it's flawed by > design, because some systems have multiple active output encodings (e.g. > Windows). > > Unless this design has changed between 0.14 and 0.18, gettext() is still > as broken as it was. Translating or not translating doesn't matter: it'll > just be broken on other systems. Too bad the rest of it is actually pretty > good. > > Bye, > > Erik. >
Re: svnadmin upgrade output message i18n issue
> > > >> I think the best solution is: DO NOTconvert the GETTEXT(3) returned > >> messages, write it ***AS IS***, since GETTEXT(3) already do the > >> correct conversion for us. > > > > Well, even though gettext may want us to believe otherwise, this doesn't > > work for cross platform applications: e.g. in windows the locale for output > > on the console may be different from the locale for other uses. Back when we > > went with gettext (2004?), we've hashed this through pretty thoroughly. I > > hope that discussion is still available in the archives. > > > > As I said in the first email of this thread, gettext 0.18.2 and 0.14.1 > give me the different behavior, it seems that gettext 0.14.1 do not do > the correct thing. But do we still need support this OLD and BUGGY > version ? That was not my point nor the point we discussed back then. As long as gettext tries to convert its translations to *any* encoding, it's flawed by design, because some systems have multiple active output encodings (e.g. Windows). Unless this design has changed between 0.14 and 0.18, gettext() is still as broken as it was. Translating or not translating doesn't matter: it'll just be broken on other systems. Too bad the rest of it is actually pretty good. Bye, Erik.
Re: svnadmin upgrade output message i18n issue
sent from my phone On May 23, 2013 4:43 PM, "Dongsheng Song" wrote: > > On Thu, May 23, 2013 at 10:06 PM, Philip Martin > wrote: > > Dongsheng Song writes: > > > >> On Thu, May 23, 2013 at 9:28 PM, Philip Martin > >> wrote: > >>> Dongsheng Song writes: > >>> > On Thu, May 23, 2013 at 9:11 PM, Philip Martin > wrote: > > Philip Martin writes: > > > >> So it appears the UTF8 to native conversion is missing from > >> repos_notify_handler. I think repos_notify_handler should be using > >> svn_stream_printf_from_utf8 rather than svn_stream_printf. > > > > I've fixed trunk to use svn_cmdline_cstring_from_utf8 and proposed it > > for 1.8. > > > > As GETTEXT(3) man pages said, If and only if > defined(HAVE_BIND_TEXTDOMAIN_CODESET), > your commit is OK. > > So you should check HAVE_BIND_TEXTDOMAIN_CODESET when you use > svn_cmdline_cstring_from_utf8. > >>> > >>> Are you saying there is a problem with my change? If there is a problem > >>> doesn't already apply to all other uses of svn_cmdline_cstring_from_utf8? > >>> > >> > >> I thinks so. In the subversion/libsvn_subr/nls.c file: > >> > >> #ifdef HAVE_BIND_TEXTDOMAIN_CODESET > >> bind_textdomain_codeset(PACKAGE_NAME, "UTF-8"); > >> #endif /* HAVE_BIND_TEXTDOMAIN_CODESET */ > >> > >> bind_textdomain_codeset only called when HAVE_BIND_TEXTDOMAIN_CODESET > >> defined. In this case, you can assume GETTEXT(3) returned string is > >> UTF-8 encoded. > > > > I still don't understand if you are claiming my change has a problem or > > if there is a problem in all uses of svn_cmdline_cstring_from_utf8. > > > > I recall a related thread from last year: > > > > http://svn.haxx.se/dev/archive-2012-08/index.shtml#34 > > http://mail-archives.apache.org/mod_mbox/subversion-dev/201208.mbox/%3Cop.wilcelggnngjn5@tortoise%3E > > > > I think we assume that the translations are UTF-8. > > > > Is there some code change you think we should make? > > > > Even ALL the translations are UTF-8, GETTEXT(3) still return the > string encoded by the ***current locale's codeset***. > > Here is sniped from the GETTEXT(3) man pages: > > In both cases, the functions also use the LC_CTYPE locale facet in > order to convert the translated message from the translator's > codeset to the ***current locale's codeset***, unless overridden by a > prior call to the bind_textdomain_codeset function. > > So svn_cmdline_printf SHOULD NOT assume the input string is UTF-8 > coded, it it encoded to the ***current locale's codeset***. But we call the codeset function to make sure we do not generate output in the current locale encoding. > I think the best solution is: DO NOTconvert the GETTEXT(3) returned > messages, write it ***AS IS***, since GETTEXT(3) already do the > correct conversion for us. Well, even though gettext may want us to believe otherwise, this doesn't work for cross platform applications: e.g. in windows the locale for output on the console may be different from the locale for other uses. Back when we went with gettext (2004?), we've hashed this through pretty thoroughly. I hope that discussion is still available in the archives. Bye, Erik.
Re: Compressed Pristines (Design Doc)
Hi Ash, Thanks for picking up the initiative to implement this feature. On Thu, Mar 22, 2012 at 7:01 PM, Ivan Zhakov wrote: > On Thu, Mar 22, 2012 at 18:30, Daniel Shahaf wrote: > > OK, I've had a cruise through now. > > > > First of all I have to say it's an order of magnitude larger than what > > I'd imagined it would be. That makes the "move it elsewhere" idea I'd > > had less practical than I'd predicted. I'm also not intending to take > > you up on your offer to proxy me to the doc, though thanks for making it. > > > > Design-wise I'm a bit surprised that the choice ended up being rolling > > a custom file format. > > > > Thanks for your work. > > > +1. I believe we should implement compressed pristine in simple way: > just compress pristine files itself, without inventing some new > format. As the others, I'm surprised we seem to be going with a custom file format. You claim source files are generally small in size and hence only small benefits can be had from compressing them, if at all, due to the fact that they would be of sub-block size already. To substantiate that claim, I took the pristines directory from my Subversion working copy and did some experimenting. See results below: $ ls -ls uncompressed-pristines/*/*.svn-base | awk '{ tot += $1; } END { print "total size " tot; }' total size: 188724 $ cp -Rp uncompressed-pristines/ compressed-pristines $ gzip compressed-pristines/*/*.svn-base $ ls -ls compressed-pristines/*/*.svn-base.gz | awk '{ tot += $1; } END { print "total size " tot; }' total size: 52320 $ cat compressed-pristines/*/*.svn-base.gz > combined-compressed-file $ ls -ls combined-compressed-file 41812 So, if I look at the Subversion pristines in my working copy, the reduction in allocated blocks goes from 100% to 27%. To be honest, I doubt the complexity we'll be importing just to reduce the allocated number of blocks from 27% to 22% is really worth it: the savings are already tremendous. Won't the creation of a custom storage format just serve to destabilize our working copy? Do you have data which triggered you to design this custom format? Bye, Erik.
Fwd: Compressed Pristines
Forwarding my response back to the list... -- Forwarded message -- From: Erik Huelsmann Date: Mon, Mar 12, 2012 at 4:14 PM Subject: Re: Compressed Pristines To: Johan Corveleyn Hi Johan, > Has nothing to do with the property. The pristine matches the repository, > > byte for byte. The file installed in the working copy is affected by the > > property; not the pristine. > > Yes, the pristine matches the repository. But what I mean is: > > (on Windows): > $ create file-with-crlf.txt > $ svn add file-with-crlf.txt > $ svn ps svn:eol-style native file-with-crlf.txt > $ svn commit -mm file-with-crlf.txt > > -> pristine file is LF-terminated (as is the file in the repos, as you > point out). > This is correct: line endings get normalized to LF when svn:eol-style 'native' is applied. > $ create file-with-crlf.txt > $ svn add file-with-crlf.txt > $ svn commit -mm file-with-crlf.txt > > -> pristine file CRLF-terminated. > This is correct: file doesn't have any transformation applied: we preserve the input file. > $ create file-with-crlf.txt > $ svn add file-with-crlf.txt > $ svn ps svn:eol-style CRLF file-with-crlf.txt > $ svn commit -mm file-with-crlf.txt > > -> pristine file CRLF-terminated. > This is correct: it's the normal form for files with CRLF applied (before you ask: files with CR line ending normalization get transformed to CR only). > $ create file-with-crlf.txt > $ svn add file-with-crlf.txt > $ svn ps svn:eol-style LF file-with-crlf.txt > $ svn commit -mm file-with-crlf.txt > > -> pristine file is LF-terminated (as is the working-copy file). Exactly. So what you found is that for any eol style other than native, we use exactly that style. For native, we use LF. HTH, Erik.
Re: Let's discuss about unicode compositions for filenames!
On Thu, Feb 2, 2012 at 10:59 PM, Hiroaki Nakamura wrote: > 2012/2/3 Peter Samuelson : >> >>> On 02.02.2012 20:22, Peter Samuelson wrote: >>> > By proposing a client-only solution, I hope to avoid _all_ those >>> > questions. >> >> [Branko Cibej] >>> Can't see how that works, unless you either make the client-side >>> solution optional, create a mapping table, or make name lookup on the >>> server agnostic to character representation. >> >> Yes, I did propose a mapping table in wc.db. >> >> Old clients on OS X would continue to be confused; the solution is to >> upgrade. > > Until upgrading all clients, there are possibilities that NFD filenames > are checked in to repositories. So I proposed servers change filenames > to NFC before checking in to repositories. How about checking existence of a path to be added using NFC encoding? If it does not exist when both the repository paths and the new path(s) are converted to NFC, go ahead and add it using the encoding that you were handed off the network? Bye, Erik
Re: Apache, Subversion hooks, and locales
> Given that httpd is avoiding setlocale() we're pretty much left without > locale support in mod_dav_svn. Beware that you don't depend on setlocale() not having been called though: at least one of the popular mod_ modules *does* use setlocale(). (I think it was php5.) Other than that: completely agreed. Bye, Erik.
Re: Does fsfs revprop packing no longer allow usage of traditional backup software?
Hi Hyrum, On Thu, Jun 30, 2011 at 11:33 PM, Hyrum K Wright wrote: > On Thu, Jun 30, 2011 at 3:27 PM, Peter Samuelson wrote: > > > > [Ivan Zhakov] > >> It should be easy to implement editing revprops without using SQLite: > >> in case someone modify revprop non-packed revprop file is created, in > >> read operation non-packed revprop file should be considered as more > >> up-to-date. In next svnadmin pack operation these non-packed files > >> should be merged back to packed one. > > > > +1. This would basically mean there's only _one_ code path for writing > > revprops, yes? 'svnadmin pack' gets a little more complex, but the > > rest of libsvn_fs_fs gets simpler. > > > > Anyone have time to actually do this? Converting the packed format > > from sqlite to the same format used for packed revs would be a bonus. > > I like this idea, but it would seem to introduce an additional stat() > call* for every attempt to fetch a revprop, because you'd first have > to check the "old" location, and then the packed one. As far as I can > see, you'd have to do this in every case; in other words, there isn't > a single-stat() short cut for the common case of non-edited revprops. > > -Hyrum > > * - I don't know why we seem to have this obsession with stat() calls > around here, but it appears to have rubbed off on me. > Well, we've been able to increase working copy performance throughout the lifetime of libsvn_wc-1 by working out ways to reduce the number of apr_stat() calls. I'm not aware there's a huge reason to do that on the server side though Bye, Erik.
Re: svn bisect
Hi Arwin, On Tue, Jun 21, 2011 at 9:45 AM, Arwin Arni wrote: > 3. Will this feature be considered at all (if it is any good) or am I simply > doing something to exercise my brain cells? Actually, I think it'd be a good idea to have a standardized command to have where all clients work alike. What I think this command may also need is a list of revs to exclude from bisection, for example because they're known to fail to compile. This has been a wish for me. The scripts currently available either don't use such a feature or don't use a ubiquitously understood value. So, I'm all in favor of adding bisect one way or another. Bye, Erik.'
[PATCH] Align mod_dav_svn behaviour with ViewVC: resolve symlinks in SVNParentPath
One of my after-hours activities is to help maintain a community hosting site for Common Lisp development. During our latest system migration, I noticed that mod_dav_svn acts weird in view of symlinks: If you check http://svn.common-lisp.net/, the repository listing page is empty. However, if you go to http://svn.common-lisp.net/armedbear, you'll find that a repository is being listed. The repository is a symlink in the parent path. This works great for ViewVC and also for hosting the actual repositories, but it doesn't work out for listing the available repositories. The patch below fixes that, but I don't know if there are explicit considerations to have the current behaviour, so I'm holding off my commit for now. Bye, Erik. [[[ Check node kind of resolved special nodes to be a directory, in order to include symlinks pointing to directories. * subversion/mod_dav_svn/repos.c (deliver): Extend 'is directory' check for inclusion in parent path listing to include symlinks-to-directory. ]]] Index: subversion/mod_dav_svn/repos.c === --- subversion/mod_dav_svn/repos.c (revision 1132467) +++ subversion/mod_dav_svn/repos.c (working copy) @@ -3247,7 +3247,23 @@ apr_hash_this(hi, &key, NULL, &val); dirent = val; - if (dirent->kind != svn_node_dir) + if (dirent->kind == svn_node_file && dirent->special) +{ + svn_node_kind_t resolved_kind; + const char *name = key; + + serr = svn_io_check_resolved_path(name, &resolved_kind, +resource->pool); + if (serr != NULL) +return dav_svn__convert_err(serr, +HTTP_INTERNAL_SERVER_ERROR, +"couldn't fetch dirents " +"of SVNParentPath", +resource->pool); + if (resolved_kind != svn_node_dir) +continue; +} + else if (dirent->kind != svn_node_dir) continue; ent->name = key;
Re: Effect of indices on SQLite (optimizer) performance
On Sat, Feb 5, 2011 at 8:25 PM, Mark Phippard wrote: > On Sat, Feb 5, 2011 at 1:05 PM, Erik Huelsmann wrote: > >> Scenario (2) takes ~0.27 seconds to evaluate in the unmodified >> database. Adding an index on (wc_id, local_relpath) makes the >> execution time drop to ~0.000156 seconds! >> >> >> Seems Philip was right :-) We need to carefully review the indices we >> have in our database to support good performance. > I wish this document were fully fleshed out, it seems like it has some > good info in it: > > http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html > > Getting indexes in place for the bulk of our reads is essential. It > seems like now would be a good time to make that a priority. Of > course adding more indexes will further slow down write speed (which > seems bad already) so maybe the above document will give ideas for > other optimizations. > > Did anyone see the tests I posted on users@ of a checkout with 5000 > files in single folder? I really thought we would be faster than 1.6 > already but we are actually several factors slower. > > My background is all with DB2 on OS/400. Something I was looking for > in SQLite docs is whether it uses hints for the number of rows in a > table. For example, DB2 optimizes a new table for 10,000 rows with > increments of 1,000 when you reach the limit. If you know you are > inserting 100,000 rows you can get a massive performance improvement > by telling DB2 to optimize for a larger size. I was wondering if > SQLite was doing something like optimizing for 100 rows or something > small. I noticed the end of the checkout is really slow which implies > it does not insert the rows fast. Maybe this is just an area where we > need to use transactions better? Their FAQ (http://www.sqlite.org/faq.html#q19) sure suggests that it's not wise to do separate inserts: the document says SQLite easily does 50k inserts per sec into a table on moderate hardware, but only roughly 60 transactions per second... That would surely point into the direction of using transactions when we need mass inserts! I'm not sure exactly where in our code these inserts should be collected though. Maybe one of the WC-NG regulars has an idea? Bye, Erik.
Re: Effect of indices on SQLite (optimizer) performance
Now attached as text files (to be renamed to .py) to prevent the mailer software from dropping them... Bye, Erik. On Sat, Feb 5, 2011 at 7:05 PM, Erik Huelsmann wrote: > Yesterday or IRC, Bert, Philip and I were chatting about our SQLite > perf issues and how Philip's findings in the past suggested that > SQLite wasn't using its indices to optimize our queries. > > After searching and discussing its documentation, Philip suggested the > -too obvious- "maybe we have the wrong indices". > > So, I went to work with his "fake database generator script" (attached > as "test.py"). > > > The type of query we're seeing problematic performance with looks like > the one below. The essential part is the WHERE clause. > > SELECT * FROM nodes WHERE wc_id = 1 AND (local_relpath = 'foo' OR > local_relpath like 'foo%'); > > > We discussed 3 ways to achieve the effect of this query: > > 1. The query itself > 2. The query stated as a UNION of two queries > 3. Running the two parts of the UNION manually ourselves. > > Ad (1) > This query doesn't perform as we had hoped to get from using a database. > > Ad (2) > In the past, UNIONs have been explicitly removed because they were > creating temporary tables (on disk!). However, since then we have > changed our SQLite setup to create temporary tables in memory, so the > option should really be re-evaluated. > > Ad (3) > I'd hate to have to use two queries in all places in our source where > we want to run queries like these. As a result, I think this scenario > should be avoided if we can. > > > So, I've created 'perf.py' to evaluate each of these scenarios, > researching the effect on each of them under the influence of adding > different indices. > > This is my finding: > > Scenario (1) [an AND combined with a complex OR] doesn't perform well > under any circumstance. > > Scenario (2) performs differently, depending on the available indices. > > Scenario (3) performs roughly equal to scenario (2). > > > Scenario (2) takes ~0.27 seconds to evaluate in the unmodified > database. Adding an index on (wc_id, local_relpath) makes the > execution time drop to ~0.000156 seconds! > > > Seems Philip was right :-) We need to carefully review the indices we > have in our database to support good performance. > > > Bye, > > > Erik. > #!/usr/bin/python import os, sqlite3, time c = sqlite3.connect('wcx.db') c.execute("""pragma case_sensitive_like=1""") c.execute("""pragma foreign_keys=on""") c.execute("""pragma synchronous=off""") c.execute("""pragma temp_store=memory""") start = time.clock() # cpu clock as float in secs #c.execute("""drop index i_wc_id_rp;""") #c.execute("""create index i_wc_id_rp on nodes (wc_id, local_relpath);""") print c.execute(".indices") # strategy 1 c.execute("""select * from nodes where wc_id = 1 AND (local_relpath like 'foo/%' OR local_relpath = 'foo');"""); # strategy 2 #c.execute("""select * from nodes where wc_id = 1 AND local_relpath like 'foo/%' # union select * from nodes where wc_id = 1 AND local_relpath = 'foo';""") # strategy 3 #c.execute("""select * from nodes where wc_id = 1 AND local_relpath like 'foo/%';""") #c.execute("""select * from nodes where wc_id = 1 AND local_relpath = 'foo';""") end = time.clock() print "timing: %5f\n" % (end - start) #!/usr/bin/python import os, sqlite3 try: os.remove('wcx.db') except: pass c = sqlite3.connect('wcx.db') c.execute("""pragma case_sensitive_like=1""") c.execute("""pragma foreign_keys=on""") c.execute("""pragma synchronous=off""") c.execute("""pragma temp_store=memory""") c.execute("""create table repository ( id integer primary key autoincrement, root text unique not null, uuid text not null)""") c.execute("""create index i_uuid on repository (uuid)""") c.execute("""create index i_root on repository (root)""") c.execute("""create table wcroot ( id integer primary key autoincrement, local_
Effect of indices on SQLite (optimizer) performance
Yesterday or IRC, Bert, Philip and I were chatting about our SQLite perf issues and how Philip's findings in the past suggested that SQLite wasn't using its indices to optimize our queries. After searching and discussing its documentation, Philip suggested the -too obvious- "maybe we have the wrong indices". So, I went to work with his "fake database generator script" (attached as "test.py"). The type of query we're seeing problematic performance with looks like the one below. The essential part is the WHERE clause. SELECT * FROM nodes WHERE wc_id = 1 AND (local_relpath = 'foo' OR local_relpath like 'foo%'); We discussed 3 ways to achieve the effect of this query: 1. The query itself 2. The query stated as a UNION of two queries 3. Running the two parts of the UNION manually ourselves. Ad (1) This query doesn't perform as we had hoped to get from using a database. Ad (2) In the past, UNIONs have been explicitly removed because they were creating temporary tables (on disk!). However, since then we have changed our SQLite setup to create temporary tables in memory, so the option should really be re-evaluated. Ad (3) I'd hate to have to use two queries in all places in our source where we want to run queries like these. As a result, I think this scenario should be avoided if we can. So, I've created 'perf.py' to evaluate each of these scenarios, researching the effect on each of them under the influence of adding different indices. This is my finding: Scenario (1) [an AND combined with a complex OR] doesn't perform well under any circumstance. Scenario (2) performs differently, depending on the available indices. Scenario (3) performs roughly equal to scenario (2). Scenario (2) takes ~0.27 seconds to evaluate in the unmodified database. Adding an index on (wc_id, local_relpath) makes the execution time drop to ~0.000156 seconds! Seems Philip was right :-) We need to carefully review the indices we have in our database to support good performance. Bye, Erik.
Re: svn commit: r1028092 - /subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c
Hi Hyrum, On Wed, Oct 27, 2010 at 11:34 PM, Hyrum K. Wright wrote: > On Wed, Oct 27, 2010 at 3:40 PM, wrote: >> Author: stefan2 >> Date: Wed Oct 27 20:40:53 2010 >> New Revision: 1028092 >> >> URL: http://svn.apache.org/viewvc?rev=1028092&view=rev >> Log: >> Incorporate feedback I got on r985606. >> >> * subversion/libsvn_ra_svn/marshal.c >> (SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD): introduce symbolic name >> for an otherwise arbitrary number >> (read_long_string): fix docstring >> (read_string): use symbolic name and explain the rationale behind the >> special case >> >> Modified: >> subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c >> >> Modified: subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c >> URL: >> http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c?rev=1028092&r1=1028091&r2=1028092&view=diff >> == >> --- subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c >> (original) >> +++ subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c Wed >> Oct 27 20:40:53 2010 >> @@ -44,6 +44,12 @@ >> >> #define svn_iswhitespace(c) ((c) == ' ' || (c) == '\n') >> >> +/* If we receive data that *claims* to be followed by a very long string, >> + * we should not trust that claim right away. But everything up to 1 MB >> + * should be too small to be instrumental for a DOS attack. */ >> + >> +#define SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD (0x10) > > I like the name! > >> + >> /* --- CONNECTION INITIALIZATION --- */ >> >> svn_ra_svn_conn_t *svn_ra_svn_create_conn2(apr_socket_t *sock, >> @@ -555,9 +561,8 @@ svn_error_t *svn_ra_svn_write_tuple(svn_ >> >> /* --- READING DATA ITEMS --- */ >> >> -/* Read LEN bytes from CONN into already-allocated structure ITEM. >> - * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string >> - * data is allocated in POOL. */ >> +/* Read LEN bytes from CONN into a supposedly empty STRINGBUF. >> + * POOL will be used for temporary allocations. */ >> static svn_error_t * >> read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, >> svn_stringbuf_t *stringbuf, apr_uint64_t len) >> @@ -593,7 +598,14 @@ static svn_error_t *read_string(svn_ra_s >> svn_ra_svn_item_t *item, apr_uint64_t len) >> { >> svn_stringbuf_t *stringbuf; >> - if (len > 0x10) >> + >> + /* We should not use large strings in our protocol. However, we may >> + * receive a claim that a very long string is going to follow. In that >> + * case, we start small and wait for all that data to actually show up. >> + * This does not fully prevent DOS attacs but makes them harder (you >> + * have to actually send gigabytes of data). > > Wow, I hadn't even considered this. Once we get this on trunk, it > might make sense to propose a backport, since this has (potential?) > security implications. Actually, that was already released as a security vulnerability some years ago. The comment by Stefan makes it painfully apparent that it is, but I guess that's a good thing. Notice that he did nothing but name the constant and add the explanation. See http://subversion.apache.org/security/CAN-2004-0413-advisory.txt This is exactly the point I was talking about when I said properties are length-limited by ra_svn. (In relation to the maximum size of merge-tracking information.) The actual code is a little bit different than what I remembered., because it does seem to grow the buffer once it gets past the first MiB. Regards, Erik.
Re: svn commit: r1026128 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c
>> - if (! replaced && status == svn_wc__db_status_added >> + if (reverted >> + && ! replaced >> + && status == svn_wc__db_status_added >> && db_kind == svn_wc__db_kind_dir) >> { >> - /* Non-replacements have their admin area deleted. wc-1.0 */ >> + /* Non-replaced directories have their admin area deleted. wc- >> 1.0 */ >> SVN_ERR(svn_wc__adm_destroy(db, local_abspath, >> cancel_func, cancel_baton, pool)); >> } >> > > I don't think we need this block with single-db. There is no administrative > area to remove. This call also destroys the adm-access which may be cached in the db-handle. Removing that call makes one of our 'we should work with our old entries code' tests fail. Maybe the comment should state something to that effect? Bye, Erik.
Re: svn commit: r1026105 - /subversion/trunk/subversion/libsvn_wc/merge.c
Hi Stefan, I see you're not on irc, so you may have missed it: This commit, or the next, turned the buildslaves red. Bye, Erik. On Thu, Oct 21, 2010 at 9:07 PM, wrote: > Author: stsp > Date: Thu Oct 21 19:07:54 2010 > New Revision: 1026105 > > URL: http://svn.apache.org/viewvc?rev=1026105&view=rev > Log: > * subversion/libsvn_wc/merge.c > (merge_text_file): Don't leak temporary file RESULT_TARGET. > E.g. when a text conflict happened during an update, and the user > chose 'theirs-full', a file containing the diff3 merge result with > conflict markers was left over in .svn/tmp/ directory. > > Found by: someone on the #svn IRC channel, some time ago > > Modified: > subversion/trunk/subversion/libsvn_wc/merge.c > > Modified: subversion/trunk/subversion/libsvn_wc/merge.c > URL: > http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_wc/merge.c?rev=1026105&r1=1026104&r2=1026105&view=diff > == > --- subversion/trunk/subversion/libsvn_wc/merge.c (original) > +++ subversion/trunk/subversion/libsvn_wc/merge.c Thu Oct 21 19:07:54 2010 > @@ -1039,7 +1039,10 @@ merge_text_file(svn_skel_t **work_items, > } > > if (*merge_outcome == svn_wc_merge_merged) > - return SVN_NO_ERROR; > + { > + SVN_ERR(svn_io_remove_file2(result_target, TRUE, scratch_pool)); > + return SVN_NO_ERROR; > + } > } > else if (contains_conflicts && dry_run) > *merge_outcome = svn_wc_merge_conflict; > @@ -1078,6 +1081,8 @@ merge_text_file(svn_skel_t **work_items, > result_pool, scratch_pool)); > *work_items = svn_wc__wq_merge(*work_items, work_item, result_pool); > } > + else > + SVN_ERR(svn_io_remove_file2(result_target, TRUE, scratch_pool)); > > return SVN_NO_ERROR; > } > > >
Determining the 'revert' output we want
Last week, I greatly simplified our 'revert' code. However, in the process, I changed the notifications from 'revert' too: The old code would send notifications for all modified nodes (including tree modifications), with a single exception: it would send a notification only for the root in case of non-replaced-added/copied/moved nodes. The change that I made is to extend 'notify-on-root-only' to added (non-copied/moved) nodes which are also replacements. However, talking to Bert, he said he'd rather get more notifications than fewer and decide himself whether he wants to show them in his GUI. This made me think we may want to distinguish two or three notification types for a path: * Content/props-restored paths * Removed-from-version-control paths (invoked for non-replaced added/copied/moved paths) * Restored paths (invoked for deleted/replaced paths) Optionally, it would be possible to use different notifications for paths which are not op_roots, but which *are* part of a tree modification - let's call those 'derived' paths. Ofcourse, we'd then need to decide which notifications our client would show. Thoughts? Comments? Bye, Erik.
Re: svn commit: r1022931 - in /subversion/trunk/subversion/libsvn_wc: status.c wc-queries.sql wc_db.c wc_db.h
>> - cold cache: 1.7 is almost 50% faster than 1.6 >> 1.7: 22s >> 1.6: 42s >> >> - hot cache: 1.7 is just about on par with 1.6 (only 20% slower) >> 1.7: 0.86s >> 1.6: 0.72s >> > > What do you guys mean by "cold cache" and "hot cache"? If they mean what I > think they mean, wouldn't "hot cache" be faster that "cold cache" ? I think they are what you think. 22 seconds is slower than < 1s, isn't it? Bye, Erik.
revert behaviour in the light of layered working copy changes
As Julian pointed out, I'm working on making 'revert' work with our NODES table in the layered design situation. As part of that work, I was studying the current behaviour of revert: supposedly, that's what the behaviour of the new revert should look like in simple cases. However, one of the things I found is that revert leaves unversioned artifacts behind. While I'm aware that in some situations this is part of the policy (don't delete uncommitted changes), in case of revert, it's rather unpractical for a number of reasons: 1. The artifacts left behind can cause botched merges later on - even with our current client 2. The artifacts can lead to obstructions in the new working copy model when we're going with the model of "incremental reverts" that Julian proposed Even if we want to prevent the deletion of uncommitted changes -which I'm going to challenge next- I think we leave behind way too many artifacts: all files and directories which were part of a copy or move tree-restructuring operation are left behind on revert. Now, The problem here is that the files are even left behind if they were unmodified - and hence reproducible - by which reasoning no destruction of local modifications could have happened in the first place. This is why I'm now proposing that we stop to leave behind the -unchanged- files which are part of a copy or move operation. One could argue that the same reasoning could be applied to added trees. However, in that case, you might also apply the reasoning that the subtree should stay behind unversioned: it's afterall only the 'add' operation which we're reverting and deleting the added subtree might actually destroy users' efforts. The tricky bit to the reasoning in the paragraph above is that we don't check if files have been fully changed (effectively replaced) or not, meaning that simply reverting a versioned file could in effect have the same consequences as deleting an added file. With respect to "keeping around unversioned reverted-adds", I'm not sure what to propose. What do others think? I'm inclined to argue along the lines of "they're all delete operations", however, given our current behaviour, I also see why users wouldn't expect this behaviour. Comments? Bye, Erik.
Re: Format 20 upgrade to NODES
On Wed, Oct 6, 2010 at 1:12 PM, Julian Foad wrote: > On Wed, 2010-10-06 at 09:32 +0100, Philip Martin wrote: >> I'd like to enable NODES as a replacement for BASE_NODE and >> WORKING_NODE. This would involve bumping the format number, and old >> working copies would get automatically upgraded. > > +1 from me, ASAP. > > We're still working on the op_depth support and it's more complex than I > originally thought. It looks like doing this transition in two separate > format bumps will be more expedient. > > Please give me 24h to change the order of NODES columns first - see > separate email. +1 from me too. Bye, Erik.
Re: svn commit: r999837 - /subversion/trunk/subversion/libsvn_wc/wc-queries.sql
On Wed, Sep 22, 2010 at 11:25 PM, Greg Stein wrote: > On Wed, Sep 22, 2010 at 05:39, wrote: > >... > > +++ subversion/trunk/subversion/libsvn_wc/wc-queries.sql Wed Sep 22 > 09:39:45 2010 > > @@ -215,7 +215,7 @@ update nodes set properties = ?3 > > where wc_id = ?1 and local_relpath = ?2 > > and op_depth in > >(select op_depth from nodes > > -where wc_id = ?1 and local_relpath = ?2 > > +where wc_id = ?1 and local_relpath = ?2 and op_depth > 0 > > order by op_depth desc > > limit 1); > > Wouldn't it be better to do: > > where wc_id = ?1 and local_relpath = ?2 > and op_depth = (select max(op_depth) from nodes >where wc_id=?1 and local_relpath=?2 and op_depth > 0); > > It seems that eliminating the "order by" and "limit", in favor of > max() will tell sqlite what we're really searching for: the maximal > value. > > I wrote those queries like that because Bert said it would introduce an aggregation function - at the time he said it, that sounded like it was something negative. > Also note that the above query uses "op_depth in (...)" > > yet: > > > > > @@ -312,7 +312,7 @@ WHERE wc_id = ?1 AND local_relpath = ?2; > > update nodes set translated_size = ?3, last_mod_time = ?4 > > where wc_id = ?1 and local_relpath = ?2 > > and op_depth = (select op_depth from nodes > > - where wc_id = ?1 and local_relpath = ?2 > > + where wc_id = ?1 and local_relpath = ?2 and op_depth > > 0 > > order by op_depth desc > > limit 1); > > This one does not. The rest of the statements you converted all use > the "in" variant. > The "in" variant is probably better, because - especially with the op_depth > 0 restriction - the result set can probably be empty. Bye, Erik.
Re: UTF-8 NFC/NFD paths issue
Sorry to have left the discussion running so long without contributing to it myself. The reason I started about changing the repository / fs is because it is where we store the dataset that we'll need to support forever: working copies get destroyed and checked out over and over every hour, every day. Repositories get created once and only accumulate data. > > That doesn't solve the historical revisions containing "bad" paths. My > > understanding of the problem was that we'd go into the past and > > rewrite the paths into a single, canonical form. > > > > Agreed: an out-of-band solution fixes thing historically too. > As pointed out on IRC, I think it's important to stop adding semantically the same paths to a repository. From the perspective of efficiency, it might be handy to have a normalized version stored somewhere for all paths living in the repository, but to prevent addition of differently encoded paths, such a thing isn't really required: the correct encoding can be calculated when the check happens. > Having backend enforce NFC can wait for 2.0 I suppose :) > True, but the value of that might be limited: if we required all communications to be NFC encoded, we need to take additional measures - as pointed out by Branko - to make things work on MacOS X: currently, we have MacOS X shops happily working with non-ascii characters in the paths, all NFD encoded. That would change. By the way, Julian Foad, Philip Martin, Bert Huijben and I talked through a possible solution to fix the client-side issue which becomes an option once we switch to wc-ng. The full impact of that change needs to be determined though and probably does not fit in the 1.7 timeline. If it seems it does, we'll bring it up. To recap, the change I'm proposing is that we check pathnames with NFC/D aware comparison routines upon add_file() / add_directory() inside libsvn_repos or libsvn_fs_* - of which I suspect it's easier to handle inside the latter. In my proposal, we don't specify a "repository normal" encoding. If performance degrades too much, we can enhance the filesystem with a normalized version which doesn't need to be recoded in order to do the comparison with the incoming path. Other than that, I don't think there's anything *required* to make us Unicode-aware on the server. It's also the change I'm proposing cmpilato to implement in libsvn_fs_base as a proof of concept. This proposal says nothing about the client side. The client side can be fixed independently from the server side, given that we can't switch to normalized paths in the protocol until 2.0: whatever paths a server sends, the client will need to use those to communicate back to the server. Bye, Erik.
Re: svn commit: r997905 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c
Hi Greg, On Thu, Sep 16, 2010 at 10:47 PM, Erik Huelsmann wrote: > > > On Thu, Sep 16, 2010 at 10:40 PM, Philip Martin < > philip.mar...@wandisco.com> wrote: > >> Erik Huelsmann writes: >> >> > We're now back to a single failure. It's in the relocation-verification >> code >> > in db-test.c (line 1505). With the half-hour I've spent so far, I wasn't >> > able to locate it, but I have to move to other business now. Hopefully >> > you'll be able to find it. >> >> It's the difference between the old STMT_UPDATE_BASE_RECURSIVE_REPO >> and the new STMT_RECURSIVE_UPDATE_NODE_REPO. The first updates >> non-null repo_ids while the second updates repo_ids that match the old >> repo_id. This makes a difference when a node has a non-null repo_id >> that doesn't match the the old repo_id. >> >> I'm not sure whether the pre-relocate db is valid, and if it is I'm >> not sure which of the relocate algorithms is correct. >> > > The latter query (the one which verifies the repo_id) is the one I wrote. I > did so intentionally: from the description of the copyfrom_* fields in the > WORKING_NODE table, I couldn't but conclude they may be referring to a > different repository. Since the new query is updating both BASE and WORKING, > I thought verification of the old repo_id to be required. Additionally, what > happens if -for whatever reason- 1 working copy contains references to > multiple repositories? The former query will rewrite everything to be part > of the same repository. Hence, I think the former query is flawed. > > I hope the original author (Greg?) has something to say about it. > Just checked who originally wrote the STMT_UPDATE_RECURSIVE_BASE/WORKING_REPO query for use with relocate. Turns out to be you indeed. The fact that the old copyfrom_* fields have a repo_id column indicates to me that you're provisioning the option to store the fact that a file has been copied off a different repository. The fact that you store (in BASE) a repo_id for every base node (instead of one per wc) probably means that you're provisioning to have multiple repository sources for a single wc. However, the UPDATE_RECURSIVE_BASE_REPO query doesn't take any of that into account and simply rewrites all repo_ids to be the new repo to relocate to. That doesn't seem correct though: if other nodes had different repository sources, those should probably be excluded from relocation, no? What's your view on this? Bye, Erik.
Re: svn commit: r997905 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c
On Thu, Sep 16, 2010 at 10:40 PM, Philip Martin wrote: > Erik Huelsmann writes: > > > We're now back to a single failure. It's in the relocation-verification > code > > in db-test.c (line 1505). With the half-hour I've spent so far, I wasn't > > able to locate it, but I have to move to other business now. Hopefully > > you'll be able to find it. > > It's the difference between the old STMT_UPDATE_BASE_RECURSIVE_REPO > and the new STMT_RECURSIVE_UPDATE_NODE_REPO. The first updates > non-null repo_ids while the second updates repo_ids that match the old > repo_id. This makes a difference when a node has a non-null repo_id > that doesn't match the the old repo_id. > > I'm not sure whether the pre-relocate db is valid, and if it is I'm > not sure which of the relocate algorithms is correct. > The latter query (the one which verifies the repo_id) is the one I wrote. I did so intentionally: from the description of the copyfrom_* fields in the WORKING_NODE table, I couldn't but conclude they may be referring to a different repository. Since the new query is updating both BASE and WORKING, I thought verification of the old repo_id to be required. Additionally, what happens if -for whatever reason- 1 working copy contains references to multiple repositories? The former query will rewrite everything to be part of the same repository. Hence, I think the former query is flawed. I hope the original author (Greg?) has something to say about it. Bye, Erik.
Re: svn commit: r997905 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c
Hi Philip, On Thu, Sep 16, 2010 at 10:07 PM, wrote: > Author: ehu > Date: Thu Sep 16 20:07:27 2010 > New Revision: 997905 > > URL: http://svn.apache.org/viewvc?rev=997905&view=rev > Log: > Fix one of two remaining SVN_WC__NODES failures (manifesting itself twice). > > * subversion/tests/libsvn_wc/entries-compat.c > (TESTING_DATA): Add NODES (working_node) data. > > We're now back to a single failure. It's in the relocation-verification code in db-test.c (line 1505). With the half-hour I've spent so far, I wasn't able to locate it, but I have to move to other business now. Hopefully you'll be able to find it. Bye, Erik.
UTF-8 NFC/NFD paths issue
Yesterday, I was talking to CMike about our long-standing issue with UTF-8 strings designating a certain path not neccessarily being equal to other strings designating the same path. The issue has to do with NFC (composed) and NFD (decomposed) representation of Unicode characters. CMike nicely called the issue the "Erik Huelsmann issue" yesterday :-) The issue consists of two parts: 1. The repository which should determine that paths being added by a commit are unique, regardless of their encoding (NFC/NFD) 2. The client which should detect that the pathnames coming in from the filesystem may differ in encoding from what's in the working copy administrative files [this is mainly an issue on the Mac: http://subversion.tigris.org/issues/show_bug.cgi?id=2464] Mike, the thing I have been trying to find around our filesystem implementation is where an editor drive adding a path [add_directory() or add_file()] checks whether the file already exists. The check at that point should be encoding independent, for example by making all paths NFC (or NFD) before comparison. You could use utf8proc ( http://www.flexiguided.de/publications.utf8proc.en.html) to do the normalization - it's very light-weight in contrast to ICU which provides the same fuctionality, but has a much broader scope. The problem I was telling you about is that I was looking in libsvn_fs_base to find where the existence check is performed, but I couldn't find it. Basically what I was trying to do is: do what we do now (ie fail if the path exists and succeed if it doesn't), with the only difference that the paths used for comparison are guarenteed to be the same normalization - meaning they are the same byte sequence when they're equal unicode. Bye, Erik.
Fwd: svn commit: r996661 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c
Julian, This commit should remove the test failures you were experiencing on trunk with SVN_WC__NODES. At least that should give you confidence that if you see failures, you probably introduced them with local changes :-) Bye, Erik. -- Forwarded message -- From: Date: Mon, Sep 13, 2010 at 9:41 PM Subject: svn commit: r996661 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c To: comm...@subversion.apache.org Author: ehu Date: Mon Sep 13 19:41:00 2010 New Revision: 996661 URL: http://svn.apache.org/viewvc?rev=996661&view=rev Log: * subversion/tests/libsvn_wc/entries-compat.c (TESTING_DATA): Add NODES data. Modified: subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c Modified: subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c?rev=996661&r1=996660&r2=996661&view=diff == --- subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c (original) +++ subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c Mon Sep 13 19:41:00 2010 @@ -90,6 +90,7 @@ static const char * const TESTING_DATA = /* ### The file_externals column in BASE_NODE is temporary, and will be ### removed. However, to keep the tests passing, we need to add it ### to the following insert statements. *Be sure to remove it*. */ +#ifndef SVN_WC__NODES_ONLY "insert into base_node values (" " 1, '', 1, '', null, 'normal', 'dir', " " 1, null, null, " @@ -187,6 +188,92 @@ static const char * const TESTING_DATA = " 1, " TIME_1s ", '" AUTHOR_1 "', null, null, null, '()', null, null, " " null); " " " +#endif +#ifdef SVN_WC__NODES
NODE_DATA / NODES status
Today, I finished replacing all NODE_DATA queries (UPDATE/DELETE/INSERT) in wc_db.c by queries which operate on NODES. From here on, I'll start to write code to query BASE_NODE+NODES and WORKING_NODE+NODES, verifying that both tables return the same results. There are however, a few queries in 'entries.c' which operate directly on BASE_/WORKING_NODE. These queries will need to be migrated. However, in our old entries, we don't have the concept of op_depths and op roots. That makes it a bit hard to migrate the entries file to the exact semantics of the NODES table. When we fix the WORKING_NODE concept to have an op_depth == 1 during migration, however, conversion of the queries in that file isn't much of a problem. Does anybody expect serious issues from all working nodes having the same op_depth? The alternative would be to set the op_depth of each working node to the path component count of its local_relpath (making each node a stand-alone change). Now that I write the above, I think it's the sanest to make each working node its own oproot. That would be roughly as simple to code as the "everything is 1" assumption. Better ideas? Comments? Bye, Erik.
Migrating from NODE_DATA/BASE_NODE/WORKING_NODE to NODES
In r992993 the NODES table design was added. The SVN_WC__NODES conditional was created to enable known-working code for this schema. The NODES conditional will be used to flag sections which need to be further looked into for modification, just as with SINGLE_DB and NODE_DATA. It would be my idea to work toward a situation where - under SVN_WC__NODES - everything is written to two tables, verifying their equality when reading the data back from them. Then when we're able to run in that mode, we can switch to the NODES table from the current model of three tables. I'm going to tear down SVN_WC__NODE_DATA in the process. There are no guarantees the code will remain in working state with that conditional. Bye, Erik.
Re: svn commit: r992886 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c
On Mon, Sep 6, 2010 at 11:35 AM, Bert Huijben wrote: > > > + SVN_ERR(svn_sqlite__get_statement(&stmt, pdh->wcroot->sdb, > > +STMT_INSERT_NODE_DATA)); > > + > > + SVN_ERR(svn_sqlite__bindf(stmt, "isi", pdh->wcroot->wc_id, base, > > +(apr_int64_t) 0 /* BASE */ > > +)); > > + SVN_ERR(svn_sqlite__bind_text(stmt, 4, "")); > > + SVN_ERR(svn_sqlite__bind_token(stmt, 5, presence_map, > > + svn_wc__db_status_normal)); > > + SVN_ERR(svn_sqlite__bind_token(stmt, 6, kind_map, > > + svn_wc__db_kind_subdir)); > > Why don't you use _bindf("isistt", ...) here? > That would include all the other fields. (Other option: separate binds of > all values) > > Right. We have many situations where we could/should bind all values through ..._bindf(). This was merely duplicating what was exactly above it. I prefer the ..._bindf() version myself, but didn't want to rewrite existing code to use it. Based on your feedback, I think I just might do that anyway, when in the same function. Bye, Erik.
Re: [PROPOSAL] WC-NG: merge NODE_DATA, WORKING_NODE and BASE_NODE into a single table (NODES)
Given all the responses in the thread, I'd say we're moving to the single table for BASE and WORKING node recording. There was a flurry of activity from me yesterday and this morning regarding NODE_DATA: that was just me flushing my queue of patches. The work isn't completely irrelevant, as it identifies the spots where the NODES table will be introduced, just like NODE_DATA had to. Today, I'll go to draw up the NODES table and move over the queries which had already been modified for NODE_DATA over to the NODES design. I hope to get a very long way today already. If it's not done today, then I expect it to be able to finish it this week. Anyone wanting to join in: let's chat on IRC. Bye, Erik. On Thu, Sep 2, 2010 at 11:34 PM, Erik Huelsmann wrote: > > > As described by Julian earlier this month, Julian, Philip and I observed > that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in > common. Notably, by introducing the NODE_DATA table, most fields from > BASE_NODE and WORKING_NODE already moved to a common table. > > The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the > side of WORKING_NODE are the 2 cache fields 'translated_size' and > 'last_mod_time'. Apart from those two, there are the indexing fields wc_id, > local_relpath and parent_relpath. > > In the end we're storing *lots* of bytes (wc_id, local_relpath and > parent_relpath) to store 2 64-bit values. > > On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path > and revision. The NODE_DATA table already has the fields original_repos_id, > original_repos_path and original_revision. When op_depth == 0, these are > guaranteed to be empty (null), since they are for working nodes with > copy/move source information. Renaming the three fields in NODE_DATA to > repos_id, repos_path and revision, generalizing their use to include > op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE > would be reduced to a store of the dav_cache, translated_size and > last_mod_time fields. > > By subsuming translated_size and last_mod_time into NODE_DATA, neither > WORKING_NODE nor BASE_NODE will need to store these values anymore. This > eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then > only stores dav_cache. Here too, it's probably more efficient (in size) to > store dav_cache in NODE_DATA to prevent repeated storage of wc_id, > local_relpath and parent_relpath in BASE_NODE. > > In addition to the eliminated storage overhead, we'd be making things a > little less complex for ourselves: UPDATE, INSERT and DELETE queries would > be operating only on a single table, removing the need to split updates > across multiple statements. > > > This week, I was discussing this change with Greg on IRC. We both have the > feeling this should work out well. The proposal here is to switch > (WORKING_NODE, NODE_DATA, BASE_NODE) into a single table --> NODES. > > > Comments? Fears? Enhancements? > > > Bye, > > > Erik. >
[PROPOSAL] WC-NG: merge NODE_DATA, WORKING_NODE and BASE_NODE into a single table (NODES)
As described by Julian earlier this month, Julian, Philip and I observed that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in common. Notably, by introducing the NODE_DATA table, most fields from BASE_NODE and WORKING_NODE already moved to a common table. The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the side of WORKING_NODE are the 2 cache fields 'translated_size' and 'last_mod_time'. Apart from those two, there are the indexing fields wc_id, local_relpath and parent_relpath. In the end we're storing *lots* of bytes (wc_id, local_relpath and parent_relpath) to store 2 64-bit values. On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path and revision. The NODE_DATA table already has the fields original_repos_id, original_repos_path and original_revision. When op_depth == 0, these are guaranteed to be empty (null), since they are for working nodes with copy/move source information. Renaming the three fields in NODE_DATA to repos_id, repos_path and revision, generalizing their use to include op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE would be reduced to a store of the dav_cache, translated_size and last_mod_time fields. By subsuming translated_size and last_mod_time into NODE_DATA, neither WORKING_NODE nor BASE_NODE will need to store these values anymore. This eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then only stores dav_cache. Here too, it's probably more efficient (in size) to store dav_cache in NODE_DATA to prevent repeated storage of wc_id, local_relpath and parent_relpath in BASE_NODE. In addition to the eliminated storage overhead, we'd be making things a little less complex for ourselves: UPDATE, INSERT and DELETE queries would be operating only on a single table, removing the need to split updates across multiple statements. This week, I was discussing this change with Greg on IRC. We both have the feeling this should work out well. The proposal here is to switch (WORKING_NODE, NODE_DATA, BASE_NODE) into a single table --> NODES. Comments? Fears? Enhancements? Bye, Erik.
Re: svn commit: r986332 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c
>> >> Modified: >> subversion/trunk/subversion/libsvn_wc/wc-queries.sql >> subversion/trunk/subversion/libsvn_wc/wc_db.c > > Your log message doesn't describe any changes in wc_db.c Prop-edited now. Thanks. Bye, Erik.
NODE_DATA (2nd iteration)
After lots of discussion regarding the way NODE_DATA/4th tree should be working, I'm now ready to post a summary of the progress. In my last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I stated why we need this; this post is about the conclusion of what needs to happen. Also included are the first steps there. With the advent of NODE_DATA, we distinguish node values specifically related to BASE nodes, those specifically related to "current" WORKING nodes and those which are to be maintained for multiple levels of WORKING nodes (not only the "current" view) (the latter category is most often also shared with BASE). The respective tables will hold the columns shown below. - TABLE WORKING_NODE ( wc_id INTEGER NOT NULL REFERENCES WCROOT (id), local_relpath TEXT NOT NULL, parent_relpath TEXT, moved_here INTEGER, moved_to TEXT, original_repos_id INTEGER REFERENCES REPOSITORY (id), original_repos_path TEXT, original_revnum INTEGER, translated_size INTEGER, last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ keep_local INTEGER, PRIMARY KEY (wc_id, local_relpath) ); CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath); The moved_* and original_* columns are typical examples of "WORKING fields only maintained for the visible WORKING nodes": the original_* and moved_* fields are inherited from the operation root by all children part of the operation. The operation root will be the visible change on its own level, meaning it'll have rows both in the WORKING_NODE and NODE_DATA tables. The fact that these columns are not in the WORKING_NODE table means that tree changes are not preserved accros overlapping changes. This is fully compatible with what we do today: changes to higher levels destroy changes to lower levels. The translated_size and last_mod_time columns exist in WORKING_NODE and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that they exist in BASE_NODE is a bit of a hack: it's to prevent creation of WORKING_NODE data for every file which has keyword expansion or eol translation properties set: these columns serve only to optimize working copy scanning for changes and as such only relate to the visible WORKING_NODEs. TABLE BASE_NODE ( wc_id INTEGER NOT NULL REFERENCES WCROOT (id), local_relpath TEXT NOT NULL, repos_id INTEGER REFERENCES REPOSITORY (id), repos_relpath TEXT, parent_relpath TEXT, translated_size INTEGER, last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ dav_cache BLOB, incomplete_children INTEGER, file_external TEXT, PRIMARY KEY (wc_id, local_relpath) ); TABLE NODE_DATA ( wc_id INTEGER NOT NULL REFERENCES WCROOT (id), local_relpath TEXT NOT NULL, op_depth INTEGER NOT NULL, presence TEXT NOT NULL, kind TEXT NOT NULL, checksum TEXT, changed_rev INTEGER, changed_date INTEGER, /* an APR date/time (usec since 1970) */ changed_author TEXT, depth TEXT, symlink_target TEXT, properties BLOB, PRIMARY KEY (wc_id, local_relpath, oproot) ); CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath); Which leaves the NODE_DATA structure above. The op_depth column contains the depth of the node - relative to the wc root - on which the operation was run which caused the creation of the given NODE_DATA node. In the final scheme (based on single-db), the value will be 0 for base and a positive integer for WORKING related data. In order to be able to implement NODE_DATA even without having a fully functional SINGLE_DB yet, a transitional node numbering scheme needs to be devised. The following numbers will apply: BASE == 0, WORKING-this-dir == 1, WORKING-any-immediate-child == 2. Other transitioning related remarks: * Conditional-protected experimentational sections, just like with SINGLE_DB * Initial implementation will simply replace the current functionality of the 2 tables, from there we can work our way through whatever needs doing. * Am I forgetting any others? Bye, Erik.
Re: NODE_DATA (aka fourth tree)
>> * moved_here >> * moved_to On IRC, we were discussing the fact that these columns are in the databases, but nobody seems to be planning to implement them for 1.7. Is that your perception too? If so, we could remove them with the upcoming schema-change required for NODE_DATA. Bye, Erik.
Re: NODE_DATA (aka fourth tree)
On Sun, Jul 11, 2010 at 1:04 AM, Greg Stein wrote: > On Sat, Jul 10, 2010 at 17:55, Erik Huelsmann wrote: >>... >> Columns to be placed in NODE_DATA: >> >> * wc_id >> * local_relpath >> * oproot_distance >> * presence >> * kind >> * revnum > > revnum is a BASE concept, so it does not belong here. WORKING nodes do > not have a revision until they are committed. If the node is copied > from the repository, then the *source* of that copy needs a revision > and path, but that is conceptually different from "revnum" (which > identifies the rev of the node itself). > >> * checksum >> * translated_size >> * last_mod_time Thinking about it a bit more, I think translated_size and last_mod_time are a bit odd to have in NODE_DATA - although they are part of both BASE_NODE and WORKING_NODE: they really do apply only to BASE and the *current* working node: they are part of the optimization to determine if a file has changed. Presumably, when a different layer of WORKING becomes visible, we'll be recalculating both fields. If that's the case, shouldn't we just hold onto them in their respective tables? >> * changed_rev >> * changed_date >> * changed_author >> * depth >> * properties >> * dav_cache > > dav_cache is also a BASE concept, and remains in BASE_NODE. Agreed. >> * symlink_target >> * file_external > > I'm not sure that file_external belongs here. We certainly don't have > it in WORKING_NODE. I've been informing around on IRC to understand the difference between why that would apply to file_external, but not to symlink_target. The difference isn't clear to me yet. Do you have anything which might help me? >> This means, these columns stay in WORKING_NODE (next to its key, ofcourse): >> >> * copyfrom_repos_id >> * copyfrom_repos_path >> * copyfrom_revnum >> * moved_here >> * moved_to >> >> These columns can stay in WORKING_NODE, because all children inherit >> their values from the oproot. I.e. a subdirectory of a copied >> directory inherits the copy/move info, unless it's been copied/moved >> itself, in which case it has its own copy information. > > Right. > > Also note that we can opportunistically rename the above columns to > their wc_db API names: original_*. They would be original_repos_id, > original_repos_relpath, original_revision. Done. (In my local patch-in-preparation.) Bye, Erik.
NODE_DATA (aka fourth tree)
As announced by gstein before, we've had some discussion on the NODE_DATA structure which should allow storing multiple levels of tree manipulation in our wc-db. This mail aims at describing my progress on the subject so far. Please review and comment. Introduction What's the 4th tree about? The 4th tree is not 1 tree, but instead it's the ability to store overlapping tree changes in our WORKING tree. Take the following tree: root +- A - C - file \- B - C - file Then, imagine replacing A with B. All would be fine with our current single level WORKING representation. However, if we replace 'file' in the copied tree, a single level won't do anymore: if you revert the replacement of file, you want to revert to what was there when the tree was copied. The other option - which you don't want because it would result in an inconsistent tree - would be that wc-ng would revert to what was there even before the copy operation. Being able to revert the 'file' replacement independently of the 'A' replacement, you need 2 levels of WORKING nodes for 'file': one for the direct replacement and one for the replacement that comes with replacing 'A'. Using the same logic, many levels may be required to model complicated working copy changes. What this change is not -- This change does not include any change to the current behaviour of libsvn_wc that modifying modified trees are destructive operations. The multi-level model exists only to keep track of WORKING tree changes, not to make changes to the ACTUAL tree visible again after reverting a replaced subtree. Proposed change - Greg made a proposal on the list some time ago which allows the required multiplicity of WORKING nodes by creating a new table: NODE_DATA. The table was proposed to hold a subset of the columns currently in the BASE_NODE and WORKING_NODE tables. The rationale about storing the BASE_NODE data in the table too is that a query for a node which doesn't have a WORKING version will simply return the BASE version. That way, there's no need to teach the code about the absense of WORKING. Although the BASE_NODE information is put in this table, this doesn't mean the BASE_NODE and WORKING_NODE concepts are being redefined, other than allowing layered WORKING_NODE (sub)trees. Columns to be placed in NODE_DATA: * wc_id * local_relpath * oproot_distance * presence * kind * revnum * checksum * translated_size * last_mod_time * changed_rev * changed_date * changed_author * depth * properties * dav_cache * symlink_target * file_external This means, these columns stay in WORKING_NODE (next to its key, ofcourse): * copyfrom_repos_id * copyfrom_repos_path * copyfrom_revnum * moved_here * moved_to These columns can stay in WORKING_NODE, because all children inherit their values from the oproot. I.e. a subdirectory of a copied directory inherits the copy/move info, unless it's been copied/moved itself, in which case it has its own copy information. As described before, sorting the nodes relating to a certain path in ascending order relating to their oproot, you'd always get the 'current' WORKING state applicable to the node, if the distance between the node and the working copy root is used to identify the BASE_NODE data. Most -if not all- of the changes to the underlying table structure should stay hidden behind the wc-db API. Relevance to 1.7 -- Why do we need this change now? Why can't it wait until we finished 1.7, after all, it's just polishing the way we versioned directories in wc-1, right? Not exactly. Currently, mixed-revision working copies are modelled using an oproot for each subtree with its own revision number. That means that without this change, effectively we can't represent mixed-revision working copy trees. So, in order to achieve feature parity with 1.6, we need to realise this change before 1.7. Well, that's basically it. Comments? Bye, Erik.
Re: Antwort: Re: ... Re: dangerous implementation of rep-sharing cache for fsfs
On Wed, Jun 30, 2010 at 10:13 PM, Daniel Shahaf wrote: > [ trim CC ] > > Mark Mielke wrote on Wed, 30 Jun 2010 at 21:37 -: >> On 06/30/2010 05:57 AM, michael.fe...@evonik.com wrote: >> > P.S. Thanks for the warning; we are not going to use 1.7. > > Did you check what is the probability of dying in a car accident? Well, I quickly checked their website; they're in the pharma business: the business of determining the chances of dying of a pill when you consume it. That definitely explains the paranoia: they're storing law suit evidence in Subversion before it's actually evidence. (Hence the paranoia about the data staying *exactly* what they put in.) >> > At the Moment we are not using 1.6 either, >> > because of the SHA-1 rep-share cache. > > In 1.6, representation sharing can be DISABLED. Bye, Erik.
Re: misaligned blame output if repo has >1m revisions
Hi Philip, On Mon, Apr 12, 2010 at 3:14 PM, Philipp Marek wrote: > Hello Bert! > > On Montag, 12. April 2010, Bert Huijben wrote: >> Well, on Windows consoles are all 80 characters wide. (You can fix this if >> you are a frequent Command Prompt User, but most applications just assume >> 80 characters on Windows. And in many cases Windows switches back to 80 >> characters if it detects direct screen operations) >> >> The tab width is not globally configurable on Windows. > Do you regularly look at the output in a console? Without piping into a file > and looking at that with an editor? I'm not Bert, but: Actually, I haven't ever looked at blame output any other way. Bye, Erik.
Re: misaligned blame output if repo has >1m revisions
hi Phil, On Mon, Apr 12, 2010 at 7:54 AM, Philipp Marek wrote: > Hello Johan, > hello Stefan, > > On Freitag, 9. April 2010, Stefan Sperling wrote: >> On Fri, Apr 09, 2010 at 10:17:12PM +0200, Johan Corveleyn wrote: >> > So I guess this is coming up for you guys when s.a.o reaches the 1 >> > million mark :-). >> >> Nice buglet. I suppose we could simply add 2 or 3 spaces of indentation >> until people run into even higher revisions in real life? :) > what do you mean, "in real life"? > > http://websvn.kde.org/trunk/ > Directory revision: 1113881 > > Seems that life is fast enough ;-) With 3 additional spaces, that would fit, wouldn't it? Bye, Erik.
Re: svn client protocol (svn:// uri) specification + any client implementation , if any
Dear Karthik, How about libsvn_ra_svn? (Which is an implementation of libsvn_ra.) http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/ libsvn_ra is available here: http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra/ And the headers with documentation - and for linking - can be found here: http://svn.apache.org/repos/asf/subversion/trunk/subversion/include/ Regards, Erik. On Mon, Mar 15, 2010 at 7:30 AM, Karthik K wrote: > On 03/14/2010 03:11 PM, Philip Martin wrote: >> >> Karthik K writes: >> >> >>> >>> Wondering if there is a document explaining the svn:// uri >>> connection protocol ( among other transports/protocols that svn >>> supports, primarily interested in the read-only checkouts), and >>> clients that implement the protocol. >>> >> >> How about: >> C >> >> http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol >> >> > > Thanks Philip for the links to the protocol. Curious , if there is any > apache licensed svn client library (java, say) for the protocol mentioned > here ? Thanks. >